Raghavendra, Prakash
2012-Aug-15 10:04 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM
Hi Hal I was also looking at providing such a support in LLVM for capturing (both explicit and implicit) parallelism in LLVM. We had an initial discussion around this and your proposal comes at the right time. We support such an initiative. We can work together to get this support implemented in LLVM. But, I have a slight different view. I think today parallelism does not necessarily mean OpenMP or SIMD, we are in the area of heterogeneous computing. I agree that your primary target was thread-based parallelism, but I think we could extend this while we capture the parallelism in the program. My idea is to capture parallelism with the way you have said using 'metadata'. I agree to record the parallel regions in the metadata (as given by the user). However, we could also give placeholders to record any additional information that the compiler writer needs like number of threads, scheduling parameters, chunk size, etc etc which are specific perhaps to OpenMP. The point is that the same parallel loop could be targeted by another standard to accelerators today (like GPUs) using another standard OpenACC. We may get a new standard to capture and target for different kind of parallel device, which could look quite different, and has to specifically targeted. Since we are at the intermediate layer, we could be independent of both user level standards like OpenMP, OpenACC, OpenCL, Cilk+, C++AMP etc and at the same time, keep enough information at this stage so that the compiler could generate efficient backend code for the target device. So, my suggestion is to keep all these relevant information as 'tags' for metadata and it is up to the backend to use or throw the information. As you said, if the backend ignores there should not be any harm in correctness of the final code. Second point I wanted to make was on the intrinsics. I am not sure why we need these intrinsics at the LLVM level. I am not sure why we would need conditional constructs for expressing parallelism. These could be calls directly to the runtime library at the code generation level. Again, this is very good initiative and we would like to support such a support in LLVM ASAP. Prakash Raghavendra AMD, Bangalore Email: Prakash.raghavendra at amd.com Phone: +91-80-3323 0753 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120815/835e7457/attachment.html>
Renato Golin
2012-Aug-15 10:56 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM
On 15 August 2012 11:04, Raghavendra, Prakash <Prakash.Raghavendra at amd.com> wrote:> My idea is to capture parallelism with the way you have said using > ‘metadata’. I agree to record the parallel regions in the metadata (as given by the user). However, we > could also give placeholders to record any additional information that the compiler writer needs like > number of threads, scheduling parameters, chunk size, etc etc which are specific perhaps to > OpenMP.Hi Prakash, I can't see the silver bullet you do. Different types of parallelism (thread/process/network/heterogeneous) have completely different assumptions, and the same keyword can mean different things, depending on the context. If you try to create a magic metadata that will cover from OpenCL to OpenMP to MPI, you'll end up having to have namespaces in metadata, which is the same as having N different types of metadata. If there was a language that could encompass the pure meaning of parallelism (Oracle just failed building that), one could assume many things for each paradigm (OpenCL, mp, etc) much easier than trying to fit the already complex rules of C/C++ into the even more complex rules of target/vendor-dependent behaviour. OpenCL is supposed to be less of a problem in that, but the target is so different that I wouldn't try to merge OpenCL keywords with OpenMP ones. True, you can do that with a handful of basic concepts, but the more obscure ones will break your leg. And you *will* have to implement them. Those of us unlucky enough to have to have implement bitfields, anonymous unions, volatile and C++ class layout know what I mean by that. True, we're talking about the language-agnostic LLVM IR, but you have to remember that LLVM IR is build from real-world languages, and thus, full of front-end hacks and fiddles to tell the back end about the ABI decisions in a generic way. I'm still not convinced there will be a lot of shared keywords between all parallel paradigms, ie. that you can take the same IR and compile to OpenCL, or OpenMP, or MPI, etc and it'll just work (and optimise). -- cheers, --renato http://systemcall.org/
Hal Finkel
2012-Aug-15 16:28 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM
On Wed, 15 Aug 2012 10:04:34 +0000 "Raghavendra, Prakash" <Prakash.Raghavendra at amd.com> wrote:> > Hi Hal > > I was also looking at providing such a support in LLVM for capturing > (both explicit and implicit) parallelism in LLVM. We had an initial > discussion around this and your proposal comes at the right time. We > support such an initiative. We can work together to get this support > implemented in LLVM.Great!> > But, I have a slight different view. I think today parallelism does > not necessarily mean OpenMP or SIMD, we are in the area of > heterogeneous computing. I agree that your primary target was > thread-based parallelism, but I think we could extend this while we > capture the parallelism in the program.I don't think that we have a different view, but my experience with heterogeneous systems is limited, and while I've played around with OpenACC and OpenCL some, I don't feel qualified to design an LLVM support API for those standards. I don't feel that I really understand the use cases well enough. My hope is that others will chime in with ideas on how to best support those models. I think that the largest difference between shared-memory parallelism (as in OpenMP) and the parallelism targeted by OpenACC, etc. is the memory model. With OpenACC, IIRC, there is an assumption that the accelerator memory is separate and specific data-copying directives are necessary. Furthermore, with asynchronous-completion support, these data copies are not optional. We could certainly add data-copying intrinsics for this, but the underlying problem is code assumptions about the data copies. I'm not sure how to deal with this.> > My idea is to capture parallelism with the way you have said using > 'metadata'. I agree to record the parallel regions in the metadata > (as given by the user). However, we could also give placeholders to > record any additional information that the compiler writer needs like > number of threads, scheduling parameters, chunk size, etc etc which > are specific perhaps to OpenMP.I agree, although I think that some of those parameters are generic enough to apply to different parallelization mechanism. They might also be ignored by some mechanisms for which they're irrelevant. We should make the metadata modular, I think that is a good idea. Instead of taking a fixed list of things, for example, we may want to encode name/value pairs.> > The point is that the same parallel loop could be targeted by another > standard to accelerators today (like GPUs) using another standard > OpenACC. We may get a new standard to capture and target for > different kind of parallel device, which could look quite different, > and has to specifically targeted.Yes. We just need to make sure that we fully capture the semantics of the standards that we're targeting. My idea was to start with OpenMP, and make sure that we could fully capture its semantics, and then move on from there.> > Since we are at the intermediate layer, we could be independent of > both user level standards like OpenMP, OpenACC, OpenCL, Cilk+, C++AMP > etc and at the same time, keep enough information at this stage so > that the compiler could generate efficient backend code for the > target device.Yes, this is, to the extent possible, what I'd like.> So, my suggestion is to keep all these relevant > information as 'tags' for metadata and it is up to the backend to use > or throw the information. As you said, if the backend ignores there > should not be any harm in correctness of the final code. > > Second point I wanted to make was on the intrinsics. I am not sure > why we need these intrinsics at the LLVM level. I am not sure why we > would need conditional constructs for expressing parallelism. These > could be calls directly to the runtime library at the code generation > level.These are necessary because of technical requirements; specifically, metadata variable references do not count as 'uses', and so were runtime expressions not referenced by an intrinsic, those variables would be deleted as dead code. In OpenMP, expressions which reference local variables can appear in the pragmas (such as those which specify the number of threads), and we need to make sure those expressions are not removed prior to lowering. I believe that OpenACC has similar causes to support. That having been said, I'm certainly open to more generic intrinsics.> > Again, this is very good initiative and we would like to support such > a support in LLVM ASAP.I am very happy to hear you say that. -Hal> > Prakash Raghavendra > AMD, Bangalore > Email: Prakash.raghavendra at amd.com > Phone: +91-80-3323 0753 >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hal Finkel
2012-Aug-15 16:36 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM
On Wed, 15 Aug 2012 11:56:48 +0100 Renato Golin <rengolin at systemcall.org> wrote:> On 15 August 2012 11:04, Raghavendra, Prakash > <Prakash.Raghavendra at amd.com> wrote: > > My idea is to capture parallelism with the way you have said using > > ‘metadata’. I agree to record the parallel regions in the metadata > > (as given by the user). However, we could also give placeholders to > > record any additional information that the compiler writer needs > > like number of threads, scheduling parameters, chunk size, etc etc > > which are specific perhaps to OpenMP. > > Hi Prakash, > > I can't see the silver bullet you do. Different types of parallelism > (thread/process/network/heterogeneous) have completely different > assumptions, and the same keyword can mean different things, depending > on the context. If you try to create a magic metadata that will cover > from OpenCL to OpenMP to MPI, you'll end up having to have namespaces > in metadata, which is the same as having N different types of > metadata. > > If there was a language that could encompass the pure meaning of > parallelism (Oracle just failed building that), one could assume many > things for each paradigm (OpenCL, mp, etc) much easier than trying to > fit the already complex rules of C/C++ into the even more complex > rules of target/vendor-dependent behaviour. OpenCL is supposed to be > less of a problem in that, but the target is so different that I > wouldn't try to merge OpenCL keywords with OpenMP ones. > > True, you can do that with a handful of basic concepts, but the more > obscure ones will break your leg. And you *will* have to implement > them. Those of us unlucky enough to have to have implement bitfields, > anonymous unions, volatile and C++ class layout know what I mean by > that. > > True, we're talking about the language-agnostic LLVM IR, but you have > to remember that LLVM IR is build from real-world languages, and thus, > full of front-end hacks and fiddles to tell the back end about the ABI > decisions in a generic way. > > I'm still not convinced there will be a lot of shared keywords between > all parallel paradigms, ie. that you can take the same IR and compile > to OpenCL, or OpenMP, or MPI, etc and it'll just work (and optimise). >Renato, To some extent, I'm not sure that the keywords are the largest problem, but rather the runtime libraries. OpenMP, OpenACC, Cilk++, etc. all have runtime libraries that provide functions that interact with the respective syntax extensions. Allowing for that in combination with a generic framework might be difficult. That having been said, from the implementation side, there are certainly commonalities that we should exploit. Basic changes to optimization passes (loop iteration-space changes, LICM, etc.), aliasing analysis, etc. will be necessary to support many kinds of parallelism, and I think having generic support for that in LLVM is highly preferable to specialized support for many different standards. I think that there will be standard-specific semantics that will need specific modeling, but we should share when possible (and I think that a lot of the basic infrastructure can be shared). -Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Reasonably Related Threads
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)