thr3ads.net - llvm dev - [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Raghavendra, Prakash

2012-Aug-15 10:04 UTC

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

Hi Hal

I was also looking at providing such a support in LLVM for capturing (both
explicit and implicit)
parallelism in LLVM. We had an initial discussion around this and your proposal
comes at the
right time. We support such an initiative. We can work together to get this
support implemented
in LLVM.

But, I have a slight different view. I think today parallelism does not
necessarily mean OpenMP
or SIMD, we are in the area of heterogeneous computing. I agree that your
primary target
was thread-based parallelism, but I think we could extend this while we capture
the parallelism
in the program.

My idea is to capture parallelism with the way you have said using
'metadata'. I agree to record
the parallel regions in the metadata (as given by the user). However, we could
also give placeholders
to record any additional information that the compiler writer needs like number
of threads,
scheduling parameters, chunk size, etc etc which are specific perhaps to OpenMP.

The point is that the same parallel loop could be targeted by another standard
to accelerators today
(like GPUs) using another standard OpenACC. We may get a new standard to capture
and target
for different kind of parallel device, which could look quite different, and has
to specifically targeted.

Since we are at the intermediate layer, we could be independent of both user
level standards like
OpenMP, OpenACC, OpenCL, Cilk+, C++AMP etc and at the same time, keep enough
information at this stage
so that the compiler could generate efficient backend code for the target
device. So, my suggestion is
to keep all these relevant information as 'tags' for metadata and it is
up to the backend to use or
throw the information. As you said, if the backend ignores there should not be
any harm in correctness
of the final code.

Second point I wanted to make was on the intrinsics. I am not sure why we need
these intrinsics at the
LLVM level. I am not sure why we would need conditional constructs for
expressing parallelism. These
could be calls directly to the runtime library at the code generation level.

Again, this is very good initiative and we would like to support such a support
in LLVM ASAP.

Prakash Raghavendra
AMD, Bangalore
Email: Prakash.raghavendra at amd.com
Phone: +91-80-3323 0753

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120815/835e7457/attachment.html>

Renato Golin

2012-Aug-15 10:56 UTC

head link

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

On 15 August 2012 11:04, Raghavendra, Prakash
<Prakash.Raghavendra at amd.com> wrote:> My idea is to capture parallelism with the way you have said using
> ‘metadata’. I agree to record the parallel regions in the metadata (as
given by the user). However, we
> could also give placeholders to record any additional information that the
compiler writer needs like
> number of threads, scheduling parameters, chunk size, etc etc which are
specific perhaps to
> OpenMP.
Hi Prakash,

I can't see the silver bullet you do. Different types of parallelism
(thread/process/network/heterogeneous) have completely different
assumptions, and the same keyword can mean different things, depending
on the context. If you try to create a magic metadata that will cover
from OpenCL to OpenMP to MPI, you'll end up having to have namespaces
in metadata, which is the same as having N different types of
metadata.

If there was a language that could encompass the pure meaning of
parallelism (Oracle just failed building that), one could assume many
things for each paradigm (OpenCL, mp, etc) much easier than trying to
fit the already complex rules of C/C++ into the even more complex
rules of target/vendor-dependent behaviour. OpenCL is supposed to be
less of a problem in that, but the target is so different that I
wouldn't try to merge OpenCL keywords with OpenMP ones.

True, you can do that with a handful of basic concepts, but the more
obscure ones will break your leg. And you *will* have to implement
them. Those of us unlucky enough to have to have implement bitfields,
anonymous unions, volatile and C++ class layout know what I mean by
that.

True, we're talking about the language-agnostic LLVM IR, but you have
to remember that LLVM IR is build from real-world languages, and thus,
full of front-end hacks and fiddles to tell the back end about the ABI
decisions in a generic way.

I'm still not convinced there will be a lot of shared keywords between
all parallel paradigms, ie. that you can take the same IR and compile
to OpenCL, or OpenMP, or MPI, etc and it'll just work (and optimise).

-- 
cheers,
--renato

http://systemcall.org/

Hal Finkel

2012-Aug-15 16:28 UTC

head link

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

On Wed, 15 Aug 2012 10:04:34 +0000
"Raghavendra, Prakash" <Prakash.Raghavendra at amd.com> wrote:
> 
> Hi Hal
> 
> I was also looking at providing such a support in LLVM for capturing
> (both explicit and implicit) parallelism in LLVM. We had an initial
> discussion around this and your proposal comes at the right time. We
> support such an initiative. We can work together to get this support
> implemented in LLVM.
Great!
> 
> But, I have a slight different view. I think today parallelism does
> not necessarily mean OpenMP or SIMD, we are in the area of
> heterogeneous computing. I agree that your primary target was
> thread-based parallelism, but I think we could extend this while we
> capture the parallelism in the program.
I don't think that we have a different view, but my experience with
heterogeneous systems is limited, and while I've played around with
OpenACC and OpenCL some, I don't feel qualified to design an LLVM
support API for those standards. I don't feel that I really understand
the use cases well enough. My hope is that others will chime in with
ideas on how to best support those models.

I think that the largest difference between shared-memory parallelism
(as in OpenMP) and the parallelism targeted by OpenACC, etc. is the
memory model. With OpenACC, IIRC, there is an assumption that the
accelerator memory is separate and specific data-copying directives are
necessary. Furthermore, with asynchronous-completion support, these
data copies are not optional. We could certainly add data-copying
intrinsics for this, but the underlying problem is code assumptions
about the data copies. I'm not sure how to deal with this.
> 
> My idea is to capture parallelism with the way you have said using
> 'metadata'. I agree to record the parallel regions in the metadata
> (as given by the user). However, we could also give placeholders to
> record any additional information that the compiler writer needs like
> number of threads, scheduling parameters, chunk size, etc etc which
> are specific perhaps to OpenMP.
I agree, although I think that some of those parameters are generic
enough to apply to different parallelization mechanism. They might also
be ignored by some mechanisms for which they're irrelevant. We should
make the metadata modular, I think that is a good idea. Instead of
taking a fixed list of things, for example, we may want to encode
name/value pairs.
> 
> The point is that the same parallel loop could be targeted by another
> standard to accelerators today (like GPUs) using another standard
> OpenACC. We may get a new standard to capture and target for
> different kind of parallel device, which could look quite different,
> and has to specifically targeted.
Yes. We just need to make sure that we fully capture the semantics of
the standards that we're targeting. My idea was to start with OpenMP,
and make sure that we could fully capture its semantics, and then move
on from there.
> 
> Since we are at the intermediate layer, we could be independent of
> both user level standards like OpenMP, OpenACC, OpenCL, Cilk+, C++AMP
> etc and at the same time, keep enough information at this stage so
> that the compiler could generate efficient backend code for the
> target device.
Yes, this is, to the extent possible, what I'd like.
> So, my suggestion is to keep all these relevant
> information as 'tags' for metadata and it is up to the backend to
use
> or throw the information. As you said, if the backend ignores there
> should not be any harm in correctness of the final code.
> 
> Second point I wanted to make was on the intrinsics. I am not sure
> why we need these intrinsics at the LLVM level. I am not sure why we
> would need conditional constructs for expressing parallelism. These
> could be calls directly to the runtime library at the code generation
> level.
These are necessary because of technical requirements; specifically,
metadata variable references do not count as 'uses', and so were
runtime expressions not referenced by an intrinsic, those variables
would be deleted as dead code. In OpenMP, expressions which reference
local variables can appear in the pragmas (such as those which specify
the number of threads), and we need to make sure those expressions are
not removed prior to lowering. I believe that OpenACC has similar
causes to support.

That having been said, I'm certainly open to more generic intrinsics.
> 
> Again, this is very good initiative and we would like to support such
> a support in LLVM ASAP.
I am very happy to hear you say that.

 -Hal
> 
> Prakash Raghavendra
> AMD, Bangalore
> Email: Prakash.raghavendra at amd.com
> Phone: +91-80-3323 0753
> 


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel

2012-Aug-15 16:36 UTC

head link

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

On Wed, 15 Aug 2012 11:56:48 +0100
Renato Golin <rengolin at systemcall.org> wrote:
> On 15 August 2012 11:04, Raghavendra, Prakash
> <Prakash.Raghavendra at amd.com> wrote:
> > My idea is to capture parallelism with the way you have said using
> > ‘metadata’. I agree to record the parallel regions in the metadata
> > (as given by the user). However, we could also give placeholders to
> > record any additional information that the compiler writer needs
> > like number of threads, scheduling parameters, chunk size, etc etc
> > which are specific perhaps to OpenMP.
> 
> Hi Prakash,
> 
> I can't see the silver bullet you do. Different types of parallelism
> (thread/process/network/heterogeneous) have completely different
> assumptions, and the same keyword can mean different things, depending
> on the context. If you try to create a magic metadata that will cover
> from OpenCL to OpenMP to MPI, you'll end up having to have namespaces
> in metadata, which is the same as having N different types of
> metadata.
> 
> If there was a language that could encompass the pure meaning of
> parallelism (Oracle just failed building that), one could assume many
> things for each paradigm (OpenCL, mp, etc) much easier than trying to
> fit the already complex rules of C/C++ into the even more complex
> rules of target/vendor-dependent behaviour. OpenCL is supposed to be
> less of a problem in that, but the target is so different that I
> wouldn't try to merge OpenCL keywords with OpenMP ones.
> 
> True, you can do that with a handful of basic concepts, but the more
> obscure ones will break your leg. And you *will* have to implement
> them. Those of us unlucky enough to have to have implement bitfields,
> anonymous unions, volatile and C++ class layout know what I mean by
> that.
> 
> True, we're talking about the language-agnostic LLVM IR, but you have
> to remember that LLVM IR is build from real-world languages, and thus,
> full of front-end hacks and fiddles to tell the back end about the ABI
> decisions in a generic way.
> 
> I'm still not convinced there will be a lot of shared keywords between
> all parallel paradigms, ie. that you can take the same IR and compile
> to OpenCL, or OpenMP, or MPI, etc and it'll just work (and optimise).
> 
Renato,

To some extent, I'm not sure that the keywords are the largest problem,
but rather the runtime libraries. OpenMP, OpenACC, Cilk++, etc. all
have runtime libraries that provide functions that interact with the
respective syntax extensions. Allowing for that in combination with a
generic framework might be difficult.

That having been said, from the implementation side, there are
certainly commonalities that we should exploit. Basic changes to
optimization passes (loop iteration-space changes, LICM, etc.),
aliasing analysis, etc. will be necessary to support many kinds of
parallelism, and I think having generic support for that in LLVM is
highly preferable to specialized support for many different standards.
I think that there will be standard-specific semantics that will need
specific modeling, but we should share when possible (and I think that
a lot of the basic infrastructure can be shared).

 -Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Aug 2012 - [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM

Reasonably Related Threads