thr3ads.net - llvm dev - [LLVMdev] LLVM Parallel IR [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2015-Mar-09 17:52 UTC

[LLVMdev] LLVM Parallel IR

On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch>
wrote:> If my memories are right, one of the critical issues (besides
> other engineering considerations) was that parallelism metadata in LLVM is
> optional and can always be dropped. However, for
> OpenMP it sometimes is incorrect to execute a loop sequential that has been
> marked parallel in the source code.
Exactly. The fact that metadata goes stale quickly is not a flaw, but
a design decision. If the producing pass is close enough from the
consuming one, you should be ok. If not, then proving legality might
be tricky and time consuming. The problem with OpenMP and other
vectorizer pragmas is that the metadata is introduced by the
front-end, and it's a long way down the pipeline for it to be
consumed. Having said that, it works ok if you keep your code simple.

I'd be interested in knowing what in the IR cannot be accomplished in
terms of metadata for parallelization, and what would be the new
constructs that needed to be added to the IR in order to do that. If
there is benefit for your project at the same time as for OpenMP and
our internal vectorizer annotation, changing the IR wouldn't be
impossible. We have done the same for exception handling...

cheers,
--renato

Kevin Streit

2015-Mar-10 08:36 UTC

head link

[LLVMdev] LLVM Parallel IR

> On March 9, 2015 at 6:52 PM Renato Golin <renato.golin at linaro.org>
wrote:
> 
> 
> On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch>
wrote:
> > If my memories are right, one of the critical issues (besides
> > other engineering considerations) was that parallelism metadata in
LLVM is
> > optional and can always be dropped. However, for
> > OpenMP it sometimes is incorrect to execute a loop sequential that has
been
> > marked parallel in the source code.
> 
> Exactly. The fact that metadata goes stale quickly is not a flaw, but
> a design decision. If the producing pass is close enough from the
> consuming one, you should be ok. If not, then proving legality might
> be tricky and time consuming. The problem with OpenMP and other
> vectorizer pragmas is that the metadata is introduced by the
> front-end, and it's a long way down the pipeline for it to be
> consumed. Having said that, it works ok if you keep your code simple.
I know that this was a long discussion and that the "breakability" of
parallel
loop infos is the result of a design decision. And I also believe that this is
a good way as long as parallelism is not part of the contract with the user
(i.e., the programmer when placing explicit parallelism annotations, or the
language designer introducing parallelism into the semantics of the language).
Tobias already mentioned problems with breaking OpenMP semantics. Similarly
different forms of parallelism, like task parallelism, could certainly be
represented using metadata, say by extracting the task code to a function and
annotating the call as being spawned for parallel execution. Again,
optimizations could break it, violating a possible contract with the user.

Alternatively we could introduce intrinsics, which I currently do. This would
forbid certain optimizations like moving potential memory accesses in and out
of "parallel code sections" and therefore does not break parallelism
that
often.  The headaches that this approach causes me are that basic analyses like
dominance, reachability and the like are broken in that setting as everything
computed in one parallel task, followed by another parallel task in the cfg
does not dominate or even reach the second task. This of course influences the
precision and correctness of optimizations, like for instance redundant code
elimination or GVN.
> I'd be interested in knowing what in the IR cannot be accomplished in
> terms of metadata for parallelization, and what would be the new
> constructs that needed to be added to the IR in order to do that. If
> there is benefit for your project at the same time as for OpenMP and
> our internal vectorizer annotation, changing the IR wouldn't be
> impossible. We have done the same for exception handling...
I understand that parallelism is a very invasive concept and introducing it
into a so far "sequential" IR will cause severe breakage and
headaches. But I
am afraid that if we accept parallelism as being a first class citizen, then I
would prefer doing it as a core part of the IR.  One possibility to do this
gradually might also be to have a seperate, parallel, IR, say PIR, that will be
lowered to regular IR at some point (however this point is chosen). Existing
optimizations can then be gradually moved from the regular IR phase to the PIR
phase where appropriate and useful.  Nevertheless I do not propose to do such a
thing in LLVM right now. I think this might be an option for a (bigger)
research project at first.
 
I'd be happy to hear further thoughts about that.

Cheers,

---

Kevin Streit
Neugäßchen 2
66111 Saarbrücken

Tel. +49 (0)151 23003245
streit at mailbox.org · http://www.kevinstreit.de

Hal Finkel

2015-Mar-10 12:45 UTC

head link

[LLVMdev] LLVM Parallel IR

----- Original Message -----> From: "Kevin Streit" <streit at mailbox.org>
> To: "Renato Golin" <renato.golin at linaro.org>,
"Tobias Grosser" <tgrosser at inf.ethz.ch>
> Cc: "William Moses" <wmoses at csail.mit.edu>, "LLVM
Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, March 10, 2015 3:36:10 AM
> Subject: Re: [LLVMdev] LLVM Parallel IR
> 
> > On March 9, 2015 at 6:52 PM Renato Golin <renato.golin at
linaro.org>
> > wrote:
> > 
> > 
> > On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at
inf.ethz.ch>
> > wrote:
> > > If my memories are right, one of the critical issues (besides
> > > other engineering considerations) was that parallelism metadata
> > > in LLVM is
> > > optional and can always be dropped. However, for
> > > OpenMP it sometimes is incorrect to execute a loop sequential
> > > that has been
> > > marked parallel in the source code.
> > 
> > Exactly. The fact that metadata goes stale quickly is not a flaw,
> > but
> > a design decision. If the producing pass is close enough from the
> > consuming one, you should be ok. If not, then proving legality
> > might
> > be tricky and time consuming. The problem with OpenMP and other
> > vectorizer pragmas is that the metadata is introduced by the
> > front-end, and it's a long way down the pipeline for it to be
> > consumed. Having said that, it works ok if you keep your code
> > simple.
> 
> I know that this was a long discussion and that the
"breakability" of
> parallel
> loop infos is the result of a design decision. And I also believe
> that this is
> a good way as long as parallelism is not part of the contract with
> the user
> (i.e., the programmer when placing explicit parallelism annotations,
> or the
> language designer introducing parallelism into the semantics of the
> language).
> Tobias already mentioned problems with breaking OpenMP semantics.
> Similarly
> different forms of parallelism, like task parallelism, could
> certainly be
> represented using metadata, say by extracting the task code to a
> function and
> annotating the call as being spawned for parallel execution. Again,
> optimizations could break it, violating a possible contract with the
> user.
> 
> Alternatively we could introduce intrinsics, which I currently do.
> This would
> forbid certain optimizations like moving potential memory accesses in
> and out
> of "parallel code sections" and therefore does not break
parallelism
> that
> often.  The headaches that this approach causes me are that basic
> analyses like
> dominance, reachability and the like are broken in that setting as
> everything
> computed in one parallel task, followed by another parallel task in
> the cfg
> does not dominate or even reach the second task. This of course
> influences the
> precision and correctness of optimizations, like for instance
> redundant code
> elimination or GVN.
> 
> > I'd be interested in knowing what in the IR cannot be accomplished
> > in
> > terms of metadata for parallelization, and what would be the new
> > constructs that needed to be added to the IR in order to do that.
> > If
> > there is benefit for your project at the same time as for OpenMP
> > and
> > our internal vectorizer annotation, changing the IR wouldn't be
> > impossible. We have done the same for exception handling...
> 
> I understand that parallelism is a very invasive concept and
> introducing it
> into a so far "sequential" IR will cause severe breakage and
> headaches. But I
> am afraid that if we accept parallelism as being a first class
> citizen, then I
> would prefer doing it as a core part of the IR.  One possibility to
> do this
> gradually might also be to have a seperate, parallel, IR, say PIR,
> that will be
> lowered to regular IR at some point (however this point is chosen).
> Existing
> optimizations can then be gradually moved from the regular IR phase
> to the PIR
> phase where appropriate and useful.  Nevertheless I do not propose to
> do such a
> thing in LLVM right now. I think this might be an option for a
> (bigger)
> research project at first.
>  
> I'd be happy to hear further thoughts about that.
Part of the issue here is that the benefit to IR features vs. enhancing LLVM
(TLI, BasicAA, etc.) to understand more about the semantics of the runtime calls
is not clear. To play devil's advocate;

 - Currently, runtime calls interfere with optimizations such as LICM because of
the conservative answer BasicAA must provide about unknown external functions,
however, BasicAA has knowledge of certain external function calls, and some
knowledge of these could be added.

 - We'd like to perform some optimizations, such as duplicate barrier
removal. However, an optimization can recognize consecutive calls to a barrier
runtime library function pretty easily -- we don't need special IR features
for that, perhaps only a good abstraction layer if we support different
runtimes.

 -Hal
> 
> Cheers,
> 
> ---
> 
> Kevin Streit
> Neugäßchen 2
> 66111 Saarbrücken
> 
> Tel. +49 (0)151 23003245
> streit at mailbox.org · http://www.kevinstreit.de
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Renato Golin

2015-Mar-11 16:30 UTC

head link

[LLVMdev] LLVM Parallel IR

On 10 March 2015 at 08:36, Kevin Streit <streit at mailbox.org>
wrote:> Again, optimizations could break it, violating a possible contract with the
user.
AFAIK, all these annotations are more hints than contracts. Adding
#pragma omp simd on a loop that has forward dependencies that cannot
be solved should not make the loop vectorize (incorrectly). Same goes
for standard OMP, threads and other types of parallel computation.

> The headaches that this approach causes me are that basic analyses like
> dominance, reachability and the like are broken
Yes, intrinsics are heavy handed and will stop the compiler from being
smart in many ways.

> One possibility to do this gradually might also be to have a seperate,
parallel, IR, say PIR, that will be
> lowered to regular IR at some point (however this point is chosen).
So, we have Polly that does that, and it's not a trivial issue. Adding
yet another representation is worrying. The IR Language Reference is
comprehensive and authoritative, and is a great resource for building
IR from ASTs or adding optimisations. For every new representation, we
have to add a similar document, plus another to tell how that is
represented in other IRs, so that transformations are well defined. I
don't think we have that for Polly, and adding PIR would double that
problem.

I'd be much more open to slow moving changes to the current IR
(especially if they benefit Polly, too). This path worked very well
for exception handling (which is an equally complex case), and should
work for parallelism.

But that's just my opinion. :)

cheers,
--renato

Andrey Bokhanko

2015-Mar-25 23:50 UTC

head link

[LLVMdev] LLVM Parallel IR

These discussions bring back memories... :-)

I suggest everyone who is looking into extending LLVM IR to express
parallel semantic to read the list archives from September and October of
2012:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/thread.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/thread.html

looking for all threads with "OpenMP" in their names.

Some things were discussed extensively back then; for example, this
proposal from Kevin:

I understand that parallelism is a very invasive concept and introducing
it> into a so far "sequential" IR will cause severe breakage and
headaches.
> But I
> am afraid that if we accept parallelism as being a first class citizen,
> then I
> would prefer doing it as a core part of the IR.  One possibility to do this
> gradually might also be to have a seperate, parallel, IR, say PIR, that
> will be
> lowered to regular IR at some point (however this point is chosen).
> Existing
> optimizations can then be gradually moved from the regular IR phase to the
> PIR
> phase where appropriate and useful.  Nevertheless I do not propose to do
> such a
> thing in LLVM right now. I think this might be an option for a (bigger)
> research project at first.
>
was already refuted by Chris:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054042.html

Andrey
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150326/0670c7cd/attachment.html>

llvm dev - Mar 2015 - [LLVMdev] LLVM Parallel IR

[LLVMdev] LLVM Parallel IR

[LLVMdev] LLVM Parallel IR

[LLVMdev] LLVM Parallel IR

[LLVMdev] LLVM Parallel IR

[LLVMdev] LLVM Parallel IR