On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch> wrote:> If my memories are right, one of the critical issues (besides > other engineering considerations) was that parallelism metadata in LLVM is > optional and can always be dropped. However, for > OpenMP it sometimes is incorrect to execute a loop sequential that has been > marked parallel in the source code.Exactly. The fact that metadata goes stale quickly is not a flaw, but a design decision. If the producing pass is close enough from the consuming one, you should be ok. If not, then proving legality might be tricky and time consuming. The problem with OpenMP and other vectorizer pragmas is that the metadata is introduced by the front-end, and it's a long way down the pipeline for it to be consumed. Having said that, it works ok if you keep your code simple. I'd be interested in knowing what in the IR cannot be accomplished in terms of metadata for parallelization, and what would be the new constructs that needed to be added to the IR in order to do that. If there is benefit for your project at the same time as for OpenMP and our internal vectorizer annotation, changing the IR wouldn't be impossible. We have done the same for exception handling... cheers, --renato
> On March 9, 2015 at 6:52 PM Renato Golin <renato.golin at linaro.org> wrote: > > > On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch> wrote: > > If my memories are right, one of the critical issues (besides > > other engineering considerations) was that parallelism metadata in LLVM is > > optional and can always be dropped. However, for > > OpenMP it sometimes is incorrect to execute a loop sequential that has been > > marked parallel in the source code. > > Exactly. The fact that metadata goes stale quickly is not a flaw, but > a design decision. If the producing pass is close enough from the > consuming one, you should be ok. If not, then proving legality might > be tricky and time consuming. The problem with OpenMP and other > vectorizer pragmas is that the metadata is introduced by the > front-end, and it's a long way down the pipeline for it to be > consumed. Having said that, it works ok if you keep your code simple.I know that this was a long discussion and that the "breakability" of parallel loop infos is the result of a design decision. And I also believe that this is a good way as long as parallelism is not part of the contract with the user (i.e., the programmer when placing explicit parallelism annotations, or the language designer introducing parallelism into the semantics of the language). Tobias already mentioned problems with breaking OpenMP semantics. Similarly different forms of parallelism, like task parallelism, could certainly be represented using metadata, say by extracting the task code to a function and annotating the call as being spawned for parallel execution. Again, optimizations could break it, violating a possible contract with the user. Alternatively we could introduce intrinsics, which I currently do. This would forbid certain optimizations like moving potential memory accesses in and out of "parallel code sections" and therefore does not break parallelism that often. The headaches that this approach causes me are that basic analyses like dominance, reachability and the like are broken in that setting as everything computed in one parallel task, followed by another parallel task in the cfg does not dominate or even reach the second task. This of course influences the precision and correctness of optimizations, like for instance redundant code elimination or GVN.> I'd be interested in knowing what in the IR cannot be accomplished in > terms of metadata for parallelization, and what would be the new > constructs that needed to be added to the IR in order to do that. If > there is benefit for your project at the same time as for OpenMP and > our internal vectorizer annotation, changing the IR wouldn't be > impossible. We have done the same for exception handling...I understand that parallelism is a very invasive concept and introducing it into a so far "sequential" IR will cause severe breakage and headaches. But I am afraid that if we accept parallelism as being a first class citizen, then I would prefer doing it as a core part of the IR. One possibility to do this gradually might also be to have a seperate, parallel, IR, say PIR, that will be lowered to regular IR at some point (however this point is chosen). Existing optimizations can then be gradually moved from the regular IR phase to the PIR phase where appropriate and useful. Nevertheless I do not propose to do such a thing in LLVM right now. I think this might be an option for a (bigger) research project at first. I'd be happy to hear further thoughts about that. Cheers, --- Kevin Streit Neugäßchen 2 66111 Saarbrücken Tel. +49 (0)151 23003245 streit at mailbox.org · http://www.kevinstreit.de
----- Original Message -----> From: "Kevin Streit" <streit at mailbox.org> > To: "Renato Golin" <renato.golin at linaro.org>, "Tobias Grosser" <tgrosser at inf.ethz.ch> > Cc: "William Moses" <wmoses at csail.mit.edu>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Tuesday, March 10, 2015 3:36:10 AM > Subject: Re: [LLVMdev] LLVM Parallel IR > > > On March 9, 2015 at 6:52 PM Renato Golin <renato.golin at linaro.org> > > wrote: > > > > > > On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch> > > wrote: > > > If my memories are right, one of the critical issues (besides > > > other engineering considerations) was that parallelism metadata > > > in LLVM is > > > optional and can always be dropped. However, for > > > OpenMP it sometimes is incorrect to execute a loop sequential > > > that has been > > > marked parallel in the source code. > > > > Exactly. The fact that metadata goes stale quickly is not a flaw, > > but > > a design decision. If the producing pass is close enough from the > > consuming one, you should be ok. If not, then proving legality > > might > > be tricky and time consuming. The problem with OpenMP and other > > vectorizer pragmas is that the metadata is introduced by the > > front-end, and it's a long way down the pipeline for it to be > > consumed. Having said that, it works ok if you keep your code > > simple. > > I know that this was a long discussion and that the "breakability" of > parallel > loop infos is the result of a design decision. And I also believe > that this is > a good way as long as parallelism is not part of the contract with > the user > (i.e., the programmer when placing explicit parallelism annotations, > or the > language designer introducing parallelism into the semantics of the > language). > Tobias already mentioned problems with breaking OpenMP semantics. > Similarly > different forms of parallelism, like task parallelism, could > certainly be > represented using metadata, say by extracting the task code to a > function and > annotating the call as being spawned for parallel execution. Again, > optimizations could break it, violating a possible contract with the > user. > > Alternatively we could introduce intrinsics, which I currently do. > This would > forbid certain optimizations like moving potential memory accesses in > and out > of "parallel code sections" and therefore does not break parallelism > that > often. The headaches that this approach causes me are that basic > analyses like > dominance, reachability and the like are broken in that setting as > everything > computed in one parallel task, followed by another parallel task in > the cfg > does not dominate or even reach the second task. This of course > influences the > precision and correctness of optimizations, like for instance > redundant code > elimination or GVN. > > > I'd be interested in knowing what in the IR cannot be accomplished > > in > > terms of metadata for parallelization, and what would be the new > > constructs that needed to be added to the IR in order to do that. > > If > > there is benefit for your project at the same time as for OpenMP > > and > > our internal vectorizer annotation, changing the IR wouldn't be > > impossible. We have done the same for exception handling... > > I understand that parallelism is a very invasive concept and > introducing it > into a so far "sequential" IR will cause severe breakage and > headaches. But I > am afraid that if we accept parallelism as being a first class > citizen, then I > would prefer doing it as a core part of the IR. One possibility to > do this > gradually might also be to have a seperate, parallel, IR, say PIR, > that will be > lowered to regular IR at some point (however this point is chosen). > Existing > optimizations can then be gradually moved from the regular IR phase > to the PIR > phase where appropriate and useful. Nevertheless I do not propose to > do such a > thing in LLVM right now. I think this might be an option for a > (bigger) > research project at first. > > I'd be happy to hear further thoughts about that.Part of the issue here is that the benefit to IR features vs. enhancing LLVM (TLI, BasicAA, etc.) to understand more about the semantics of the runtime calls is not clear. To play devil's advocate; - Currently, runtime calls interfere with optimizations such as LICM because of the conservative answer BasicAA must provide about unknown external functions, however, BasicAA has knowledge of certain external function calls, and some knowledge of these could be added. - We'd like to perform some optimizations, such as duplicate barrier removal. However, an optimization can recognize consecutive calls to a barrier runtime library function pretty easily -- we don't need special IR features for that, perhaps only a good abstraction layer if we support different runtimes. -Hal> > Cheers, > > --- > > Kevin Streit > Neugäßchen 2 > 66111 Saarbrücken > > Tel. +49 (0)151 23003245 > streit at mailbox.org · http://www.kevinstreit.de > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On 10 March 2015 at 08:36, Kevin Streit <streit at mailbox.org> wrote:> Again, optimizations could break it, violating a possible contract with the user.AFAIK, all these annotations are more hints than contracts. Adding #pragma omp simd on a loop that has forward dependencies that cannot be solved should not make the loop vectorize (incorrectly). Same goes for standard OMP, threads and other types of parallel computation.> The headaches that this approach causes me are that basic analyses like > dominance, reachability and the like are brokenYes, intrinsics are heavy handed and will stop the compiler from being smart in many ways.> One possibility to do this gradually might also be to have a seperate, parallel, IR, say PIR, that will be > lowered to regular IR at some point (however this point is chosen).So, we have Polly that does that, and it's not a trivial issue. Adding yet another representation is worrying. The IR Language Reference is comprehensive and authoritative, and is a great resource for building IR from ASTs or adding optimisations. For every new representation, we have to add a similar document, plus another to tell how that is represented in other IRs, so that transformations are well defined. I don't think we have that for Polly, and adding PIR would double that problem. I'd be much more open to slow moving changes to the current IR (especially if they benefit Polly, too). This path worked very well for exception handling (which is an equally complex case), and should work for parallelism. But that's just my opinion. :) cheers, --renato
These discussions bring back memories... :-) I suggest everyone who is looking into extending LLVM IR to express parallel semantic to read the list archives from September and October of 2012: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/thread.html http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/thread.html looking for all threads with "OpenMP" in their names. Some things were discussed extensively back then; for example, this proposal from Kevin: I understand that parallelism is a very invasive concept and introducing it> into a so far "sequential" IR will cause severe breakage and headaches. > But I > am afraid that if we accept parallelism as being a first class citizen, > then I > would prefer doing it as a core part of the IR. One possibility to do this > gradually might also be to have a seperate, parallel, IR, say PIR, that > will be > lowered to regular IR at some point (however this point is chosen). > Existing > optimizations can then be gradually moved from the regular IR phase to the > PIR > phase where appropriate and useful. Nevertheless I do not propose to do > such a > thing in LLVM right now. I think this might be an option for a (bigger) > research project at first. >was already refuted by Chris: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054042.html Andrey -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150326/0670c7cd/attachment.html>