I'm part of a research group at MIT looking to create an extension of LLVM that inherently allows one to nicely code a parallel loop. Most parallel frameworks tend to take the body of a parallel loop and stick it inside of a function for the parallel runtime to call when appropriate. However, this makes optimizations significantly more difficult as most compiler optimizations tend to be intraprocedural. The goal of this parallel IR is to allow LLVM to more easily optimize parallel code -- which often gets the short end of the optimization stick. Long story short: I was wondering if anyone in the LLVM community has already begun a similar project or knows of something that begins to accomplish this. Additionally, does anyone who has worked with parallelism in LLVM have any specific comments or suggestions regarding this project. Sincerely, William Moses MIT Computer Science and Artificial Intelligence Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150309/5825b107/attachment.html>
On 09.03.2015 16:58, William Moses wrote:> I'm part of a research group at MIT looking to create an extension of > LLVM that inherently allows one to nicely code a parallel loop. > > Most parallel frameworks tend to take the body of a parallel loop and > stick it inside of a function for the parallel runtime to call when > appropriate. However, this makes optimizations significantly more > difficult as most compiler optimizations tend to be intraprocedural. The > goal of this parallel IR is to allow LLVM to more easily optimize > parallel code -- which often gets the short end of the optimization stick. > > Long story short: I was wondering if anyone in the LLVM community has > already begun a similar project or knows of something that begins to > accomplish this. > > Additionally, does anyone who has worked with parallelism in LLVM have > any specific comments or suggestions regarding this project.Hi William, this sounds like a very interesting idea, but it also is not an easy problem. You may probably want to have a look into how LLVM models parallel loops using the loop.parallel metadata. At the time when this metadata was introduced (see archives), we had a longer debate on how to model parallelism. First ideas suggested to just add metadata flag to the loop header to indicate that the loop is parallel. However, the problem with such an approach is that scalar transformations might introduce memory operations that destroy parallelism. As they have commonly no idea about loop level metadata, the metadata needs to be designed in a way that newly introduced scalar dependences automatically invalidate the loop level metadata. There were also relevant discussions at the time when LLVM's OpenMP support was introduced. People considered to use some kind of parallelism metadata and to only, at IR level, introduce the openmp runtime calls. The hope was to be able to perform useful optimizations at IR level. If my memories are right, one of the critical issues (besides other engineering considerations) was that parallelism metadata in LLVM is optional and can always be dropped. However, for OpenMP it sometimes is incorrect to execute a loop sequential that has been marked parallel in the source code. Best regards, Tobias
Hi William, A similar discussion has recently been started on this list by a grad student of MIT (The student was Lefteris Ioannidis and the discussion titled "LLVM parallel annotations"). So you might also participate in that discussion. I am working on automatic parallelization in LLVM and have frequently come to the point during my research that I wished I had started with discussing and implementing a parallel extension of LLVM IR. Unfortunately I did not, or at least not in a way it should be implemented. So I am happy to hear that someone plans to work on this. As you might be aware of there is already a representation for parallel loops in LLVM IR. This representation is basically based on metadata being added to both the loop (llvm.loop <id>) and all memory accesses within that loop (llvm.mem.parallel_loop_access <id>) which are safe. The loop with id <id> can then be executed in parallel (i.e., every iteration thereof) provided all potential memory accesses it contains carry a corresponding mark with the same id. Optimizations which are not aware of the parallelism thus potentially break it, but the result would be sequential execution, which is at least sound. You will find details and discussions about those annotations in the archives of this list and in the sources (e.g., LoopInfo) Although I understand why such a minimally invasive approach to parallelism in the IR is desired for most use cases, I think that parallelism is invasive by nature and would have to influence most optimizations. During my PhD work I always wished for a completely parallel IR with all analyses and transformations being aware of parallelism at least those that have to be. Such an IR would be usable as the target for parallel source languages like Cilk(+), OpenMP, or simply for automatic parallelization. For that however, it would have to be able to express parallelism not only for loops. Instead, one should also be able to "fork" arbitrary code regions for parallel execution. Think, a cilk spawn. I think it should be possible to come up with a minimal, clearly defined set of parallelism constructs in the IR. The goal should be to be able to map the most important forms of parallelism to these constructs. Completely parallel loops (DOALL loops) are just one form. A first guide might be OpenMP constructs, including parallel sections and tasks. I'd be interested in hearing more details about your ideas in that direction. Cheers, --- Kevin Streit Neugäßchen 2 66111 Saarbrücken Tel. +49 (0)151 23003245 streit at mailbox.org · http://www.kevinstreit.de> On March 9, 2015 at 4:58 PM William Moses <wmoses at csail.mit.edu> wrote: > > > I'm part of a research group at MIT looking to create an extension of LLVM > that inherently allows one to nicely code a parallel loop. > > Most parallel frameworks tend to take the body of a parallel loop and stick > it inside of a function for the parallel runtime to call when appropriate. > However, this makes optimizations significantly more difficult as most > compiler optimizations tend to be intraprocedural. The goal of this > parallel IR is to allow LLVM to more easily optimize parallel code -- which > often gets the short end of the optimization stick. > > Long story short: I was wondering if anyone in the LLVM community has > already begun a similar project or knows of something that begins to > accomplish this. > > Additionally, does anyone who has worked with parallelism in LLVM have any > specific comments or suggestions regarding this project. > > Sincerely, > William Moses > MIT Computer Science and Artificial Intelligence Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 9 March 2015 at 17:30, Tobias Grosser <tgrosser at inf.ethz.ch> wrote:> If my memories are right, one of the critical issues (besides > other engineering considerations) was that parallelism metadata in LLVM is > optional and can always be dropped. However, for > OpenMP it sometimes is incorrect to execute a loop sequential that has been > marked parallel in the source code.Exactly. The fact that metadata goes stale quickly is not a flaw, but a design decision. If the producing pass is close enough from the consuming one, you should be ok. If not, then proving legality might be tricky and time consuming. The problem with OpenMP and other vectorizer pragmas is that the metadata is introduced by the front-end, and it's a long way down the pipeline for it to be consumed. Having said that, it works ok if you keep your code simple. I'd be interested in knowing what in the IR cannot be accomplished in terms of metadata for parallelization, and what would be the new constructs that needed to be added to the IR in order to do that. If there is benefit for your project at the same time as for OpenMP and our internal vectorizer annotation, changing the IR wouldn't be impossible. We have done the same for exception handling... cheers, --renato
Hi William, Currently, LLVM lowers parallel constructs of OpenMP at the front-end level to runtime calls; although simple, this ad-hoc solution prevents high-level reasoning about parallel languages while possibly introducing semantic inconsistencies in existing sequential program analyses. Worse, users implicitly rely upon the lack of interprocedural analyses within many compilers to preserve the semantics of programs even in the presence of code optimizations. As part of my PhD work, I worked on introducing a parallel intermediate representation extension for parallel compilers which I called *SPIRE*. SPIRE was successfully applied to automatically parallelize sequential programs within the PIPS source to source compiler. (Here is a link to my PhD thesis: https://tel.archives-ouvertes.fr/pastel-00935483/document). As part of my Postdoc work at University of Houston, I am currently applying SPIRE to LLVM in order to optimize PGAS languages, specifically OpenSHMEM. The idea is to be able to recognize OpenSHMEM constructs and represent them explicitly in the LLVM IR using SPIRE to take advantage of sequential optimizations offered by LLVM. The ultimate goal is to develop new parallel optimizations based on SPIRE(LLVM IR). The tricky part here is to prove that the enabled optimizations will generate correct codes. I am using SPIRE formal operational semantics I developed to prove the correctness of optimizations performed on parallel programs. I can send you a submitted paper regarding this work on LLVM if you are interested. Best, -- Dounia KHALDI Postdoctoral Fellow University of Houston Department of Computer Science 501 Philip G. Hoffman Hall Houston, TX 77204-3010 On Mon, Mar 9, 2015 at 10:58 AM, William Moses <wmoses at csail.mit.edu> wrote:> I'm part of a research group at MIT looking to create an extension of LLVM > that inherently allows one to nicely code a parallel loop. > > Most parallel frameworks tend to take the body of a parallel loop and > stick it inside of a function for the parallel runtime to call when > appropriate. However, this makes optimizations significantly more difficult > as most compiler optimizations tend to be intraprocedural. The goal of this > parallel IR is to allow LLVM to more easily optimize parallel code -- which > often gets the short end of the optimization stick. > > Long story short: I was wondering if anyone in the LLVM community has > already begun a similar project or knows of something that begins to > accomplish this. > > Additionally, does anyone who has worked with parallelism in LLVM have any > specific comments or suggestions regarding this project. > > Sincerely, > William Moses > MIT Computer Science and Artificial Intelligence Laboratory > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Dounia KHALDI Postdoctoral Fellow University of Houston Department of Computer Science 501 Philip G. Hoffman Hall Houston, TX 77204-3010 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150316/95ebe706/attachment.html>
Hi William, as has been mentioned by Kevin, the idea of building an explicit parallel IR and associated tools has its merits. With the Insieme Project (http://insieme-compiler.org) we went along that road and built an explicit parallel high-level IR for analysing, manipulating and tuning parallel codes. In particular we managed to design a compact IR unifying various different parallel language extensions and APIs, including OpenMP, Cilk, OpenCL and -- to some extend -- MPI. As part of this work we got to believe that in the general case the compiler supported coordination and tuning of parallelism in programs is beyond the scope of conventional low-level IRs. If you are interested in our experiences, concepts, design and results, an overview can be found in [1] and a rather extensive description and formalization in the associated thesis [2]. Maybe those can provide you some insights regarding potential alternative approaches for your work. Best, Herbert [1] http://dl.acm.org/citation.cfm?id=2523721.2523727 [2] http://www.dps.uibk.ac.at/~csaf7445/pub/phd_thesis_jordan.pdf - in particular Chapter 3 On 2015-03-17 01:12, Dounia Khaldi wrote:> Hi William, > > Currently, LLVM lowers parallel constructs of OpenMP at the front-end > level to runtime calls; although simple, this ad-hoc solution prevents > high-level reasoning about parallel languages while possibly > introducing semantic inconsistencies in existing sequential program > analyses. Worse, users implicitly rely upon the lack of > interprocedural analyses within many compilers to preserve the > semantics of programs even in the presence of code optimizations. > > As part of my PhD work, I worked on introducing a parallel > intermediate representation extension for parallel compilers which I > called /SPIRE/. SPIRE was successfully applied to automatically > parallelize sequential programs within the PIPS source to source > compiler. (Here is a link to my PhD thesis: > https://tel.archives-ouvertes.fr/pastel-00935483/document). > > As part of my Postdoc work at University of Houston, I am currently > applying SPIRE to LLVM in order to optimize PGAS languages, > specifically OpenSHMEM. The idea is to be able to recognize OpenSHMEM > constructs and represent them explicitly in the LLVM IR using SPIRE to > take advantage of sequential optimizations offered by LLVM. The > ultimate goal is to develop new parallel optimizations based on > SPIRE(LLVM IR). The tricky part here is to prove that the enabled > optimizations will generate correct codes. I am using SPIRE formal > operational semantics I developed to prove the correctness of > optimizations performed on parallel programs. > > I can send you a submitted paper regarding this work on LLVM if you > are interested. > > Best, > > > -- > Dounia KHALDI > Postdoctoral Fellow > University of Houston > Department of Computer Science > 501 Philip G. Hoffman Hall > Houston, TX 77204-3010 > > On Mon, Mar 9, 2015 at 10:58 AM, William Moses <wmoses at csail.mit.edu > <mailto:wmoses at csail.mit.edu>> wrote: > > I'm part of a research group at MIT looking to create an extension > of LLVM that inherently allows one to nicely code a parallel loop. > > Most parallel frameworks tend to take the body of a parallel loop > and stick it inside of a function for the parallel runtime to call > when appropriate. However, this makes optimizations significantly > more difficult as most compiler optimizations tend to be > intraprocedural. The goal of this parallel IR is to allow LLVM to > more easily optimize parallel code -- which often gets the short > end of the optimization stick. > > Long story short: I was wondering if anyone in the LLVM community > has already begun a similar project or knows of something that > begins to accomplish this. > > Additionally, does anyone who has worked with parallelism in LLVM > have any specific comments or suggestions regarding this project. > > Sincerely, > William Moses > MIT Computer Science and Artificial Intelligence Laboratory > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > Dounia KHALDI > Postdoctoral Fellow > University of Houston > Department of Computer Science > 501 Philip G. Hoffman Hall > Houston, TX 77204-3010 > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/97f01964/attachment.html>