Andrey Bokhanko
2012-Oct-02 10:09 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
Chris,> My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata. I can't imagine a way to make this work *correctly* without massive changes to the optimizer.There are three ways to make this work correctly: 1) Ignore OpenMP-related intrinsics and associated metadata. Least effort, least benefit (no OpenMP support). Yet, OpenMP programs compiled correctly, as if no pragmas are present -- including *exactly the same* number of routines and call graph (thanks to no procedurization in front-end). OpenMP specification allow such compilation. This might be the choice for targets that don't support OpenMP runtime library. 2) Make procedurization (including all runtime calls -- no intrinsics left after this step) at the very start of LLVM optimizer. No changes to optimizations, but no opportunity to optimize parallel code. As cheap and easy as one can do to support OpenMP. This might be a good choice for initial implementation. 3) Do some carefully chosen optimizations before procedurization. Do heavylifting (like loop restructuring optimizations) after procedurization. Some effort, a lot of benefit. This is essentially what is described in [Tian05] (referenced in our proposal). 4) Make all optimizations thread-aware. Best approach in theory, no compilers exist that go as far. Our proposal make all these choices possible. One can implement 1) in half an hour, yet keep the door opened for a better solution. Yours, Andrey --- Software Engineer Intel Compiler Team Intel Corp.
dag at cray.com
2012-Oct-02 19:47 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
Andrey Bokhanko <andreybokhanko at gmail.com> writes:> There are three ways to make this work correctly: > > 1) Ignore OpenMP-related intrinsics and associated metadata. Least > effort, least benefit (no OpenMP support). Yet, OpenMP programs > compiled correctly, as if no pragmas are present -- including *exactly > the same* number of routines and call graph (thanks to no > procedurization in front-end). OpenMP specification allow such > compilation. This might be the choice for targets that don't support > OpenMP runtime library.Actually, it is perfectly possible to have a program with OpenMP directives that is NOT valid when those directives are ignored. In other words, it's possible to write a legal OMP program that relies on parallelism to function correctly. In practice this doesn't happen in production codes but it's wrong to say the compiler can just ignore directives with no problems whatsoever.> 2) Make procedurization (including all runtime calls -- no intrinsics > left after this step) at the very start of LLVM optimizer. No changes > to optimizations, but no opportunity to optimize parallel code. As > cheap and easy as one can do to support OpenMP. This might be a good > choice for initial implementation.This should work fine, but then why support intrinsics in LLVM at all. I understand you're talking about an initial implementation.> 3) Do some carefully chosen optimizations before procedurization. Do > heavylifting (like loop restructuring optimizations) after > procedurization. Some effort, a lot of benefit. This is essentially > what is described in [Tian05] (referenced in our proposal).What are the important optimizations?> 4) Make all optimizations thread-aware. Best approach in theory, no > compilers exist that go as far.This is probably not practical. It may be fine in academia but in production environments the resources don't exist, unfortunately. -David
Chris Lattner
2012-Oct-03 05:26 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
On Oct 2, 2012, at 3:09 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Chris, > >> My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata. I can't imagine a way to make this work *correctly* without massive changes to the optimizer. > > There are three ways to make this work correctly: > > 1) Ignore OpenMP-related intrinsics and associated metadata. Least > effort, least benefit (no OpenMP support).This is trivially true, but the entire point of supporting OpenMP in the IR would be to have some sort of late "procedurization" pass that actually exposes the parallelism through some runtime. Saying that we could just ignore this is silly: if we wanted to ignore OpenMP, we can do that in the frontend with far less complexity. In fact, we're already done! ;-)> 2) Make procedurization (including all runtime calls -- no intrinsics > left after this step) at the very start of LLVM optimizer. No changes > to optimizations, but no opportunity to optimize parallel code. As > cheap and easy as one can do to support OpenMP. This might be a good > choice for initial implementation. > > 3) Do some carefully chosen optimizations before procedurization. Do > heavylifting (like loop restructuring optimizations) after > procedurization. Some effort, a lot of benefit. This is essentially > what is described in [Tian05] (referenced in our proposal).I think you're missing the point here. The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer. Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms. This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering. My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches: 1. Early procedurization (e.g. in the frontend that produces LLVM IR). This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early. 2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4). While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it.> 4) Make all optimizations thread-aware. Best approach in theory, no > compilers exist that go as far.It's not clear to me exactly what sorts of optimizations that late procedurization is attempting to allow. I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model. However, the technical benefits of this design are not clear to me, and I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature. This is exactly the sort of thing that I want to avoid in LLVM. -Chris
Andrey Bokhanko
2012-Oct-03 07:56 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
Chris,> I think you're missing the point here. The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer. Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms. This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering.Well, at some point compiler *has* to insert runtime library calls. This is true for all proposals, both existing and potential ones. Do you mean that runtime calls must be inserted either strictly before LLVM optimizer or strictly after it -- no other place? More on this later. As for treating IR with/without OpenMP intrinsics as separate forms, this is a matter of personal taste and design choice, I guess. Is strength reduction (that replaces multiplications with additions) transforms IR into another "form"?> My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches: > > 1. Early procedurization (e.g. in the frontend that produces LLVM IR). This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early. > > 2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4). While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it.I understand your point... and respectfully disagree with it. You basically say that it is all or nothing at all: either *no* optimizations on parallel code (runtime calls inserted before LLVM optimizer), or *all* optimizations workable on parallel code (calls inserted after LLVM optimizer). In former case we lose *all* optimizations, not some. As for latter, I share your skepticism -- and duplicate it.> I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model.Yes and yes. And one more: "the proof is in the pudding", or so they say. Intel Compiler (that, as you correctly noted, uses essentially the same design) is the metaphorical "pudding" that proves viability and good performance potential of the approach we proposed.> I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature.Hmmm... One has to analyze Intel Compiler bugs statistics to make this assertion, but this is certainly not being impression. Yours, Andrey --- Software Engineer Intel Compiler Team Intel Corp. On Wed, Oct 3, 2012 at 9:26 AM, Chris Lattner <clattner at apple.com> wrote:> On Oct 2, 2012, at 3:09 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote: >> Chris, >> >>> My comment was mostly in response to the Intel proposal, which effectively translates OpenMP pragmas directly into llvm intrinsics + metadata. I can't imagine a way to make this work *correctly* without massive changes to the optimizer. >> >> There are three ways to make this work correctly: >> >> 1) Ignore OpenMP-related intrinsics and associated metadata. Least >> effort, least benefit (no OpenMP support). > > This is trivially true, but the entire point of supporting OpenMP in the IR would be to have some sort of late "procedurization" pass that actually exposes the parallelism through some runtime. Saying that we could just ignore this is silly: if we wanted to ignore OpenMP, we can do that in the frontend with far less complexity. In fact, we're already done! ;-) > >> 2) Make procedurization (including all runtime calls -- no intrinsics >> left after this step) at the very start of LLVM optimizer. No changes >> to optimizations, but no opportunity to optimize parallel code. As >> cheap and easy as one can do to support OpenMP. This might be a good >> choice for initial implementation. >> >> 3) Do some carefully chosen optimizations before procedurization. Do >> heavylifting (like loop restructuring optimizations) after >> procedurization. Some effort, a lot of benefit. This is essentially >> what is described in [Tian05] (referenced in our proposal). > > I think you're missing the point here. The whole idea of LLVM IR is that it doesn't have various "forms" that are valid at different points in the optimizer. Even very late lowering passes like strength reduction are pure IR to IR passes that do not introduce special forms. This is in stark contrast to other compilers (e.g. Open64) which have several levels of lowering. > > My whole objection comes from the (possibly incorrect, I am not an OpenMP expert!) idea that there are only two reasonable implementation approaches: > > 1. Early procedurization (e.g. in the frontend that produces LLVM IR). This is very easy to preserve and correctness is trivial, but you lose some (theoretical?) optimization benefits by doing procedurization early. > > 2. Late procedurization where the IR has explicit parallelism constructs and all optimizers preserve its correctness requirements (this is your #4). While this is possible in theory, I'm skeptical that this could make sense, and your proposal certainly isn't the right way to do it. > >> 4) Make all optimizations thread-aware. Best approach in theory, no >> compilers exist that go as far. > > It's not clear to me exactly what sorts of optimizations that late procedurization is attempting to allow. I understand that this is the design that the Intel compiler uses, and you are motivated to make LLVM fit that model. However, the technical benefits of this design are not clear to me, and I also understand that late procedurization has been a continuous source of subtle correctness bugs that are still being found even though the product is mature. This is exactly the sort of thing that I want to avoid in LLVM. > > -Chris
Andrey Bokhanko
2012-Oct-03 08:30 UTC
[LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
David,> Actually, it is perfectly possible to have a program with OpenMP > directives that is NOT valid when those directives are ignored. In > other words, it's possible to write a legal OMP program that relies on > parallelism to function correctly. In practice this doesn't happen in > production codes but it's wrong to say the compiler can just ignore > directives with no problems whatsoever.You might be right. But this is as good as one can do compiling an OpenMP program for a target with no OpenMP support.> What are the important optimizations?You mean "that should be done before procedurization"? As you understand, there is only way to know -- try it. As been mentioned elsewhere, Intel Compiler employs essentially the same design as we proposed. [Tian05] (use this link to access the paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.3763&rep=rep1&type=pdf) describes phase ordering that Intel Compiler developers found to provide good performance while preserving correctness.>> 4) Make all optimizations thread-aware. Best approach in theory, no >> compilers exist that go as far. > > This is probably not practical. It may be fine in academia but in > production environments the resources don't exist, unfortunately.I do agree! :-) That's why we propose what we propose -- the design leaves all doors opened. Yours, Andrey --- Software Engineer Intel Compiler Team Intel Corp.
Apparently Analagous Threads
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.)
- [LLVMdev] [RFC] OpenMP Representation in LLVM IR
- [LLVMdev] [RFC] OpenMP Representation in LLVM IR
- [LLVMdev] [RFC] OpenMP Representation in LLVM IR