The PNaCl project has implemented various IR simplification passes that simplify LLVM IR by lowering complex features to simpler features. We'd like to upstream some of these IR passes to LLVM. We'd like to explore if this acceptable, and if so, how we should go about doing this. The immediate reason is that Emscripten is reusing PNaCl's IR passes for its new "fastcomp" backend [1]. It would be really useful if PNaCl and Emscripten could collaborate via upstream LLVM rather than a branch. Some background: There are two related use cases for these IR simplification passes: 1) Simplifying the task of writing a new LLVM backend. This is Emscripten's use case. The IR simplification passes reduce the number of cases a backend has to handle, so they would be useful for anyone else creating a new backend. 2) Using a subset of LLVM IR as a stable distribution format for portable executables. This is PNaCl's use case. PNaCl's IR subset omits various complex IR features, which we lower using the IR simplification passes [2]. Renderscript is an example of another project that uses IR as a stable distribution format, though I think currently Renderscript is not subsetting IR much. Some examples of PNaCl's IR simplification passes are: * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower varargs and by-value argument passing respectively. They would be useful for any backend that doesn't want to implement varargs or by-value calling conventions. * Instruction-level lowering: * ExpandStructRegs splits up struct values into scalars, removing the "insertvalue" and "extractvalue" instructions. * PromoteIntegers legalizes integer types (e.g. i30 is converted to i32). * Module-level lowering: This implements, at the IR level, functionality that is traditionally provided by "ld". e.g. ExpandCtors lowers llvm.global_ctors to the __init_array_start and __init_array_end symbols that are used by C libraries at startup. PNaCl's IR simplification passes are modular -- most are independent of each other -- so they allow projects to pick and choose which IR features to support and which to pre-lower. The modularity of these passes makes them low-maintenance and easy to write targeted tests for. The code for these passes can be found here: https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/ There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. The PNaCl team (which I'm a member of) is happy to take on the work of maintaining this code, such as updating it as LLVM IR evolves and doing code reviews. We would upstream this gradually, pass by pass, so the changes would be manageable. Cheers, Mark [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend [2] https://groups.google.com/forum/#!topic/llvm-dev/lk6dZzwW0ls - PNaCl Bitcode reference manual -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/86e2dde3/attachment.html>
To add to what Mark mentioned about Emscripten's new backend [1] using the PNaCl passes: It made writing the backend much easier than it otherwise would have been, given our requirements - we are an 'odd' target in that we want to transform LLVM IR into JavaScript, then run it through our existing external JavaScript optimizer tool, which does very JavaScript-specific optimizations (on a JavaScript AST which is the natural form for us), and for that reason we don't use the common backend codegen path. Basically the PNaCl simplification passes convert LLVM IR into a smaller and simpler subset of LLVM IR, which makes writing a backend that processes LLVM IR more convenient. I think there are other use cases as well that could benefit from these passes being upstream. While typically a backend would want to use the common codegen to get register allocation and so forth, there are situations where you just want to transform LLVM IR into something else. For example in a university course you could teach people compiler optimizations using LLVM IR, then have them write a tiny backend that compiles that IR into a familiar language (Python, Java, anything else that they already know) to execute it (lli also works of course, but this might feel more "concrete" for the students, and they would learn more I suspect). Writing that backend in a way that processes LLVM IR means you only need them to understand LLVM IR and not anything about the selection DAG etc. Also, there are situations where performance is really not a concern, like someone writing a backend for a little VM they invented for fun and just want to execute small amounts of C code on it - for example this happened with the DCPU-16 spec, and people made an LLVM backend for it. In summary, I think the shared thing in these examples is that LLVM IR is very nice to work with, and there are some situations where you're using it and you have a reason to convert it into something else, and you want to do that in as _simple_ a way as possible as opposed to generating the most _optimal_ results. The PNaCl IR simplification passes are in my opinion a big help there. - Alon [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org> wrote:> The PNaCl project has implemented various IR simplification passes that > simplify LLVM IR by lowering complex features to simpler features. We'd > like to upstream some of these IR passes to LLVM. We'd like to explore if > this acceptable, and if so, how we should go about doing this. > > The immediate reason is that Emscripten is reusing PNaCl's IR passes for > its new "fastcomp" backend [1]. It would be really useful if PNaCl and > Emscripten could collaborate via upstream LLVM rather than a branch. > > Some background: There are two related use cases for these IR > simplification passes: > > 1) Simplifying the task of writing a new LLVM backend. This is > Emscripten's use case. The IR simplification passes reduce the number of > cases a backend has to handle, so they would be useful for anyone else > creating a new backend. > > 2) Using a subset of LLVM IR as a stable distribution format for portable > executables. This is PNaCl's use case. PNaCl's IR subset omits various > complex IR features, which we lower using the IR simplification passes [2]. > Renderscript is an example of another project that uses IR as a stable > distribution format, though I think currently Renderscript is not > subsetting IR much. > > Some examples of PNaCl's IR simplification passes are: > > * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower > varargs and by-value argument passing respectively. They would be useful > for any backend that doesn't want to implement varargs or by-value calling > conventions. > > * Instruction-level lowering: > * ExpandStructRegs splits up struct values into scalars, removing the > "insertvalue" and "extractvalue" instructions. > * PromoteIntegers legalizes integer types (e.g. i30 is converted to > i32). > > * Module-level lowering: This implements, at the IR level, functionality > that is traditionally provided by "ld". e.g. ExpandCtors lowers > llvm.global_ctors to the __init_array_start and __init_array_end symbols > that are used by C libraries at startup. > > PNaCl's IR simplification passes are modular -- most are independent of > each other -- so they allow projects to pick and choose which IR features > to support and which to pre-lower. The modularity of these passes makes > them low-maintenance and easy to write targeted tests for. > > The code for these passes can be found here: > > https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/ > > There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM > already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. > > The PNaCl team (which I'm a member of) is happy to take on the work of > maintaining this code, such as updating it as LLVM IR evolves and doing > code reviews. We would upstream this gradually, pass by pass, so the > changes would be manageable. > > Cheers, > Mark > > [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend > [2] https://groups.google.com/forum/#!topic/llvm-dev/lk6dZzwW0ls - PNaCl > Bitcode reference manual > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/c1214e5f/attachment.html>
On Tue, Mar 4, 2014 at 4:04 PM, Mark Seaborn <mseaborn at chromium.org> wrote:> The PNaCl project has implemented various IR simplification passes that > simplify LLVM IR by lowering complex features to simpler features. We'd > like to upstream some of these IR passes to LLVM. We'd like to explore if > this acceptable, and if so, how we should go about doing this. > > The immediate reason is that Emscripten is reusing PNaCl's IR passes for > its new "fastcomp" backend [1]. It would be really useful if PNaCl and > Emscripten could collaborate via upstream LLVM rather than a branch. > > Some background: There are two related use cases for these IR > simplification passes: > > 1) Simplifying the task of writing a new LLVM backend. This is > Emscripten's use case. The IR simplification passes reduce the number of > cases a backend has to handle, so they would be useful for anyone else > creating a new backend. >FWIW, this sounds to me like a sufficiently compelling use case to support getting this in-tree. -- Sean Silva> > 2) Using a subset of LLVM IR as a stable distribution format for portable > executables. This is PNaCl's use case. PNaCl's IR subset omits various > complex IR features, which we lower using the IR simplification passes [2]. > Renderscript is an example of another project that uses IR as a stable > distribution format, though I think currently Renderscript is not > subsetting IR much. > > Some examples of PNaCl's IR simplification passes are: > > * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower > varargs and by-value argument passing respectively. They would be useful > for any backend that doesn't want to implement varargs or by-value calling > conventions. > > * Instruction-level lowering: > * ExpandStructRegs splits up struct values into scalars, removing the > "insertvalue" and "extractvalue" instructions. > * PromoteIntegers legalizes integer types (e.g. i30 is converted to > i32). > > * Module-level lowering: This implements, at the IR level, functionality > that is traditionally provided by "ld". e.g. ExpandCtors lowers > llvm.global_ctors to the __init_array_start and __init_array_end symbols > that are used by C libraries at startup. > > PNaCl's IR simplification passes are modular -- most are independent of > each other -- so they allow projects to pick and choose which IR features > to support and which to pre-lower. The modularity of these passes makes > them low-maintenance and easy to write targeted tests for. > > The code for these passes can be found here: > > https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/ > > There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM > already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. > > The PNaCl team (which I'm a member of) is happy to take on the work of > maintaining this code, such as updating it as LLVM IR evolves and doing > code reviews. We would upstream this gradually, pass by pass, so the > changes would be manageable. > > Cheers, > Mark > > [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend > [2] https://groups.google.com/forum/#!topic/llvm-dev/lk6dZzwW0ls - PNaCl > Bitcode reference manual > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/07ae1b1b/attachment.html>
Chandler Carruth
2014-Mar-04 23:17 UTC
[LLVMdev] Upstreaming PNaCl's IR simplification passes
On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org> wrote:> The PNaCl project has implemented various IR simplification passes that > simplify LLVM IR by lowering complex features to simpler features. We'd > like to upstream some of these IR passes to LLVM. We'd like to explore if > this acceptable, and if so, how we should go about doing this. >My question is somewhat different. I'm not questioning whether these are acceptable, I'm questioning why these are interesting and important for the LLVM project. Neither PNaCl nor Emscripten open source projects have extensive developer overlap with the LLVM community, and the developers have not (so far) become super active maintainers of LLVM, although your recent patches to fix some bugs uncovered by PNaCl have been much appreciated. These lowering passes are likely to have few (most likely, zero) in-tree users for the foreseeable future. I'm not enthusiastic about the community taking on the maintenance, update, and code review burden of these. I would point you at the several emails I have written to folks adding new significant features to LLVM about how to offset this by contributing maintenance and improvements to the core infrastructure, fixing bugs and generally making things better sufficient to offset the ongoing complexity cost of the new features. Fortunately, the PNaCl passes seem somewhat less complex than (for instance) the x32 backend, but they seem likely to still add a reasonable amount of complexity. They will certainly be challenging to review and get the design into an acceptable state across the community. At this point, I'm not really optimistic about there being a large enough body of community members excited about getting these passes in to offset these costs. I'm happy to be proven wrong of course, and would also be happy to see you, other PNaCl developers, or Emscripten developers become more active in the community in order to build this trust and establish a good basis for these to go into LLVM.> > The immediate reason is that Emscripten is reusing PNaCl's IR passes for > its new "fastcomp" backend [1]. It would be really useful if PNaCl and > Emscripten could collaborate via upstream LLVM rather than a branch. >While this does seem like a useful thing for your two projects, it isn't clear why this benefits the LLVM community. Perhaps it does, but I'd like to see that clarified.> Some background: There are two related use cases for these IR > simplification passes: > > 1) Simplifying the task of writing a new LLVM backend. This is > Emscripten's use case. The IR simplification passes reduce the number of > cases a backend has to handle, so they would be useful for anyone else > creating a new backend. >If these simplify writing a backend, why wouldn't the patches include commensurate simplifications to LLVM's backends? That would both give them an in-tree customer, and more immediate value to the community and project as a whole.> > 2) Using a subset of LLVM IR as a stable distribution format for portable > executables. This is PNaCl's use case. PNaCl's IR subset omits various > complex IR features, which we lower using the IR simplification passes [2]. > Renderscript is an example of another project that uses IR as a stable > distribution format, though I think currently Renderscript is not > subsetting IR much. >Given that the bitcode is stable, I don't understand why this is important. What technical problems are you solving other than making the IR match some predetermined form chosen by PNaCl?> > Some examples of PNaCl's IR simplification passes are: >I have a bunch of questions about the specific passes you mention. Perhaps these questions are better answered in the review thread for the patches, but they are at least things that I would think about and try to address if and when you send out the code review.> > * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower > varargs and by-value argument passing respectively. They would be useful > for any backend that doesn't want to implement varargs or by-value calling > conventions. >Why wouldn't these be applicable to existing backends? What is hard about the existing representations?> > * Instruction-level lowering: > * ExpandStructRegs splits up struct values into scalars, removing the > "insertvalue" and "extractvalue" instructions. >There are already passes that do this outside of function arguments and return values. Why is a new one needed? How do you handle the overflow-detecting operations?> * PromoteIntegers legalizes integer types (e.g. i30 is converted to > i32). >Does it split up too-wide integers? Do we really want another integer legalization framework in LLVM? I am actually interested in doing (partial) legalization in the IR during lowering (codegenprep time) in order to simplify the backend, but I don't think we should develop such a framework independently of the legalization currently used in the backends.> > * Module-level lowering: This implements, at the IR level, functionality > that is traditionally provided by "ld". e.g. ExpandCtors lowers > llvm.global_ctors to the __init_array_start and __init_array_end symbols > that are used by C libraries at startup. >This doesn't make any sense to me. The IR representation is strictly simpler. It is trivially lowered in a backend. I don't understand what this would benefit.> There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM > already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. >Note that these are quite different -- they lower from a front-end convenient form toward the canonical IR form. You are talking about something totally different that deals with target-oriented lowering. The correct place to look for analogies is CodeGenPrep.> The PNaCl team (which I'm a member of) is happy to take on the work of > maintaining this code, such as updating it as LLVM IR evolves and doing > code reviews. We would upstream this gradually, pass by pass, so the > changes would be manageable. >While this is appreciated, the PNaCl team should work to much more actively contribute to the core of LLVM if it wants to be trusted to maintain this code. All of that said, while I have a lot of concerns, I do want to clarify something: I actually think that this is the correct fundamental direction for LLVM. I *want* to see PNaCl and Emscripten both be significantly more involved in the community, and I think that using lowering to simplify backends is a Very Good Thing. However, I think that unless there is a significant consensus amongst the active LLVM developers that they are OK accepting and maintaining these patches (currently, I'm not), I think that the community engagement needs to happen first. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/64a7eae2/attachment.html>
Chandler Carruth
2014-Mar-04 23:18 UTC
[LLVMdev] Upstreaming PNaCl's IR simplification passes
On Tue, Mar 4, 2014 at 3:11 PM, Sean Silva <chisophugis at gmail.com> wrote:> On Tue, Mar 4, 2014 at 4:04 PM, Mark Seaborn <mseaborn at chromium.org>wrote: > >> The PNaCl project has implemented various IR simplification passes that >> simplify LLVM IR by lowering complex features to simpler features. We'd >> like to upstream some of these IR passes to LLVM. We'd like to explore if >> this acceptable, and if so, how we should go about doing this. >> >> The immediate reason is that Emscripten is reusing PNaCl's IR passes for >> its new "fastcomp" backend [1]. It would be really useful if PNaCl and >> Emscripten could collaborate via upstream LLVM rather than a branch. >> >> Some background: There are two related use cases for these IR >> simplification passes: >> >> 1) Simplifying the task of writing a new LLVM backend. This is >> Emscripten's use case. The IR simplification passes reduce the number of >> cases a backend has to handle, so they would be useful for anyone else >> creating a new backend. >> > > FWIW, this sounds to me like a sufficiently compelling use case to support > getting this in-tree. >Just in case it gets lost in my longer reply, I want to emphasize that if these will be used to simplify the in-tree backends and those backend maintainers are on board, then I am *totally* in favor of this going into the tree. My concerns are heavily based on the fact that as proposed, none of that seems likely to happen. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/8bf9ab5a/attachment.html>
I like this and would love to see it in the tree. I think it's broadly useful to projects that want to take IR as input and then do interests things with it. -Fil> On Mar 4, 2014, at 1:04 PM, Mark Seaborn <mseaborn at chromium.org> wrote: > > The PNaCl project has implemented various IR simplification passes that simplify LLVM IR by lowering complex features to simpler features. We'd like to upstream some of these IR passes to LLVM. We'd like to explore if this acceptable, and if so, how we should go about doing this. > > The immediate reason is that Emscripten is reusing PNaCl's IR passes for its new "fastcomp" backend [1]. It would be really useful if PNaCl and Emscripten could collaborate via upstream LLVM rather than a branch. > > Some background: There are two related use cases for these IR simplification passes: > > 1) Simplifying the task of writing a new LLVM backend. This is Emscripten's use case. The IR simplification passes reduce the number of cases a backend has to handle, so they would be useful for anyone else creating a new backend. > > 2) Using a subset of LLVM IR as a stable distribution format for portable executables. This is PNaCl's use case. PNaCl's IR subset omits various complex IR features, which we lower using the IR simplification passes [2]. Renderscript is an example of another project that uses IR as a stable distribution format, though I think currently Renderscript is not subsetting IR much. > > Some examples of PNaCl's IR simplification passes are: > > * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower varargs and by-value argument passing respectively. They would be useful for any backend that doesn't want to implement varargs or by-value calling conventions. > > * Instruction-level lowering: > * ExpandStructRegs splits up struct values into scalars, removing the "insertvalue" and "extractvalue" instructions. > * PromoteIntegers legalizes integer types (e.g. i30 is converted to i32). > > * Module-level lowering: This implements, at the IR level, functionality that is traditionally provided by "ld". e.g. ExpandCtors lowers llvm.global_ctors to the __init_array_start and __init_array_end symbols that are used by C libraries at startup. > > PNaCl's IR simplification passes are modular -- most are independent of each other -- so they allow projects to pick and choose which IR features to support and which to pre-lower. The modularity of these passes makes them low-maintenance and easy to write targeted tests for. > > The code for these passes can be found here: > https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/ > > There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. > > The PNaCl team (which I'm a member of) is happy to take on the work of maintaining this code, such as updating it as LLVM IR evolves and doing code reviews. We would upstream this gradually, pass by pass, so the changes would be manageable. > > Cheers, > Mark > > [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend > [2] https://groups.google.com/forum/#!topic/llvm-dev/lk6dZzwW0ls - PNaCl Bitcode reference manual > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/68b10e1c/attachment.html>
On Tue, Mar 4, 2014 at 6:17 PM, Chandler Carruth <chandlerc at google.com>wrote:> On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org>wrote: > >> The PNaCl project has implemented various IR simplification passes that >> simplify LLVM IR by lowering complex features to simpler features. We'd >> like to upstream some of these IR passes to LLVM. We'd like to explore if >> this acceptable, and if so, how we should go about doing this. >> > > My question is somewhat different. I'm not questioning whether these are > acceptable, I'm questioning why these are interesting and important for the > LLVM project. > > Neither PNaCl nor Emscripten open source projects have extensive developer > overlap with the LLVM community, and the developers have not (so far) > become super active maintainers of LLVM, although your recent patches to > fix some bugs uncovered by PNaCl have been much appreciated. These lowering > passes are likely to have few (most likely, zero) in-tree users for the > foreseeable future. I'm not enthusiastic about the community taking on the > maintenance, update, and code review burden of these. > > I would point you at the several emails I have written to folks adding new > significant features to LLVM about how to offset this by contributing > maintenance and improvements to the core infrastructure, fixing bugs and > generally making things better sufficient to offset the ongoing complexity > cost of the new features. Fortunately, the PNaCl passes seem somewhat less > complex than (for instance) the x32 backend, but they seem likely to still > add a reasonable amount of complexity. They will certainly be challenging > to review and get the design into an acceptable state across the community. > At this point, I'm not really optimistic about there being a large enough > body of community members excited about getting these passes in to offset > these costs. I'm happy to be proven wrong of course, and would also be > happy to see you, other PNaCl developers, or Emscripten developers become > more active in the community in order to build this trust and establish a > good basis for these to go into LLVM. > > >> >> The immediate reason is that Emscripten is reusing PNaCl's IR passes for >> its new "fastcomp" backend [1]. It would be really useful if PNaCl and >> Emscripten could collaborate via upstream LLVM rather than a branch. >> > > While this does seem like a useful thing for your two projects, it isn't > clear why this benefits the LLVM community. Perhaps it does, but I'd like > to see that clarified. >I think Alon's point about easing the task for students/people learning (or playing with) LLVM is pretty strong. People playing around with LLVM today are tomorrow's contributors. If we can get them to that feeling of "win" faster, they are more likely to stick with the project.> > >> Some background: There are two related use cases for these IR >> simplification passes: >> >> 1) Simplifying the task of writing a new LLVM backend. This is >> Emscripten's use case. The IR simplification passes reduce the number of >> cases a backend has to handle, so they would be useful for anyone else >> creating a new backend. >> > > If these simplify writing a backend, why wouldn't the patches include > commensurate simplifications to LLVM's backends? That would both give them > an in-tree customer, and more immediate value to the community and project > as a whole. >I'd also like to add: If these simplify writing a backend, should there be commensurate changes to any relevant documentation for getting started writing backends? (we don't have much such documentation though...) (such documentation could also be construed as an in-tree customer if indeed this would simplify it).> > >> >> 2) Using a subset of LLVM IR as a stable distribution format for >> portable executables. This is PNaCl's use case. PNaCl's IR subset omits >> various complex IR features, which we lower using the IR simplification >> passes [2]. Renderscript is an example of another project that uses IR as >> a stable distribution format, though I think currently Renderscript is not >> subsetting IR much. >> > > Given that the bitcode is stable, I don't understand why this is > important. What technical problems are you solving other than making the IR > match some predetermined form chosen by PNaCl? > > >> >> Some examples of PNaCl's IR simplification passes are: >> > > I have a bunch of questions about the specific passes you mention. Perhaps > these questions are better answered in the review thread for the patches, > but they are at least things that I would think about and try to address if > and when you send out the code review. > > >> >> * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower >> varargs and by-value argument passing respectively. They would be useful >> for any backend that doesn't want to implement varargs or by-value calling >> conventions. >> > > Why wouldn't these be applicable to existing backends? What is hard about > the existing representations? > > >> >> * Instruction-level lowering: >> * ExpandStructRegs splits up struct values into scalars, removing the >> "insertvalue" and "extractvalue" instructions. >> > > There are already passes that do this outside of function arguments and > return values. Why is a new one needed? How do you handle the > overflow-detecting operations? > > > >> * PromoteIntegers legalizes integer types (e.g. i30 is converted to >> i32). >> > > Does it split up too-wide integers? Do we really want another integer > legalization framework in LLVM? I am actually interested in doing (partial) > legalization in the IR during lowering (codegenprep time) in order to > simplify the backend, but I don't think we should develop such a framework > independently of the legalization currently used in the backends. > > >> >> * Module-level lowering: This implements, at the IR level, >> functionality that is traditionally provided by "ld". e.g. ExpandCtors >> lowers llvm.global_ctors to the __init_array_start and __init_array_end >> symbols that are used by C libraries at startup. >> > > This doesn't make any sense to me. The IR representation is strictly > simpler. It is trivially lowered in a backend. I don't understand what this > would benefit. >It might be simpler to do in the backend, but I think that the point is that it is a recurring cost in every backend; in particular for backends written by people starting out/playing around with LLVM (i.e. potential future contributors), where any potential performance loss is acceptable for the sake of simplifying things.> > >> There seems to be plenty of precedent for IR-to-IR lowering passes -- >> LLVM already contains passes such as LowerInvoke, LowerSwitch and >> LowerAtomic. >> > > Note that these are quite different -- they lower from a front-end > convenient form toward the canonical IR form. You are talking about > something totally different that deals with target-oriented lowering. The > correct place to look for analogies is CodeGenPrep. > > >> The PNaCl team (which I'm a member of) is happy to take on the work of >> maintaining this code, such as updating it as LLVM IR evolves and doing >> code reviews. We would upstream this gradually, pass by pass, so the >> changes would be manageable. >> > > While this is appreciated, the PNaCl team should work to much more > actively contribute to the core of LLVM if it wants to be trusted to > maintain this code. >Is eliben still on the PNaCl team? (e.g. < http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-June/063010.html>) I'd also like to point out that IR-level passes are pretty much LLVM's strongest point of decoupling and modularization, so of all code changes to have no in-tree users (if indeed there are none), this is probably a best-case scenario from a maintainability perspective (especially if it becomes the point of collaboration for Emscripten and PNaCl). -- Sean Silva> > > > All of that said, while I have a lot of concerns, I do want to clarify > something: I actually think that this is the correct fundamental direction > for LLVM. I *want* to see PNaCl and Emscripten both be significantly more > involved in the community, and I think that using lowering to simplify > backends is a Very Good Thing. However, I think that unless there is a > significant consensus amongst the active LLVM developers that they are OK > accepting and maintaining these patches (currently, I'm not), I think that > the community engagement needs to happen first. > > -Chandler > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/e4e6ac87/attachment.html>
Chris Lattner
2014-Mar-05 01:15 UTC
[LLVMdev] Upstreaming PNaCl's IR simplification passes
On Mar 4, 2014, at 3:17 PM, Chandler Carruth <chandlerc at google.com> wrote:> On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org> wrote: > The PNaCl project has implemented various IR simplification passes that simplify LLVM IR by lowering complex features to simpler features. We'd like to upstream some of these IR passes to LLVM. We'd like to explore if this acceptable, and if so, how we should go about doing this. > > My question is somewhat different. I'm not questioning whether these are acceptable, I'm questioning why these are interesting and important for the LLVM project.I share Chandler's concern. If these aren't actively used by something in tree, they will bit rot. The way to counter the bit rot would be to add extensive testcases... but that would just add an even larger burden on core LLVM developers to keep them up to date. We have seen similar "obviously useful" pieces of infrastructure fall to the same fate (e.g., the C backend, which incidentally had very similar utilities back when it was alive). Why would this be any different? -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/0866b289/attachment.html>
Sorry to reply to myself, I thought of something else I should have mentioned before. Emscripten hopes to eventually upstream its JavaScript backend, if there is interest. It's a work in progress and far from ready for that right now, and there are probably lots of issues to figure out regarding that (again, far too early to get into detail), but one thing will be the dependence of the backend on the PNaCl IR simplification passes - I guess if they are not upstream at that point, we'd have to figure things out then. - Alon On Tue, Mar 4, 2014 at 2:25 PM, Alon Zakai <alonzakai at gmail.com> wrote:> To add to what Mark mentioned about Emscripten's new backend [1] using the > PNaCl passes: It made writing the backend much easier than it otherwise > would have been, given our requirements - we are an 'odd' target in that we > want to transform LLVM IR into JavaScript, then run it through our existing > external JavaScript optimizer tool, which does very JavaScript-specific > optimizations (on a JavaScript AST which is the natural form for us), and > for that reason we don't use the common backend codegen path. Basically the > PNaCl simplification passes convert LLVM IR into a smaller and simpler > subset of LLVM IR, which makes writing a backend that processes LLVM IR > more convenient. > > I think there are other use cases as well that could benefit from these > passes being upstream. While typically a backend would want to use the > common codegen to get register allocation and so forth, there are > situations where you just want to transform LLVM IR into something else. > For example in a university course you could teach people compiler > optimizations using LLVM IR, then have them write a tiny backend that > compiles that IR into a familiar language (Python, Java, anything else that > they already know) to execute it (lli also works of course, but this might > feel more "concrete" for the students, and they would learn more I > suspect). Writing that backend in a way that processes LLVM IR means you > only need them to understand LLVM IR and not anything about the selection > DAG etc. Also, there are situations where performance is really not a > concern, like someone writing a backend for a little VM they invented for > fun and just want to execute small amounts of C code on it - for example > this happened with the DCPU-16 spec, and people made an LLVM backend for it. > > In summary, I think the shared thing in these examples is that LLVM IR is > very nice to work with, and there are some situations where you're using it > and you have a reason to convert it into something else, and you want to do > that in as _simple_ a way as possible as opposed to generating the most > _optimal_ results. The PNaCl IR simplification passes are in my opinion a > big help there. > > - Alon > > [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend > > > > On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org>wrote: > >> The PNaCl project has implemented various IR simplification passes that >> simplify LLVM IR by lowering complex features to simpler features. We'd >> like to upstream some of these IR passes to LLVM. We'd like to explore if >> this acceptable, and if so, how we should go about doing this. >> >> The immediate reason is that Emscripten is reusing PNaCl's IR passes for >> its new "fastcomp" backend [1]. It would be really useful if PNaCl and >> Emscripten could collaborate via upstream LLVM rather than a branch. >> >> Some background: There are two related use cases for these IR >> simplification passes: >> >> 1) Simplifying the task of writing a new LLVM backend. This is >> Emscripten's use case. The IR simplification passes reduce the number of >> cases a backend has to handle, so they would be useful for anyone else >> creating a new backend. >> >> 2) Using a subset of LLVM IR as a stable distribution format for >> portable executables. This is PNaCl's use case. PNaCl's IR subset omits >> various complex IR features, which we lower using the IR simplification >> passes [2]. Renderscript is an example of another project that uses IR as >> a stable distribution format, though I think currently Renderscript is not >> subsetting IR much. >> >> Some examples of PNaCl's IR simplification passes are: >> >> * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower >> varargs and by-value argument passing respectively. They would be useful >> for any backend that doesn't want to implement varargs or by-value calling >> conventions. >> >> * Instruction-level lowering: >> * ExpandStructRegs splits up struct values into scalars, removing the >> "insertvalue" and "extractvalue" instructions. >> * PromoteIntegers legalizes integer types (e.g. i30 is converted to >> i32). >> >> * Module-level lowering: This implements, at the IR level, >> functionality that is traditionally provided by "ld". e.g. ExpandCtors >> lowers llvm.global_ctors to the __init_array_start and __init_array_end >> symbols that are used by C libraries at startup. >> >> PNaCl's IR simplification passes are modular -- most are independent of >> each other -- so they allow projects to pick and choose which IR features >> to support and which to pre-lower. The modularity of these passes makes >> them low-maintenance and easy to write targeted tests for. >> >> The code for these passes can be found here: >> >> https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/ >> >> There seems to be plenty of precedent for IR-to-IR lowering passes -- >> LLVM already contains passes such as LowerInvoke, LowerSwitch and >> LowerAtomic. >> >> The PNaCl team (which I'm a member of) is happy to take on the work of >> maintaining this code, such as updating it as LLVM IR evolves and doing >> code reviews. We would upstream this gradually, pass by pass, so the >> changes would be manageable. >> >> Cheers, >> Mark >> >> [1] https://github.com/kripken/emscripten/wiki/LLVM-Backend >> [2] https://groups.google.com/forum/#!topic/llvm-dev/lk6dZzwW0ls - PNaCl >> Bitcode reference manual >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/fa8f31fa/attachment.html>
There's a lot of questions in your post, so I'll focus on the technical questions about specific IR passes in this first reply... On 4 March 2014 15:17, Chandler Carruth <chandlerc at google.com> wrote:> On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn <mseaborn at chromium.org>wrote: > >> Some background: There are two related use cases for these IR >> simplification passes: >> > >> 1) Simplifying the task of writing a new LLVM backend. This is >> Emscripten's use case. The IR simplification passes reduce the number of >> cases a backend has to handle, so they would be useful for anyone else >> creating a new backend. >> > > If these simplify writing a backend, why wouldn't the patches include > commensurate simplifications to LLVM's backends? That would both give them > an in-tree customer, and more immediate value to the community and project > as a whole. >That's a good question. I'll have to have a look around in the LLVM backend code and see what parts could be replaced by one of PNaCl's simplification passes. One answer is that, in some cases, such as calling conventions and global constructor arrays, LLVM's backend is constrained to follow the ABIs for particular OSes and architectures. Compatibility makes complexity harder to remove. I'll elaborate more below. This only applies to a few of PNaCl's IR passes though.> > >> 2) Using a subset of LLVM IR as a stable distribution format for >> portable executables. This is PNaCl's use case. PNaCl's IR subset omits >> various complex IR features, which we lower using the IR simplification >> passes [2]. Renderscript is an example of another project that uses IR as >> a stable distribution format, though I think currently Renderscript is not >> subsetting IR much. >> > > Given that the bitcode is stable, I don't understand why this is important. >Is the bitcode format stable now? I heard talk that LLVM is trying to do this now, but I don't remember seeing an llvmdev thread stating that for sure. Was there a thread about it that I missed? I just remember hearing complaints last year that the format was still getting changed. :-)> * Calling conventions lowering: ExpandVarArgs and ExpandByVal lower >> varargs and by-value argument passing respectively. They would be useful >> for any backend that doesn't want to implement varargs or by-value calling >> conventions. >> > > Why wouldn't these be applicable to existing backends? What is hard about > the existing representations? >For the calling conventions lowering passes, you wouldn't want to use them in backends that have to match some existing architecture-specific ABI for calling conventions. For example, if you use ExpandVarArgs on x86, your .o file won't be able to successfully call the printf() function provided by libc.so, because the varargs calling conventions won't match. But for many targets that is not an issue, either because: * there is no existing architecture-specific ABI that LLVM must match, or * you're using static linking, or can make similar "closed world" assumptions, so that a module can use any calling conventions as long as they're used consistently within the module. Both of these are true for PNaCl and Emscripten. My suspicion is that one or both of these conditions will be true for other novel backends, such as for specialised architectures like GPUs. Aside from PNaCl and Emscripten, I am less familiar with other novel backends. So one of the things I had hoped to learn from this discussion was whether other backends would find these passes useful. So far we've had some people say that yes, they would.> * Instruction-level lowering: >> * ExpandStructRegs splits up struct values into scalars, removing the >> "insertvalue" and "extractvalue" instructions. >> > > There are already passes that do this outside of function arguments and > return values. Why is a new one needed? >Are you referring to the work that SelectionDAGBuilder.cpp does to convert insertvalue/extractvalue to a SelectionDAG? I don't think there's an IR-to-IR pass in LLVM for doing this, is there? The reason PNaCl needs an IR-to-IR pass is that PNaCl's stable IR omits insertvalue/extractvalue, in order to keep the format simple and reduce the set of constructs that a PNaCl translator implementation needs to handle. The reason Emscripten's fastcomp uses ExpandStructRegs is to keep Emscripten's backend simple, in the context that it doesn't use lib/CodeGen. And the reason we have to handle insertvalue/extractvalue at all is largely that Clang outputs them for uses of C++ method pointers. Otherwise, structs-as-registers aren't really used. At least, that was the case in 3.3 -- maybe some more uses have appeared since then.> How do you handle the overflow-detecting operations? >PNaCl has the ExpandArithWithOverflow pass, which lowers uses of llvm.*.with.overflow.*.> > * PromoteIntegers legalizes integer types (e.g. i30 is converted to >> i32). >> > > Does it split up too-wide integers? >PNaCl's version currently doesn't. Emscripten's fastcomp has a version which splits up 64-bit integer operations into 32-bit operations, which they need because Javascript doesn't support 64-bit integer arithmetic. PNaCl's version didn't need to do that because we were happy to support 64-bit arithmetic in PNaCl's stable ABI. However, we did find that unusual C bitfields caused Clang to generate integer types larger than 64-bit (which we don't support in PNaCl's stable ABI), so we started implementing a pass to split those up. We should probably sync up with Emscripten and reuse their code for that.> Do we really want another integer legalization framework in LLVM? >At the risk of not answering your question directly, LLVM already has two instruction selectors, SelectionDAG and FastISel. So another question might be, when is it OK to have multiple implementations that perform similar tasks using different approaches, and when is it not OK? What are the trade-offs involved here?> I am actually interested in doing (partial) legalization in the IR during > lowering (codegenprep time) in order to simplify the backend, but I don't > think we should develop such a framework independently of the legalization > currently used in the backends. > > >> >> * Module-level lowering: This implements, at the IR level, >> functionality that is traditionally provided by "ld". e.g. ExpandCtors >> lowers llvm.global_ctors to the __init_array_start and __init_array_end >> symbols that are used by C libraries at startup. >> > > This doesn't make any sense to me. The IR representation is strictly > simpler. It is trivially lowered in a backend. I don't understand what this > would benefit. >To elaborate: In PNaCl, pexes are statically linked modules in which running global constructors is handled by user code inside the pexe. The special llvm.global_ctors array isn't part of PNaCl's stable subset of IR, because there's no need for it to be. Running constructors is done in normal IR by the pexe's entry point, without constructors needing to be handled specially by PNaCl's IR format. LLVM's global_ctors construct is incomplete: it provides a mechanism, at the IR level, to declare functions to be run at startup, but it assumes that running these functions will be done by a runtime library. At the IR level, LLVM doesn't provide a way to implement a runtime library that can read that constructor list. ld linker scripts provide a way to do that -- e.g. on Linux, see /usr/lib/ldscripts/elf_i386.x, which defines __init_array_{start,end} -- but that's not at the IR level. ExpandCtors just provides a mechanism for a runtime library to list the constructor functions, purely at the IR level, without constructors having to be a special feature in the PNaCl ABI or in the Emscripten backend. There seems to be plenty of precedent for IR-to-IR lowering passes -- LLVM>> already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic. >> > > Note that these are quite different -- they lower from a front-end > convenient form toward the canonical IR form. >Those three passes don't lower towards canonical IR form -- unless we are taking "canonical IR form" to mean quite different things? LowerInvoke and LowerAtomic both strip out information irreversibly. LowerAtomic "lowers atomic intrinsics to non-atomic form for use in a known non-preemptible environment". LowerInvoke strips out exception handling by converting invokes to calls, so that landingpads, resumes, etc. become dead and can be removed by a later pass. (As an aside, LowerInvoke has an option for using SJLJ exception handling, but that option appears to be unused and replaced by lib/CodeGen/SjLjEHPrepare.cpp.) LowerSwitch "rewrites switch instructions with a sequence of branches, which allows targets to get away with not implementing the switch instruction until it is convenient". These three are very similar in function to PNaCl's IR simplification passes, since they reduce the set of language features that must be supported by a backend or by a stable IR format.> You are talking about something totally different that deals with > target-oriented lowering. The correct place to look for analogies is > CodeGenPrep. >CodeGenPrepare.cpp just contains optimisations, doesn't it? It doesn't lower any language features such that the feature is removed from the module, so it doesn't seem to be analogous to PNaCl's IR simplification passes, which do do that. e.g. LowerAtomic strips out atomicrmw entirely so that anything processing LowerAtomic's output doesn't have to handle atomicrmw at all. Similarly, ExpandByVal expands out "byval" entirely. If you're looking for backend IR-to-IR passes which lower language features, DwarfEHPrepare and SjLjEHPrepare are analogous to PNaCl's passes. DwarfEHPrepare only lowers resume instructions, while SjLjEHPrepare handles more. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140306/1e945ce5/attachment.html>