John Mosby
2009-Feb-26 22:02 UTC
[LLVMdev] Shrink Wrapping - RFC and initial implementation
Hello LLVMdev, I have been working with LLVM for just over a year now, mainly in the area of compilation for HDLs like SystemVerilog and SystemC. Most of this work dealt with translation to LLVM IR, representing concurrent languages with LLVM and using LLVM analyses and transforms for compiling onto proprietary simulation acceleration hardware. All of this work used the C back end exclusively, since I wanted a transparent and easily debuggable flow. To learn more about the code generator, I decided to try implementing shrink wrapping, a reasonably self-contained back end transformation pass. I now have a preliminary implemenation of shrink wrapping, done as an option to prologue/epilogue insertion under the switch --shrink-wrap. It is limited to X86 presently since that is the only target I have access to at the moment. The main features are: - Placing callee saved register (CSR) spills and restores in the CFG to tightly surround uses so that execution paths that do not use CSRs do not pay the spill/restore penalty. - Avoiding placment of spills/restores in loops: if a CSR is used inside a loop(nest), the spills are placed in the loop preheader, and restores are placed in the loop exit nodes (the successors of the loop _exiting_ nodes). - Covering paths without CSR uses: e.g. if a restore is placed in a join block, a matching spill is added to the end of all immediate predecessor blocks that are not reached by a spill. Similarly for saves placed in branch blocks. Since I ran into a non-trivial issue in developing this pass, I would like to submit my implementation as a "RFC + code/tests" rather than a typical contribution, and get people's opinions on how to proceed. The issue is that the code generator assumes all spills and restores of callee saved registers must be placed in the entry and return blocks resp. Shink wrapping violates this assumption, with the result that the frame offsets computed by PEI for frame-relative operands may be incorrect. My limited implementation uses a workaround that adjusts the generation of prologue code and the frame indices used by the target eliminateFrameIndex() when necessary. I am looking at several approaches, but I would like input from anyone who has an opinion. Finally, I realize that shrink wrapping is probably not high priority in the larger scheme of LLVM development, so I'm not expecting a huge response, but any ideas are certainly welcome. The patch and a test tarball are attached. I include basic tests that are run with the supplied Makefile. Thanks, John Mosby -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/4b4da3a6/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: shrink-wrapping.patch Type: application/octet-stream Size: 43255 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/4b4da3a6/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test-sw.tar.gz Type: application/x-gzip Size: 2272 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/4b4da3a6/attachment.bin>
Anton Korobeynikov
2009-Feb-27 00:33 UTC
[LLVMdev] Shrink Wrapping - RFC and initial implementation
Hello, John> My limited implementation uses a workaround that adjusts the > generation of prologue code and the frame indices used by > the target eliminateFrameIndex() when necessary. I am looking at > several approaches, but I would like input from anyone who > has an opinion.I haven't looked into the patch deep enough yet, but I have at least 2 questions: 1. How do all the stuff play with dynamic stack realignment? 2. It seems, that dwarf information about callee-saved registers is invalidated by your patch. This means, that you won't have sane stack traces in the debugger. Unwinding won't also work. Have you tried to compile some C++ code, which uses EH? -- With best regards, Anton Korobeynikov. Faculty of Mathematics & Mechanics, Saint Petersburg State University.
John Mosby
2009-Feb-27 01:52 UTC
[LLVMdev] Shrink Wrapping - RFC and initial implementation
Hi Anton, Thanks for your questions, that's what I'm looking for. On Thu, Feb 26, 2009 at 5:33 PM, Anton Korobeynikov <anton at korobeynikov.info> wrote:> Hello, John > > > My limited implementation uses a workaround that adjusts the > > generation of prologue code and the frame indices used by > > the target eliminateFrameIndex() when necessary. I am looking at > > several approaches, but I would like input from anyone who > > has an opinion. > I haven't looked into the patch deep enough yet, but I have at least 2 > questions: > 1. How do all the stuff play with dynamic stack realignment? > 2. It seems, that dwarf information about callee-saved registers is > invalidated by your patch. > This means, that you won't have sane stack traces in the debugger. > Unwinding won't also work. > Have you tried to compile some C++ code, which uses EH?Integrating shrink wrapping with dynamic stack realignment, debugging info, EH (and more) requires a more general (or more complete) way of treating callee-saved registers, and I did not attempt to tackle this in the patch. I meant to show a starting point for this work and get some questions coming in (working so far :-)). I am not far along enough with the two approaches I'm looking at to give more detail. I will have more worked out in a few days. I'm still coming up to speed in the code generator areas, so thanks for your patience! Again, thanks for your questions, John> > -- > With best regards, Anton Korobeynikov. > > Faculty of Mathematics & Mechanics, Saint Petersburg State University. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/20ae0327/attachment.html>
Evan Cheng
2009-Mar-01 03:05 UTC
[LLVMdev] Shrink Wrapping - RFC and initial implementation
On Feb 26, 2009, at 2:02 PM, John Mosby wrote:> Hello LLVMdev, > > I have been working with LLVM for just over a year now, mainly in > the area of compilation for HDLs like SystemVerilog and SystemC. > Most of this work dealt with translation to LLVM IR, representing > concurrent languages with LLVM and using LLVM analyses and transforms > for compiling onto proprietary simulation acceleration hardware. All > of this work used the C back end exclusively, since I wanted a > transparent > and easily debuggable flow.Welcome to the community.> > To learn more about the code generator, I decided to try > implementing shrink wrapping, a reasonably self-contained back end > transformation pass. > > I now have a preliminary implemenation of shrink wrapping, done as > an option to prologue/epilogue insertion under the switch --shrink- > wrap.Nice.> It is limited to X86 presently since that is the only target I have > access to at the moment.What part of this is target dependent? Is this due to emitPrologue / emitEpilogue being target specific?> > The main features are: > - Placing callee saved register (CSR) spills and restores in the > CFG to tightly surround uses > so that execution paths that do not use CSRs do not pay the > spill/restore penalty. > > - Avoiding placment of spills/restores in loops: if a CSR is used > inside a loop(nest), the spills > are placed in the loop preheader, and restores are placed in > the loop exit nodes (the > successors of the loop _exiting_ nodes). > > - Covering paths without CSR uses: e.g. if a restore is placed in > a join block, a matching spill > is added to the end of all immediate predecessor blocks that > are not reached by a spill. > Similarly for saves placed in branch blocks.Sounds great. It would help everyone if you can show some examples code.> > > Since I ran into a non-trivial issue in developing this pass, I > would like to submit my implementation as a "RFC + code/tests" > rather than a typical contribution, and get people's opinions on how > to proceed. > > The issue is that the code generator assumes all spills and restores > of callee saved registers must be placed in the entry and return > blocks resp. > Shink wrapping violates this assumption, with the result that the > frame offsets computed by PEI for frame-relative operands may be > incorrect.I am not sure how this would happen. Why would frame offsets be affected by where these instructions are placed?> > My limited implementation uses a workaround that adjusts the > generation of prologue code and the frame indices used by > the target eliminateFrameIndex() when necessary. I am looking at > several approaches, but I would like input from anyone who > has an opinion. >I think to do this right for every target is a big job. I'd like to see prologue / epilogue related stuff be moved out of TargetRegisterInfo. Shrink wrapping will only happen when the targets buy-in, i.e. providing the right hooks. When is shrink wrapping happening? Is it possible to do it after CSR spills and restores are inserted but before FI are lowered into sp / fp +/- offset?> Finally, I realize that shrink wrapping is probably not high > priority in the larger scheme of LLVM development, so I'm not > expecting > a huge response, but any ideas are certainly welcome.It's actually a fairly useful optimization. It can really help a class of functions, e.g. functions with early returns.> > The patch and a test tarball are attached. I include basic tests > that are run with the supplied Makefile.Some comments: 1. The code needs some refactoring. :-) It's a big chunk of code so it's hard to digest. 2. There doesn't seem to be a way to turn off shrink wrapping. Please make sure it's optional. When it's not being done, PEI should not require dominator, etc. From what I can see this is a very good first step. I look forward to seeing its completion. Evan> > Thanks, > John Mosby > <shrink-wrapping.patch><test- > sw.tar.gz>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
John Mosby
2009-Mar-01 22:57 UTC
[LLVMdev] Shrink Wrapping - RFC and initial implementation
First, thanks very much for your comments! On Sat, Feb 28, 2009 at 8:05 PM, Evan Cheng <evan.cheng at apple.com> wrote:> > On Feb 26, 2009, at 2:02 PM, John Mosby wrote: > > It is limited to X86 presently since that is the only target I have > > access to at the moment. > > What part of this is target dependent? Is this due to emitPrologue / > emitEpilogue being target specific? >It is target dependent (X86) at present because of the way I developed it, just using the X86 target since that is the only one on which I can test the entire (static) flow: test.c -> llvm-gcc -emit-llvm -> (.ll, .bc) -> llc --shrink-wrap -> .s -> gcc test.s -o test. I worked with other targets also, but I decided to take it as far as I could on the first go with X86. First pass was without debugging info and with simple stack frames, in the interest of getting as much worked out as possible. I saw the issue concerning how code gen handles placement of spill and restore code outside of entry/return blocks before I had the first test cases running, but I worked through the details using -march=x86 only. Re: debugging info: I know about the work to change the way debugging info is handled, so I held off trying to make the shrink wrapping work with the current impl.> > The main features are: > > - Placing callee saved register (CSR) spills and restores in the > > CFG to tightly surround uses > > so that execution paths that do not use CSRs do not pay the > > spill/restore penalty. > > > > - Avoiding placment of spills/restores in loops: if a CSR is used > > inside a loop(nest), the spills > > are placed in the loop preheader, and restores are placed in > > the loop exit nodes (the > > successors of the loop _exiting_ nodes). > > > > - Covering paths without CSR uses: e.g. if a restore is placed in > > a join block, a matching spill > > is added to the end of all immediate predecessor blocks that > > are not reached by a spill. > > Similarly for saves placed in branch blocks. > > Sounds great. It would help everyone if you can show some examples code.I am putting documented examples together from the test cases in the patch.> Since I ran into a non-trivial issue in developing this pass, I > > would like to submit my implementation as a "RFC + code/tests" > > rather than a typical contribution, and get people's opinions on how > > to proceed. > > > > The issue is that the code generator assumes all spills and restores > > of callee saved registers must be placed in the entry and return > > blocks resp. > > Shink wrapping violates this assumption, with the result that the > > frame offsets computed by PEI for frame-relative operands may be > > incorrect. > > I am not sure how this would happen. Why would frame offsets be > affected by where these instructions are placed?The issue is illustrated by a simple example in which a single CSR is used in one branch of a conditional. When the stack frame is laid out, the spill for this CSR is accounted for in the calculation of stack size as it should be. The stack slot for the CSR is correctly computed and everything seems fine when the MO_FrameIndex are replaced. The problem is that since the spill instruction for the CSR (e.g. pushl %esi) is moved from the entry block, the push does not happen, and the value of %esp in the entry block is not what it should be to satisfy the offsets produced by eliminateFrameIndex(). A similar situation exists for the BB into which a spill is "moved" (from the entry block): a push happens to spill the CSR on entry to the block, and now %esp is not what it should be for that block. The example below illustrates this issue: assume: int F(int a, int b, int c) uses one CSR in conditional branch prologue, no shrink wrapping: _F: pushl %esi # spill CSR %csi, %esp -= 4 (in this case) subl $56, %esp # create frame, %esp = %esp - 56 movl 64(%esp), %eax # fetch arg 'a' from entry %esp + 4 movl %eax, 52(%esp) movl 68(%esp), %eax # fetch arg 'b' movl %eax, 48(%esp) ... prologue with spill shrink-wrapped to basic block bb: _F: # no spill of %esi, moved to bb subl $56, %esp # create frame same as before %esp = %esp - 56 movl 64(%esp), %eax # error: 'a' is not at 64(%esp), it's at 60(%esp) movl %eax, 52(%esp) ... The simple, ugly hack of adjusting the value by which %esp is decremented in the prologue when one or more CSR spills have been placed into other blocks takes care of the issue on this simple code (no dynamic stack realign., nor EH) on x86. The companion hack for (non entry) MBBs into which spills have been introduced is to adjust the stack size around eliminateFrameIndex()'s for replacement of MO_FrameIndex operands. Obviously, all of this applies only when spills are done with push/pop, which is the case on x86. I used this issue to start looking at generalizing how spills and restores are handled, before looking too closely at other targets, and developed the workaround for the initial implementation.> > > > My limited implementation uses a workaround that adjusts the > > generation of prologue code and the frame indices used by > > the target eliminateFrameIndex() when necessary. I am looking at > > several approaches, but I would like input from anyone who > > has an opinion. > > > > I think to do this right for every target is a big job. I'd like to > see prologue / epilogue related stuff be moved out of > TargetRegisterInfo. Shrink wrapping will only happen when the targets > buy-in, i.e. providing the right hooks. >Part of what I'm doing now is estimating the work, which requires going through the targets. I am not far enough along to send out a proposal. Moving pro/epi generation out of TRI, perhaps into its own "component" is one architecture I am looking at.> When is shrink wrapping happening? Is it possible to do it after CSR > spills and restores are inserted but before FI are lowered into sp / > fp +/- offset?Shrink wrapping starts after calculateCalleeSavedRegisters(), which creates the list of CSRs used in the function. Shrink wrapping assigns MBB placements for spills and restores based on where they are used. calculateCalleeSavedRegisters() determines stack slots for the CSRs used in the function. I don't see an interaction between this and shrink wrapping, of have I missed something?> > Finally, I realize that shrink wrapping is probably not high > > priority in the larger scheme of LLVM development, so I'm not > > expecting > > a huge response, but any ideas are certainly welcome. > > It's actually a fairly useful optimization. It can really help a class > of functions, e.g. functions with early returns. >Quite right, it is certainly worthwhile. I could have left that comment out :-)> > > > The patch and a test tarball are attached. I include basic tests > > that are run with the supplied Makefile. > > Some comments: > > 1. The code needs some refactoring. :-) It's a big chunk of code so > it's hard to digest. > 2. There doesn't seem to be a way to turn off shrink wrapping. Please > make sure it's optional. When it's not being done, PEI should not > require dominator, etc. >I already refactored once, but I knew it would not be enough(!), I'll definitely do another pass. I forgot to put the analysis deps under --shrink-wrap, I will fix that and anything else that I might have left out of the option.> > From what I can see this is a very good first step. I look forward to > seeing its completion. > > Evan >Thanks! Likewise, and it's a pleasure to work on. John -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090301/70df295e/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Shrink Wrapping - RFC and initial implementation
- [LLVMdev] Shrink Wrapping - RFC and initial implementation
- [LLVMdev] Shrink Wrapping - RFC and initial implementation
- [LLVMdev] Shrink Wrapping - RFC and initial implementation
- [LLVMdev] Shrink Wrapping - RFC and initial implementation