Chris Lattner
2008-May-20 18:28 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On May 20, 2008, at 8:57 AM, David Greene wrote:> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: > >> 1) Does ScalarReplAggregates totally superscede >> PromoteMemoryToRegister? I > > Nope, they are different. Mem2Reg is really important if you want > register > allocation.Actually SROA does fully subsume Mem2Reg. It iterates between breaking up aggregates and promoting them to registers.>> think I need it to optimize small arrays, but what is the expected >> added >> complexity? > > I shouldn't think it would be very expensive at all.Agreed.>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >> adding/subtracting 0? > > That's probably more the purview of instcombine.Right.>> 4) Is DeadStoreElimination still necessary when we have >> AggressiveDCE? > > Probably, but I'll let others give the definitive answer.They are orthogonal. DSE touches stores, AggressiveDCE does mostly scalars. ADCE basically assumes all stores are live.>> 5) What are the tradeoffs between the different dead code elimination >> variants (why not always use the aggressive one)? > > Others can speak to this.ADCE is actually quite expensive from the compile-time perspective, because it requires building post dominance information. I'd stick with a combination of "-dce -simplifycfg" instead of -adce.>> 6) Is there a better place for CFGSimplification? Should I perform >> it at >> multiple points? > > I think once is probably enough. Earlier would probably be better > as it will > simplify later passes and potentially help them run faster.I'd suggest running it earlier as you suggest, but then also late to clean up. It's a very fast pass.>> Also, my code will frequently have vectors, that are either >> initialized to >> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many >> multiplications and additions, but I was wondering which passes I >> need for >> this (probably a Reassociate pass, what else)? And I also wonder >> whether >> these passes actually work with vectors? > > I would assume they work with vector. Anything expression-related > is good > to capture these opportunities (reassociation, folding, instcombine, > etc.).Note that a lot of optimizations don't apply to floating point values. E.g. "x+0.0" can't be eliminated (in general), because "-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate vectors, but may miss certain optimizations on them. If you notice something (e.g. reassociation) that isn't being done aggressively with vectors, please file a bug. We do a lot of vector optimizations though, so we are probably only missing details, not whole classes of important optimizations.>> Is there any other highly recommended pass for this kind of >> applications >> that I'm forgetting? Any passes that I better avoid due to poor >> gains and >> long optimization time? >Another important optimization is GVN, which does common subexpression elimination (for example).>> Sorry for the many question marks. :-) I don't expect there is an >> absolute >> definite answer to each of them, but some guidelines and insights >> would be >> very welcome and much appreciated. > > Phase ordering is one of the trickiest parts to tune. It's often > highly > code-dependent. If your target sources are limited in number, you > might be > able to get away with a funky ordering that would be disastrous on > general-purpose integer codes, for example. It's often a matter of > trial and > error. Several people have done research on using genetic > algorithms and > other tricks to find an optimal phase ordering. A google search > should turn > up interesting stuff."What he said" :). Also, if you see specific problems in the generated code, let us know. -Chris
Daniel Berlin
2008-May-20 18:33 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On Tue, May 20, 2008 at 2:28 PM, Chris Lattner <clattner at apple.com> wrote:> > On May 20, 2008, at 8:57 AM, David Greene wrote: > >> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: >> >>> 1) Does ScalarReplAggregates totally superscede >>> PromoteMemoryToRegister? I >> >> Nope, they are different. Mem2Reg is really important if you want >> register >> allocation. > > Actually SROA does fully subsume Mem2Reg. It iterates between breaking > up aggregates and promoting them to registers. > >>> think I need it to optimize small arrays, but what is the expected >>> added >>> complexity? >> >> I shouldn't think it would be very expensive at all. > > Agreed. > >>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >>> adding/subtracting 0? >> >> That's probably more the purview of instcombine. > > Right.SCCP with a good constant folder should definitely be able to take care of * 0 and */ 1 though. Not that i would go much further than this.
Nicolas Capens
2008-May-21 09:29 UTC
[LLVMdev] Optimization passes organization and tradeoffs
Hi Chris, Thanks for the detailed explanations. I have a few remaining questions: Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg and therefore generally preferable? What would be the code quality implications of using "-dce -simplifycfg" instead of -adce? As far as I understand the algorithms involved, -dce would hardly ever miss a dead instruction if dead stores have already been eliminated, right? For my application it is safe to reduce x+0.0 to x. I checked and my current set of passes keeps the add. Is there a way to control the floating-point model in some way? Lots of compilers can select 'strict', 'precise', 'fast', etc. Even adding and then subtracting the same value, which almost never results in the original value, is almost always safe to eliminate for my app... What's the difference between GVN and GCSE, if they both perform common subexpression elimination? So far LLVM is working great for me. I'm impressed by the overall architecture and the fair learning curve. I'm generating complex but well-optimized code in a matter of days. I haven't found any problems in the generated code yet, but I'll let you know as soon as I notice a glitch. Kind regards, Nicolas -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Lattner Sent: Tuesday, 20 May, 2008 20:28 To: LLVM Developers Mailing List Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs On May 20, 2008, at 8:57 AM, David Greene wrote:> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: > >> 1) Does ScalarReplAggregates totally superscede >> PromoteMemoryToRegister? I > > Nope, they are different. Mem2Reg is really important if you want > register > allocation.Actually SROA does fully subsume Mem2Reg. It iterates between breaking up aggregates and promoting them to registers.>> think I need it to optimize small arrays, but what is the expected >> added >> complexity? > > I shouldn't think it would be very expensive at all.Agreed.>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >> adding/subtracting 0? > > That's probably more the purview of instcombine.Right.>> 4) Is DeadStoreElimination still necessary when we have >> AggressiveDCE? > > Probably, but I'll let others give the definitive answer.They are orthogonal. DSE touches stores, AggressiveDCE does mostly scalars. ADCE basically assumes all stores are live.>> 5) What are the tradeoffs between the different dead code elimination >> variants (why not always use the aggressive one)? > > Others can speak to this.ADCE is actually quite expensive from the compile-time perspective, because it requires building post dominance information. I'd stick with a combination of "-dce -simplifycfg" instead of -adce.>> 6) Is there a better place for CFGSimplification? Should I perform >> it at >> multiple points? > > I think once is probably enough. Earlier would probably be better > as it will > simplify later passes and potentially help them run faster.I'd suggest running it earlier as you suggest, but then also late to clean up. It's a very fast pass.>> Also, my code will frequently have vectors, that are either >> initialized to >> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many >> multiplications and additions, but I was wondering which passes I >> need for >> this (probably a Reassociate pass, what else)? And I also wonder >> whether >> these passes actually work with vectors? > > I would assume they work with vector. Anything expression-related > is good > to capture these opportunities (reassociation, folding, instcombine, > etc.).Note that a lot of optimizations don't apply to floating point values. E.g. "x+0.0" can't be eliminated (in general), because "-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate vectors, but may miss certain optimizations on them. If you notice something (e.g. reassociation) that isn't being done aggressively with vectors, please file a bug. We do a lot of vector optimizations though, so we are probably only missing details, not whole classes of important optimizations.>> Is there any other highly recommended pass for this kind of >> applications >> that I'm forgetting? Any passes that I better avoid due to poor >> gains and >> long optimization time? >Another important optimization is GVN, which does common subexpression elimination (for example).>> Sorry for the many question marks. :-) I don't expect there is an >> absolute >> definite answer to each of them, but some guidelines and insights >> would be >> very welcome and much appreciated. > > Phase ordering is one of the trickiest parts to tune. It's often > highly > code-dependent. If your target sources are limited in number, you > might be > able to get away with a funky ordering that would be disastrous on > general-purpose integer codes, for example. It's often a matter of > trial and > error. Several people have done research on using genetic > algorithms and > other tricks to find an optimal phase ordering. A google search > should turn > up interesting stuff."What he said" :). Also, if you see specific problems in the generated code, let us know. -Chris _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Nicolas Capens
2008-May-21 10:03 UTC
[LLVMdev] Optimization passes organization and tradeoffs
Hi Daniel, What's a constant folder? I only found a reference to it in the SCCP implementation. Is there a bit of documentation for it or could you explain the basics? Thanks, Nicolas -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Daniel Berlin Sent: Tuesday, 20 May, 2008 20:34 To: LLVM Developers Mailing List Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs On Tue, May 20, 2008 at 2:28 PM, Chris Lattner <clattner at apple.com> wrote:> > On May 20, 2008, at 8:57 AM, David Greene wrote: > >> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: >> >>> 1) Does ScalarReplAggregates totally superscede >>> PromoteMemoryToRegister? I >> >> Nope, they are different. Mem2Reg is really important if you want >> register >> allocation. > > Actually SROA does fully subsume Mem2Reg. It iterates between breaking > up aggregates and promoting them to registers. > >>> think I need it to optimize small arrays, but what is the expected >>> added >>> complexity? >> >> I shouldn't think it would be very expensive at all. > > Agreed. > >>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >>> adding/subtracting 0? >> >> That's probably more the purview of instcombine. > > Right.SCCP with a good constant folder should definitely be able to take care of * 0 and */ 1 though. Not that i would go much further than this. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chris Lattner
2008-May-21 20:48 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On Wed, 21 May 2008, Nicolas Capens wrote:> Thanks for the detailed explanations. I have a few remaining questions: > > Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg > and therefore generally preferable?Right.> What would be the code quality implications of using "-dce -simplifycfg" > instead of -adce? As far as I understand the algorithms involved, -dce would > hardly ever miss a dead instruction if dead stores have already been > eliminated, right?-dce -simplifycfg should be basically the same as adce. ADCE was recently changed to not delete output-free loops. That was the major advantage it had over simplifycfg. We now have an explicit loop deletion pass as part of the optimizer.> For my application it is safe to reduce x+0.0 to x. I checked and my current > set of passes keeps the add. Is there a way to control the floating-point > model in some way? Lots of compilers can select 'strict', 'precise', 'fast', > etc. Even adding and then subtracting the same value, which almost never > results in the original value, is almost always safe to eliminate for my > app...You can tell the code generator that it is safe to do this (see llvm/Target/TargetOptions.h), but there isn't currently a way to tell the optimizers about it. My tentative plan is to split "add" into "fadd"/"iadd" and then make the "fp" operations have some sort of flags to parameterize the optimizations allowed for them. This would allow us to propagate down the equivalent of GCC's -ffast-math into the IR.> What's the difference between GVN and GCSE, if they both perform common > subexpression elimination?GVN does more, and is a better algorithm. GCSE is basically deprecated and should be removed at some point.> So far LLVM is working great for me. I'm impressed by the overall > architecture and the fair learning curve. I'm generating complex but > well-optimized code in a matter of days. I haven't found any problems in the > generated code yet, but I'll let you know as soon as I notice a glitch.Awesome! -Chris> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Chris Lattner > Sent: Tuesday, 20 May, 2008 20:28 > To: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs > > > On May 20, 2008, at 8:57 AM, David Greene wrote: > >> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: >> >>> 1) Does ScalarReplAggregates totally superscede >>> PromoteMemoryToRegister? I >> >> Nope, they are different. Mem2Reg is really important if you want >> register >> allocation. > > Actually SROA does fully subsume Mem2Reg. It iterates between breaking > up aggregates and promoting them to registers. > >>> think I need it to optimize small arrays, but what is the expected >>> added >>> complexity? >> >> I shouldn't think it would be very expensive at all. > > Agreed. > >>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >>> adding/subtracting 0? >> >> That's probably more the purview of instcombine. > > Right. > >>> 4) Is DeadStoreElimination still necessary when we have >>> AggressiveDCE? >> >> Probably, but I'll let others give the definitive answer. > > They are orthogonal. DSE touches stores, AggressiveDCE does mostly > scalars. ADCE basically assumes all stores are live. > >>> 5) What are the tradeoffs between the different dead code elimination >>> variants (why not always use the aggressive one)? >> >> Others can speak to this. > > ADCE is actually quite expensive from the compile-time perspective, > because it requires building post dominance information. I'd stick > with a combination of "-dce -simplifycfg" instead of -adce. > >>> 6) Is there a better place for CFGSimplification? Should I perform >>> it at >>> multiple points? >> >> I think once is probably enough. Earlier would probably be better >> as it will >> simplify later passes and potentially help them run faster. > > I'd suggest running it earlier as you suggest, but then also late to > clean up. It's a very fast pass. > >>> Also, my code will frequently have vectors, that are either >>> initialized to >>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many >>> multiplications and additions, but I was wondering which passes I >>> need for >>> this (probably a Reassociate pass, what else)? And I also wonder >>> whether >>> these passes actually work with vectors? >> >> I would assume they work with vector. Anything expression-related >> is good >> to capture these opportunities (reassociation, folding, instcombine, >> etc.). > > Note that a lot of optimizations don't apply to floating point > values. E.g. "x+0.0" can't be eliminated (in general), because > "-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate > vectors, but may miss certain optimizations on them. If you notice > something (e.g. reassociation) that isn't being done aggressively with > vectors, please file a bug. We do a lot of vector optimizations > though, so we are probably only missing details, not whole classes of > important optimizations. > >>> Is there any other highly recommended pass for this kind of >>> applications >>> that I'm forgetting? Any passes that I better avoid due to poor >>> gains and >>> long optimization time? >> > Another important optimization is GVN, which does common subexpression > elimination (for example). > >>> Sorry for the many question marks. :-) I don't expect there is an >>> absolute >>> definite answer to each of them, but some guidelines and insights >>> would be >>> very welcome and much appreciated. >> >> Phase ordering is one of the trickiest parts to tune. It's often >> highly >> code-dependent. If your target sources are limited in number, you >> might be >> able to get away with a funky ordering that would be disastrous on >> general-purpose integer codes, for example. It's often a matter of >> trial and >> error. Several people have done research on using genetic >> algorithms and >> other tricks to find an optimal phase ordering. A google search >> should turn >> up interesting stuff. > > "What he said" :). Also, if you see specific problems in the > generated code, let us know. > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://nondot.org/sabre/ http://llvm.org/
Possibly Parallel Threads
- [LLVMdev] Optimization passes organization and tradeoffs
- [LLVMdev] Optimization passes organization and tradeoffs
- [LLVMdev] Optimization passes organization and tradeoffs
- [LLVMdev] Optimization passes organization and tradeoffs
- [LLVMdev] Optimization passes organization and tradeoffs