Chris Lattner
2008-May-21  20:48 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On Wed, 21 May 2008, Nicolas Capens wrote:> Thanks for the detailed explanations. I have a few remaining questions: > > Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg > and therefore generally preferable?Right.> What would be the code quality implications of using "-dce -simplifycfg" > instead of -adce? As far as I understand the algorithms involved, -dce would > hardly ever miss a dead instruction if dead stores have already been > eliminated, right?-dce -simplifycfg should be basically the same as adce. ADCE was recently changed to not delete output-free loops. That was the major advantage it had over simplifycfg. We now have an explicit loop deletion pass as part of the optimizer.> For my application it is safe to reduce x+0.0 to x. I checked and my current > set of passes keeps the add. Is there a way to control the floating-point > model in some way? Lots of compilers can select 'strict', 'precise', 'fast', > etc. Even adding and then subtracting the same value, which almost never > results in the original value, is almost always safe to eliminate for my > app...You can tell the code generator that it is safe to do this (see llvm/Target/TargetOptions.h), but there isn't currently a way to tell the optimizers about it. My tentative plan is to split "add" into "fadd"/"iadd" and then make the "fp" operations have some sort of flags to parameterize the optimizations allowed for them. This would allow us to propagate down the equivalent of GCC's -ffast-math into the IR.> What's the difference between GVN and GCSE, if they both perform common > subexpression elimination?GVN does more, and is a better algorithm. GCSE is basically deprecated and should be removed at some point.> So far LLVM is working great for me. I'm impressed by the overall > architecture and the fair learning curve. I'm generating complex but > well-optimized code in a matter of days. I haven't found any problems in the > generated code yet, but I'll let you know as soon as I notice a glitch.Awesome! -Chris> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Chris Lattner > Sent: Tuesday, 20 May, 2008 20:28 > To: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs > > > On May 20, 2008, at 8:57 AM, David Greene wrote: > >> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote: >> >>> 1) Does ScalarReplAggregates totally superscede >>> PromoteMemoryToRegister? I >> >> Nope, they are different. Mem2Reg is really important if you want >> register >> allocation. > > Actually SROA does fully subsume Mem2Reg. It iterates between breaking > up aggregates and promoting them to registers. > >>> think I need it to optimize small arrays, but what is the expected >>> added >>> complexity? >> >> I shouldn't think it would be very expensive at all. > > Agreed. > >>> 2) Does SCCP also eliminate multiplying/dividing by 1 and >>> adding/subtracting 0? >> >> That's probably more the purview of instcombine. > > Right. > >>> 4) Is DeadStoreElimination still necessary when we have >>> AggressiveDCE? >> >> Probably, but I'll let others give the definitive answer. > > They are orthogonal. DSE touches stores, AggressiveDCE does mostly > scalars. ADCE basically assumes all stores are live. > >>> 5) What are the tradeoffs between the different dead code elimination >>> variants (why not always use the aggressive one)? >> >> Others can speak to this. > > ADCE is actually quite expensive from the compile-time perspective, > because it requires building post dominance information. I'd stick > with a combination of "-dce -simplifycfg" instead of -adce. > >>> 6) Is there a better place for CFGSimplification? Should I perform >>> it at >>> multiple points? >> >> I think once is probably enough. Earlier would probably be better >> as it will >> simplify later passes and potentially help them run faster. > > I'd suggest running it earlier as you suggest, but then also late to > clean up. It's a very fast pass. > >>> Also, my code will frequently have vectors, that are either >>> initialized to >>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many >>> multiplications and additions, but I was wondering which passes I >>> need for >>> this (probably a Reassociate pass, what else)? And I also wonder >>> whether >>> these passes actually work with vectors? >> >> I would assume they work with vector. Anything expression-related >> is good >> to capture these opportunities (reassociation, folding, instcombine, >> etc.). > > Note that a lot of optimizations don't apply to floating point > values. E.g. "x+0.0" can't be eliminated (in general), because > "-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate > vectors, but may miss certain optimizations on them. If you notice > something (e.g. reassociation) that isn't being done aggressively with > vectors, please file a bug. We do a lot of vector optimizations > though, so we are probably only missing details, not whole classes of > important optimizations. > >>> Is there any other highly recommended pass for this kind of >>> applications >>> that I'm forgetting? Any passes that I better avoid due to poor >>> gains and >>> long optimization time? >> > Another important optimization is GVN, which does common subexpression > elimination (for example). > >>> Sorry for the many question marks. :-) I don't expect there is an >>> absolute >>> definite answer to each of them, but some guidelines and insights >>> would be >>> very welcome and much appreciated. >> >> Phase ordering is one of the trickiest parts to tune. It's often >> highly >> code-dependent. If your target sources are limited in number, you >> might be >> able to get away with a funky ordering that would be disastrous on >> general-purpose integer codes, for example. It's often a matter of >> trial and >> error. Several people have done research on using genetic >> algorithms and >> other tricks to find an optimal phase ordering. A google search >> should turn >> up interesting stuff. > > "What he said" :). Also, if you see specific problems in the > generated code, let us know. > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://nondot.org/sabre/ http://llvm.org/
David Greene
2008-May-21  22:32 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On Wednesday 21 May 2008 15:48, Chris Lattner wrote:> > What's the difference between GVN and GCSE, if they both perform common > > subexpression elimination? > > GVN does more, and is a better algorithm. GCSE is basically deprecated > and should be removed at some point.Er...waitaminute. Maybe there's something I don't fully grok about GVN, but in general, value numbering and CSE are different and often complementary. Neither is "more powerful" than the other. -Dave
Chris Lattner
2008-May-22  00:28 UTC
[LLVMdev] Optimization passes organization and tradeoffs
On Wed, 21 May 2008, David Greene wrote:> On Wednesday 21 May 2008 15:48, Chris Lattner wrote: >>> What's the difference between GVN and GCSE, if they both perform common >>> subexpression elimination? >> >> GVN does more, and is a better algorithm. GCSE is basically deprecated >> and should be removed at some point. > > Er...waitaminute. Maybe there's something I don't fully grok about GVN, > but in general, value numbering and CSE are different and often complementary. > Neither is "more powerful" than the other.I'm talking about the LLVM passes here, not the abstract algorithms. -Chris -- http://nondot.org/sabre/ http://llvm.org/