thr3ads.net - llvm dev - [LLVMdev] Optimization passes organization and tradeoffs [May 2008]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2008-May-20 18:28 UTC

[LLVMdev] Optimization passes organization and tradeoffs

On May 20, 2008, at 8:57 AM, David Greene wrote:
> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>
>> 1) Does ScalarReplAggregates totally superscede  
>> PromoteMemoryToRegister? I
>
> Nope, they are different.  Mem2Reg is really important if you want  
> register
> allocation.
Actually SROA does fully subsume Mem2Reg. It iterates between breaking  
up aggregates and promoting them to registers.
>> think I need it to optimize small arrays, but what is the expected  
>> added
>> complexity?
>
> I shouldn't think it would be very expensive at all.
Agreed.
>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>> adding/subtracting 0?
>
> That's probably more the purview of instcombine.
Right.
>> 4) Is DeadStoreElimination still necessary when we have  
>> AggressiveDCE?
>
> Probably, but I'll let others give the definitive answer.
They are orthogonal.  DSE touches stores, AggressiveDCE does mostly  
scalars.  ADCE basically assumes all stores are live.
>> 5) What are the tradeoffs between the different dead code elimination
>> variants (why not always use the aggressive one)?
>
> Others can speak to this.
ADCE is actually quite expensive from the compile-time perspective,  
because it requires building post dominance information.  I'd stick  
with a combination of "-dce -simplifycfg" instead of -adce.
>> 6) Is there a better place for CFGSimplification? Should I perform  
>> it at
>> multiple points?
>
> I think once is probably enough.  Earlier would probably be better  
> as it will
> simplify later passes and potentially help them run faster.
I'd suggest running it earlier as you suggest, but then also late to  
clean up.  It's a very fast pass.
>> Also, my code will frequently have vectors, that are either  
>> initialized to
>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>> multiplications and additions, but I was wondering which passes I  
>> need for
>> this (probably a Reassociate pass, what else)? And I also wonder  
>> whether
>> these passes actually work with vectors?
>
> I would assume they work with vector.  Anything expression-related  
> is good
> to capture these opportunities (reassociation, folding, instcombine,  
> etc.).
Note that a lot of optimizations don't apply to floating point  
values.  E.g. "x+0.0" can't be eliminated (in general), because  
"-0.0+0.0"  is 0.0, not -0.0.  All the optimizations should tolerate  
vectors, but may miss certain optimizations on them.  If you notice  
something (e.g. reassociation) that isn't being done aggressively with  
vectors, please file a bug.  We do a lot of vector optimizations  
though, so we are probably only missing details, not whole classes of  
important optimizations.
>> Is there any other highly recommended pass for this kind of  
>> applications
>> that I'm forgetting? Any passes that I better avoid due to poor  
>> gains and
>> long optimization time?
>Another important optimization is GVN, which does common subexpression  
elimination (for example).
>> Sorry for the many question marks. :-) I don't expect there is an  
>> absolute
>> definite answer to each of them, but some guidelines and insights  
>> would be
>> very welcome and much appreciated.
>
> Phase ordering is one of the trickiest parts to tune.  It's often  
> highly
> code-dependent.  If your target sources are limited in number, you  
> might be
> able to get away with a funky ordering that would be disastrous on
> general-purpose integer codes, for example.  It's often a matter of  
> trial and
> error.  Several people have done research on using genetic  
> algorithms and
> other tricks to find an optimal phase ordering.  A google search  
> should turn
> up interesting stuff.
"What he said" :).  Also, if you see specific problems in the  
generated code, let us know.

-Chris

Daniel Berlin

2008-May-20 18:33 UTC

head link

[LLVMdev] Optimization passes organization and tradeoffs

On Tue, May 20, 2008 at 2:28 PM, Chris Lattner <clattner at apple.com>
wrote:>
> On May 20, 2008, at 8:57 AM, David Greene wrote:
>
>> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>>
>>> 1) Does ScalarReplAggregates totally superscede
>>> PromoteMemoryToRegister? I
>>
>> Nope, they are different.  Mem2Reg is really important if you want
>> register
>> allocation.
>
> Actually SROA does fully subsume Mem2Reg. It iterates between breaking
> up aggregates and promoting them to registers.
>
>>> think I need it to optimize small arrays, but what is the expected
>>> added
>>> complexity?
>>
>> I shouldn't think it would be very expensive at all.
>
> Agreed.
>
>>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>>> adding/subtracting 0?
>>
>> That's probably more the purview of instcombine.
>
> Right.

SCCP with a good constant folder should definitely be able to take
care of * 0 and */ 1 though.

Not that i would go much further than this.

Nicolas Capens

2008-May-21 09:29 UTC

head link

[LLVMdev] Optimization passes organization and tradeoffs

Hi Chris,

Thanks for the detailed explanations. I have a few remaining questions:

Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg
and therefore generally preferable?

What would be the code quality implications of using "-dce
-simplifycfg"
instead of -adce? As far as I understand the algorithms involved, -dce would
hardly ever miss a dead instruction if dead stores have already been
eliminated, right?

For my application it is safe to reduce x+0.0 to x. I checked and my current
set of passes keeps the add. Is there a way to control the floating-point
model in some way? Lots of compilers can select 'strict',
'precise', 'fast',
etc. Even adding and then subtracting the same value, which almost never
results in the original value, is almost always safe to eliminate for my
app...

What's the difference between GVN and GCSE, if they both perform common
subexpression elimination?

So far LLVM is working great for me. I'm impressed by the overall
architecture and the fair learning curve. I'm generating complex but
well-optimized code in a matter of days. I haven't found any problems in the
generated code yet, but I'll let you know as soon as I notice a glitch.

Kind regards,

Nicolas


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chris Lattner
Sent: Tuesday, 20 May, 2008 20:28
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs


On May 20, 2008, at 8:57 AM, David Greene wrote:
> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>
>> 1) Does ScalarReplAggregates totally superscede  
>> PromoteMemoryToRegister? I
>
> Nope, they are different.  Mem2Reg is really important if you want  
> register
> allocation.
Actually SROA does fully subsume Mem2Reg. It iterates between breaking  
up aggregates and promoting them to registers.
>> think I need it to optimize small arrays, but what is the expected  
>> added
>> complexity?
>
> I shouldn't think it would be very expensive at all.
Agreed.
>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>> adding/subtracting 0?
>
> That's probably more the purview of instcombine.
Right.
>> 4) Is DeadStoreElimination still necessary when we have  
>> AggressiveDCE?
>
> Probably, but I'll let others give the definitive answer.
They are orthogonal.  DSE touches stores, AggressiveDCE does mostly  
scalars.  ADCE basically assumes all stores are live.
>> 5) What are the tradeoffs between the different dead code elimination
>> variants (why not always use the aggressive one)?
>
> Others can speak to this.
ADCE is actually quite expensive from the compile-time perspective,  
because it requires building post dominance information.  I'd stick  
with a combination of "-dce -simplifycfg" instead of -adce.
>> 6) Is there a better place for CFGSimplification? Should I perform  
>> it at
>> multiple points?
>
> I think once is probably enough.  Earlier would probably be better  
> as it will
> simplify later passes and potentially help them run faster.
I'd suggest running it earlier as you suggest, but then also late to  
clean up.  It's a very fast pass.
>> Also, my code will frequently have vectors, that are either  
>> initialized to
>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>> multiplications and additions, but I was wondering which passes I  
>> need for
>> this (probably a Reassociate pass, what else)? And I also wonder  
>> whether
>> these passes actually work with vectors?
>
> I would assume they work with vector.  Anything expression-related  
> is good
> to capture these opportunities (reassociation, folding, instcombine,  
> etc.).
Note that a lot of optimizations don't apply to floating point  
values.  E.g. "x+0.0" can't be eliminated (in general), because  
"-0.0+0.0"  is 0.0, not -0.0.  All the optimizations should tolerate  
vectors, but may miss certain optimizations on them.  If you notice  
something (e.g. reassociation) that isn't being done aggressively with  
vectors, please file a bug.  We do a lot of vector optimizations  
though, so we are probably only missing details, not whole classes of  
important optimizations.
>> Is there any other highly recommended pass for this kind of  
>> applications
>> that I'm forgetting? Any passes that I better avoid due to poor  
>> gains and
>> long optimization time?
>Another important optimization is GVN, which does common subexpression  
elimination (for example).
>> Sorry for the many question marks. :-) I don't expect there is an  
>> absolute
>> definite answer to each of them, but some guidelines and insights  
>> would be
>> very welcome and much appreciated.
>
> Phase ordering is one of the trickiest parts to tune.  It's often  
> highly
> code-dependent.  If your target sources are limited in number, you  
> might be
> able to get away with a funky ordering that would be disastrous on
> general-purpose integer codes, for example.  It's often a matter of  
> trial and
> error.  Several people have done research on using genetic  
> algorithms and
> other tricks to find an optimal phase ordering.  A google search  
> should turn
> up interesting stuff.
"What he said" :).  Also, if you see specific problems in the  
generated code, let us know.

-Chris
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nicolas Capens

2008-May-21 10:03 UTC

head link

[LLVMdev] Optimization passes organization and tradeoffs

Hi Daniel,

What's a constant folder? I only found a reference to it in the SCCP
implementation. Is there a bit of documentation for it or could you explain
the basics?

Thanks,

Nicolas

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Daniel Berlin
Sent: Tuesday, 20 May, 2008 20:34
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs

On Tue, May 20, 2008 at 2:28 PM, Chris Lattner <clattner at apple.com>
wrote:>
> On May 20, 2008, at 8:57 AM, David Greene wrote:
>
>> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>>
>>> 1) Does ScalarReplAggregates totally superscede
>>> PromoteMemoryToRegister? I
>>
>> Nope, they are different.  Mem2Reg is really important if you want
>> register
>> allocation.
>
> Actually SROA does fully subsume Mem2Reg. It iterates between breaking
> up aggregates and promoting them to registers.
>
>>> think I need it to optimize small arrays, but what is the expected
>>> added
>>> complexity?
>>
>> I shouldn't think it would be very expensive at all.
>
> Agreed.
>
>>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>>> adding/subtracting 0?
>>
>> That's probably more the purview of instcombine.
>
> Right.

SCCP with a good constant folder should definitely be able to take
care of * 0 and */ 1 though.

Not that i would go much further than this.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2008-May-21 20:48 UTC

head link

[LLVMdev] Optimization passes organization and tradeoffs

On Wed, 21 May 2008, Nicolas Capens wrote:> Thanks for the detailed explanations. I have a few remaining questions:
>
> Am I correct that ScalarReplAggregates is hardly more expensive than
Mem2Reg
> and therefore generally preferable?
Right.
> What would be the code quality implications of using "-dce
-simplifycfg"
> instead of -adce? As far as I understand the algorithms involved, -dce
would
> hardly ever miss a dead instruction if dead stores have already been
> eliminated, right?
-dce -simplifycfg should be basically the same as adce.  ADCE was recently 
changed to not delete output-free loops.  That was the major advantage it 
had over simplifycfg.  We now have an explicit loop deletion pass as part 
of the optimizer.
> For my application it is safe to reduce x+0.0 to x. I checked and my
current
> set of passes keeps the add. Is there a way to control the floating-point
> model in some way? Lots of compilers can select 'strict',
'precise', 'fast',
> etc. Even adding and then subtracting the same value, which almost never
> results in the original value, is almost always safe to eliminate for my
> app...
You can tell the code generator that it is safe to do this (see 
llvm/Target/TargetOptions.h), but there isn't currently a way to tell the 
optimizers about it.  My tentative plan is to split "add" into 
"fadd"/"iadd" and then make the "fp" operations
have some sort of flags to
parameterize the optimizations allowed for them.  This would allow us to 
propagate down the equivalent of GCC's -ffast-math into the IR.
> What's the difference between GVN and GCSE, if they both perform common
> subexpression elimination?
GVN does more, and is a better algorithm.  GCSE is basically deprecated 
and should be removed at some point.
> So far LLVM is working great for me. I'm impressed by the overall
> architecture and the fair learning curve. I'm generating complex but
> well-optimized code in a matter of days. I haven't found any problems
in the
> generated code yet, but I'll let you know as soon as I notice a glitch.
Awesome!

-Chris

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Chris Lattner
> Sent: Tuesday, 20 May, 2008 20:28
> To: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs
>
>
> On May 20, 2008, at 8:57 AM, David Greene wrote:
>
>> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>>
>>> 1) Does ScalarReplAggregates totally superscede
>>> PromoteMemoryToRegister? I
>>
>> Nope, they are different.  Mem2Reg is really important if you want
>> register
>> allocation.
>
> Actually SROA does fully subsume Mem2Reg. It iterates between breaking
> up aggregates and promoting them to registers.
>
>>> think I need it to optimize small arrays, but what is the expected
>>> added
>>> complexity?
>>
>> I shouldn't think it would be very expensive at all.
>
> Agreed.
>
>>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>>> adding/subtracting 0?
>>
>> That's probably more the purview of instcombine.
>
> Right.
>
>>> 4) Is DeadStoreElimination still necessary when we have
>>> AggressiveDCE?
>>
>> Probably, but I'll let others give the definitive answer.
>
> They are orthogonal.  DSE touches stores, AggressiveDCE does mostly
> scalars.  ADCE basically assumes all stores are live.
>
>>> 5) What are the tradeoffs between the different dead code
elimination
>>> variants (why not always use the aggressive one)?
>>
>> Others can speak to this.
>
> ADCE is actually quite expensive from the compile-time perspective,
> because it requires building post dominance information.  I'd stick
> with a combination of "-dce -simplifycfg" instead of -adce.
>
>>> 6) Is there a better place for CFGSimplification? Should I perform
>>> it at
>>> multiple points?
>>
>> I think once is probably enough.  Earlier would probably be better
>> as it will
>> simplify later passes and potentially help them run faster.
>
> I'd suggest running it earlier as you suggest, but then also late to
> clean up.  It's a very fast pass.
>
>>> Also, my code will frequently have vectors, that are either
>>> initialized to
>>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating
many
>>> multiplications and additions, but I was wondering which passes I
>>> need for
>>> this (probably a Reassociate pass, what else)? And I also wonder
>>> whether
>>> these passes actually work with vectors?
>>
>> I would assume they work with vector.  Anything expression-related
>> is good
>> to capture these opportunities (reassociation, folding, instcombine,
>> etc.).
>
> Note that a lot of optimizations don't apply to floating point
> values.  E.g. "x+0.0" can't be eliminated (in general),
because
> "-0.0+0.0"  is 0.0, not -0.0.  All the optimizations should
tolerate
> vectors, but may miss certain optimizations on them.  If you notice
> something (e.g. reassociation) that isn't being done aggressively with
> vectors, please file a bug.  We do a lot of vector optimizations
> though, so we are probably only missing details, not whole classes of
> important optimizations.
>
>>> Is there any other highly recommended pass for this kind of
>>> applications
>>> that I'm forgetting? Any passes that I better avoid due to poor
>>> gains and
>>> long optimization time?
>>
> Another important optimization is GVN, which does common subexpression
> elimination (for example).
>
>>> Sorry for the many question marks. :-) I don't expect there is
an
>>> absolute
>>> definite answer to each of them, but some guidelines and insights
>>> would be
>>> very welcome and much appreciated.
>>
>> Phase ordering is one of the trickiest parts to tune.  It's often
>> highly
>> code-dependent.  If your target sources are limited in number, you
>> might be
>> able to get away with a funky ordering that would be disastrous on
>> general-purpose integer codes, for example.  It's often a matter of
>> trial and
>> error.  Several people have done research on using genetic
>> algorithms and
>> other tricks to find an optimal phase ordering.  A google search
>> should turn
>> up interesting stuff.
>
> "What he said" :).  Also, if you see specific problems in the
> generated code, let us know.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - May 2008 - [LLVMdev] Optimization passes organization and tradeoffs

[LLVMdev] Optimization passes organization and tradeoffs

[LLVMdev] Optimization passes organization and tradeoffs

[LLVMdev] Optimization passes organization and tradeoffs

[LLVMdev] Optimization passes organization and tradeoffs

[LLVMdev] Optimization passes organization and tradeoffs

Possibly Parallel Threads