thr3ads.net - llvm dev - [LLVMdev] Making optimization passes do less [May 2008]

If this information is useful, please help other people find it:
Share via:

Matthijs Kooijman

2008-May-26 11:23 UTC

[LLVMdev] Making optimization passes do less

Hi all,

I'm currently struggling with a few optimization passes that change stuff I
don't want to be changed. However, for the most part those passes
(InstructionCombining
and SimplifyCFG currently) do stuff that I do want, so disabling them
alltogether doesn't help me much.

The problem arises because the architecture I'm compiling for is quite
non-standard. In particular, it has the ability to execute a lot of
instructions in parallel, but at the same time can't execute everything you
throw at it.

My problem with SimplifyCFG is the following: Whenever the if and else branch
start with the same instruction, it gets hoisted up into the predecessor
block. For my architecture, instructions in different blocks can't be run in
parallel, so this optimization makes code either very inefficient or not
compile at all.

InstructionCombining has this habit of removing unneeded bits from constants.
For example, if I do i & 63, where i is a loop counter that is always even,
this gets replaced by i & 62. Which gives, of course, the same results when
interpreted, but our backend cannot just use any constant as an & mask (in
particular, it can only use a limited amount of them). I'd very much prefer
to
preserve the original value from the source here (I also assume that this
optimization is in place to help further optimizations, because I can't
really
see any use of this change on regular architectures...).

I've been thinking a bit on how to achieve this, and I see a few options:
 * Use a local patch to simply disable the parts of the passes we don't
want,
	 optionally protected by some check to only disable it when it's unwanted.
	 This would be a very effective approach, though also very much unwanted.
	 Local patches to the LLVM source are a pain to maintain.
 * Use some kind of subclassed pass. Since, AFAICS, simply subclassing an
	 existing pass doesn't really work due to the class ID stuff, this requires
	 making a superclass to do most of the work and a subclass to decide when to
	 do it in LLVM, so we can add another subclass (similar to the Inliner
	 pass). Alternatively, the functionality of instruction combining could be
	 split off into a utility class, though that would prevent using overriding
	 methods to disable some functionality. This approach could be useful, but
	 I can't really see how it would work out yet.
 * Add options to the current passes. I could add an option to the current
   passes to make them do what I want (either using an option to
	 createXXXPass() and the constructor, or perhaps using a set_XXX_option()
	 methode or something). This might work for SimplifyCFG, since that option
	 could be made a bit more generic, such as "Don't move instructions
between
	 blocks" (leaving SimplifyCFG free to merge blocks whenever appropriate).

	 For InstructionCombining this is harder, since our requirements are not as
	 easily captured in an elegant option, I'm afraid.
 * Mark instructions / values as immutable. We could write a pass that marks
   the values we want preserved as immutable and other passes should leave
	 those values alone. This requires quite some modification to LLVM (probably
	 even the IR) and all optimization passes. Though I think it's actually
	 quite an elegant solution, it's probably hard to express everything we
need
	 in it (also, if we mark some instruction or value as immutable, it's hard
	 to prevent a pass from making a copy of the instruction (perhaps
	 indirectly) and simply making the immutable instruction unused).
 * Use some kind of TargetInfo/TargetData struct to control certain
   optimizations. I'm not really sure how this is used now, but I could
	 imagine that there is some interface for optimization passes to find out
	 what optimizations are worthwile and what are not (something like a
	 bool isBetter(Value* old, Value* new) as a very simple example. Is there
	 already something like this?
	
None of these options seem too attractive to me, what do others think? Is
there some other option I'm missing here?

Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080526/c5ab4241/attachment.sig>

Devang Patel

2008-May-27 17:42 UTC

head link

[LLVMdev] Making optimization passes do less

Matthijs,

On May 26, 2008, at 4:23 AM, Matthijs Kooijman wrote:
> Hi all,
>
> I'm currently struggling with a few optimization passes that change  
> stuff I
> don't want to be changed. However, for the most part those passes  
> (InstructionCombining
> and SimplifyCFG currently) do stuff that I do want, so disabling them
> alltogether doesn't help me much.
>
> The problem arises because the architecture I'm compiling for is quite
> non-standard. In particular, it has the ability to execute a lot of
> instructions in parallel, but at the same time can't execute  
> everything you
> throw at it.
>
> My problem with SimplifyCFG is the following: Whenever the if and  
> else branch
> start with the same instruction, it gets hoisted up into the  
> predecessor
> block. For my architecture, instructions in different blocks can't  
> be run in
> parallel, so this optimization makes code either very inefficient or  
> not
> compile at all.
>
> InstructionCombining has this habit of removing unneeded bits from  
> constants.
> For example, if I do i & 63, where i is a loop counter that is  
> always even,
> this gets replaced by i & 62. Which gives, of course, the same  
> results when
> interpreted, but our backend cannot just use any constant as an &  
> mask (in
> particular, it can only use a limited amount of them). I'd very much  
> prefer to
> preserve the original value from the source here (I also assume that  
> this
> optimization is in place to help further optimizations, because I  
> can't really
> see any use of this change on regular architectures...).
>
> I've been thinking a bit on how to achieve this, and I see a few  
> options:
> * Use a local patch to simply disable the parts of the passes we  
> don't want,
> 	 optionally protected by some check to only disable it when it's  
> unwanted.
> 	 This would be a very effective approach, though also very much  
> unwanted.
> 	 Local patches to the LLVM source are a pain to maintain.
Yes, but it works!
> * Use some kind of subclassed pass. Since, AFAICS, simply  
> subclassing an
> 	 existing pass doesn't really work due to the class ID stuff, this  
> requires
> 	 making a superclass to do most of the work and a subclass to  
> decide when to
> 	 do it in LLVM, so we can add another subclass (similar to the  
> Inliner
> 	 pass). Alternatively, the functionality of instruction combining  
> could be
> 	 split off into a utility class, though that would prevent using  
> overriding
> 	 methods to disable some functionality. This approach could be  
> useful, but
> 	 I can't really see how it would work out yet.
> * Add options to the current passes. I could add an option to the  
> current
>   passes to make them do what I want (either using an option to
> 	 createXXXPass() and the constructor, or perhaps using a  
> set_XXX_option()
> 	 methode or something). This might work for SimplifyCFG, since that  
> option
> 	 could be made a bit more generic, such as "Don't move
instructions
> between
> 	 blocks" (leaving SimplifyCFG free to merge blocks whenever  
> appropriate).
>
> 	 For InstructionCombining this is harder, since our requirements  
> are not as
> 	 easily captured in an elegant option, I'm afraid.
In general, I'd like to avoid additional options if possible.
> * Mark instructions / values as immutable. We could write a pass  
> that marks
>   the values we want preserved as immutable and other passes should  
> leave
> 	 those values alone. This requires quite some modification to LLVM  
> (probably
> 	 even the IR) and all optimization passes. Though I think it's  
> actually
> 	 quite an elegant solution, it's probably hard to express  
> everything we need
> 	 in it (also, if we mark some instruction or value as immutable,  
> it's hard
> 	 to prevent a pass from making a copy of the instruction (perhaps
> 	 indirectly) and simply making the immutable instruction unused).
Usually "volatile" is one hammer used in such situation to instruct  
optimizers to stay away. But it is not elegant and smells like hack.
> * Use some kind of TargetInfo/TargetData struct to control certain
>   optimizations. I'm not really sure how this is used now, but I could
> 	 imagine that there is some interface for optimization passes to  
> find out
> 	 what optimizations are worthwile and what are not (something like a
> 	 bool isBetter(Value* old, Value* new) as a very simple example. Is  
> there
> 	 already something like this?
Instruction Combiner uses TargetData. However AFAIK, there is not any  
general purpose interface available to select optimization based on  
target.
> None of these options seem too attractive to me, what do others  
> think? Is
> there some other option I'm missing here?
Would it be possible for write a code gen level pass to sink the  
instructions to maximize parallel instructions in a block for your  
target ?

-
Devang

Chris Lattner

2008-May-28 06:25 UTC

head link

[LLVMdev] Making optimization passes do less

On May 26, 2008, at 4:23 AM, Matthijs Kooijman wrote:> I'm currently struggling with a few optimization passes that change  
> stuff I
> don't want to be changed.
Hehe ok.
> However, for the most part those passes (InstructionCombining
> and SimplifyCFG currently) do stuff that I do want, so disabling them
> alltogether doesn't help me much.
Ok.
> The problem arises because the architecture I'm compiling for is quite
> non-standard. In particular, it has the ability to execute a lot of
> instructions in parallel, but at the same time can't execute  
> everything you
> throw at it.
Ok, that is odd :)
> My problem with SimplifyCFG is the following: Whenever the if and  
> else branch
> start with the same instruction, it gets hoisted up into the  
> predecessor
> block. For my architecture, instructions in different blocks can't  
> be run in
> parallel, so this optimization makes code either very inefficient or  
> not
> compile at all.
There are two different issues here.  Passes like instcombine and  
simplifycfg [which is really "basic block combine" :) ] do two things:

1. They make changes that are clear wins, e.g. deleting unconditional  
branches and noop instrs.
2. They change code into more canonical form.

Merging repeated instructions is an important canonicalization because  
it can unlock other optimizations.  The fact that your target doesn't  
like code in this form is not a good reason for simplifycfg to stop  
doing it. :)
> InstructionCombining has this habit of removing unneeded bits from  
> constants.
> For example, if I do i & 63, where i is a loop counter that is  
> always even,
> this gets replaced by i & 62. Which gives, of course, the same  
> results when
> interpreted, but our backend cannot just use any constant as an &  
> mask (in
> particular, it can only use a limited amount of them).
Sure, this is another example of canonicalization.  Are you using the  
LLVM code generator?  It has support for handling this specifically.   
ARM and Alpha in particular have special instructions that only work  
with very specific and masks.  If you write a pattern/instruction that  
matches (and myreg, 255) for example, this will match a dag node for  
"(and myreg, 16)" if the code generator knows that the other bits are
already zero.
> I'd very much prefer to
> preserve the original value from the source here (I also assume that  
> this
> optimization is in place to help further optimizations, because I  
> can't really
> see any use of this change on regular architectures...).
This is folly.  If the user wrote the code in the "optimized" form  
that instcombine transforms it into, your code generator should still  
produce the optimized instructions.  You're trading one missed  
optimization for another one.

> I've been thinking a bit on how to achieve this, and I see a few  
> options
> :None of these options seem too attractive to me, what do others  
> think? Is
> there some other option I'm missing here?
I really don't like any of these options.  The best ways to go are:

1) teach your code generator how to do these optimizations, reversing  
the cases that you care about.
2) if #1 isn't feasible, write a canonicalization prepass (like  
codegen prepare) that transforms code from the "canonical optimizer  
form" into a happy form for your target.

-Chris

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - May 2008 - [LLVMdev] Making optimization passes do less

[LLVMdev] Making optimization passes do less

[LLVMdev] Making optimization passes do less

[LLVMdev] Making optimization passes do less

Apparently Analagous Threads