thr3ads.net - llvm dev - [LLVMdev] How best to represent assume statements in LLVM IR? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2012-Oct-02 21:43 UTC

[LLVMdev] How best to represent assume statements in LLVM IR?

On 10/02/2012 12:52 AM, Duncan Sands wrote:> Hi Philip,
>
>> Does anyone have any suggestions on how to best represent an assumption
>> statement(*) in IR?
>
> good question!  There have been various attempts, for example Nick tried
> teaching the optimizers to not prune the branch to unreachable in
>
>    br %cond, label %assumption_holds, %assumption_doesnt_hold
> assumption_doesnt_hold:
>    unreachable
>
> This then leads to %cond being replaced with true everywhere downstream,
> which
> is good.  Unfortunately it also causes a bunch of other optimizations to
> not
> occur, and the overall result was not a win.
>
> Rafael added "range" metadata to the IR, however it is only for
loads.
> If it
> could also be attached to the definition of %cond, saying that the value
> is 1,
> and various places are taught to understand that, then that might be
> another
> possibility.It really seems to me that the core problem is that the assumption 
branch is removed before the information it contained could be leveraged 
to remove downstream code.  If the "unreachable" is replaced with a
call
to an external function marked noreturn (i.e. a trap like function), the 
optimizer does correctly optimize downstream code.  This seems to be a 
classic ordering problem.

Note: I haven't studied how the downstream code is optimized.  It seems 
odd to me that the information is lost when the branch is lost.  I may 
dig into what's causing the analysis information to be dropped - there 
might simply be a bug somewhere along the way.

Nick's approach seems to tackle the ordering problem by preventing the 
optimization of the unreachable branch.  As you pointed out, this is 
undesirable since we don't want the code to calculate the assumption 
condition to be retained.

I'm not quiet clear on how Rafael's solution addresses the problem at 
hand, so I won't address that.

I've gone ahead and implemented my hack solution - use an external 
function to force downstream optimization, replace it with unreachable 
and then re-optimize.  At least for the trivial examples I'm testing 
with, this seems to get the desired behavior.  The downside of course is 
that it requires the all of the other passes to be run twice.

If anyone is interested, I can throw the (nearly trivial) code up on 
github for anyone else who wants it.

 > However I think people would rather only have such> metadata be
> attached to "inputs" to a function: function parameters, loads
from
> memory etc.
> In your case, is the assumption usually about a function parameter?For the initial examples I'm considering - specifically object 
invariants and function preconditions - all assumptions are about 
function parameters.  However, I can think of some examples where having 
the ability to note an assumption about an arbitrary intermediate value 
might be useful.  If I can, I'd like to not exclude that case.

One such example (for a language with array bounds checks):
int n = external_func_compute_loop_bound();
assume n < array.size();
for(int i = 0; i < n; i++) {
   process(arrray[i]);
}

Philip

*removed extra text for readability*

Hal Finkel

2012-Oct-02 21:55 UTC

head link

[LLVMdev] How best to represent assume statements in LLVM IR?

On Tue, 02 Oct 2012 14:43:51 -0700
Philip Reames <listmail at philipreames.com> wrote:
> On 10/02/2012 12:52 AM, Duncan Sands wrote:
> > Hi Philip,
> >
> >> Does anyone have any suggestions on how to best represent an
> >> assumption statement(*) in IR?
> >
> > good question!  There have been various attempts, for example Nick
> > tried teaching the optimizers to not prune the branch to
> > unreachable in
> >
> >    br %cond, label %assumption_holds, %assumption_doesnt_hold
> > assumption_doesnt_hold:
> >    unreachable
> >
> > This then leads to %cond being replaced with true everywhere
> > downstream, which
> > is good.  Unfortunately it also causes a bunch of other
> > optimizations to not
> > occur, and the overall result was not a win.
> >
> > Rafael added "range" metadata to the IR, however it is only
for
> > loads. If it
> > could also be attached to the definition of %cond, saying that the
> > value is 1,
> > and various places are taught to understand that, then that might be
> > another
> > possibility.
> It really seems to me that the core problem is that the assumption 
> branch is removed before the information it contained could be
> leveraged to remove downstream code.  If the "unreachable" is
> replaced with a call to an external function marked noreturn (i.e. a
> trap like function), the optimizer does correctly optimize downstream
> code.  This seems to be a classic ordering problem.
> 
> Note: I haven't studied how the downstream code is optimized.  It
> seems odd to me that the information is lost when the branch is
> lost.  I may dig into what's causing the analysis information to be
> dropped - there might simply be a bug somewhere along the way.
> 
> Nick's approach seems to tackle the ordering problem by preventing
> the optimization of the unreachable branch.  As you pointed out, this
> is undesirable since we don't want the code to calculate the
> assumption condition to be retained.
> 
> I'm not quiet clear on how Rafael's solution addresses the problem
at
> hand, so I won't address that.
> 
> I've gone ahead and implemented my hack solution - use an external 
> function to force downstream optimization, replace it with
> unreachable and then re-optimize.  At least for the trivial examples
> I'm testing with, this seems to get the desired behavior.  The
> downside of course is that it requires the all of the other passes to
> be run twice.
How about having an analysis pass that turns these things into SCEVs
that can be associated as predicates with each downstream instruction?
These can stick around so long as this analysis is not invalidated
(this would imply some infrastructure work, but the work would be
incremental in nature).
> 
> If anyone is interested, I can throw the (nearly trivial) code up on 
> github for anyone else who wants it.
Can't hurt :)

 -Hal
> 
>  > However I think people would rather only have such
> > metadata be
> > attached to "inputs" to a function: function parameters,
loads from
> > memory etc.
> > In your case, is the assumption usually about a function parameter?
> For the initial examples I'm considering - specifically object 
> invariants and function preconditions - all assumptions are about 
> function parameters.  However, I can think of some examples where
> having the ability to note an assumption about an arbitrary
> intermediate value might be useful.  If I can, I'd like to not
> exclude that case.
> 
> One such example (for a language with array bounds checks):
> int n = external_func_compute_loop_bound();
> assume n < array.size();
> for(int i = 0; i < n; i++) {
>    process(arrray[i]);
> }
> 
> Philip
> 
> *removed extra text for readability*
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Philip Reames

2012-Oct-02 22:01 UTC

head link

[LLVMdev] How best to represent assume statements in LLVM IR?

On 10/02/2012 02:55 PM, Hal Finkel wrote:> On Tue, 02 Oct 2012 14:43:51 -0700
> Philip Reames <listmail at philipreames.com> wrote:
>
>> On 10/02/2012 12:52 AM, Duncan Sands wrote:
>>> Hi Philip,
>>>
>>>> Does anyone have any suggestions on how to best represent an
>>>> assumption statement(*) in IR?
>>>
>>> good question!  There have been various attempts, for example Nick
>>> tried teaching the optimizers to not prune the branch to
>>> unreachable in
>>>
>>>     br %cond, label %assumption_holds, %assumption_doesnt_hold
>>> assumption_doesnt_hold:
>>>     unreachable
>>>
>>> This then leads to %cond being replaced with true everywhere
>>> downstream, which
>>> is good.  Unfortunately it also causes a bunch of other
>>> optimizations to not
>>> occur, and the overall result was not a win.
>>>
>>> Rafael added "range" metadata to the IR, however it is
only for
>>> loads. If it
>>> could also be attached to the definition of %cond, saying that the
>>> value is 1,
>>> and various places are taught to understand that, then that might
be
>>> another
>>> possibility.
>> It really seems to me that the core problem is that the assumption
>> branch is removed before the information it contained could be
>> leveraged to remove downstream code.  If the "unreachable" is
>> replaced with a call to an external function marked noreturn (i.e. a
>> trap like function), the optimizer does correctly optimize downstream
>> code.  This seems to be a classic ordering problem.
>>
>> Note: I haven't studied how the downstream code is optimized.  It
>> seems odd to me that the information is lost when the branch is
>> lost.  I may dig into what's causing the analysis information to be
>> dropped - there might simply be a bug somewhere along the way.
>>
>> Nick's approach seems to tackle the ordering problem by preventing
>> the optimization of the unreachable branch.  As you pointed out, this
>> is undesirable since we don't want the code to calculate the
>> assumption condition to be retained.
>>
>> I'm not quiet clear on how Rafael's solution addresses the
problem at
>> hand, so I won't address that.
>>
>> I've gone ahead and implemented my hack solution - use an external
>> function to force downstream optimization, replace it with
>> unreachable and then re-optimize.  At least for the trivial examples
>> I'm testing with, this seems to get the desired behavior.  The
>> downside of course is that it requires the all of the other passes to
>> be run twice.
>
> How about having an analysis pass that turns these things into SCEVs
> that can be associated as predicates with each downstream instruction?
> These can stick around so long as this analysis is not invalidated
> (this would imply some infrastructure work, but the work would be
> incremental in nature).I like this idea and may pursue it.  This is a side project and I've got 
a few things above this on the queue for that project, so it may be a 
bit.  I'll let you know once I start looking at that
aspect.>
>>
>> If anyone is interested, I can throw the (nearly trivial) code up on
>> github for anyone else who wants it.
>
> Can't hurt :)Give me a day or two to get the code in a shape where I'm not ashamed to 
publicly attach my name to it.  :)

Philip
>
>>
>>   > However I think people would rather only have such
>>> metadata be
>>> attached to "inputs" to a function: function parameters,
loads from
>>> memory etc.
>>> In your case, is the assumption usually about a function parameter?
>> For the initial examples I'm considering - specifically object
>> invariants and function preconditions - all assumptions are about
>> function parameters.  However, I can think of some examples where
>> having the ability to note an assumption about an arbitrary
>> intermediate value might be useful.  If I can, I'd like to not
>> exclude that case.
>>
>> One such example (for a language with array bounds checks):
>> int n = external_func_compute_loop_bound();
>> assume n < array.size();
>> for(int i = 0; i < n; i++) {
>>     process(arrray[i]);
>> }
>>
>> Philip
>>
>> *removed extra text for readability*
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Oct 2012 - [LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

Possibly Parallel Threads