thr3ads.net - llvm dev - [LLVMdev] How best to represent assume statements in LLVM IR? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2012-Oct-01 20:39 UTC

[LLVMdev] How best to represent assume statements in LLVM IR?

Does anyone have any suggestions on how to best represent an assumption 
statement(*) in IR?  In particular, I want to expose the information 
implied by the assumption to the optimization passes without emitting 
code (after optimization) to check the assumption itself.  I've tried a 
couple of options so far, and none have gotten me quite the right 
semantics.  Has anyone else implemented this or does anyone have 
suggestions on how best to do this?


** Background **

By "assumption statement" I mean the standard "assume(boolean 
expression)" construct that shows up in program verification.  It 
essentially just indicates a programmer (or tool) provided fact that 
holds at that particular line.  It's very similar to an assertion 
statement except that the condition is not checked at runtime. 
Conceptually, an assumption statement has no runtime semantics, but I 
want to be able to leverage the assumption to prune any unnecessary 
conditionals downstream from the assumption.

To give an example in pseudo code, I'd like to start with something like 
this:
assume x == 5
if x != 5 { ...}
use(x)

And after optimization end up with this:
use(5)

** End Background **


The general strategy I've been using is to represent the assumption as a 
branch on the conditional value and rely on the optimizer to clean it 
up.  The options I've considered within the (assumed not taken) branch are:
- "unreachable" - With this choice, the optimizer nicely removes all
of
the branches leading to the unreachable statement - i.e. the assumption 
has no cost - but downstream conditionals which should be constant given 
the assumption are not optimized.  This surprises me.  Am I 
misunderstanding the intended use of unreachable here?
- "call void @llvm.trap() noreturn nounwind" - With this choice, 
downstream conditionals are optimized away, but the trap instruction is 
emitted - i.e. the assumption is checked at runtime.

The next idea I'm considering is to essentially run the optimizer twice. 
  First, I'd emit a call to a custom intrinsic (i.e. a wrapper around 
trap), and run the optimizer.  This would have the effect of removing 
any downstream branches implied by the assumption.  Second, I'd insert a 
custom pass to remove calls to the intrinsic and replace them with 
unreachable.  Then I'd rerun the full set of optimizations.

Based on the observed behavior, this should get me the desired result. 
However, this is starting to feel like somewhat of a hack and I wanted 
to get feedback from the community before continuing.

Notes:
- To clarify terminology, the "optimizer" I refer to above is the 
standard set of passes at "-O3".
- I haven't investigated which of the various optimization passes are 
responsible for each observed behavior.
- All of my tests were run on a x86_64 system using a slightly modified 
Clang/LLVM 3.0.  I don't believe any of the changes are relevant.  If I 
need to rerun my tests using 3.1, let me know.

Yours,
Philip Reames

Duncan Sands

2012-Oct-02 07:52 UTC

head link

[LLVMdev] How best to represent assume statements in LLVM IR?

Hi Philip,
> Does anyone have any suggestions on how to best represent an assumption
> statement(*) in IR?
good question!  There have been various attempts, for example Nick tried
teaching the optimizers to not prune the branch to unreachable in

   br %cond, label %assumption_holds, %assumption_doesnt_hold
assumption_doesnt_hold:
   unreachable

This then leads to %cond being replaced with true everywhere downstream, which
is good.  Unfortunately it also causes a bunch of other optimizations to not
occur, and the overall result was not a win.

Rafael added "range" metadata to the IR, however it is only for loads.
If it
could also be attached to the definition of %cond, saying that the value is 1,
and various places are taught to understand that, then that might be another
possibility.  However I think people would rather only have such metadata be
attached to "inputs" to a function: function parameters, loads from
memory etc.
In your case, is the assumption usually about a function parameter?

Ciao, Duncan.

   In particular, I want to expose the information implied
by> the assumption to the optimization passes without emitting code (after
> optimization) to check the assumption itself.  I've tried a couple of
options so
> far, and none have gotten me quite the right semantics.  Has anyone else
> implemented this or does anyone have suggestions on how best to do this?
>
>
> ** Background **
>
> By "assumption statement" I mean the standard
"assume(boolean expression)"
> construct that shows up in program verification.  It essentially just
indicates
> a programmer (or tool) provided fact that holds at that particular line. 
It's
> very similar to an assertion statement except that the condition is not
checked
> at runtime. Conceptually, an assumption statement has no runtime semantics,
but
> I want to be able to leverage the assumption to prune any unnecessary
> conditionals downstream from the assumption.
>
> To give an example in pseudo code, I'd like to start with something
like this:
> assume x == 5
> if x != 5 { ...}
> use(x)
>
> And after optimization end up with this:
> use(5)
>
> ** End Background **
>
>
> The general strategy I've been using is to represent the assumption as
a branch
> on the conditional value and rely on the optimizer to clean it up.  The
options
> I've considered within the (assumed not taken) branch are:
> - "unreachable" - With this choice, the optimizer nicely removes
all of the
> branches leading to the unreachable statement - i.e. the assumption has no
cost
> - but downstream conditionals which should be constant given the assumption
are
> not optimized.  This surprises me.  Am I misunderstanding the intended use
of
> unreachable here?
> - "call void @llvm.trap() noreturn nounwind" - With this choice,
downstream
> conditionals are optimized away, but the trap instruction is emitted - i.e.
the
> assumption is checked at runtime.
>
> The next idea I'm considering is to essentially run the optimizer
twice.  First,
> I'd emit a call to a custom intrinsic (i.e. a wrapper around trap), and
run the
> optimizer.  This would have the effect of removing any downstream branches
> implied by the assumption.  Second, I'd insert a custom pass to remove
calls to
> the intrinsic and replace them with unreachable.  Then I'd rerun the
full set of
> optimizations.
>
> Based on the observed behavior, this should get me the desired result.
However,
> this is starting to feel like somewhat of a hack and I wanted to get
feedback
> from the community before continuing.
>
> Notes:
> - To clarify terminology, the "optimizer" I refer to above is the
standard set
> of passes at "-O3".
> - I haven't investigated which of the various optimization passes are
> responsible for each observed behavior.
> - All of my tests were run on a x86_64 system using a slightly modified
> Clang/LLVM 3.0.  I don't believe any of the changes are relevant.  If I
need to
> rerun my tests using 3.1, let me know.
>
> Yours,
> Philip Reames
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Philip Reames

2012-Oct-02 21:43 UTC

head link

[LLVMdev] How best to represent assume statements in LLVM IR?

On 10/02/2012 12:52 AM, Duncan Sands wrote:> Hi Philip,
>
>> Does anyone have any suggestions on how to best represent an assumption
>> statement(*) in IR?
>
> good question!  There have been various attempts, for example Nick tried
> teaching the optimizers to not prune the branch to unreachable in
>
>    br %cond, label %assumption_holds, %assumption_doesnt_hold
> assumption_doesnt_hold:
>    unreachable
>
> This then leads to %cond being replaced with true everywhere downstream,
> which
> is good.  Unfortunately it also causes a bunch of other optimizations to
> not
> occur, and the overall result was not a win.
>
> Rafael added "range" metadata to the IR, however it is only for
loads.
> If it
> could also be attached to the definition of %cond, saying that the value
> is 1,
> and various places are taught to understand that, then that might be
> another
> possibility.It really seems to me that the core problem is that the assumption 
branch is removed before the information it contained could be leveraged 
to remove downstream code.  If the "unreachable" is replaced with a
call
to an external function marked noreturn (i.e. a trap like function), the 
optimizer does correctly optimize downstream code.  This seems to be a 
classic ordering problem.

Note: I haven't studied how the downstream code is optimized.  It seems 
odd to me that the information is lost when the branch is lost.  I may 
dig into what's causing the analysis information to be dropped - there 
might simply be a bug somewhere along the way.

Nick's approach seems to tackle the ordering problem by preventing the 
optimization of the unreachable branch.  As you pointed out, this is 
undesirable since we don't want the code to calculate the assumption 
condition to be retained.

I'm not quiet clear on how Rafael's solution addresses the problem at 
hand, so I won't address that.

I've gone ahead and implemented my hack solution - use an external 
function to force downstream optimization, replace it with unreachable 
and then re-optimize.  At least for the trivial examples I'm testing 
with, this seems to get the desired behavior.  The downside of course is 
that it requires the all of the other passes to be run twice.

If anyone is interested, I can throw the (nearly trivial) code up on 
github for anyone else who wants it.

 > However I think people would rather only have such> metadata be
> attached to "inputs" to a function: function parameters, loads
from
> memory etc.
> In your case, is the assumption usually about a function parameter?For the initial examples I'm considering - specifically object 
invariants and function preconditions - all assumptions are about 
function parameters.  However, I can think of some examples where having 
the ability to note an assumption about an arbitrary intermediate value 
might be useful.  If I can, I'd like to not exclude that case.

One such example (for a language with array bounds checks):
int n = external_func_compute_loop_bound();
assume n < array.size();
for(int i = 0; i < n; i++) {
   process(arrray[i]);
}

Philip

*removed extra text for readability*

David Blaikie

2012-Oct-02 23:55 UTC

head link

[LLVMdev] How best to represent assume statements in LLVM IR?

The history of this issue is in http://llvm.org/PR810 - please add any
new insight/ideas there (& read Sabre's (Chris Lattner) ye olde txt
file (mentioned in the bug) on his thoughts on the issue too).

On Mon, Oct 1, 2012 at 1:39 PM, Philip Reames <listmail at
philipreames.com> wrote:> Does anyone have any suggestions on how to best represent an assumption
> statement(*) in IR?  In particular, I want to expose the information
implied
> by the assumption to the optimization passes without emitting code (after
> optimization) to check the assumption itself.  I've tried a couple of
> options so far, and none have gotten me quite the right semantics.  Has
> anyone else implemented this or does anyone have suggestions on how best to
> do this?
>
>
> ** Background **
>
> By "assumption statement" I mean the standard
"assume(boolean expression)"
> construct that shows up in program verification.  It essentially just
> indicates a programmer (or tool) provided fact that holds at that
particular
> line.  It's very similar to an assertion statement except that the
condition
> is not checked at runtime. Conceptually, an assumption statement has no
> runtime semantics, but I want to be able to leverage the assumption to
prune
> any unnecessary conditionals downstream from the assumption.
>
> To give an example in pseudo code, I'd like to start with something
like
> this:
> assume x == 5
> if x != 5 { ...}
> use(x)
>
> And after optimization end up with this:
> use(5)
>
> ** End Background **
>
>
> The general strategy I've been using is to represent the assumption as
a
> branch on the conditional value and rely on the optimizer to clean it up.
> The options I've considered within the (assumed not taken) branch are:
> - "unreachable" - With this choice, the optimizer nicely removes
all of the
> branches leading to the unreachable statement - i.e. the assumption has no
> cost - but downstream conditionals which should be constant given the
> assumption are not optimized.  This surprises me.  Am I misunderstanding
the
> intended use of unreachable here?
> - "call void @llvm.trap() noreturn nounwind" - With this choice,
downstream
> conditionals are optimized away, but the trap instruction is emitted - i.e.
> the assumption is checked at runtime.
>
> The next idea I'm considering is to essentially run the optimizer
twice.
> First, I'd emit a call to a custom intrinsic (i.e. a wrapper around
trap),
> and run the optimizer.  This would have the effect of removing any
> downstream branches implied by the assumption.  Second, I'd insert a
custom
> pass to remove calls to the intrinsic and replace them with unreachable.
> Then I'd rerun the full set of optimizations.
>
> Based on the observed behavior, this should get me the desired result.
> However, this is starting to feel like somewhat of a hack and I wanted to
> get feedback from the community before continuing.
>
> Notes:
> - To clarify terminology, the "optimizer" I refer to above is the
standard
> set of passes at "-O3".
> - I haven't investigated which of the various optimization passes are
> responsible for each observed behavior.
> - All of my tests were run on a x86_64 system using a slightly modified
> Clang/LLVM 3.0.  I don't believe any of the changes are relevant.  If I
need
> to rerun my tests using 3.1, let me know.
>
> Yours,
> Philip Reames
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Oct 2012 - [LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

[LLVMdev] How best to represent assume statements in LLVM IR?

Maybe Matching Threads