thr3ads.net - llvm dev - [LLVMdev] Proposal for Poison Semantics [Feb 2015]

If this information is useful, please help other people find it:
Share via:

David Majnemer

2015-Feb-03 17:36 UTC

[LLVMdev] Proposal for Poison Semantics

On Tue, Feb 3, 2015 at 3:15 AM, Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> Hi,
>
>
>
> Thanks David for putting up this proposal together!
>
> I like the idea of having poison values behave more like undef (i.e., per
> bit, with run-time behavior).
>
> One of the problems this proposal solves is speculation of 'a
&& b' into
> 'a & b'. Currently this is illegal (despite sometimes
simplifycfg doing it
> anyway).
>
> It also fixes bugs like http://llvm.org/PR20997
>
>
>
> The proposal doesn't say anything about branching on a poison value. I
> assume this should stay as the current interpretation -- that such branches
> should be undefined behavior (since we cannot branch to multiple places at
> the same time -- even if they would compute the same values; that's
already
> too hard for the compiler to analyze).
>
The RFC intended to make branching on poison values OK.  If branching on
poison wasn't OK, then we couldn't go from select to -> br/phi.

>
>
> There's another caveat: it *does* seem to fix the problem described by
Dan
> in http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html
>
> However, it introduces a potential performance penalty: we won't be
able
> to speculate instructions with undefined behavior whose input may be
poison.
>
>
>
> For example, take the following code:
>
> loop:
>
>   %add = add nsw %x, %y
>
>   %div = udiv %z, %add
>
>   … use %div only in the case that %add does not overflow and is non-zero
>
>
>
> We can move the %add outside of the loop, but we cannot move the division.
> With the reason being that if %add overflows, then %add is poison and
> therefore it can take any value (in particular, it can be 0), triggering
> undef behavior in %div.  Therefore, we cannot freely move %div, unless we
> can prove that %add will never be 0 nor poison.  This sounds hard for the
> compiler to do, and I guess we'll have some regressions (e.g., LICM has
to
> be more conservative). Nevertheless, I'm all for fixing poison once and
for
> all!
>
Believe it or not, I already fixed this bug (PR21412). :)

>
>
> BTW, would it help if I produced a version of Alive that implements the
> semantics being proposed?  (with no performance guarantees for this
> prototype).  The cool thing is that then we can run it through our database
> of 300+ InstCombine optimizations and see which ones would have to be
> removed/fixed.
>
I think such a thing would be great.  However, there is a problem that the
RFC wasn't aware of when it was written:

consider:
%S = select %A, %B, undef

without us knowing anything about %A or %B, we will replace all uses of %S
with %B.  This transform would be considered wrong with the RFC in mind.

If this transform was valid, there could not be any value or value-like
property in LLVM with semantics more powerful than undef.  This makes me
think that what LLVM *actually* implements is not poison or something like
it.

On the flip side, we could say that this transform is nonsense but I'd
rather not pessimize LLVM like that.

>
>
> Nuno
>
>
>
>
>
> *From:* David Majnemer
> *Sent:* 28 January 2015 02:50
> *To:* llvmdev at cs.uiuc.edu
> *Cc:* Sanjoy Das; Dan Gohman; John Regehr; Nuno Lopes
> *Subject:* RFC: Proposal for Poison Semantics
>
>
>
> Hello,
>
>
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
>
>
> --
>
> David
>
>
>
>
>
> # LLVM Poison Semantics
>
>
>
> Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to say the
> least, confusing for users, researchers and engineers working with LLVM.
>
>
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
>
>
> ## A Quick Introduction to Poison
>
>
>
> Let's start with a concrete motivating example in C:
>
> ```
>
> int isSumGreater(int a, int b) {
>
>   return a + b > a;
>
> }
>
> ```
>
>
>
> The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
>
>
> ```
>
> define i32 @isSumGreater(i32 %a, i32 %b) {
>
> entry:
>
>   %add = add i32 %a, %b
>
>   %cmp = icmp sgt i32 %add, %a
>
>   %conv = zext i1 %cmp to i32
>
>   ret i32 %conv
>
> }
>
> ```
>
>
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
>
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
> for "no signed wrap", `nuw` is short for "no unsigned
wrap".
>
>
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
>
> LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
>
>
>
> ## Differences Between LLVM and C/C++
>
>
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
>
>
> Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates signed
> overflow results in poison. Values which have no relationship with poisoned
> values are not effected by them.
>
>
>
> Let us take the following C program into consideration:
>
> ```
>
> int calculateImportantResult(int a, int b) {
>
>   int result = 0;
>
>   if (a) {
>
>     result = a + b;
>
>   }
>
>   return result;
>
> }
>
> ```
>
>
>
> A straightforward lowering to LLVM IR could be:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   br i1 %tobool, label %if.then, label %if.end
>
>
>
> if.then:
>
>   %add = add nsw i32 %a, %b
>
>   br label %if.end
>
>
>
> if.end:
>
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   %add = add nsw i32 %a, %b
>
>   %result = select i1 %tobool, i32 0, i32 %add
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> In the original code, the calculation of `%add` was control dependent.
>
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
>
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
>
>
> # Computation Involving Poison Values
>
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
>
>
> Examples of this rule which will result in poison:
>
> ```
>
>   %add = add i32 %x, %always_poison
>
>   %sub = sub i32 %x, %always_poison
>
>   %xor = xor i32 %x, %always_poison
>
>   %mul = mul i32 %always_poison, 1
>
> ```
>
>
>
> Examples of this rule which do not result in poison:
>
> ```
>
>   %or  = or  i32 %always_poison, 2
>
>   %and = and i32 %always_poison, 2
>
>   %mul = mul i32 %always_poison, 0
>
> ```
>
>
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
>
>
>
> The following example is only poison if `%cond` is false:
>
> ```
>
>   %sel = select i1 %cond, i32 2, %always_poison
>
> ```
>
>
>
> ### Is it safe to have poison as a `call` argument?
>
>
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
>
>
> LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
>
>
>
> ### Is it safe to store poison to memory?
>
>
>
> `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32* %mem` will
> result in `%load` being a poison value.
>
>
>
> ### Is it safe to load or store a poison memory location?
>
>
>
> No.  Poison works just like `undef` in this respect.
>
>
>
> ### Does comparing a poison value result in poison?
>
>
>
> It depends.  If the comparison couldn't solely be determined by looking
at
> the other operand, the result is poison.
>
>
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
>
> However, `icmp i32 ne %always_poison, 7` is poison.
>
>
>
> ### What if the condition operand in a `select` is poison?
>
>
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel`
> is either `true` or `false`.  Because, `%sel` depends on `%always_poison`
> it too is poison.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/9a8703ad/attachment.html>

John Regehr

2015-Feb-04 17:13 UTC

head link

[LLVMdev] Proposal for Poison Semantics

> I think such a thing would be great.  However, there is a problem that the
RFC wasn't aware of when it was
> written:
> 
> consider:
> %S = select %A, %B, undef
> 
> without us knowing anything about %A or %B, we will replace all uses of %S
with %B.  This transform would be
> considered wrong with the RFC in mind.
> 
> If this transform was valid, there could not be any value or value-like
property in LLVM with semantics more
> powerful than undef.  This makes me think that what LLVM *actually*
implements is not poison or something like
> it.
Is it possible that the new weaker poison subsumes undef?  The interaction 
between these different but related UB-like concepts is confusing at best.

John

Nuno Lopes

2015-Feb-04 17:20 UTC

head link

[LLVMdev] Proposal for Poison Semantics

>> Thanks David for putting up this proposal together!
>> I like the idea of having poison values behave more like undef (i.e.,
per bit, with run-time behavior).
>> One of the problems this proposal solves is speculation of 'a
&& b' into 'a & b'. Currently this is illegal (despite
sometimes simplifycfg doing it anyway).
>> It also fixes bugs like http://llvm.org/PR20997
 >> >> The proposal doesn't say anything about branching on a poison
value. I assume this should stay as the current interpretation -- that such
branches should be undefined behavior (since we cannot branch to multiple places
at the same time -- even if they >> would compute the same values;
that's already too hard for the compiler to analyze).
> 
> The RFC intended to make branching on poison values OK.  If branching on
poison wasn't OK, then we couldn't go from select to -> br/phi. 
Ok, agreed. That case will be always safe. 

 >> There's another caveat: it *does* seem to fix the problem described
by Dan in http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html
>> However, it introduces a potential performance penalty: we won't be
able to speculate instructions with undefined behavior whose input may be
poison.
>>  
>> For example, take the following code:
>> loop:
>>   %add = add nsw %x, %y
>>   %div = udiv %z, %add
>>   … use %div only in the case that %add does not overflow and is
non-zero
>>  
>> We can move the %add outside of the loop, but we cannot move the
division. With the reason being that if %add overflows, then %add is poison and
therefore it can take any value (in particular, it can be 0), triggering undef
behavior in %div.  Therefore, we cannot freely move %div, unless we can prove
that %add will never be 0 nor poison.  This sounds hard for the compiler to do,
and I guess we'll have some regressions (e.g., LICM has to be more
conservative). Nevertheless, I'm all for fixing poison once and for all!
> 
> Believe it or not, I already fixed this bug (PR21412). :) 
Cool! :)

 >> BTW, would it help if I produced a version of Alive that implements the
semantics being proposed?  (with no performance guarantees for this prototype). 
The cool thing is that then we can run it through our database of 300+
InstCombine optimizations and see which ones would have to be removed/fixed.
> 
> I think such a thing would be great.  However, there is a problem that the
RFC wasn't aware of when it was written:
> 
> consider:
> %S = select %A, %B, undef
> 
> without us knowing anything about %A or %B, we will replace all uses of %S
with %B.  This transform would be considered wrong with the RFC in mind.
> 
> If this transform was valid, there could not be any value or value-like
property in LLVM with semantics more powerful than undef.  This makes me think
that what LLVM *actually* implements is not poison or something like it.
> 
> On the flip side, we could say that this transform is nonsense but I'd
rather not pessimize LLVM like that. 
Ah, you're saying that poison is strictly stronger UB than undef. And the
reason being that poison may lead to UB when used in certain operations.  Nice
catch.
But we could have a simple precondition that states that this transformation is
correct if %A is not any operations with nsw/nuw/exact flags. Sure, it's not
as good as the situation we have today, but the current situation doesn't
look very good anyway :)

I have a question though: When does poison becomes UB? On external calls and
volatile stores only?  Any other visible side-effecting operations?  (at least
those two have to be UB, right?)

Nuno

Nuno Lopes

2015-Feb-04 17:42 UTC

head link

[LLVMdev] Proposal for Poison Semantics

>> I think such a thing would be great.  However, there is a problem that 
>> the RFC wasn't aware of when it was
>> written:
>> 
>> consider:
>> %S = select %A, %B, undef
>> 
>> without us knowing anything about %A or %B, we will replace all uses 
>> of %S with %B.  This transform would be considered wrong with the RFC
in mind.
>> 
>> If this transform was valid, there could not be any value or 
>> value-like property in LLVM with semantics more powerful than undef.  
>> This makes me think that what LLVM *actually* implements is not poison
or something like it.
> 
>Is it possible that the new weaker poison subsumes undef?  The interaction
between these different but related UB-like concepts is confusing at best.
That's a good point!
AFAICT there are only 2 differences between undef and poison:
 1) poison can trigger UB at some point in the future.
 2) the behavior of sext (and zext?) is different (copying the sign bit vs
adding poison bits)

(read more about 2) at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html)


Right now. I think the only places that generate undef are reads of
uninitialized variables, and I can't remember of any other.  If that's
the case, it's fair to ask why can't we use poison for everything?
Reading uninitialized memory is undefined behavior, so in theory we could return
a poison value.  On the other hand, the behavior may look strange at times.
Imagine the following:

%a = load i4* %ptr
%b = sext %a to i8
%c = and %b, 0x8080

If %a returns undef, then %c can only be 0 or 0x8080 (because it copies the
sign). If %a is poison, then %c can be any of the 4 combinations.
The developer may not expect this behavior.  But, anyway, UB was executed, so
not our problem..

Nuno

David Majnemer

2015-Feb-04 18:26 UTC

head link

[LLVMdev] Proposal for Poison Semantics

On Wed, Feb 4, 2015 at 9:13 AM, John Regehr <regehr at cs.utah.edu> wrote:
> I think such a thing would be great.  However, there is a problem that the
>> RFC wasn't aware of when it was
>> written:
>>
>> consider:
>> %S = select %A, %B, undef
>>
>> without us knowing anything about %A or %B, we will replace all uses of
>> %S with %B.  This transform would be
>> considered wrong with the RFC in mind.
>>
>> If this transform was valid, there could not be any value or value-like
>> property in LLVM with semantics more
>> powerful than undef.  This makes me think that what LLVM *actually*
>> implements is not poison or something like
>> it.
>>
>
> Is it possible that the new weaker poison subsumes undef?  The interaction
> between these different but related UB-like concepts is confusing at best.

I am actively working towards removing poison altogether.  I think a more
accurate model of LLVM's wrapping flags is not poison but instead something
akin to the fast-math flags on floating point instructions.

>
>
> John
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150204/a8dbeecb/attachment.html>

Sanjoy Das

2015-Feb-04 18:27 UTC

head link

[LLVMdev] Proposal for Poison Semantics

> But we could have a simple precondition that states that this
transformation is correct if %A is not any operations with nsw/nuw/exact flags.
Sure, it's not as good as the situation we have today, but the current
situation doesn't look very good anyway :)
Function arguments can be poison too.  One way to get around this is
to declare that calling a function with poison as an argument is UB,
but then outlining functions without introducing UB becomes tricky.  A
similar issue exists with loading poison -- if we make storing poison
UB period, then reg2mem will introduce UB.
>
> I have a question though: When does poison becomes UB? On external calls
and volatile stores only?  Any other visible side-effecting operations?  (at
least those two have to be UB, right?)
>
> Nuno
>

David Majnemer

2015-Feb-04 18:29 UTC

head link

[LLVMdev] Proposal for Poison Semantics

On Wed, Feb 4, 2015 at 9:20 AM, Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> >> Thanks David for putting up this proposal together!
> >> I like the idea of having poison values behave more like undef
(i.e.,
> per bit, with run-time behavior).
> >> One of the problems this proposal solves is speculation of 'a
&& b'
> into 'a & b'. Currently this is illegal (despite sometimes
simplifycfg
> doing it anyway).
> >> It also fixes bugs like http://llvm.org/PR20997
>  >>
> >> The proposal doesn't say anything about branching on a poison
value. I
> assume this should stay as the current interpretation -- that such branches
> should be undefined behavior (since we cannot branch to multiple places at
> the same time -- even if they >> would compute the same values;
that's
> already too hard for the compiler to analyze).
> >
> > The RFC intended to make branching on poison values OK.  If branching
on
> poison wasn't OK, then we couldn't go from select to -> br/phi.
>
> Ok, agreed. That case will be always safe.
>
>
> >> There's another caveat: it *does* seem to fix the problem
described by
> Dan in
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html
> >> However, it introduces a potential performance penalty: we
won't be
> able to speculate instructions with undefined behavior whose input may be
> poison.
> >>
> >> For example, take the following code:
> >> loop:
> >>   %add = add nsw %x, %y
> >>   %div = udiv %z, %add
> >>   … use %div only in the case that %add does not overflow and is
> non-zero
> >>
> >> We can move the %add outside of the loop, but we cannot move the
> division. With the reason being that if %add overflows, then %add is poison
> and therefore it can take any value (in particular, it can be 0),
> triggering undef behavior in %div.  Therefore, we cannot freely move %div,
> unless we can prove that %add will never be 0 nor poison.  This sounds hard
> for the compiler to do, and I guess we'll have some regressions (e.g.,
LICM
> has to be more conservative). Nevertheless, I'm all for fixing poison
once
> and for all!
> >
> > Believe it or not, I already fixed this bug (PR21412). :)
>
> Cool! :)
>
>
> >> BTW, would it help if I produced a version of Alive that
implements the
> semantics being proposed?  (with no performance guarantees for this
> prototype).  The cool thing is that then we can run it through our database
> of 300+ InstCombine optimizations and see which ones would have to be
> removed/fixed.
> >
> > I think such a thing would be great.  However, there is a problem that
> the RFC wasn't aware of when it was written:
> >
> > consider:
> > %S = select %A, %B, undef
> >
> > without us knowing anything about %A or %B, we will replace all uses
of
> %S with %B.  This transform would be considered wrong with the RFC in mind.
> >
> > If this transform was valid, there could not be any value or
value-like
> property in LLVM with semantics more powerful than undef.  This makes me
> think that what LLVM *actually* implements is not poison or something like
> it.
> >
> > On the flip side, we could say that this transform is nonsense but
I'd
> rather not pessimize LLVM like that.
>
> Ah, you're saying that poison is strictly stronger UB than undef. And
the
> reason being that poison may lead to UB when used in certain operations.
> Nice catch.
> But we could have a simple precondition that states that this
> transformation is correct if %A is not any operations with nsw/nuw/exact
> flags. Sure, it's not as good as the situation we have today, but the
> current situation doesn't look very good anyway :)
>
> I have a question though: When does poison becomes UB? On external calls
> and volatile stores only?  Any other visible side-effecting operations?
> (at least those two have to be UB, right?)
>
Saying that calls to external functions results in UB is too strict.  It's
only if the external function has some side-effecting behavior.

>
> Nuno
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150204/49903b86/attachment.html>

llvm dev - Feb 2015 - [LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics