thr3ads.net - llvm dev - [LLVMdev] Proposal for Poison Semantics [Feb 2015]

If this information is useful, please help other people find it:
Share via:

David Majnemer

2015-Jan-28 02:50 UTC

[LLVMdev] RFC: Proposal for Poison Semantics

Hello,

What follows is my attempt to describe how poison works.  Let me know what
you think.

-- 
David


# LLVM Poison Semantics

Poison is an LLVM concept which exists solely to enable further
optimization of LLVM IR. The exact behavior of poison has been, to say the
least, confusing for users, researchers and engineers working with LLVM.

This document hopes to clear up some of the confusion of poison and
hopefully explain *why* it has its semantics.

## A Quick Introduction to Poison

Let's start with a concrete motivating example in C:
```
int isSumGreater(int a, int b) {
  return a + b > a;
}
```

The C specification permits us to optimize the comparison in `isSumGreater`
to `b > 0` because signed overflow results in undefined behavior.  A
reasonable translation of `isSumGreater` to LLVM IR could be:

```
define i32 @isSumGreater(i32 %a, i32 %b) {
entry:
  %add = add i32 %a, %b
  %cmp = icmp sgt i32 %add, %a
  %conv = zext i1 %cmp to i32
  ret i32 %conv
}
```

However, LLVM cannot determine that `%cmp` should not consider cases where
`%add` resulted in signed overflow.  We need a way to communicate this
information to LLVM.

This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
"no signed wrap", `nuw` is short for "no unsigned wrap".

With these, we can come up with a new formulation of `%add`: `add i32 nsw
%a, %b`.
LLVM can take this into account when it is optimizing the `%cmp` and
replace it with: `icmp sgt i32 %b, 0`.

## Differences Between LLVM and C/C++

There are some interesting differences between what C++ and C specify and
how LLVM behaves with respect to performing an operationg which is not
permitted to overflow.

Perhaps chief among them is that evaluating an expression in C++ or C which
results performs an overflow is undefined behavior. In LLVM, executing an
instruction which is marked `nsw` but which violates signed overflow
results in poison. Values which have no relationship with poisoned values
are not effected by them.

Let us take the following C program into consideration:
```
int calculateImportantResult(int a, int b) {
  int result = 0;
  if (a) {
    result = a + b;
  }
  return result;
}
```

A straightforward lowering to LLVM IR could be:
```
define i32 @calculateImportantResult(i32 %a, i32 %b) {
entry:
  %tobool = icmp ne i32 %a, 0
  br i1 %tobool, label %if.then, label %if.end

if.then:
  %add = add nsw i32 %a, %b
  br label %if.end

if.end:
  %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
  ret i32 %result
}
```

Moving `%add` to the `%entry` block would be preferable and would allow
further optimizations:
```
define i32 @calculateImportantResult(i32 %a, i32 %b) {
entry:
  %tobool = icmp ne i32 %a, 0
  %add = add nsw i32 %a, %b
  %result = select i1 %tobool, i32 0, i32 %add
  ret i32 %result
}
```

In the original code, the calculation of `%add` was control dependent.
Now, `%add` might result in signed overflow in violation of the `nsw` flag.
Despite this, the program should behave as it did before because the
poisoned value is masked-out by the select. The next section will dive into
this in greater detail.

# Computation Involving Poison Values
Poison in a computation results in poison if the result cannot be
constrained by its non-poison operands.

Examples of this rule which will result in poison:
```
  %add = add i32 %x, %always_poison
  %sub = sub i32 %x, %always_poison
  %xor = xor i32 %x, %always_poison
  %mul = mul i32 %always_poison, 1
```

Examples of this rule which do not result in poison:
```
  %or  = or  i32 %always_poison, 2
  %and = and i32 %always_poison, 2
  %mul = mul i32 %always_poison, 0
```

In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
`0`.  In this respect, poison is not different from `undef`.

The following example is only poison if `%cond` is false:
```
  %sel = select i1 %cond, i32 2, %always_poison
```

### Is it safe to have poison as a `call` argument?

A `call` instruction may or may not result in poison depending on exactly
how the callee  uses the supplied arguments, it is not necessarily the case
that `call i32 @someFunction(i32 %always_poison)` results in poison.

LLVM cannot forbid poison from entering `call` arguments without
prohibiting an optimization pass from outlining code.

### Is it safe to store poison to memory?

`store i32 %always_poison, i32* %mem` does not result in undefined
behavior. A subsequent load instruction like `%load = load i32* %mem` will
result in `%load` being a poison value.

### Is it safe to load or store a poison memory location?

No.  Poison works just like `undef` in this respect.

### Does comparing a poison value result in poison?

It depends.  If the comparison couldn't solely be determined by looking at
the other operand, the result is poison.

For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
poison.
However, `icmp i32 ne %always_poison, 7` is poison.

### What if the condition operand in a `select` is poison?

In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
either `true` or `false`.  Because, `%sel` depends on `%always_poison` it
too is poison.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/386f3afe/attachment.html>

Chandler Carruth

2015-Jan-28 03:22 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer <david.majnemer at
gmail.com>
wrote:
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
>
The last time we discussed this, we were considering poison a property of
an individual bit much like undef is a property of an indivdual bit. Are
you no longer proposing those semantics? That is, while %or's second bit is
clearly not poison, is the first bit (or LSB) poison?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/b6acf733/attachment.html>

Sanjoy Das

2015-Jan-28 03:23 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

Hi David,

I spent some time thinking about poison semantics this way, but here
is where I always get stuck:

Consider the IR fragment

  %x = zext i32 %maybe_poison to i64
  %y = lshr i64 %x 32
  %ptr = gep %global, %y
  store 42 to %ptr

If %maybe_poison is poison, then is %y poison?  For all i32 values of
%maybe_poison, %y is i64 0, so in some sense you can determine the
value %y without looking at %x.  Given that, the store in the above
fragment is not undefined behavior even if %maybe_poison is poison.
However, this means if "%maybe_poison" is "add nuw %m, %n"
it cannot
be optimized to "add nuw (zext %m) (zext %n)" since that will change
program behavior in an otherwise well-defined program.

One way out of this is to say that zext and sext of a value that is
dependent on poison is poison.  We'll have to do something similar for
load shearing.

The above "problem" can also be seen in
> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by
> looking at the other operand, the result is poison.
This means we cannot do the C-style optimization "int a = ...; a <
(a + 1)" ==> "true".  In the case "a + 1" overflows,
a is
INT_SIGNED_MAX.  But if a is INT_SIGNED_MAX, "a < X" is always
false,
for all values of X.  So "a < a + 1" is defined; and it is true for
"a
!= INT_SIGNED_MAX" but false for "a == INT_SIGNED_MAX".  Hence
the
expression cannot be folded to true.


I think the reason why poison is hard to formalize is that it
fundamentally tries to give an N bit value behavior that cannot be
justified by _any_ N bit value.  It "breaks" llvm's type system.

-- Sanjoy


On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer
<david.majnemer at gmail.com> wrote:> Hello,
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
> --
> David
>
>
> # LLVM Poison Semantics
>
> Poison is an LLVM concept which exists solely to enable further
optimization
> of LLVM IR. The exact behavior of poison has been, to say the least,
> confusing for users, researchers and engineers working with LLVM.
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
> ## A Quick Introduction to Poison
>
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
>   return a + b > a;
> }
> ```
>
> The C specification permits us to optimize the comparison in `isSumGreater`
> to `b > 0` because signed overflow results in undefined behavior.  A
> reasonable translation of `isSumGreater` to LLVM IR could be:
>
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
>   %add = add i32 %a, %b
>   %cmp = icmp sgt i32 %add, %a
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> ```
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
> "no signed wrap", `nuw` is short for "no unsigned
wrap".
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and
replace
> it with: `icmp sgt i32 %b, 0`.
>
> ## Differences Between LLVM and C/C++
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
> Perhaps chief among them is that evaluating an expression in C++ or C which
> results performs an overflow is undefined behavior. In LLVM, executing an
> instruction which is marked `nsw` but which violates signed overflow
results
> in poison. Values which have no relationship with poisoned values are not
> effected by them.
>
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
>   int result = 0;
>   if (a) {
>     result = a + b;
>   }
>   return result;
> }
> ```
>
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   br i1 %tobool, label %if.then, label %if.end
>
> if.then:
>   %add = add nsw i32 %a, %b
>   br label %if.end
>
> if.end:
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>   ret i32 %result
> }
> ```
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   %add = add nsw i32 %a, %b
>   %result = select i1 %tobool, i32 0, i32 %add
>   ret i32 %result
> }
> ```
>
> In the original code, the calculation of `%add` was control dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
> Examples of this rule which will result in poison:
> ```
>   %add = add i32 %x, %always_poison
>   %sub = sub i32 %x, %always_poison
>   %xor = xor i32 %x, %always_poison
>   %mul = mul i32 %always_poison, 1
> ```
>
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to `0`.
> In this respect, poison is not different from `undef`.
>
> The following example is only poison if `%cond` is false:
> ```
>   %sel = select i1 %cond, i32 2, %always_poison
> ```
>
> ### Is it safe to have poison as a `call` argument?
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
> LLVM cannot forbid poison from entering `call` arguments without
prohibiting
> an optimization pass from outlining code.
>
> ### Is it safe to store poison to memory?
>
> `store i32 %always_poison, i32* %mem` does not result in undefined
behavior.
> A subsequent load instruction like `%load = load i32* %mem` will result in
> `%load` being a poison value.
>
> ### Is it safe to load or store a poison memory location?
>
> No.  Poison works just like `undef` in this respect.
>
> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by looking
at
> the other operand, the result is poison.
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
>
> ### What if the condition operand in a `select` is poison?
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
> either `true` or `false`.  Because, `%sel` depends on `%always_poison` it
> too is poison.

David Majnemer

2015-Jan-28 03:30 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On Tue, Jan 27, 2015 at 7:22 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
>
> On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer <david.majnemer at
gmail.com>
> wrote:
>
>> Examples of this rule which do not result in poison:
>> ```
>>   %or  = or  i32 %always_poison, 2
>>   %and = and i32 %always_poison, 2
>>   %mul = mul i32 %always_poison, 0
>> ```
>>
>> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
>> `0`.  In this respect, poison is not different from `undef`.
>>
>
> The last time we discussed this, we were considering poison a property of
> an individual bit much like undef is a property of an indivdual bit. Are
> you no longer proposing those semantics? That is, while %or's second
bit is
> clearly not poison, is the first bit (or LSB) poison?
>
Sorry this wasn't clear.  My intent was to say that bit 1 was set but that
the other bits were poison.  With this in mind, it is legal to throw the
poison away if you don't want to keep the or instruction if you replace it
with a value which has bit 1 set (these semantics are just like undef).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/0bf95da7/attachment.html>

David Majnemer

2015-Jan-28 03:44 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On Tue, Jan 27, 2015 at 7:23 PM, Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> Hi David,
>
> I spent some time thinking about poison semantics this way, but here
> is where I always get stuck:
>
> Consider the IR fragment
>
>   %x = zext i32 %maybe_poison to i64
>   %y = lshr i64 %x 32
>   %ptr = gep %global, %y
>   store 42 to %ptr
>
> If %maybe_poison is poison, then is %y poison?  For all i32 values of
> %maybe_poison, %y is i64 0, so in some sense you can determine the
> value %y without looking at %x.

I agree, this makes sense.

> Given that, the store in the above
> fragment is not undefined behavior even if %maybe_poison is poison.
> However, this means if "%maybe_poison" is "add nuw %m,
%n" it cannot
> be optimized to "add nuw (zext %m) (zext %n)" since that will
change
> program behavior in an otherwise well-defined program.
>
Hmm, I'm not so sure this is right.

Are we talking about transforming:
%add = add nuw i32 %x, %y
%res = zext i32 %add to i64

to:
%z1 = zext i32 %x to i64
%z2 = zext i32 %y to i64
%res = add nuw i64 %z1, %z2

?

This is OK because performing a zext by that many bits cannot result in a
NUW violation.

>
> One way out of this is to say that zext and sext of a value that is
> dependent on poison is poison.  We'll have to do something similar for
> load shearing.
>
sext must be dependent on the underlying value because it splats the sign
bit.

>
> The above "problem" can also be seen in
>
> > ### Does comparing a poison value result in poison?
> >
> > It depends.  If the comparison couldn't solely be determined by
> > looking at the other operand, the result is poison.
>
> This means we cannot do the C-style optimization "int a = ...; a <
> (a + 1)" ==> "true".  In the case "a + 1"
overflows, a is
> INT_SIGNED_MAX.  But if a is INT_SIGNED_MAX, "a < X" is always
false,
> for all values of X.  So "a < a + 1" is defined; and it is
true for "a
> != INT_SIGNED_MAX" but false for "a == INT_SIGNED_MAX". 
Hence the
> expression cannot be folded to true.
>
This part sounds tricky, I'm not immediately sure what to do.

>
>
> I think the reason why poison is hard to formalize is that it
> fundamentally tries to give an N bit value behavior that cannot be
> justified by _any_ N bit value.  It "breaks" llvm's type
system.
>
> -- Sanjoy
>
>
> On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer
> <david.majnemer at gmail.com> wrote:
> > Hello,
> >
> > What follows is my attempt to describe how poison works.  Let me know
> what
> > you think.
> >
> > --
> > David
> >
> >
> > # LLVM Poison Semantics
> >
> > Poison is an LLVM concept which exists solely to enable further
> optimization
> > of LLVM IR. The exact behavior of poison has been, to say the least,
> > confusing for users, researchers and engineers working with LLVM.
> >
> > This document hopes to clear up some of the confusion of poison and
> > hopefully explain *why* it has its semantics.
> >
> > ## A Quick Introduction to Poison
> >
> > Let's start with a concrete motivating example in C:
> > ```
> > int isSumGreater(int a, int b) {
> >   return a + b > a;
> > }
> > ```
> >
> > The C specification permits us to optimize the comparison in
> `isSumGreater`
> > to `b > 0` because signed overflow results in undefined behavior. 
A
> > reasonable translation of `isSumGreater` to LLVM IR could be:
> >
> > ```
> > define i32 @isSumGreater(i32 %a, i32 %b) {
> > entry:
> >   %add = add i32 %a, %b
> >   %cmp = icmp sgt i32 %add, %a
> >   %conv = zext i1 %cmp to i32
> >   ret i32 %conv
> > }
> > ```
> >
> > However, LLVM cannot determine that `%cmp` should not consider cases
> where
> > `%add` resulted in signed overflow.  We need a way to communicate this
> > information to LLVM.
> >
> > This is where the `nsw` and `nuw` flags come into play.  `nsw` is
short
> for
> > "no signed wrap", `nuw` is short for "no unsigned
wrap".
> >
> > With these, we can come up with a new formulation of `%add`: `add i32
nsw
> > %a, %b`.
> > LLVM can take this into account when it is optimizing the `%cmp` and
> replace
> > it with: `icmp sgt i32 %b, 0`.
> >
> > ## Differences Between LLVM and C/C++
> >
> > There are some interesting differences between what C++ and C specify
and
> > how LLVM behaves with respect to performing an operationg which is not
> > permitted to overflow.
> >
> > Perhaps chief among them is that evaluating an expression in C++ or C
> which
> > results performs an overflow is undefined behavior. In LLVM, executing
an
> > instruction which is marked `nsw` but which violates signed overflow
> results
> > in poison. Values which have no relationship with poisoned values are
not
> > effected by them.
> >
> > Let us take the following C program into consideration:
> > ```
> > int calculateImportantResult(int a, int b) {
> >   int result = 0;
> >   if (a) {
> >     result = a + b;
> >   }
> >   return result;
> > }
> > ```
> >
> > A straightforward lowering to LLVM IR could be:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   br i1 %tobool, label %if.then, label %if.end
> >
> > if.then:
> >   %add = add nsw i32 %a, %b
> >   br label %if.end
> >
> > if.end:
> >   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
> >   ret i32 %result
> > }
> > ```
> >
> > Moving `%add` to the `%entry` block would be preferable and would
allow
> > further optimizations:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   %add = add nsw i32 %a, %b
> >   %result = select i1 %tobool, i32 0, i32 %add
> >   ret i32 %result
> > }
> > ```
> >
> > In the original code, the calculation of `%add` was control dependent.
> > Now, `%add` might result in signed overflow in violation of the `nsw`
> flag.
> > Despite this, the program should behave as it did before because the
> > poisoned value is masked-out by the select. The next section will dive
> into
> > this in greater detail.
> >
> > # Computation Involving Poison Values
> > Poison in a computation results in poison if the result cannot be
> > constrained by its non-poison operands.
> >
> > Examples of this rule which will result in poison:
> > ```
> >   %add = add i32 %x, %always_poison
> >   %sub = sub i32 %x, %always_poison
> >   %xor = xor i32 %x, %always_poison
> >   %mul = mul i32 %always_poison, 1
> > ```
> >
> > Examples of this rule which do not result in poison:
> > ```
> >   %or  = or  i32 %always_poison, 2
> >   %and = and i32 %always_poison, 2
> >   %mul = mul i32 %always_poison, 0
> > ```
> >
> > In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.
> > In this respect, poison is not different from `undef`.
> >
> > The following example is only poison if `%cond` is false:
> > ```
> >   %sel = select i1 %cond, i32 2, %always_poison
> > ```
> >
> > ### Is it safe to have poison as a `call` argument?
> >
> > A `call` instruction may or may not result in poison depending on
exactly
> > how the callee  uses the supplied arguments, it is not necessarily the
> case
> > that `call i32 @someFunction(i32 %always_poison)` results in poison.
> >
> > LLVM cannot forbid poison from entering `call` arguments without
> prohibiting
> > an optimization pass from outlining code.
> >
> > ### Is it safe to store poison to memory?
> >
> > `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior.
> > A subsequent load instruction like `%load = load i32* %mem` will
result
> in
> > `%load` being a poison value.
> >
> > ### Is it safe to load or store a poison memory location?
> >
> > No.  Poison works just like `undef` in this respect.
> >
> > ### Does comparing a poison value result in poison?
> >
> > It depends.  If the comparison couldn't solely be determined by
looking
> at
> > the other operand, the result is poison.
> >
> > For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> > poison.
> > However, `icmp i32 ne %always_poison, 7` is poison.
> >
> > ### What if the condition operand in a `select` is poison?
> >
> > In the example `%sel = select i1 %always_poison, i1 true, false`,
`%sel`
> is
> > either `true` or `false`.  Because, `%sel` depends on `%always_poison`
it
> > too is poison.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/106edad0/attachment.html>

Sanjoy Das

2015-Jan-28 03:45 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

> However, this means if "%maybe_poison" is "add nuw %m,
%n" it cannot
> be optimized to "add nuw (zext %m) (zext %n)" since that will
change
I should have been clearer here: "it" == "%x" -- IOW you
cannot the
usual zext of sum == sum of zexts transform.

> This means we cannot do the C-style optimization "int a = ...; a <
> (a + 1)" ==> "true".  In the case "a + 1"
overflows, a is
> INT_SIGNED_MAX.  But if a is INT_SIGNED_MAX, "a < X" is always
false,
And also a slight clarification here -- "a + 1" sign-overflows if and
only if a is INT_SIGNED_MAX.  < is signed less than.  And if a is not
poison, "a < a + 1" is not poison because precisely in the case
where
"a + 1" is poison, we can determine the result of the compare without
looking at "a + 1".
> for all values of X.  So "a < a + 1" is defined; and it is
true for "a
> != INT_SIGNED_MAX" but false for "a == INT_SIGNED_MAX". 
Hence the
> expression cannot be folded to true.
>
>
> I think the reason why poison is hard to formalize is that it
> fundamentally tries to give an N bit value behavior that cannot be
> justified by _any_ N bit value.  It "breaks" llvm's type
system.
>
> -- Sanjoy
>
>
> On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer
> <david.majnemer at gmail.com> wrote:
>> Hello,
>>
>> What follows is my attempt to describe how poison works.  Let me know
what
>> you think.
>>
>> --
>> David
>>
>>
>> # LLVM Poison Semantics
>>
>> Poison is an LLVM concept which exists solely to enable further
optimization
>> of LLVM IR. The exact behavior of poison has been, to say the least,
>> confusing for users, researchers and engineers working with LLVM.
>>
>> This document hopes to clear up some of the confusion of poison and
>> hopefully explain *why* it has its semantics.
>>
>> ## A Quick Introduction to Poison
>>
>> Let's start with a concrete motivating example in C:
>> ```
>> int isSumGreater(int a, int b) {
>>   return a + b > a;
>> }
>> ```
>>
>> The C specification permits us to optimize the comparison in
`isSumGreater`
>> to `b > 0` because signed overflow results in undefined behavior.  A
>> reasonable translation of `isSumGreater` to LLVM IR could be:
>>
>> ```
>> define i32 @isSumGreater(i32 %a, i32 %b) {
>> entry:
>>   %add = add i32 %a, %b
>>   %cmp = icmp sgt i32 %add, %a
>>   %conv = zext i1 %cmp to i32
>>   ret i32 %conv
>> }
>> ```
>>
>> However, LLVM cannot determine that `%cmp` should not consider cases
where
>> `%add` resulted in signed overflow.  We need a way to communicate this
>> information to LLVM.
>>
>> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
for
>> "no signed wrap", `nuw` is short for "no unsigned
wrap".
>>
>> With these, we can come up with a new formulation of `%add`: `add i32
nsw
>> %a, %b`.
>> LLVM can take this into account when it is optimizing the `%cmp` and
replace
>> it with: `icmp sgt i32 %b, 0`.
>>
>> ## Differences Between LLVM and C/C++
>>
>> There are some interesting differences between what C++ and C specify
and
>> how LLVM behaves with respect to performing an operationg which is not
>> permitted to overflow.
>>
>> Perhaps chief among them is that evaluating an expression in C++ or C
which
>> results performs an overflow is undefined behavior. In LLVM, executing
an
>> instruction which is marked `nsw` but which violates signed overflow
results
>> in poison. Values which have no relationship with poisoned values are
not
>> effected by them.
>>
>> Let us take the following C program into consideration:
>> ```
>> int calculateImportantResult(int a, int b) {
>>   int result = 0;
>>   if (a) {
>>     result = a + b;
>>   }
>>   return result;
>> }
>> ```
>>
>> A straightforward lowering to LLVM IR could be:
>> ```
>> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>> entry:
>>   %tobool = icmp ne i32 %a, 0
>>   br i1 %tobool, label %if.then, label %if.end
>>
>> if.then:
>>   %add = add nsw i32 %a, %b
>>   br label %if.end
>>
>> if.end:
>>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>>   ret i32 %result
>> }
>> ```
>>
>> Moving `%add` to the `%entry` block would be preferable and would allow
>> further optimizations:
>> ```
>> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>> entry:
>>   %tobool = icmp ne i32 %a, 0
>>   %add = add nsw i32 %a, %b
>>   %result = select i1 %tobool, i32 0, i32 %add
>>   ret i32 %result
>> }
>> ```
>>
>> In the original code, the calculation of `%add` was control dependent.
>> Now, `%add` might result in signed overflow in violation of the `nsw`
flag.
>> Despite this, the program should behave as it did before because the
>> poisoned value is masked-out by the select. The next section will dive
into
>> this in greater detail.
>>
>> # Computation Involving Poison Values
>> Poison in a computation results in poison if the result cannot be
>> constrained by its non-poison operands.
>>
>> Examples of this rule which will result in poison:
>> ```
>>   %add = add i32 %x, %always_poison
>>   %sub = sub i32 %x, %always_poison
>>   %xor = xor i32 %x, %always_poison
>>   %mul = mul i32 %always_poison, 1
>> ```
>>
>> Examples of this rule which do not result in poison:
>> ```
>>   %or  = or  i32 %always_poison, 2
>>   %and = and i32 %always_poison, 2
>>   %mul = mul i32 %always_poison, 0
>> ```
>>
>> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
`0`.
>> In this respect, poison is not different from `undef`.
>>
>> The following example is only poison if `%cond` is false:
>> ```
>>   %sel = select i1 %cond, i32 2, %always_poison
>> ```
>>
>> ### Is it safe to have poison as a `call` argument?
>>
>> A `call` instruction may or may not result in poison depending on
exactly
>> how the callee  uses the supplied arguments, it is not necessarily the
case
>> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>>
>> LLVM cannot forbid poison from entering `call` arguments without
prohibiting
>> an optimization pass from outlining code.
>>
>> ### Is it safe to store poison to memory?
>>
>> `store i32 %always_poison, i32* %mem` does not result in undefined
behavior.
>> A subsequent load instruction like `%load = load i32* %mem` will result
in
>> `%load` being a poison value.
>>
>> ### Is it safe to load or store a poison memory location?
>>
>> No.  Poison works just like `undef` in this respect.
>>
>> ### Does comparing a poison value result in poison?
>>
>> It depends.  If the comparison couldn't solely be determined by
looking at
>> the other operand, the result is poison.
>>
>> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
>> poison.
>> However, `icmp i32 ne %always_poison, 7` is poison.
>>
>> ### What if the condition operand in a `select` is poison?
>>
>> In the example `%sel = select i1 %always_poison, i1 true, false`,
`%sel` is
>> either `true` or `false`.  Because, `%sel` depends on `%always_poison`
it
>> too is poison.

Mehdi Amini

2015-Jan-28 06:18 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

> On Jan 27, 2015, at 6:50 PM, David Majnemer <david.majnemer at
gmail.com> wrote:
> 
> Hello,
> 
> What follows is my attempt to describe how poison works.  Let me know what
you think.
> 
> -- 
> David
> 
> 
> # LLVM Poison Semantics
> 
> Poison is an LLVM concept which exists solely to enable further
optimization of LLVM IR. The exact behavior of poison has been, to say the
least, confusing for users, researchers and engineers working with LLVM.
> 
> This document hopes to clear up some of the confusion of poison and
hopefully explain *why* it has its semantics.
> 
> ## A Quick Introduction to Poison
> 
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
>   return a + b > a;
> }
> ```
> 
> The C specification permits us to optimize the comparison in `isSumGreater`
to `b > 0` because signed overflow results in undefined behavior.  A
reasonable translation of `isSumGreater` to LLVM IR could be:
Typo: 
   to `b > 0`
should be 
  to `a + b > 0`
IIUC.

I mention in case you intend to publish this in the doc.


— 
Mehdi



> 
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
>   %add = add i32 %a, %b
>   %cmp = icmp sgt i32 %add, %a
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> ```
> 
> However, LLVM cannot determine that `%cmp` should not consider cases where
`%add` resulted in signed overflow.  We need a way to communicate this
information to LLVM.
> 
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
"no signed wrap", `nuw` is short for "no unsigned wrap".
> 
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
%a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and
replace it with: `icmp sgt i32 %b, 0`.
> 
> ## Differences Between LLVM and C/C++
> 
> There are some interesting differences between what C++ and C specify and
how LLVM behaves with respect to performing an operationg which is not permitted
to overflow.
> 
> Perhaps chief among them is that evaluating an expression in C++ or C which
results performs an overflow is undefined behavior. In LLVM, executing an
instruction which is marked `nsw` but which violates signed overflow results in
poison. Values which have no relationship with poisoned values are not effected
by them.
> 
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
>   int result = 0;
>   if (a) {
>     result = a + b;
>   }
>   return result;
> }
> ```
> 
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   br i1 %tobool, label %if.then, label %if.end
> 
> if.then:
>   %add = add nsw i32 %a, %b
>   br label %if.end
> 
> if.end:
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>   ret i32 %result
> }
> ```
> 
> Moving `%add` to the `%entry` block would be preferable and would allow
further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   %add = add nsw i32 %a, %b
>   %result = select i1 %tobool, i32 0, i32 %add
>   ret i32 %result
> }
> ```
> 
> In the original code, the calculation of `%add` was control dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
> Despite this, the program should behave as it did before because the
poisoned value is masked-out by the select. The next section will dive into this
in greater detail.
> 
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be
constrained by its non-poison operands.
> 
> Examples of this rule which will result in poison:
> ```
>   %add = add i32 %x, %always_poison
>   %sub = sub i32 %x, %always_poison
>   %xor = xor i32 %x, %always_poison
>   %mul = mul i32 %always_poison, 1
> ```
> 
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
> 
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to `0`.
In this respect, poison is not different from `undef`.
> 
> The following example is only poison if `%cond` is false:
> ```
>   %sel = select i1 %cond, i32 2, %always_poison
> ```
> 
> ### Is it safe to have poison as a `call` argument?
> 
> A `call` instruction may or may not result in poison depending on exactly
how the callee  uses the supplied arguments, it is not necessarily the case that
`call i32 @someFunction(i32 %always_poison)` results in poison.
> 
> LLVM cannot forbid poison from entering `call` arguments without
prohibiting an optimization pass from outlining code.
> 
> ### Is it safe to store poison to memory?
> 
> `store i32 %always_poison, i32* %mem` does not result in undefined
behavior. A subsequent load instruction like `%load = load i32* %mem` will
result in `%load` being a poison value.
> 
> ### Is it safe to load or store a poison memory location?
> 
> No.  Poison works just like `undef` in this respect.
> 
> ### Does comparing a poison value result in poison?
> 
> It depends.  If the comparison couldn't solely be determined by looking
at the other operand, the result is poison.
> 
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
> 
> ### What if the condition operand in a `select` is poison?
> 
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
either `true` or `false`.  Because, `%sel` depends on `%always_poison` it too is
poison.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

David Majnemer

2015-Jan-28 06:29 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On Tue, Jan 27, 2015 at 10:18 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> > On Jan 27, 2015, at 6:50 PM, David Majnemer <david.majnemer at
gmail.com>
> wrote:
> >
> > Hello,
> >
> > What follows is my attempt to describe how poison works.  Let me know
> what you think.
> >
> > --
> > David
> >
> >
> > # LLVM Poison Semantics
> >
> > Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to say the
> least, confusing for users, researchers and engineers working with LLVM.
> >
> > This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
> >
> > ## A Quick Introduction to Poison
> >
> > Let's start with a concrete motivating example in C:
> > ```
> > int isSumGreater(int a, int b) {
> >   return a + b > a;
> > }
> > ```
> >
> > The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
> Typo:
>    to `b > 0`
> should be
>   to `a + b > 0`
> IIUC.
>
> I mention in case you intend to publish this in the doc.
>
I intended what I wrote.  We perform this optimization in InstCombine
today:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp?revision=226783&view=markup#l3193

>
>
> —
> Mehdi
>
>
>
>
> >
> > ```
> > define i32 @isSumGreater(i32 %a, i32 %b) {
> > entry:
> >   %add = add i32 %a, %b
> >   %cmp = icmp sgt i32 %add, %a
> >   %conv = zext i1 %cmp to i32
> >   ret i32 %conv
> > }
> > ```
> >
> > However, LLVM cannot determine that `%cmp` should not consider cases
> where `%add` resulted in signed overflow.  We need a way to communicate
> this information to LLVM.
> >
> > This is where the `nsw` and `nuw` flags come into play.  `nsw` is
short
> for "no signed wrap", `nuw` is short for "no unsigned
wrap".
> >
> > With these, we can come up with a new formulation of `%add`: `add i32
> nsw %a, %b`.
> > LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
> >
> > ## Differences Between LLVM and C/C++
> >
> > There are some interesting differences between what C++ and C specify
> and how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
> >
> > Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates signed
> overflow results in poison. Values which have no relationship with poisoned
> values are not effected by them.
> >
> > Let us take the following C program into consideration:
> > ```
> > int calculateImportantResult(int a, int b) {
> >   int result = 0;
> >   if (a) {
> >     result = a + b;
> >   }
> >   return result;
> > }
> > ```
> >
> > A straightforward lowering to LLVM IR could be:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   br i1 %tobool, label %if.then, label %if.end
> >
> > if.then:
> >   %add = add nsw i32 %a, %b
> >   br label %if.end
> >
> > if.end:
> >   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
> >   ret i32 %result
> > }
> > ```
> >
> > Moving `%add` to the `%entry` block would be preferable and would
allow
> further optimizations:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   %add = add nsw i32 %a, %b
> >   %result = select i1 %tobool, i32 0, i32 %add
> >   ret i32 %result
> > }
> > ```
> >
> > In the original code, the calculation of `%add` was control dependent.
> > Now, `%add` might result in signed overflow in violation of the `nsw`
> flag.
> > Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
> >
> > # Computation Involving Poison Values
> > Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
> >
> > Examples of this rule which will result in poison:
> > ```
> >   %add = add i32 %x, %always_poison
> >   %sub = sub i32 %x, %always_poison
> >   %xor = xor i32 %x, %always_poison
> >   %mul = mul i32 %always_poison, 1
> > ```
> >
> > Examples of this rule which do not result in poison:
> > ```
> >   %or  = or  i32 %always_poison, 2
> >   %and = and i32 %always_poison, 2
> >   %mul = mul i32 %always_poison, 0
> > ```
> >
> > In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
> >
> > The following example is only poison if `%cond` is false:
> > ```
> >   %sel = select i1 %cond, i32 2, %always_poison
> > ```
> >
> > ### Is it safe to have poison as a `call` argument?
> >
> > A `call` instruction may or may not result in poison depending on
> exactly how the callee  uses the supplied arguments, it is not necessarily
> the case that `call i32 @someFunction(i32 %always_poison)` results in
> poison.
> >
> > LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
> >
> > ### Is it safe to store poison to memory?
> >
> > `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32* %mem` will
> result in `%load` being a poison value.
> >
> > ### Is it safe to load or store a poison memory location?
> >
> > No.  Poison works just like `undef` in this respect.
> >
> > ### Does comparing a poison value result in poison?
> >
> > It depends.  If the comparison couldn't solely be determined by
looking
> at the other operand, the result is poison.
> >
> > For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
> > However, `icmp i32 ne %always_poison, 7` is poison.
> >
> > ### What if the condition operand in a `select` is poison?
> >
> > In the example `%sel = select i1 %always_poison, i1 true, false`,
`%sel`
> is either `true` or `false`.  Because, `%sel` depends on `%always_poison`
> it too is poison.
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/e107065a/attachment.html>

Bruce Hoult

2015-Jan-28 10:05 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On Wed, Jan 28, 2015 at 7:18 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> > On Jan 27, 2015, at 6:50 PM, David Majnemer <david.majnemer at
gmail.com>
> wrote:
> > Let's start with a concrete motivating example in C:
> > ```
> > int isSumGreater(int a, int b) {
> >   return a + b > a;
> > }
> > ```
> >
> > The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
> Typo:
>    to `b > 0`
> should be
>   to `a + b > 0`
> IIUC.
>
Let a = -10 and b = 1

a + b > a  ===  -10 + 1 > -10  ===  -9 > -10  === true

b > 0 === 1 > 0 === true

All good.

Your version:

a + b > 0  === -10 + 1 > 0 === -9 > 0 === false

Small problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/1b55302a/attachment.html>

Sean Silva

2015-Jan-28 15:02 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

Could you maybe provide an example where replacing `%always_poison` with
`undef` will change the meaning? At least for me, the thing that I'm most
unclear about is how poison differs from undef.

-- Sean Silva

On Wed, Jan 28, 2015 at 2:50 AM, David Majnemer <david.majnemer at
gmail.com>
wrote:
> Hello,
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
> --
> David
>
>
> # LLVM Poison Semantics
>
> Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to say the
> least, confusing for users, researchers and engineers working with LLVM.
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
> ## A Quick Introduction to Poison
>
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
>   return a + b > a;
> }
> ```
>
> The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
>   %add = add i32 %a, %b
>   %cmp = icmp sgt i32 %add, %a
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> ```
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
> for "no signed wrap", `nuw` is short for "no unsigned
wrap".
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
>
> ## Differences Between LLVM and C/C++
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
> Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates signed
> overflow results in poison. Values which have no relationship with poisoned
> values are not effected by them.
>
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
>   int result = 0;
>   if (a) {
>     result = a + b;
>   }
>   return result;
> }
> ```
>
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   br i1 %tobool, label %if.then, label %if.end
>
> if.then:
>   %add = add nsw i32 %a, %b
>   br label %if.end
>
> if.end:
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>   ret i32 %result
> }
> ```
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   %add = add nsw i32 %a, %b
>   %result = select i1 %tobool, i32 0, i32 %add
>   ret i32 %result
> }
> ```
>
> In the original code, the calculation of `%add` was control dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
> Examples of this rule which will result in poison:
> ```
>   %add = add i32 %x, %always_poison
>   %sub = sub i32 %x, %always_poison
>   %xor = xor i32 %x, %always_poison
>   %mul = mul i32 %always_poison, 1
> ```
>
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
>
> The following example is only poison if `%cond` is false:
> ```
>   %sel = select i1 %cond, i32 2, %always_poison
> ```
>
> ### Is it safe to have poison as a `call` argument?
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
> LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
>
> ### Is it safe to store poison to memory?
>
> `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32* %mem` will
> result in `%load` being a poison value.
>
> ### Is it safe to load or store a poison memory location?
>
> No.  Poison works just like `undef` in this respect.
>
> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by looking
at
> the other operand, the result is poison.
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
>
> ### What if the condition operand in a `select` is poison?
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel`
> is either `true` or `false`.  Because, `%sel` depends on `%always_poison`
> it too is poison.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/b5fe5853/attachment.html>

Philip Reames

2015-Jan-29 04:53 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

On 01/28/2015 07:02 AM, Sean Silva wrote:> Could you maybe provide an example where replacing `%always_poison` 
> with `undef` will change the meaning? At least for me, the thing that 
> I'm most unclear about is how poison differs from undef.
I will second this request for much the same reason.>
> -- Sean Silva
>
> On Wed, Jan 28, 2015 at 2:50 AM, David Majnemer 
> <david.majnemer at gmail.com <mailto:david.majnemer at
gmail.com>> wrote:
>
>     Hello,
>
>     What follows is my attempt to describe how poison works.  Let me
>     know what you think.
>
>     -- 
>     David
>
>
>     # LLVM Poison Semantics
>
>     Poison is an LLVM concept which exists solely to enable further
>     optimization of LLVM IR. The exact behavior of poison has been, to
>     say the least, confusing for users, researchers and engineers
>     working with LLVM.
>
>     This document hopes to clear up some of the confusion of poison
>     and hopefully explain *why* it has its semantics.
>
>     ## A Quick Introduction to Poison
>
>     Let's start with a concrete motivating example in C:
>     ```
>     int isSumGreater(int a, int b) {
>       return a + b > a;
>     }
>     ```
>
>     The C specification permits us to optimize the comparison in
>     `isSumGreater` to `b > 0` because signed overflow results in
>     undefined behavior.  A reasonable translation of `isSumGreater` to
>     LLVM IR could be:
>
>     ```
>     define i32 @isSumGreater(i32 %a, i32 %b) {
>     entry:
>       %add = add i32 %a, %b
>       %cmp = icmp sgt i32 %add, %a
>       %conv = zext i1 %cmp to i32
>       ret i32 %conv
>     }
>     ```
>
>     However, LLVM cannot determine that `%cmp` should not consider
>     cases where `%add` resulted in signed overflow.  We need a way to
>     communicate this information to LLVM.
>
>     This is where the `nsw` and `nuw` flags come into play.  `nsw` is
>     short for "no signed wrap", `nuw` is short for "no
unsigned wrap".
>
>     With these, we can come up with a new formulation of `%add`: `add
>     i32 nsw %a, %b`.
>     LLVM can take this into account when it is optimizing the `%cmp`
>     and replace it with: `icmp sgt i32 %b, 0`.
>
>     ## Differences Between LLVM and C/C++
>
>     There are some interesting differences between what C++ and C
>     specify and how LLVM behaves with respect to performing an
>     operationg which is not permitted to overflow.
>
>     Perhaps chief among them is that evaluating an expression in C++
>     or C which results performs an overflow is undefined behavior. In
>     LLVM, executing an instruction which is marked `nsw` but which
>     violates signed overflow results in poison. Values which have no
>     relationship with poisoned values are not effected by them.
>
>     Let us take the following C program into consideration:
>     ```
>     int calculateImportantResult(int a, int b) {
>       int result = 0;
>       if (a) {
>         result = a + b;
>       }
>       return result;
>     }
>     ```
>
>     A straightforward lowering to LLVM IR could be:
>     ```
>     define i32 @calculateImportantResult(i32 %a, i32 %b) {
>     entry:
>       %tobool = icmp ne i32 %a, 0
>       br i1 %tobool, label %if.then, label %if.end
>
>     if.then:
>       %add = add nsw i32 %a, %b
>       br label %if.end
>
>     if.end:
>       %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>       ret i32 %result
>     }
>     ```
>
>     Moving `%add` to the `%entry` block would be preferable and would
>     allow further optimizations:
>     ```
>     define i32 @calculateImportantResult(i32 %a, i32 %b) {
>     entry:
>       %tobool = icmp ne i32 %a, 0
>       %add = add nsw i32 %a, %b
>       %result = select i1 %tobool, i32 0, i32 %add
>       ret i32 %result
>     }
>     ```
>
>     In the original code, the calculation of `%add` was control dependent.
>     Now, `%add` might result in signed overflow in violation of the
>     `nsw` flag.
>     Despite this, the program should behave as it did before because
>     the poisoned value is masked-out by the select. The next section
>     will dive into this in greater detail.
>
>     # Computation Involving Poison Values
>     Poison in a computation results in poison if the result cannot be
>     constrained by its non-poison operands.
>
>     Examples of this rule which will result in poison:
>     ```
>       %add = add i32 %x, %always_poison
>       %sub = sub i32 %x, %always_poison
>       %xor = xor i32 %x, %always_poison
>       %mul = mul i32 %always_poison, 1
>     ```
>
>     Examples of this rule which do not result in poison:
>     ```
>       %or  = or  i32 %always_poison, 2
>       %and = and i32 %always_poison, 2
>       %mul = mul i32 %always_poison, 0
>     ```
>
>     In fact, it would be reasonable to optimize `%or` to `2` and
>     `%and` to `0`.  In this respect, poison is not different from `undef`.
>
>     The following example is only poison if `%cond` is false:
>     ```
>       %sel = select i1 %cond, i32 2, %always_poison
>     ```
>
>     ### Is it safe to have poison as a `call` argument?
>
>     A `call` instruction may or may not result in poison depending on
>     exactly how the callee  uses the supplied arguments, it is not
>     necessarily the case that `call i32 @someFunction(i32
>     %always_poison)` results in poison.
>
>     LLVM cannot forbid poison from entering `call` arguments without
>     prohibiting an optimization pass from outlining code.
>
>     ### Is it safe to store poison to memory?
>
>     `store i32 %always_poison, i32* %mem` does not result in undefined
>     behavior. A subsequent load instruction like `%load = load i32*
>     %mem` will result in `%load` being a poison value.
>
>     ### Is it safe to load or store a poison memory location?
>
>     No.  Poison works just like `undef` in this respect.
>
>     ### Does comparing a poison value result in poison?
>
>     It depends.  If the comparison couldn't solely be determined by
>     looking at the other operand, the result is poison.
>
>     For example, `icmp i32 ule %always_poison, 4294967295` is `true`,
>     not poison.
>     However, `icmp i32 ne %always_poison, 7` is poison.
>
>     ### What if the condition operand in a `select` is poison?
>
>     In the example `%sel = select i1 %always_poison, i1 true, false`,
>     `%sel` is either `true` or `false`. Because, `%sel` depends on
>     `%always_poison` it too is poison.
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/9d1a9775/attachment.html>

Nuno Lopes

2015-Feb-03 11:15 UTC

head link

[LLVMdev] Proposal for Poison Semantics

Hi,

 

Thanks David for putting up this proposal together!

I like the idea of having poison values behave more like undef (i.e., per bit,
with run-time behavior).

One of the problems this proposal solves is speculation of 'a &&
b' into 'a & b'. Currently this is illegal (despite sometimes
simplifycfg doing it anyway).

It also fixes bugs like http://llvm.org/PR20997

 

The proposal doesn't say anything about branching on a poison value. I
assume this should stay as the current interpretation -- that such branches
should be undefined behavior (since we cannot branch to multiple places at the
same time -- even if they would compute the same values; that's already too
hard for the compiler to analyze).

 

There's another caveat: it *does* seem to fix the problem described by Dan
in http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html

However, it introduces a potential performance penalty: we won't be able to
speculate instructions with undefined behavior whose input may be poison.

 

For example, take the following code:

loop:

  %add = add nsw %x, %y

  %div = udiv %z, %add

  … use %div only in the case that %add does not overflow and is non-zero

 

We can move the %add outside of the loop, but we cannot move the division. With
the reason being that if %add overflows, then %add is poison and therefore it
can take any value (in particular, it can be 0), triggering undef behavior in
%div.  Therefore, we cannot freely move %div, unless we can prove that %add will
never be 0 nor poison.  This sounds hard for the compiler to do, and I guess
we'll have some regressions (e.g., LICM has to be more conservative).
Nevertheless, I'm all for fixing poison once and for all!

 

BTW, would it help if I produced a version of Alive that implements the
semantics being proposed?  (with no performance guarantees for this prototype). 
The cool thing is that then we can run it through our database of 300+
InstCombine optimizations and see which ones would have to be removed/fixed.

 

Nuno

 

 

From: David Majnemer
Sent: 28 January 2015 02:50
To: llvmdev at cs.uiuc.edu
Cc: Sanjoy Das; Dan Gohman; John Regehr; Nuno Lopes
Subject: RFC: Proposal for Poison Semantics

 

Hello,

 

What follows is my attempt to describe how poison works.  Let me know what you
think.

 

-- 

David

 

 

# LLVM Poison Semantics

 

Poison is an LLVM concept which exists solely to enable further optimization of
LLVM IR. The exact behavior of poison has been, to say the least, confusing for
users, researchers and engineers working with LLVM.

 

This document hopes to clear up some of the confusion of poison and hopefully
explain *why* it has its semantics.

 

## A Quick Introduction to Poison

 

Let's start with a concrete motivating example in C:

```

int isSumGreater(int a, int b) {

  return a + b > a;

}

```

 

The C specification permits us to optimize the comparison in `isSumGreater` to
`b > 0` because signed overflow results in undefined behavior.  A reasonable
translation of `isSumGreater` to LLVM IR could be:

 

```

define i32 @isSumGreater(i32 %a, i32 %b) {

entry:

  %add = add i32 %a, %b

  %cmp = icmp sgt i32 %add, %a

  %conv = zext i1 %cmp to i32

  ret i32 %conv

}

```

 

However, LLVM cannot determine that `%cmp` should not consider cases where
`%add` resulted in signed overflow.  We need a way to communicate this
information to LLVM.

 

This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
"no signed wrap", `nuw` is short for "no unsigned wrap".

 

With these, we can come up with a new formulation of `%add`: `add i32 nsw %a,
%b`.

LLVM can take this into account when it is optimizing the `%cmp` and replace it
with: `icmp sgt i32 %b, 0`.

 

## Differences Between LLVM and C/C++

 

There are some interesting differences between what C++ and C specify and how
LLVM behaves with respect to performing an operationg which is not permitted to
overflow.

 

Perhaps chief among them is that evaluating an expression in C++ or C which
results performs an overflow is undefined behavior. In LLVM, executing an
instruction which is marked `nsw` but which violates signed overflow results in
poison. Values which have no relationship with poisoned values are not effected
by them.

 

Let us take the following C program into consideration:

```

int calculateImportantResult(int a, int b) {

  int result = 0;

  if (a) {

    result = a + b;

  }

  return result;

}

```

 

A straightforward lowering to LLVM IR could be:

```

define i32 @calculateImportantResult(i32 %a, i32 %b) {

entry:

  %tobool = icmp ne i32 %a, 0

  br i1 %tobool, label %if.then, label %if.end

 

if.then:

  %add = add nsw i32 %a, %b

  br label %if.end

 

if.end:

  %result = phi i32 [ %add, %if.then ], [ 0, %entry ]

  ret i32 %result

}

```

 

Moving `%add` to the `%entry` block would be preferable and would allow further
optimizations:

```

define i32 @calculateImportantResult(i32 %a, i32 %b) {

entry:

  %tobool = icmp ne i32 %a, 0

  %add = add nsw i32 %a, %b

  %result = select i1 %tobool, i32 0, i32 %add

  ret i32 %result

}

```

 

In the original code, the calculation of `%add` was control dependent.

Now, `%add` might result in signed overflow in violation of the `nsw` flag.

Despite this, the program should behave as it did before because the poisoned
value is masked-out by the select. The next section will dive into this in
greater detail.

 

# Computation Involving Poison Values

Poison in a computation results in poison if the result cannot be constrained by
its non-poison operands.

 

Examples of this rule which will result in poison:

```

  %add = add i32 %x, %always_poison

  %sub = sub i32 %x, %always_poison

  %xor = xor i32 %x, %always_poison

  %mul = mul i32 %always_poison, 1

```

 

Examples of this rule which do not result in poison:

```

  %or  = or  i32 %always_poison, 2

  %and = and i32 %always_poison, 2

  %mul = mul i32 %always_poison, 0

```

 

In fact, it would be reasonable to optimize `%or` to `2` and `%and` to `0`.  In
this respect, poison is not different from `undef`.

 

The following example is only poison if `%cond` is false:

```

  %sel = select i1 %cond, i32 2, %always_poison

```

 

### Is it safe to have poison as a `call` argument?

 

A `call` instruction may or may not result in poison depending on exactly how
the callee  uses the supplied arguments, it is not necessarily the case that
`call i32 @someFunction(i32 %always_poison)` results in poison.

 

LLVM cannot forbid poison from entering `call` arguments without prohibiting an
optimization pass from outlining code.

 

### Is it safe to store poison to memory?

 

`store i32 %always_poison, i32* %mem` does not result in undefined behavior. A
subsequent load instruction like `%load = load i32* %mem` will result in `%load`
being a poison value.

 

### Is it safe to load or store a poison memory location?

 

No.  Poison works just like `undef` in this respect.

 

### Does comparing a poison value result in poison?

 

It depends.  If the comparison couldn't solely be determined by looking at
the other operand, the result is poison.

 

For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not poison.

However, `icmp i32 ne %always_poison, 7` is poison.

 

### What if the condition operand in a `select` is poison?

 

In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
either `true` or `false`.  Because, `%sel` depends on `%always_poison` it too is
poison.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/a15f487d/attachment.html>

David Majnemer

2015-Feb-03 17:36 UTC

head link

[LLVMdev] Proposal for Poison Semantics

On Tue, Feb 3, 2015 at 3:15 AM, Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> Hi,
>
>
>
> Thanks David for putting up this proposal together!
>
> I like the idea of having poison values behave more like undef (i.e., per
> bit, with run-time behavior).
>
> One of the problems this proposal solves is speculation of 'a
&& b' into
> 'a & b'. Currently this is illegal (despite sometimes
simplifycfg doing it
> anyway).
>
> It also fixes bugs like http://llvm.org/PR20997
>
>
>
> The proposal doesn't say anything about branching on a poison value. I
> assume this should stay as the current interpretation -- that such branches
> should be undefined behavior (since we cannot branch to multiple places at
> the same time -- even if they would compute the same values; that's
already
> too hard for the compiler to analyze).
>
The RFC intended to make branching on poison values OK.  If branching on
poison wasn't OK, then we couldn't go from select to -> br/phi.

>
>
> There's another caveat: it *does* seem to fix the problem described by
Dan
> in http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html
>
> However, it introduces a potential performance penalty: we won't be
able
> to speculate instructions with undefined behavior whose input may be
poison.
>
>
>
> For example, take the following code:
>
> loop:
>
>   %add = add nsw %x, %y
>
>   %div = udiv %z, %add
>
>   … use %div only in the case that %add does not overflow and is non-zero
>
>
>
> We can move the %add outside of the loop, but we cannot move the division.
> With the reason being that if %add overflows, then %add is poison and
> therefore it can take any value (in particular, it can be 0), triggering
> undef behavior in %div.  Therefore, we cannot freely move %div, unless we
> can prove that %add will never be 0 nor poison.  This sounds hard for the
> compiler to do, and I guess we'll have some regressions (e.g., LICM has
to
> be more conservative). Nevertheless, I'm all for fixing poison once and
for
> all!
>
Believe it or not, I already fixed this bug (PR21412). :)

>
>
> BTW, would it help if I produced a version of Alive that implements the
> semantics being proposed?  (with no performance guarantees for this
> prototype).  The cool thing is that then we can run it through our database
> of 300+ InstCombine optimizations and see which ones would have to be
> removed/fixed.
>
I think such a thing would be great.  However, there is a problem that the
RFC wasn't aware of when it was written:

consider:
%S = select %A, %B, undef

without us knowing anything about %A or %B, we will replace all uses of %S
with %B.  This transform would be considered wrong with the RFC in mind.

If this transform was valid, there could not be any value or value-like
property in LLVM with semantics more powerful than undef.  This makes me
think that what LLVM *actually* implements is not poison or something like
it.

On the flip side, we could say that this transform is nonsense but I'd
rather not pessimize LLVM like that.

>
>
> Nuno
>
>
>
>
>
> *From:* David Majnemer
> *Sent:* 28 January 2015 02:50
> *To:* llvmdev at cs.uiuc.edu
> *Cc:* Sanjoy Das; Dan Gohman; John Regehr; Nuno Lopes
> *Subject:* RFC: Proposal for Poison Semantics
>
>
>
> Hello,
>
>
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
>
>
> --
>
> David
>
>
>
>
>
> # LLVM Poison Semantics
>
>
>
> Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to say the
> least, confusing for users, researchers and engineers working with LLVM.
>
>
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
>
>
> ## A Quick Introduction to Poison
>
>
>
> Let's start with a concrete motivating example in C:
>
> ```
>
> int isSumGreater(int a, int b) {
>
>   return a + b > a;
>
> }
>
> ```
>
>
>
> The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
>
>
> ```
>
> define i32 @isSumGreater(i32 %a, i32 %b) {
>
> entry:
>
>   %add = add i32 %a, %b
>
>   %cmp = icmp sgt i32 %add, %a
>
>   %conv = zext i1 %cmp to i32
>
>   ret i32 %conv
>
> }
>
> ```
>
>
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
>
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
> for "no signed wrap", `nuw` is short for "no unsigned
wrap".
>
>
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
>
> LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
>
>
>
> ## Differences Between LLVM and C/C++
>
>
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
>
>
> Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates signed
> overflow results in poison. Values which have no relationship with poisoned
> values are not effected by them.
>
>
>
> Let us take the following C program into consideration:
>
> ```
>
> int calculateImportantResult(int a, int b) {
>
>   int result = 0;
>
>   if (a) {
>
>     result = a + b;
>
>   }
>
>   return result;
>
> }
>
> ```
>
>
>
> A straightforward lowering to LLVM IR could be:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   br i1 %tobool, label %if.then, label %if.end
>
>
>
> if.then:
>
>   %add = add nsw i32 %a, %b
>
>   br label %if.end
>
>
>
> if.end:
>
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   %add = add nsw i32 %a, %b
>
>   %result = select i1 %tobool, i32 0, i32 %add
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> In the original code, the calculation of `%add` was control dependent.
>
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
>
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
>
>
> # Computation Involving Poison Values
>
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
>
>
> Examples of this rule which will result in poison:
>
> ```
>
>   %add = add i32 %x, %always_poison
>
>   %sub = sub i32 %x, %always_poison
>
>   %xor = xor i32 %x, %always_poison
>
>   %mul = mul i32 %always_poison, 1
>
> ```
>
>
>
> Examples of this rule which do not result in poison:
>
> ```
>
>   %or  = or  i32 %always_poison, 2
>
>   %and = and i32 %always_poison, 2
>
>   %mul = mul i32 %always_poison, 0
>
> ```
>
>
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
>
>
>
> The following example is only poison if `%cond` is false:
>
> ```
>
>   %sel = select i1 %cond, i32 2, %always_poison
>
> ```
>
>
>
> ### Is it safe to have poison as a `call` argument?
>
>
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
>
>
> LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
>
>
>
> ### Is it safe to store poison to memory?
>
>
>
> `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32* %mem` will
> result in `%load` being a poison value.
>
>
>
> ### Is it safe to load or store a poison memory location?
>
>
>
> No.  Poison works just like `undef` in this respect.
>
>
>
> ### Does comparing a poison value result in poison?
>
>
>
> It depends.  If the comparison couldn't solely be determined by looking
at
> the other operand, the result is poison.
>
>
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
>
> However, `icmp i32 ne %always_poison, 7` is poison.
>
>
>
> ### What if the condition operand in a `select` is poison?
>
>
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel`
> is either `true` or `false`.  Because, `%sel` depends on `%always_poison`
> it too is poison.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/9a8703ad/attachment.html>

Philip Reames

2015-Feb-03 19:10 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

(Responding to original post since this is somewhat orthogonal to the 
discussion to date.)

After reading through the thread to date, I'm coming to conclusion that 
finding a single semantics for poison that both allows speculation and 
optimization of a nsw optimization under the assumption that overflow 
"never happens" is *hard*.  Thinking about it a bit, do we actually
need
to solve that problem?

An alternate approach would be to split "nsw" into two pieces:
"nsw
unreachable" and "nsw undef".  A frontend for C or C++ would
produce
"nsw unreachable".

"nsw unreachable" allows the compiler to assume that the overflow path
can't happen.  This enables all of the zext, comparison folding, and 
other desirable optimizations we've mentioned.

"nsw undef" only allows to the compiler to assume that overflow
produces
an undefined value.  In particular, the value produced is not
"poison";
it is simply undef.

When the compiler speculates an "nsw unreachable" instruction, it
would
replace it with an "nsw undef".  However, if the compiler can *prove* 
that the original instruction would execute on all paths without a 
constraint which prevents overflow, it can preserve "nsw unreachable".
LICM today has logic that is close to this in the form of 
isGuaranteedToExecute.  (i.e. an add with loop invariant operands 
overflows in the preheader exactly when it does in the loop header)

This approach would suffer from a tension between the profitability of 
speculation and the potential lost optimizations by going from "nsw 
unreachable" to "nsw undef".  In practice, I don't have a
strong sense
for how painful this would be.  Does anyone have a good sense for how 
aggressive we are, in practice, about exploiting nsw?

My intuition is that we already work hard to handle the "guaranteed to 
execute" case due to needing to avoid newly faulting loads.  I suspect 
that we do (or at least can and should) be surprisingly effective about 
recognizing the cases where we can safely speculate the full "nsw 
unreachable" semantic.

(An alternate way to phrase the above would be to say that a) add "nsw 
unreachable" faults on overflow and b) we can't introduce faults into a
well defined program.  This is very very similar to how we reason about 
loads and dereferencability.)

Philip

p.s. It would also be interesting to weigh the profitability of 
speculating a guarded/predicated form of an "add nsw unreachable". 
But
we should probably do predicated loads and stores before we consider 
predication for simple arithmetic.

On 01/27/2015 06:50 PM, David Majnemer wrote:> Hello,
>
> What follows is my attempt to describe how poison works. Let me know 
> what you think.
>
> -- 
> David
>
>
> # LLVM Poison Semantics
>
> Poison is an LLVM concept which exists solely to enable further 
> optimization of LLVM IR. The exact behavior of poison has been, to say 
> the least, confusing for users, researchers and engineers working with 
> LLVM.
>
> This document hopes to clear up some of the confusion of poison and 
> hopefully explain *why* it has its semantics.
>
> ## A Quick Introduction to Poison
>
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
>   return a + b > a;
> }
> ```
>
> The C specification permits us to optimize the comparison in 
> `isSumGreater` to `b > 0` because signed overflow results in undefined 
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
>   %add = add i32 %a, %b
>   %cmp = icmp sgt i32 %add, %a
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> ```
>
> However, LLVM cannot determine that `%cmp` should not consider cases 
> where `%add` resulted in signed overflow.  We need a way to 
> communicate this information to LLVM.
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is 
> short for "no signed wrap", `nuw` is short for "no unsigned
wrap".
>
> With these, we can come up with a new formulation of `%add`: `add i32 
> nsw %a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and 
> replace it with: `icmp sgt i32 %b, 0`.
>
> ## Differences Between LLVM and C/C++
>
> There are some interesting differences between what C++ and C specify 
> and how LLVM behaves with respect to performing an operationg which is 
> not permitted to overflow.
>
> Perhaps chief among them is that evaluating an expression in C++ or C 
> which results performs an overflow is undefined behavior. In LLVM, 
> executing an instruction which is marked `nsw` but which violates 
> signed overflow results in poison. Values which have no relationship 
> with poisoned values are not effected by them.
>
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
>   int result = 0;
>   if (a) {
>     result = a + b;
>   }
>   return result;
> }
> ```
>
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   br i1 %tobool, label %if.then, label %if.end
>
> if.then:
>   %add = add nsw i32 %a, %b
>   br label %if.end
>
> if.end:
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>   ret i32 %result
> }
> ```
>
> Moving `%add` to the `%entry` block would be preferable and would 
> allow further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   %add = add nsw i32 %a, %b
>   %result = select i1 %tobool, i32 0, i32 %add
>   ret i32 %result
> }
> ```
>
> In the original code, the calculation of `%add` was control dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw` 
> flag.
> Despite this, the program should behave as it did before because the 
> poisoned value is masked-out by the select. The next section will dive 
> into this in greater detail.
>
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be 
> constrained by its non-poison operands.
>
> Examples of this rule which will result in poison:
> ```
>   %add = add i32 %x, %always_poison
>   %sub = sub i32 %x, %always_poison
>   %xor = xor i32 %x, %always_poison
>   %mul = mul i32 %always_poison, 1
> ```
>
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to 
> `0`.  In this respect, poison is not different from `undef`.
>
> The following example is only poison if `%cond` is false:
> ```
>   %sel = select i1 %cond, i32 2, %always_poison
> ```
>
> ### Is it safe to have poison as a `call` argument?
>
> A `call` instruction may or may not result in poison depending on 
> exactly how the callee  uses the supplied arguments, it is not 
> necessarily the case that `call i32 @someFunction(i32 %always_poison)` 
> results in poison.
>
> LLVM cannot forbid poison from entering `call` arguments without 
> prohibiting an optimization pass from outlining code.
>
> ### Is it safe to store poison to memory?
>
> `store i32 %always_poison, i32* %mem` does not result in undefined 
> behavior. A subsequent load instruction like `%load = load i32* %mem` 
> will result in `%load` being a poison value.
>
> ### Is it safe to load or store a poison memory location?
>
> No.  Poison works just like `undef` in this respect.
>
> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by 
> looking at the other operand, the result is poison.
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not 
> poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
>
> ### What if the condition operand in a `select` is poison?
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, 
> `%sel` is either `true` or `false`.  Because, `%sel` depends on 
> `%always_poison` it too is poison.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/8ee1c2dc/attachment.html>

John Regehr

2015-Feb-04 17:09 UTC

head link

[LLVMdev] Proposal for Poison Semantics

Hi all, sorry to come late to the party. This is a great thread. I wanted 
to toss out an example I ran across the other day that seems somewhat 
interesting. We start with this C program that executes no UB:

int foo(int x1, int x2) {
   return
     ((x2 > 0) && (x1 > (2147483647 - x2))) ||
     (x1 && (x2 < 0));
}

int main(void) {
   printf("%x\n", foo(1, -770027279));
   return 0;
}

Clang+LLVM turns foo() into this:

define i32 @foo(i32 %x1, i32 %x2) #0 {
   %1 = icmp sgt i32 %x2, 0
   %2 = sub nsw i32 2147483647, %x2
   %3 = icmp slt i32 %2, %x1
   %or.cond = and i1 %1, %3
   br i1 %or.cond, label %6, label %4

; <label>:4                                       ; preds = %0
   %5 = icmp slt i32 %x2, 0
   %not. = icmp ne i32 %x1, 0
   %. = and i1 %5, %not.
   br label %6

; <label>:6                                       ; preds = %4, %0
   %7 = phi i1 [ true, %0 ], [ %., %4 ]
   %8 = zext i1 %7 to i32
   ret i32 %8
}

In the semantics described by the LangRef, my interpretation of the
execution is this:

%1 = false
%2 = poison
%3 = poison
%or.cond = poison
... the br goes either way or both ways or something ...
%5 = true
%not. = true
%. = true
%7 = phi true, true

So all is good despite the non-deterministic execution.

In the proposed new model, we don't have to branch on poison since
%or.cond is completely determined by its false argument and is therefore
false. Does that all sound right?

John

Hal Finkel

2015-Feb-07 13:57 UTC

head link

[LLVMdev] RFC: Proposal for Poison Semantics

----- Original Message -----> From: "Philip Reames" <listmail at philipreames.com>
> To: "David Majnemer" <david.majnemer at gmail.com>, llvmdev
at cs.uiuc.edu
> Cc: "Nuno Lopes" <nuno.lopes at ist.utl.pt>, bruce at
hoult.org, "John Regehr" <regehr at cs.utah.edu>
> Sent: Tuesday, February 3, 2015 1:10:29 PM
> Subject: Re: [LLVMdev] RFC: Proposal for Poison Semantics
> 
> (Responding to original post since this is somewhat orthogonal to the
> discussion to date.)
> 
> After reading through the thread to date, I'm coming to conclusion
> that finding a single semantics for poison that both allows
> speculation and optimization of a nsw optimization under the
> assumption that overflow "never happens" is *hard*. Thinking
about
> it a bit, do we actually need to solve that problem?
> 
> An alternate approach would be to split "nsw" into two pieces:
"nsw
> unreachable" and "nsw undef". A frontend for C or C++ would
produce
> "nsw unreachable".
Having thought about this for a few days, I'd like to add that I like this
idea. We clearly seem to want both the strong (unreachable) and weak (undef)
semantics in different situations, the ability to say which is which, and the
ability to degrade the stronger semantics to the weaker ones for speculation.
Fundamentally, this degradation is lossy, and we need to add some extra
information to the IR in order to track it.

 -Hal
> 
> "nsw unreachable" allows the compiler to assume that the overflow
> path can't happen. This enables all of the zext, comparison folding,
> and other desirable optimizations we've mentioned.
> 
> "nsw undef" only allows to the compiler to assume that overflow
> produces an undefined value. In particular, the value produced is
> not "poison"; it is simply undef.
> 
> When the compiler speculates an "nsw unreachable" instruction, it
> would replace it with an "nsw undef". However, if the compiler
can
> *prove* that the original instruction would execute on all paths
> without a constraint which prevents overflow, it can preserve "nsw
> unreachable". LICM today has logic that is close to this in the form
> of isGuaranteedToExecute. (i.e. an add with loop invariant operands
> overflows in the preheader exactly when it does in the loop header)
> 
> This approach would suffer from a tension between the profitability
> of speculation and the potential lost optimizations by going from
> "nsw unreachable" to "nsw undef". In practice, I
don't have a strong
> sense for how painful this would be. Does anyone have a good sense
> for how aggressive we are, in practice, about exploiting nsw?
> 
> My intuition is that we already work hard to handle the "guaranteed
> to execute" case due to needing to avoid newly faulting loads. I
> suspect that we do (or at least can and should) be surprisingly
> effective about recognizing the cases where we can safely speculate
> the full "nsw unreachable" semantic.
> 
> (An alternate way to phrase the above would be to say that a) add
> "nsw unreachable" faults on overflow and b) we can't
introduce
> faults into a well defined program. This is very very similar to how
> we reason about loads and dereferencability.)
> 
> Philip
> 
> p.s. It would also be interesting to weigh the profitability of
> speculating a guarded/predicated form of an "add nsw
unreachable".
> But we should probably do predicated loads and stores before we
> consider predication for simple arithmetic.
> 
> On 01/27/2015 06:50 PM, David Majnemer wrote:
> 
> 
> 
> Hello,
> 
> 
> What follows is my attempt to describe how poison works. Let me know
> what you think.
> 
> 
> --
> David
> 
> 
> 
> 
> 
> # LLVM Poison Semantics
> 
> 
> Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to
> say the least, confusing for users, researchers and engineers
> working with LLVM.
> 
> 
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
> 
> 
> ## A Quick Introduction to Poison
> 
> 
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
> return a + b > a;
> }
> ```
> 
> 
> The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in
> undefined behavior. A reasonable translation of `isSumGreater` to
> LLVM IR could be:
> 
> 
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
> %add = add i32 %a, %b
> %cmp = icmp sgt i32 %add, %a
> %conv = zext i1 %cmp to i32
> ret i32 %conv
> }
> ```
> 
> 
> However, LLVM cannot determine that `%cmp` should not consider cases
> where `%add` resulted in signed overflow. We need a way to
> communicate this information to LLVM.
> 
> 
> This is where the `nsw` and `nuw` flags come into play. `nsw` is
> short for "no signed wrap", `nuw` is short for "no unsigned
wrap".
> 
> 
> With these, we can come up with a new formulation of `%add`: `add i32
> nsw %a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
> 
> 
> ## Differences Between LLVM and C/C++
> 
> 
> There are some interesting differences between what C++ and C specify
> and how LLVM behaves with respect to performing an operationg which
> is not permitted to overflow.
> 
> 
> Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates
> signed overflow results in poison. Values which have no relationship
> with poisoned values are not effected by them.
> 
> 
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
> int result = 0;
> if (a) {
> result = a + b;
> }
> return result;
> }
> ```
> 
> 
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
> %tobool = icmp ne i32 %a, 0
> br i1 %tobool, label %if.then, label %if.end
> 
> 
> if.then:
> %add = add nsw i32 %a, %b
> br label %if.end
> 
> 
> if.end:
> %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
> ret i32 %result
> }
> ```
> 
> 
> Moving `%add` to the `%entry` block would be preferable and would
> allow further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
> %tobool = icmp ne i32 %a, 0
> %add = add nsw i32 %a, %b
> %result = select i1 %tobool, i32 0, i32 %add
> ret i32 %result
> }
> ```
> 
> 
> In the original code, the calculation of `%add` was control
> dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw`
> flag.
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will
> dive into this in greater detail.
> 
> 
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
> 
> 
> Examples of this rule which will result in poison:
> ```
> %add = add i32 %x, %always_poison
> %sub = sub i32 %x, %always_poison
> %xor = xor i32 %x, %always_poison
> %mul = mul i32 %always_poison, 1
> ```
> 
> 
> Examples of this rule which do not result in poison:
> ```
> %or = or i32 %always_poison, 2
> %and = and i32 %always_poison, 2
> %mul = mul i32 %always_poison, 0
> ```
> 
> 
> In fact, it would be reasonable to optimize `%or` to `2` and `%and`
> to `0`. In this respect, poison is not different from `undef`.
> 
> 
> The following example is only poison if `%cond` is false:
> ```
> %sel = select i1 %cond, i32 2, %always_poison
> ```
> 
> 
> ### Is it safe to have poison as a `call` argument?
> 
> 
> A `call` instruction may or may not result in poison depending on
> exactly how the callee uses the supplied arguments, it is not
> necessarily the case that `call i32 @someFunction(i32
> %always_poison)` results in poison.
> 
> 
> LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
> 
> 
> ### Is it safe to store poison to memory?
> 
> 
> `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32*
> %mem` will result in `%load` being a poison value.
> 
> 
> ### Is it safe to load or store a poison memory location?
> 
> 
> No. Poison works just like `undef` in this respect.
> 
> 
> ### Does comparing a poison value result in poison?
> 
> 
> It depends. If the comparison couldn't solely be determined by
> looking at the other operand, the result is poison.
> 
> 
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
> 
> 
> ### What if the condition operand in a `select` is poison?
> 
> 
> In the example `%sel = select i1 %always_poison, i1 true, false`,
> `%sel` is either `true` or `false`. Because, `%sel` depends on
> `%always_poison` it too is poison.
> 
> _______________________________________________
> LLVM Developers mailing list LLVMdev at cs.uiuc.edu
> http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Feb 2015 - [LLVMdev] Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics

[LLVMdev] Proposal for Poison Semantics

[LLVMdev] RFC: Proposal for Poison Semantics