thr3ads.net - llvm dev - [LLVMdev] poison and select [Sep 2014]

If this information is useful, please help other people find it:
Share via:

John Regehr

2014-Sep-19 02:45 UTC

[LLVMdev] poison and select

Today I ran into another aspect of the poison problem...

Basically, SimplifyCFG wants to take

   expr1 && expr2

and flatten it into

   x = expr1
   y = expr2
   x&y

This isn't safe when expr2 might execute UB.  The consequence is that no 
LLVM shift instruction is safe to speculatively execute, nor is any 
nsw/nuw/exact variant, unless the operands can be proven to be in bounds.

Real example here:

   http://llvm.org/bugs/show_bug.cgi?id=20997

John

David Majnemer

2014-Sep-19 07:16 UTC

head link

[LLVMdev] poison and select

I believe Dan struck at the heart of the issue and the consequences have
just become apparent (at least to me).

Consider:
define i32 @f(i32 %a) {
  %cmp = icmp sgt i32 %a, 0
  %sub = sub nsw i32 2147483647, %a
  %sel = select i1 %cmp, i32 %sub, i32 0
  ret i32 %sel
}

If %a is -1, %sub will wrap and be poison. %cmp will be false.
Is %sel poison or not?  We have no definition for what select actually does.

If we assume select boils down to nothing but arithmetic, then %sel may be
poisoned by *any* of its operands.

If we assume select boils down to control flow (branch + PHI), then we are
violating the rules by which it operates in many places in LLVM.
Consider the following input:
%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%sel = select i1 %cmp1, i1 %cmp2, i1 false

We currently optimize this to:
%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%and = and i1 %cmp1, %cmp2

If %a is 2147483647: %add is poison, %cmp1 is false, %cmp2 is poison, %sel
is false but %and is poison!

The above transform is not an oddity, InstCombine does this sort of thing
*a lot*.

If select is arithmetic, many transforms are "correct" but we can form
select far less often than we thought we could.
If select is control flow, many transforms are "wrong" but we might be
able
to form more selects.

What is select? How does it work? How should it work?

On Thu, Sep 18, 2014 at 7:45 PM, John Regehr <regehr at cs.utah.edu>
wrote:
> Today I ran into another aspect of the poison problem...
>
> Basically, SimplifyCFG wants to take
>
>   expr1 && expr2
>
> and flatten it into
>
>   x = expr1
>   y = expr2
>   x&y
>
> This isn't safe when expr2 might execute UB.  The consequence is that
no
> LLVM shift instruction is safe to speculatively execute, nor is any
> nsw/nuw/exact variant, unless the operands can be proven to be in bounds.
>
> Real example here:
>
>   http://llvm.org/bugs/show_bug.cgi?id=20997
>
>
> John
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140919/a18536ca/attachment.html>

Sanjoy Das

2014-Nov-14 22:12 UTC

head link

[LLVMdev] poison and select

Hi all,

I was initially going to send this email to a thread on llvm-commits
where David explained the issue with poison and select to me, but then
some quick googling led me to this thread.

I've been thinking about a possible semantics for poison values in a
certain way:

The key semantic difference between undef and poison, as I understand
it, is that when justifying an execution trace the compiler needs to
commit to a legal value for every undef flowing through the program --
a N bit undef can be any N bit value, but it has to be *some* N bit
value.  On the other hand, the compiler does not need to commit to a N
bit poison value being *any* legal N bit value -- it may do
optimizations (eg. (t < (t + 1)) ==> true) that cannot be justified by
any assignment of legal values to the poison flowing through the
program.  Since there may not be legal assignments to poison values
that justify a certain execution trace, we have to prohibit any action
that leads to "observing" poison values.  So we say
"observing" poison
values is UB and "cannot happen".  Now the question is "what is
observation?".

One way to formalize the above: make any side-effecting operation
classically data-dependent on a poison value undefined behavior.  I
think this is what the spec tries to do; but the current approach has
issues as pointed out in this thread, especially in relation to
selects.

Since UB is a dynamic property, I think we should make this notion of
dependence dynamic too: a side effecting operation has UB if any of
its operands (including values it is control dependent on via
branches) is `dynamic-dependent` on a poison value (non-side effecting
operations like add and lshr never have UB).

A value X is *not* `dynamic-dependent` on value Y (not necessarily an
operand of X) if for all values of Y, X computes the same value.  So
(X - X) is not `dynamic-dependent` on X, 42 is not dynamic-dependent
on 43.  Otherwise X is `dynamic-dependent` on Y.  The idea here is
that if an expression computes the same value for all possible values
of one of its inputs (transitively), the value computed by the
expression cannot be used to "observe" the value of the said operand;
even if there is a static dependence chain.


It does not make a whole lot of sense to talk about dynamic dependence
for instructions like "load" which are not pure computation [1], and
these should be classified under "side effecting operations".  We will
have to carefully specify the `dynamic-dependent` relation for this
scheme to work (if it does work!), but I'm just trying to get the
general across right now.  For reg2mem to be safe, for instance, we
need to special-case stores to and loads from non-escaping allocas
(see summary below).


Now, the interesting thing about this dynamic dependence thing is that
it is a runtime, exact, property of a program.  So "select %c, %a, %b"
is dynamic-dependent on "%a" only in executions where "%c"
is true.
In the example in previous email, for example,

  %add = add nsw i32 %a, 1
  %cmp1 = icmp eq i32 %a, 0
  %cmp2 = icmp slt i32 %add, 0
  %and = and i1 %cmp1, %cmp2

%and is *not* `dynamic-dependent` on %cmp2 in the runs in which %cmp1
is false.

Another more complex example, involving conversion of br-PHI to a
select:

  %a = maybe poison
  br i1 %a, label %left, label %right
 left:
  %l = 42
  br label %merge
 right:
  %r = < unknown load > can be speculated
  br label %merge
 merge:
  %m = phi %l %r
  store %m to heap

If %c is poison, then the store has UB iff %r != %l (unless it is
meaningful to speculatively talk about %r's value, we can't form a
select anyway).  In that case, it is okay to transform the br-phi to a
select.  If %c is poison, and %r == %l, then the store isn't UB (since
the phi does not dynamically depend on %c).  In that case, the select
we form, "select %c, %l, %r" doesn't observe %c either and so
storing
it is not UB.  That said, I admit this is counter-intuitive -- whether
a program is undefined or not is a function of state of the system
that the program may not even observe in the undefined run.  If
this is a fundamental flaw in this scheme or not I'm not sure.


* Summary / TL;DR *

The compiler may justify executions using assignments to
poison values that don't corresponding to any legal value.  For this
reason, escaping any *information* based off a poison value is UB --
such information is unsound, in some sense.  This notion of
"information dependence" is more precise than classic data dependence
(e.g. (xor A A) does not give us any information about A).  The notion
of escaping is also more precise than "side-effecting operation" --
storing to a non-escaping alloca does not escape the stored value, for
example.

Does this make any sense at all?  Have there been past attempts at
formalizing poison along these lines?

  [1]: in this scheme load cannot result in poison because the store
that stored poison is UB and "could not have happened".

Thank you for your time!
-- Sanjoy

llvm dev - Sep 2014 - [LLVMdev] poison and select

[LLVMdev] poison and select

[LLVMdev] poison and select

[LLVMdev] poison and select