thr3ads.net - llvm dev - [llvm-dev] RFC: Killing undef and spreading poison [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes via llvm-dev

2016-Oct-18 12:06 UTC

[llvm-dev] RFC: Killing undef and spreading poison

Hi,

Over the past few years we've been trying to kill poison somehow. There have
been a few proposals, but they've all failed to pass the bar and/or to
gather significant support (my own proposals included).
We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and myself)
have a new proposal to kill undef instead and replace it with poison + a new
'freeze' instruction.  We believe this proposal simplifies things at the
IR
level, allows us to fix long-standing bugs in LLVM, and is roughly
performance-neutral for now (and can enable further optimizations in the
future).

Sorry for the longish email.  We will give a talk about this proposal at the
LLVM dev meeting in November, so hopefully we'll be able to discuss this
further in person.

We would appreciate any comments, feedback, suggestions, etc.  If you have
questions let me know as well (not everything below is super detailed, so
please ask where things are not explicit enough).
(I've CC'ed a few people that have been involved in discussions re
semantics
of LLVM IR in the past year or so.  Apologies in advance if I forgot someone
-- which probably I did.  I've also CC'ed some CodeGen folks, from which
we
would appreciate input on how this proposal fits within the current and the
future pipeline)

Thanks,
Nuno


---------------------------------------------------------


Motivation for undef & poison
============================There were a few motivations behind the creation of
undef and poison:
 1) There was a desire to exploit certain undefined behaviors without
hurting optimization opportunities.  This led to the creation of weaker
forms of undefined behavior (UB), while leaving "full" UB to
operations that
can trap the CPU, like division by zero.  Weaker UB operations include, for
example, signed integer overflow, say 'a + b'.  If signed overflow was
full
UB, then we couldn't speculatively execute these operations without proving
that they could not overflow, otherwise the compiler would be introducing UB
in a perfectly fine program. E.g., the compiler couldn't hoist signed
additions out of loops.  In weaker forms of UB, the triggering of full UB is
delayed so that the compiler can speculatively execute these operations and
there's only full UB if the result is somehow "consumed".

 2) Undef was created to represent uninitialized memory reads. For example,
it's very useful in conjunction with PHI nodes:
int a;
if (foo)
  a_0 = 42;
a_1 = phi(a_0, undef)

By using undef, we don't have to materialize some arbitrary constant on a
branch where a variable is not initialized.

3) And then people realized that undef wasn't enough.  For example, we'd
also like to be able to optimize expressions like "((a + 1) >s a)"
to
"true", which isn't possible with "undef" as defined in
(1), since there's
no value for `%x` that makes `%x > INT_MAX` true.  This was important for
loop analyses and certain InstCombine patterns, and so we needed a stronger
version of UB which was still not full UB. And poison was born. For example,
on 64-bit platforms, poison is very handy for widening induction variables
to 64 bits.



Problems with undef & poison
===========================The interactions between undef and poison are
particularly complicated and
they inhibit innocent-looking optimizations.  For example, the following is
wrong:
%v = select %c, %x, undef
  =>
%v = %x

If %x is poison and %c = false, then we've just replaced undef with poison
which is bad, since poison is stronger than undef.  So, this is an example
of something we want to do but we can't at the moment (even if we still do
it anyway, risking miscompilations!)



Goal
===Our goal was to propose minimal changes to the IR semantics such that 1) we
could keep most of the optimizations we do today, 2) would require minimal
code changes, 3) compilation time, run time, and memory consumption would
stay roughly the same, and 4) we could fix all the longstanding bugs due to
undef/poison semantics and interactions.
We believe this proposal fulfills the goals, and can even enable future
optimizations.



Proposal
======= 1) Kill undef
 2) Create a new poison value (representable in IR, inheriting from
llvm::Constant)
 3) Create a new instruction, '%y = freeze %x', that stops propagation
of
poison.  If the input is poison, then it returns an arbitrary, but fixed,
value. (like old undef, but each use gets the *same* value), otherwise it
just returns its input value.
 4) All operations over poison return poison.
For example:
and %x, poison   ->  poison  ; just like before
and 0, poison    ->  poison  ; just like before

%y = freeze poison
%z = and %y, 1  --- 000..0x   ; just like old undef
%w = xor %y, %y                    ; 0 -- not undef: both uses of %y have
the same value

Instruction-specific semantics:
 - br poison -> UB
 - select poison, %a, %b -> poison   (some InstCombine patterns will need
freeze, but they are wrong right now already!)
 - if %c is not poison, select %c, %a, %b is poison  if %c = 1 and %a is
poison, or %c = 0 and %b is poison  (see discussion below)
 - bitcast between types of different bitwidth can return poison if when
concatenating adjacent values one of these values is poison. For example:
%x = bitcast <3 x i2>  <2, poison, 2> to <2 x i3>
  ->
%x = <poison, poison>

%x = bitcast <6 x i2>  <2, poison, 2, 2, 2, 2> to <4 x i3>
  ->
%x = <poison, poison, 5, 2>



Purpose of Freeze
================Poison is propagated aggressively throughout. However, there are
cases where
this breaks certain optimizations, and therefore freeze is used to
selectively stop poison from being propagated.

A use of freeze is to enable speculative execution.  For example, loop
switching does the following transformation:
while (C) {
  if (C2) {
   A
  } else {
   B
  }
}
=>
if (C2) {
   while (C)
      A
} else {
    while (C)
       B
}

Here we are evaluating C2 before C.  If the original loop never executed
then we had never evaluated C2, while now we do.  So we need to make sure
there's no UB for branching on C2.  Freeze ensures that so we would actually
have 'if (freeze(C2))' instead.
Note that having branch on poison not trigger UB has its own problems.  We
believe this is a good tradeoff.


Another use is, for example, to implement bit-fields:
a.x = 2
becomes:
%v = load %a
%v2 = freeze %v  ; %v could be uninitialized data (poison)
%v3 = ... bitmasking...
store %a, %v3



Performance
==========We measured compile time, running time, memory consumption, and
object/IR
size of a few benchmarks (-O3 vs -O3 w/ freeze).  They should give a rough
picture of the overhead involved.
Benchmarks were run on 2 machines (server1 and server2), both x86_64 running
Ubuntu 16.04.

Summary:
 - Memory consumption: there's generally a 1% increase, with the max of 8%
in oggenc
 - Running time on SPEC: mostly in the noise (about 1% up or down)
 - Compile time: mostly in the noise (about 1% up or down, but usually no
difference)
 - LNT shows 0.5% average slowdown, but with wider swings both ways. We are
aware of a few things that would need tweaking to recover perf like loop
unrolling (likely because of SCEV not knowing what freeze is).

You can see the raw data we have collected in the links below:

Memory consumption:
https://docs.google.com/spreadsheets/d/1ycJaMPLzh_4YV7RQmVaLR-vHzd7ZlYiV2iES
G-lBRbk/edit#gid=0
SPEC running time:
https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-Ex533fy9DezIV
rbfeIns/edit#gid=0
Compilation time:
https://docs.google.com/spreadsheets/d/1_xj6o_ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW
WAR9O2c/edit#gid=0
Statistics on number of Freeze Instructions (up to 5% in total in gcc):
https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6mLUmkRIF
1SmZ-6Q/edit#gid=0
LNT raw data:
https://github.com/aqjune/freezescript/tree/master/resultcsv-mailinglist/LNT

Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM
Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM



Implementation
=============A prototype is available at:
 - https://github.com/snu-sf/llvm-freeze/tree/x86jmp
 - https://github.com/snu-sf/clang-freeze

This implementation includes:
 - Make clang emit freeze for bit-field codegen
 - Add freeze instruction to IR, plus a few instcombine patterns to simply
freeze(freeze(x)), for example
 - Fix a few instcombine select patterns to introduce freeze when needed
(these were wrong with current semantics already)
 - Add a freeze node to selection DAG and add codegen support.  For codegen,
it uses CopyFromReg+CopyToReg to lower the freeze node, and so it assumes
that from that point on LLVM will not propagate undef anymore (to ensure
that 'freeze undef' always returns the same value for all uses, which is
not
what happens with 'undef' on its own). This needs further discussion.
 - Fix loop unswitch to freeze condition
 - Change GVN's load widening to load as vector of bits+bitcast (gvload
branch only; this was not included in the tests above)

(this implementation still uses undef value, but that should be replaced
with poison)



Deployment
=========A proposal for incremental deployment of the proposed changed:
 1) Add freeze instruction + CodeGen support
 2) Change clang to start emitting freeze for bit-field reads
 3) Add auto-upgrade to convert undef to 'freeze poison' (undef and
'freeze
poison' with one use are equivalent)
 4) Fix InstCombine, Loop unswitching, etc to use freeze
 5) Replace references to undef in the code with either poison or 'freeze
poison'
 7) Kill undef
 8) Investigate any perf regressions
 9) Run John's LLVM IR fuzzer with Alive to find leftover bugs

Regarding perf, at least PR30615 needs fixing to enable efficient load
widening at IR level (which seems it's still undecided whether GVN will
continue doing it or not).



Discussion on select
===================Select is a tricky instruction to define semantics for. 
There are multiple
views of select's purpose that are contradictory:
 1) Select should be equivalent to arithmetic (e.g., allow  'select %c,
true, %x' -> 'or %c, %x' and arithmetic -> select)
 2) br + phi -> select should be allowed (simplifyCFG)
 3) Select -> br + phi should be allowed

To have 1), we need to make select poison if either of its arguments is
poison. This disallows 2), since there we need to have select being poison
only if the dynamically-chosen value is poison.
2) and 3) are orthogonal and do not conflict, though.

3) is hard because it requires select to be stronger than branching
(UB-wise), meaning that we would need select to be UB if the condition was
poison. This blocks a bunch of instcombine patterns, like:
Pre: C < 0
%r = udiv %a, C
  =>
%c = icmp ult %a, C
%r = select %c, 0, 1

If %a is poison, then we would be introducing UB. Adding freeze(%c) would
fix the problem, though.


Since we really want to have 2) since that's simplifyCFG, we propose to make
'select %c, %a, %b' poison if any of the following holds:
 - %c is poison
 - %c = true and %a is poison
 - %c = false and %b is poison

This disallows 3) and some transformations of 1). Since 3) is only performed
in CodeGenPrepare, we can safely ignore it (no aggressive optimizations are
run afterwards). For 1), we will have to restrict InstCombine patterns for
the cases when we are sure a given variable is non-poisonous (which is only
when it came from a 'freeze' instruction or it's the result of a
non-poison-producing instruction).
This semantics also allows arithmetic -> select, but not select ->
arithmetic (without use of freeze).



Acknowledgements
===============I would personally like to thank David, John, and Sanjoy for
embarking on
this trip over a year ago and spending a lot of time on this project; Gil
and his students (Juneyoung, Yoonseung, and Youngju) for prototyping and
testing several ideas (and fixing them!); and everybody else that has
contributed in the form of off-line and on-line discussions.



Future work
==========This proposal only handles integers.  It's a good step forward,
but we are
still missing pointers and floats; we are aware of problems (err,
possibilities for improvement) there.  We will work on these next. They are
orthogonal problems, so what we are proposing today won't require further
changes (hopefully).

Hal Finkel via llvm-dev

2016-Oct-18 13:29 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

----- Original Message -----> From: "Nuno Lopes" <nuno.lopes at ist.utl.pt>
> To: llvm-dev at lists.llvm.org
> Cc: "Sanjoy Das" <sanjoy at playingwithpointers.com>,
"David Majnemer" <david.majnemer at gmail.com>, "John
Regehr"
> <regehr at cs.utah.edu>, "Chung-Kil Hur" <gil.hur at
sf.snu.ac.kr>, "Juneyoung Lee" <juneyoung.lee at
sf.snu.ac.kr>,
> "Yoonseung Kim" <yoonseung.kim at sf.snu.ac.kr>,
"Youngju Song" <youngju.song at sf.snu.ac.kr>, hfinkel at
anl.gov, "Dan
> Gohman" <dan433584 at gmail.com>, nlewycky at google.com, mbraun
at apple.com, qcolombet at apple.com, "t p northover"
> <t.p.northover at gmail.com>
> Sent: Tuesday, October 18, 2016 7:06:31 AM
> Subject: RFC: Killing undef and spreading poison
> 
> Hi,
> 
> Over the past few years we've been trying to kill poison somehow.
> There have
> been a few proposals, but they've all failed to pass the bar and/or
> to
> gather significant support (my own proposals included).
> We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and
> myself)
> have a new proposal to kill undef instead and replace it with poison
> + a new
> 'freeze' instruction.  We believe this proposal simplifies things
at
> the IR
> level, allows us to fix long-standing bugs in LLVM, and is roughly
> performance-neutral for now (and can enable further optimizations in
> the
> future).
> 
> Sorry for the longish email.  We will give a talk about this proposal
> at the
> LLVM dev meeting in November, so hopefully we'll be able to discuss
> this
> further in person.
> 
> We would appreciate any comments, feedback, suggestions, etc.  If you
> have
> questions let me know as well (not everything below is super
> detailed, so
> please ask where things are not explicit enough).
> (I've CC'ed a few people that have been involved in discussions re
> semantics
> of LLVM IR in the past year or so.  Apologies in advance if I forgot
> someone
> -- which probably I did.  I've also CC'ed some CodeGen folks, from
> which we
> would appreciate input on how this proposal fits within the current
> and the
> future pipeline)
> 
Thanks for working on this! A couple of questions/comments below:

...
> 
> Purpose of Freeze
> ================> Poison is propagated aggressively throughout. However,
there are
> cases where
> this breaks certain optimizations, and therefore freeze is used to
> selectively stop poison from being propagated.
> 
> A use of freeze is to enable speculative execution.  For example,
> loop
> switching does the following transformation:
> while (C) {
>   if (C2) {
>    A
>   } else {
>    B
>   }
> }
> =>
> if (C2) {
>    while (C)
>       A
> } else {
>     while (C)
>        B
> }
> 
> Here we are evaluating C2 before C.  If the original loop never
> executed
> then we had never evaluated C2, while now we do.  So we need to make
> sure
> there's no UB for branching on C2.  Freeze ensures that so we would
> actually
> have 'if (freeze(C2))' instead.
> Note that having branch on poison not trigger UB has its own
> problems.
Can you please elaborate on these problems?
>  We
> believe this is a good tradeoff.
> 
> 
...
> 
> Discussion on select
> ===================> Select is a tricky instruction to define semantics
for.  There are
> multiple
> views of select's purpose that are contradictory:
>  1) Select should be equivalent to arithmetic (e.g., allow  'select
>  %c,
> true, %x' -> 'or %c, %x' and arithmetic -> select)
>  2) br + phi -> select should be allowed (simplifyCFG)
>  3) Select -> br + phi should be allowed
> 
> To have 1), we need to make select poison if either of its arguments
> is
> poison. This disallows 2), since there we need to have select being
> poison
> only if the dynamically-chosen value is poison.
> 2) and 3) are orthogonal and do not conflict, though.
> 
> 3) is hard because it requires select to be stronger than branching
> (UB-wise), meaning that we would need select to be UB if the
> condition was
> poison. This blocks a bunch of instcombine patterns, like:
> Pre: C < 0
> %r = udiv %a, C
>   =>
> %c = icmp ult %a, C
> %r = select %c, 0, 1
> 
> If %a is poison, then we would be introducing UB. Adding freeze(%c)
> would
> fix the problem, though.
> 
> 
> Since we really want to have 2) since that's simplifyCFG, we propose
> to make
> 'select %c, %a, %b' poison if any of the following holds:
>  - %c is poison
>  - %c = true and %a is poison
>  - %c = false and %b is poison
> 
> This disallows 3) and some transformations of 1). Since 3) is only
> performed
> in CodeGenPrepare, we can safely ignore it (no aggressive
> optimizations are
> run afterwards).
This strikes me as a dangerous game to play. CGP's transformations need to
be semantically sound (even if they are anti-canonical). In part, this is
because our IR-level analyses are used by CGP. Moreover, targets after CGP run
IR-level transformations, and having different semantic rules there is going to
make code reuse hard in subtle ways. Which brings up another issue: We currently
have undef at the MI level; do you propose changing that, and if so, how?

 -Hal
> For 1), we will have to restrict InstCombine
> patterns for
> the cases when we are sure a given variable is non-poisonous (which
> is only
> when it came from a 'freeze' instruction or it's the result of
a
> non-poison-producing instruction).
> This semantics also allows arithmetic -> select, but not select ->
> arithmetic (without use of freeze).
> 
...
> 
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Nuno Lopes via llvm-dev

2016-Oct-18 15:10 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

>> A use of freeze is to enable speculative execution.  For example, loop 
>> switching does the following transformation:
>> while (C) {
>>   if (C2) {
>>    A
>>   } else {
>>    B
>>   }
>> }
>> =>
>> if (C2) {
>>    while (C)
>>       A
>> } else {
>>     while (C)
>>        B
>> }
>> 
>> Here we are evaluating C2 before C.  If the original loop never 
>> executed then we had never evaluated C2, while now we do.  So we need 
>> to make sure there's no UB for branching on C2.  Freeze ensures
that
>> so we would actually have 'if (freeze(C2))' instead.
>> Note that having branch on poison not trigger UB has its own problems.
> 
> Can you please elaborate on these problems?
Sure! For example, the following transformation would be wrong if branch on
poison was a non-deterministic branch instead of UB:

    %a = add i32 %x, 1
    %c = icmp eq i32 %a, %poison
    br i1 %c, label %bb1, label %bb2
bb1:
    %b = add i32 %x, 1
    %d = call i32 @bar(i32 %b)
->
bb1:
    %d = call i32 @bar(i32 % poison)

GVN will perform this kind of transformation: it concludes that %a, %b, and
%poison are all equal and picks one representative value. However, these values
are equal only when they are not poison.  If %poison is indeed poison, then the
transformation is wrong because before it was passing an ok value to bar() and
after the transformation it is passing poison.
On the other hand, if branching on poison is UB, the original program was
executing UB already because %poison (and therefore %c) were poison. So the
transformation is ok.

>> Discussion on select
>> ===================>> Select is a tricky instruction to define
semantics for.  There are
>> multiple views of select's purpose that are contradictory:
>>  1) Select should be equivalent to arithmetic (e.g., allow  'select
>> %c, true, %x' -> 'or %c, %x' and arithmetic ->
select)
>>  2) br + phi -> select should be allowed (simplifyCFG)
>>  3) Select -> br + phi should be allowed
>> 
>> Since we really want to have 2) since that's simplifyCFG, we
propose
>> to make 'select %c, %a, %b' poison if any of the following
holds:
>>  - %c is poison
>>  - %c = true and %a is poison
>>  - %c = false and %b is poison
>> 
>> This disallows 3) and some transformations of 1). Since 3) is only 
>> performed in CodeGenPrepare, we can safely ignore it (no aggressive 
>> optimizations are run afterwards).
>
> This strikes me as a dangerous game to play. CGP's transformations need
to be semantically sound (even if they are anti-canonical). In part, this is
because our IR-level analyses are used by CGP. Moreover, targets after CGP run
IR-level transformations, and having different semantic rules there is going to
make code reuse hard in subtle ways. Which brings up another issue: We currently
have undef at the MI level; do you propose changing that, and if so, how?
It is sound to do the transformation at IR level if you freeze the condition.
My suggestion was to avoid introducing overhead in CGP.  This is the current
state of affairs and it seems to work right now.  I'm not shocked to see the
IR changing the semantics throughout the pipeline, since while the compiler
proceeds, the type of transformations done also change.
But I have to say that my understanding of LLVM after CGP is very limited. I was
not aware of the code reuse from IR-level analyses.  I agree this needs to be
scrutinized carefully.  Alternatively, we can just introduce freeze and pay the
price (likely low anyway).

Regarding undef at MI level, we are not proposing any change.  I don't see
any reason right now to change it since it seems that optimizations at MI level
are fine with just having undef. There's no nsw/nuw stuff there and undef
seems very useful.
Poison is lowered into undef. And freeze is lowered into this pair of copy nodes
that seems to stop propagation of undef (to ensure all uses see the same value).
I don't know enough about MI to know if there's a better way to lower
freeze, though.

Thanks,
Nuno

Bruce Hoult via llvm-dev

2016-Oct-18 16:27 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

On Tue, Oct 18, 2016 at 3:06 PM, Nuno Lopes via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> Over the past few years we've been trying to kill poison somehow. There
> have
> been a few proposals, but they've all failed to pass the bar and/or to
> gather significant support (my own proposals included).
>
Ok, freeze() produces a fixed but undefined value, so that xor'ing the same
frozen value produces 0, ANDing a frozen value with 1 produces known zero
bits everywhere except the LSB, which is undefined.

Does every instance of freeze() produce a *different* fixed but undefined
value? i.e. the value produced by freeze() is labelled with a sequence
number or something like that?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161018/1edfcb0e/attachment.html>

Nuno Lopes via llvm-dev

2016-Oct-18 17:00 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

> Ok, freeze() produces a fixed but undefined value, so that xor'ing the
same frozen value produces 0, ANDing a frozen value with 1 produces known zero
bits everywhere except the LSB, which is undefined.
> 
> Does every instance of freeze() produce a *different* fixed but undefined
value? i.e. the value produced by freeze() is labelled with a sequence number or
something like that?
Freeze produces an arbitrary value.  The compiler is free to pick a
"good" one (one that enables some optimization) if it wishes so. 
There's no determination of these values.
It's similar to current undef, except that all uses see the same value.

Nuno

Friedman, Eli via llvm-dev

2016-Oct-18 18:18 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

On 10/18/2016 5:06 AM, Nuno Lopes via llvm-dev wrote:> Another use is, for example, to implement bit-fields:
> a.x = 2
> becomes:
> %v = load %a
> %v2 = freeze %v  ; %v could be uninitialized data (poison)
> %v3 = ... bitmasking...
> store %a, %v3
It seems like you're saying that an integer load which touches any 
uninitialized byte of memory results in poison.  Therefore, load %a 
simplifies to "poison", and your proposed lowering of a bitfield write
throws away the value of every other adjacent bitfield.  Am I missing 
something?

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Nuno Lopes via llvm-dev

2016-Oct-18 19:25 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

> On 10/18/2016 5:06 AM, Nuno Lopes via llvm-dev wrote:
>> Another use is, for example, to implement bit-fields:
>> a.x = 2
>> becomes:
>> %v = load %a
>> %v2 = freeze %v  ; %v could be uninitialized data (poison)
>> %v3 = ... bitmasking...
>> store %a, %v3
>
> It seems like you're saying that an integer load which touches any 
> uninitialized byte of memory results in poison.  Therefore, load %a 
> simplifies to "poison", and your proposed lowering of a bitfield
write
> throws away the value of every other adjacent bitfield.  Am I missing 
> something?
Right, a load touching a single uninitialized bit results in poison.
The trick is that on the first bitfield write, all the remaining untouched 
fields become initialized (with an arbitrary value, though). Essentially we 
are making the adjacent bitfields undef.

So if you have:
struct foo a;
a.x = 2;
a.y = 3;

IR becomes:
%a = alloca foo
%x = load %a
%x2 = freeze %v  ; %x2 not poison
%x3 = bitmasking  ; %x3 not poison
store %a, %x3

%y = load %a
%y2 = freeze %y  ;  not needed; %y is not poison
etc..

Nuno

Dan Gohman via llvm-dev

2016-Oct-19 20:04 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

Hi Nuno,

This is an interesting proposal. Here are some high-level thoughts.
> br poison -> UB
What impact does this have on API contracts? Some library functions might
currently tolerate being passed an uninitialized value (at the LLVM level;
C's semantics are different). However, with the new poison, if the
implementation of a function happens to do a branch on the input, the code
has UB if passed an uninitialized value.
> and 0, poison -> poison
This is similar to the existing poison, and it raises similar questions.
Consider the following code:

  %x = call i32 @foo()
  %y = or i32 %x, 1
  %c = call i1 @some_condition()
  br i1 %c, label %true, label %false
true:
  %d = udiv i32 -1, %y
false:
  %z = phi i32 [ %true, %d ], [ %false, 0 ]

Is it safe to turn the phi into a select? (ignoring profitability)

KnownBits can tell us that %y always has at least one bit set, so it's
never zero, so the udiv can never divide by zero, so it's safe to
speculate, so the transform is safe. However, if %x is poison and %c is 0,
the original program has no UB, and the transformed program does. It would
then execute the udiv unconditionally, with a divisor of poison.

Inserting a freeze could fix this. However, where does the freeze go? %y is
an input to the if-conversion pattern as SimpifyCFG knows it, however
freezing %y doesn't help. Instead, the thing that needs to be frozen is %x,
because that was the input to the KnownBits analysis that said it was safe.
It would seem that either KnownBits would need to record the values it
looked at, or there'd have to be a way to recompute the set on demand, so
that these values could be frozen if needed. These might be manageable for
KnownBits, though more complicated analyses (perhaps an Andersen's-style
points-to system?) could be more awkward.


Auto-vectorization, converting adjacent scalar memory accesses into vector
accesses, or into wider combined scalar accesses is a concern. If widening
a load causes it to read a poison that it didn't previously read, and if
the value is subsequently stored, this would put poison in more
individually addressable places than in the original program, so it could
inject UB into a program that didn't previously have it. This is related to
the memcpy topic that's been raised.

It's also related to the bitfield assignment topic. The solution where the
first initialization writes a frozen poison over the adjacent bytes,
thereby protecting subsequent assignments from unfrozen poison, is clever.
However, it also raises some questions: Is the width at which any given
bitfield field is initialized to be an ABI-exposed detail (all front-ends
must emit the same width accesses for a given bitfield)? And, does this
constrain the optimizer's ability to shrink such stores?

Dan


On Tue, Oct 18, 2016 at 5:06 AM, Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> Hi,
>
> Over the past few years we've been trying to kill poison somehow. There
> have
> been a few proposals, but they've all failed to pass the bar and/or to
> gather significant support (my own proposals included).
> We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and myself)
> have a new proposal to kill undef instead and replace it with poison + a
> new
> 'freeze' instruction.  We believe this proposal simplifies things
at the IR
> level, allows us to fix long-standing bugs in LLVM, and is roughly
> performance-neutral for now (and can enable further optimizations in the
> future).
>
> Sorry for the longish email.  We will give a talk about this proposal at
> the
> LLVM dev meeting in November, so hopefully we'll be able to discuss
this
> further in person.
>
> We would appreciate any comments, feedback, suggestions, etc.  If you have
> questions let me know as well (not everything below is super detailed, so
> please ask where things are not explicit enough).
> (I've CC'ed a few people that have been involved in discussions re
> semantics
> of LLVM IR in the past year or so.  Apologies in advance if I forgot
> someone
> -- which probably I did.  I've also CC'ed some CodeGen folks, from
which we
> would appreciate input on how this proposal fits within the current and the
> future pipeline)
>
> Thanks,
> Nuno
>
>
> ---------------------------------------------------------
>
>
> Motivation for undef & poison
> ============================> There were a few motivations behind the
creation of undef and poison:
>  1) There was a desire to exploit certain undefined behaviors without
> hurting optimization opportunities.  This led to the creation of weaker
> forms of undefined behavior (UB), while leaving "full" UB to
operations
> that
> can trap the CPU, like division by zero.  Weaker UB operations include, for
> example, signed integer overflow, say 'a + b'.  If signed overflow
was full
> UB, then we couldn't speculatively execute these operations without
proving
> that they could not overflow, otherwise the compiler would be introducing
> UB
> in a perfectly fine program. E.g., the compiler couldn't hoist signed
> additions out of loops.  In weaker forms of UB, the triggering of full UB
> is
> delayed so that the compiler can speculatively execute these operations and
> there's only full UB if the result is somehow "consumed".
>
>  2) Undef was created to represent uninitialized memory reads. For example,
> it's very useful in conjunction with PHI nodes:
> int a;
> if (foo)
>   a_0 = 42;
> a_1 = phi(a_0, undef)
>
> By using undef, we don't have to materialize some arbitrary constant on
a
> branch where a variable is not initialized.
>
> 3) And then people realized that undef wasn't enough.  For example,
we'd
> also like to be able to optimize expressions like "((a + 1) >s
a)" to
> "true", which isn't possible with "undef" as
defined in (1), since there's
> no value for `%x` that makes `%x > INT_MAX` true.  This was important
for
> loop analyses and certain InstCombine patterns, and so we needed a stronger
> version of UB which was still not full UB. And poison was born. For
> example,
> on 64-bit platforms, poison is very handy for widening induction variables
> to 64 bits.
>
>
>
> Problems with undef & poison
> ===========================> The interactions between undef and poison
are particularly complicated and
> they inhibit innocent-looking optimizations.  For example, the following is
> wrong:
> %v = select %c, %x, undef
>   =>
> %v = %x
>
> If %x is poison and %c = false, then we've just replaced undef with
poison
> which is bad, since poison is stronger than undef.  So, this is an example
> of something we want to do but we can't at the moment (even if we still
do
> it anyway, risking miscompilations!)
>
>
>
> Goal
> ===> Our goal was to propose minimal changes to the IR semantics such
that 1) we
> could keep most of the optimizations we do today, 2) would require minimal
> code changes, 3) compilation time, run time, and memory consumption would
> stay roughly the same, and 4) we could fix all the longstanding bugs due to
> undef/poison semantics and interactions.
> We believe this proposal fulfills the goals, and can even enable future
> optimizations.
>
>
>
> Proposal
> =======>  1) Kill undef
>  2) Create a new poison value (representable in IR, inheriting from
> llvm::Constant)
>  3) Create a new instruction, '%y = freeze %x', that stops
propagation of
> poison.  If the input is poison, then it returns an arbitrary, but fixed,
> value. (like old undef, but each use gets the *same* value), otherwise it
> just returns its input value.
>  4) All operations over poison return poison.
> For example:
> and %x, poison   ->  poison  ; just like before
> and 0, poison    ->  poison  ; just like before
>
> %y = freeze poison
> %z = and %y, 1  --- 000..0x   ; just like old undef
> %w = xor %y, %y                    ; 0 -- not undef: both uses of %y have
> the same value
>
> Instruction-specific semantics:
>  - br poison -> UB
>  - select poison, %a, %b -> poison   (some InstCombine patterns will
need
> freeze, but they are wrong right now already!)
>  - if %c is not poison, select %c, %a, %b is poison  if %c = 1 and %a is
> poison, or %c = 0 and %b is poison  (see discussion below)
>  - bitcast between types of different bitwidth can return poison if when
> concatenating adjacent values one of these values is poison. For example:
> %x = bitcast <3 x i2>  <2, poison, 2> to <2 x i3>
>   ->
> %x = <poison, poison>
>
> %x = bitcast <6 x i2>  <2, poison, 2, 2, 2, 2> to <4 x
i3>
>   ->
> %x = <poison, poison, 5, 2>
>
>
>
> Purpose of Freeze
> ================> Poison is propagated aggressively throughout. However,
there are cases
> where
> this breaks certain optimizations, and therefore freeze is used to
> selectively stop poison from being propagated.
>
> A use of freeze is to enable speculative execution.  For example, loop
> switching does the following transformation:
> while (C) {
>   if (C2) {
>    A
>   } else {
>    B
>   }
> }
> =>
> if (C2) {
>    while (C)
>       A
> } else {
>     while (C)
>        B
> }
>
> Here we are evaluating C2 before C.  If the original loop never executed
> then we had never evaluated C2, while now we do.  So we need to make sure
> there's no UB for branching on C2.  Freeze ensures that so we would
> actually
> have 'if (freeze(C2))' instead.
> Note that having branch on poison not trigger UB has its own problems.  We
> believe this is a good tradeoff.
>
>
> Another use is, for example, to implement bit-fields:
> a.x = 2
> becomes:
> %v = load %a
> %v2 = freeze %v  ; %v could be uninitialized data (poison)
> %v3 = ... bitmasking...
> store %a, %v3
>
>
>
> Performance
> ==========> We measured compile time, running time, memory consumption,
and object/IR
> size of a few benchmarks (-O3 vs -O3 w/ freeze).  They should give a rough
> picture of the overhead involved.
> Benchmarks were run on 2 machines (server1 and server2), both x86_64
> running
> Ubuntu 16.04.
>
> Summary:
>  - Memory consumption: there's generally a 1% increase, with the max of
8%
> in oggenc
>  - Running time on SPEC: mostly in the noise (about 1% up or down)
>  - Compile time: mostly in the noise (about 1% up or down, but usually no
> difference)
>  - LNT shows 0.5% average slowdown, but with wider swings both ways. We are
> aware of a few things that would need tweaking to recover perf like loop
> unrolling (likely because of SCEV not knowing what freeze is).
>
> You can see the raw data we have collected in the links below:
>
> Memory consumption:
> https://docs.google.com/spreadsheets/d/1ycJaMPLzh_
> 4YV7RQmVaLR-vHzd7ZlYiV2iES
> G-lBRbk/edit#gid=0
> SPEC running time:
> https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-
> Ex533fy9DezIV
> rbfeIns/edit#gid=0
> Compilation time:
> https://docs.google.com/spreadsheets/d/1_xj6o_
> ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW
> WAR9O2c/edit#gid=0
> Statistics on number of Freeze Instructions (up to 5% in total in gcc):
> https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6m
> LUmkRIF
> 1SmZ-6Q/edit#gid=0
> LNT raw data:
> https://github.com/aqjune/freezescript/tree/master/
> resultcsv-mailinglist/LNT
>
> Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM
> Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM
>
>
>
> Implementation
> =============> A prototype is available at:
>  - https://github.com/snu-sf/llvm-freeze/tree/x86jmp
>  - https://github.com/snu-sf/clang-freeze
>
> This implementation includes:
>  - Make clang emit freeze for bit-field codegen
>  - Add freeze instruction to IR, plus a few instcombine patterns to simply
> freeze(freeze(x)), for example
>  - Fix a few instcombine select patterns to introduce freeze when needed
> (these were wrong with current semantics already)
>  - Add a freeze node to selection DAG and add codegen support.  For
> codegen,
> it uses CopyFromReg+CopyToReg to lower the freeze node, and so it assumes
> that from that point on LLVM will not propagate undef anymore (to ensure
> that 'freeze undef' always returns the same value for all uses,
which is
> not
> what happens with 'undef' on its own). This needs further
discussion.
>  - Fix loop unswitch to freeze condition
>  - Change GVN's load widening to load as vector of bits+bitcast (gvload
> branch only; this was not included in the tests above)
>
> (this implementation still uses undef value, but that should be replaced
> with poison)
>
>
>
> Deployment
> =========> A proposal for incremental deployment of the proposed
changed:
>  1) Add freeze instruction + CodeGen support
>  2) Change clang to start emitting freeze for bit-field reads
>  3) Add auto-upgrade to convert undef to 'freeze poison' (undef and
'freeze
> poison' with one use are equivalent)
>  4) Fix InstCombine, Loop unswitching, etc to use freeze
>  5) Replace references to undef in the code with either poison or
'freeze
> poison'
>  7) Kill undef
>  8) Investigate any perf regressions
>  9) Run John's LLVM IR fuzzer with Alive to find leftover bugs
>
> Regarding perf, at least PR30615 needs fixing to enable efficient load
> widening at IR level (which seems it's still undecided whether GVN will
> continue doing it or not).
>
>
>
> Discussion on select
> ===================> Select is a tricky instruction to define semantics
for.  There are multiple
> views of select's purpose that are contradictory:
>  1) Select should be equivalent to arithmetic (e.g., allow  'select %c,
> true, %x' -> 'or %c, %x' and arithmetic -> select)
>  2) br + phi -> select should be allowed (simplifyCFG)
>  3) Select -> br + phi should be allowed
>
> To have 1), we need to make select poison if either of its arguments is
> poison. This disallows 2), since there we need to have select being poison
> only if the dynamically-chosen value is poison.
> 2) and 3) are orthogonal and do not conflict, though.
>
> 3) is hard because it requires select to be stronger than branching
> (UB-wise), meaning that we would need select to be UB if the condition was
> poison. This blocks a bunch of instcombine patterns, like:
> Pre: C < 0
> %r = udiv %a, C
>   =>
> %c = icmp ult %a, C
> %r = select %c, 0, 1
>
> If %a is poison, then we would be introducing UB. Adding freeze(%c) would
> fix the problem, though.
>
>
> Since we really want to have 2) since that's simplifyCFG, we propose to
> make
> 'select %c, %a, %b' poison if any of the following holds:
>  - %c is poison
>  - %c = true and %a is poison
>  - %c = false and %b is poison
>
> This disallows 3) and some transformations of 1). Since 3) is only
> performed
> in CodeGenPrepare, we can safely ignore it (no aggressive optimizations are
> run afterwards). For 1), we will have to restrict InstCombine patterns for
> the cases when we are sure a given variable is non-poisonous (which is only
> when it came from a 'freeze' instruction or it's the result of
a
> non-poison-producing instruction).
> This semantics also allows arithmetic -> select, but not select ->
> arithmetic (without use of freeze).
>
>
>
> Acknowledgements
> ===============> I would personally like to thank David, John, and
Sanjoy for embarking on
> this trip over a year ago and spending a lot of time on this project; Gil
> and his students (Juneyoung, Yoonseung, and Youngju) for prototyping and
> testing several ideas (and fixing them!); and everybody else that has
> contributed in the form of off-line and on-line discussions.
>
>
>
> Future work
> ==========> This proposal only handles integers.  It's a good step
forward, but we are
> still missing pointers and floats; we are aware of problems (err,
> possibilities for improvement) there.  We will work on these next. They are
> orthogonal problems, so what we are proposing today won't require
further
> changes (hopefully).
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/ede92fc0/attachment.html>

Nuno Lopes via llvm-dev

2016-Oct-21 20:53 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

Hi Dan,

Thanks a lot for the feedback and apologies for the delay!
>> br poison -> UB
> What impact does this have on API contracts? Some library functions might
currently tolerate being passed an uninitialized value (at the LLVM level;
C's semantics are different). However, with the new poison, if the
implementation of a function happens to do a branch on the input, the code has
UB if passed an uninitialized value.
Right, I guess any LLVM-IR-level API would have to be upgraded to take the new
semantics into account.
I'm not sure that 'br poison -> UB' is new rather than a
clarification, but it's true that there were multiple interpretations for
'br poison'.
I'm curious, btw: did you have any concrete API in mind?

>> and 0, poison -> poison
>
> This is similar to the existing poison, and it raises similar questions.
Consider the following code:
> 
>   %x = call i32 @foo()
>   %y = or i32 %x, 1
>   %c = call i1 @some_condition()
>   br i1 %c, label %true, label %false
> true:
>   %d = udiv i32 -1, %y
> false:
>   %z = phi i32 [ %true, %d ], [ %false, 0 ]
> 
> Is it safe to turn the phi into a select? (ignoring profitability)
> 
> KnownBits can tell us that %y always has at least one bit set, so it's
never zero, so the udiv can never divide by zero, so it's safe to speculate,
so the transform is safe. However, if %x is poison and %c is 0, the original
program has no UB, and the transformed program does. It would then execute the
udiv unconditionally, with a divisor of poison.
> 
> Inserting a freeze could fix this. However, where does the freeze go? %y is
an input to the if-conversion pattern as SimpifyCFG knows it, however freezing
%y doesn't help. Instead, the thing that needs to be frozen is %x, because
that was the input to the KnownBits analysis that said it was safe. It would
seem that either KnownBits would need to record the values it looked at, or
there'd have to be a way to recompute the set on demand, so that these
values could be frozen if needed. These might be manageable for KnownBits,
though more complicated analyses (perhaps an Andersen's-style points-to
system?) could be more awkward.
Right, that's a very good point. We already have this problem today as well,
though.
We can translate this into a select by using freeze, as you say (that's
exactly for these cases why we are proposing to introduce freeze).
What we need to settle on, and we didn't touch this on our proposal, is how
LLVM's analyses API will look like. Most (if not all) LLVM analyses today
report a result that doesn't exclude that the value might be poison. 
That's because if you report %v to be non-null and it's actually poison
it's fine since poison can be restricted to be non-null.  We will need to
introduce a isNotPoison() analysis and a ensureIsNonPoison() utility (that needs
to be used by all transformations doing speculation).
I would freeze %y. What's the argument against?

> Auto-vectorization, converting adjacent scalar memory accesses into vector
accesses, or into wider combined scalar accesses is a concern. If widening a
load causes it to read a poison that it didn't previously read, and if the
value is subsequently stored, this would put poison in more individually
addressable places than in the original program, so it could inject UB into a
program that didn't previously have it. This is related to the memcpy topic
that's been raised.
The problem with widening Is already there today. We are not really changing the
semantics of poison, just creating more opportunities for it to appear.
We have a solution for widening, which is to use vectors. Since we define poison
at value-level, we can widen, say, a "load i32" into "load <2
x i32>" (but not to "load i64").  One may argue that this
representation is even better, since it actually exposes the fact that we are
loading multiple, independent values.  But we need to make sure the rest of LLVM
pipeline is fine with increased usage of vectors throughout.

> It's also related to the bitfield assignment topic. The solution where
the first initialization writes a frozen poison over the adjacent bytes, thereby
protecting subsequent assignments from unfrozen poison, is clever. However, it
also raises some questions: Is the width at which any given bitfield field is
initialized to be an ABI-exposed detail (all front-ends must emit the same width
accesses for a given bitfield)? And, does this constrain the optimizer's
ability to shrink such stores?
That's an interesting point; didn't think about it before.  You're
right that all frontends manipulating a given bitfield would have to agree on
the access size (at IR level; at assembly level it shouldn't matter).
Regarding store shrinking, I feel it's something better suited to SDAG/MI
level.  There if we stick with undef, we can support shrinking.  If we introduce
poison up to (and including) MI, then things become tricky.

Thanks,
Nuno

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Oct 2016 - RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

Reasonably Related Threads