thr3ads.net - llvm dev - [llvm-dev] Killing undef and spreading poison [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes via llvm-dev

2016-Dec-07 23:27 UTC

[llvm-dev] Killing undef and spreading poison

>>  1) a bit in the output is poison if flipping any set of poison bits
>>  in the
>> input may yield different values for the output bit.
>
> This is the definition I'd expect.
>
>> For example (bitwise notation):
>> ppp * 000 == 000 (since flipping any of the poison bits cannot yield
>> a
>> result other than zero)
>> 00p + 00p == 0pp
>> ppp << 2 == p00
>> icmp ugt ppp, uint_max  == p  (special case)
>>
>> 2) (thanks to Sanjoy):  for an operation R = A op B, bit R[i] is
>> poison if
>> bit i of 'A op undef' may yield 0 and 1  (same for 'undef
op B').
>> This one unfortunately breaks some algebraic rewrites like: x * -1
>>   ->   0
>> - x
>>
>>
>> I've implemented both semantics in Alive (branches: newsema2 and
>> bitwise-poison2, respectively) and then we run Alive over a large set
>> of
>> optimizations to see which were ok and which became wrong.
>>
>> Many optimizations became wrong with either of the bitwise semantics.
>
> To be clear, "wrong" here means that the value resulting from the
> transformation might have more poison bits for some inputs than the 
> original value, correct?
The vast majority of the examples were of that case, yes.  But we also have 
a few cases where the transformed expression computes a different value (I 
don't have a record of these handy, sorry).

>> So, my suggestion is to go for value-wise poison, but maybe with a
>> few
>> tweaks from what I've presented (based on the feedback we
received):
>>  - Have 2 load instructions: one that does a value-wise access (if
>>  any bit
>> is poison in memory, the whole value becomes poison), and another
>> version
>> that freezes each bit individually (which is just syntactic sugar for
>> a
>> freeze(load <n x i1>) followed by a bitcast). The latter load is
very
>> close
>> to the load instruction we have today.
>>
>> Adding this extra instruction has some benefits: it simplifies IR
>> auto-upgrade, and frontends can move to the new instruction
>> incrementally.
>> Also, it seems that C++ codegen could use the freezing load in a few
>> places.
>
> Can we be specific on what "few places" we're talking about?
I assume this
> means at least:
>
>  1. Accesses to bit fields (or other places where Clang generates loads 
> larger than the underlying data type)
>  2. memcpy-like operations
>  3. Accesses though character pointers? (because character pointers can be 
> used to access data of any type)
1. Bit fields definitely. Sometimes clang needs to widen loads because of 
the ABI, but these can probably be changed to vector loads instead.
2. Clang introduces memcpys to copy objects (which may have uninitialized 
padding). My suggestion (contrary to my previous opinion) is to make memcpy 
a bit-wise copy to simplify things.
3. For char ptrs, not sure that's needed. You would be loading an i8, which 
is the type you are asking for, so it doesn't seem necessary.
4. I'm not a clang expert by any means, but I briefly spoke with Richard 
Smith during the dev meeting and he told me that there are some cases 
involving initialization of unions in C++ that may also need this freezing 
load. I don't know the exact details.

>> On the negative side is that the freezing load is harder to move
>> around and
>
> Why exactly are these harder to move around?
Because they have an implicit freeze, which is also non-trivial to move. 
For example, these freezing loads cannot easily be sank into loops, like:
%v = load %p
while (...) { use(%v) }
  =>
while (...) { %v = load %p; use(%v) }

The load inside the loop may return a different value for each 
uninitialized/poison bit in each loop iteration, while in the original code 
all iterations would see the same value for %v.

So this load should be reserved for the cases when it's actually needed 
only.  For example, in clang, all loads from the program itself should use 
the value-wise-poison-load, not this one.
This load is similar as if we were to make undef yield the same value for 
all uses. Since our load today returns undef for uninitialized memory, the 
current load would become pretty much equivalent to the load we are 
proposing.
(we already established that having undef is not a good idea; this is just 
an analogy that will hopefully give some intuition and help understanding 
the concept of this new load)

Nuno

Hal Finkel via llvm-dev

2016-Dec-07 23:39 UTC

head link

[llvm-dev] Killing undef and spreading poison

----- Original Message -----> From: "Nuno Lopes" <nuno.lopes at ist.utl.pt>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Sanjoy Das" <sanjoy at playingwithpointers.com>,
"David Majnemer" <david.majnemer at gmail.com>, "John
Regehr"
> <regehr at cs.utah.edu>, "Chung-Kil Hur" <gil.hur at
sf.snu.ac.kr>, "Juneyoung Lee" <juneyoung.lee at
sf.snu.ac.kr>, "Dan
> Gohman" <dan433584 at gmail.com>, "Philip Reames"
<listmail at philipreames.com>, clattner at apple.com, "Chandler
Carruth"
> <chandlerc at google.com>, llvm-dev at lists.llvm.org
> Sent: Wednesday, December 7, 2016 5:27:31 PM
> Subject: Re: Killing undef and spreading poison
> 
> >>  1) a bit in the output is poison if flipping any set of poison
> >>  bits
> >>  in the
> >> input may yield different values for the output bit.
> >
> > This is the definition I'd expect.
> >
> >> For example (bitwise notation):
> >> ppp * 000 == 000 (since flipping any of the poison bits cannot
> >> yield
> >> a
> >> result other than zero)
> >> 00p + 00p == 0pp
> >> ppp << 2 == p00
> >> icmp ugt ppp, uint_max  == p  (special case)
> >>
> >> 2) (thanks to Sanjoy):  for an operation R = A op B, bit R[i] is
> >> poison if
> >> bit i of 'A op undef' may yield 0 and 1  (same for
'undef op B').
> >> This one unfortunately breaks some algebraic rewrites like: x * -1
> >>   ->   0
> >> - x
> >>
> >>
> >> I've implemented both semantics in Alive (branches: newsema2
and
> >> bitwise-poison2, respectively) and then we run Alive over a large
> >> set
> >> of
> >> optimizations to see which were ok and which became wrong.
> >>
> >> Many optimizations became wrong with either of the bitwise
> >> semantics.
> >
> > To be clear, "wrong" here means that the value resulting
from the
> > transformation might have more poison bits for some inputs than the
> > original value, correct?
> 
> The vast majority of the examples were of that case, yes.  But we
> also have
> a few cases where the transformed expression computes a different
> value (I
> don't have a record of these handy, sorry).
Those are just bugs, right?
> 
> 
> >> So, my suggestion is to go for value-wise poison, but maybe with a
> >> few
> >> tweaks from what I've presented (based on the feedback we
> >> received):
> >>  - Have 2 load instructions: one that does a value-wise access (if
> >>  any bit
> >> is poison in memory, the whole value becomes poison), and another
> >> version
> >> that freezes each bit individually (which is just syntactic sugar
> >> for
> >> a
> >> freeze(load <n x i1>) followed by a bitcast). The latter
load is
> >> very
> >> close
> >> to the load instruction we have today.
> >>
> >> Adding this extra instruction has some benefits: it simplifies IR
> >> auto-upgrade, and frontends can move to the new instruction
> >> incrementally.
> >> Also, it seems that C++ codegen could use the freezing load in a
> >> few
> >> places.
> >
> > Can we be specific on what "few places" we're talking
about? I
> > assume this
> > means at least:
> >
> >  1. Accesses to bit fields (or other places where Clang generates
> >  loads
> > larger than the underlying data type)
> >  2. memcpy-like operations
> >  3. Accesses though character pointers? (because character pointers
> >  can be
> > used to access data of any type)
> 
> 1. Bit fields definitely. Sometimes clang needs to widen loads
> because of
> the ABI, but these can probably be changed to vector loads instead.
> 2. Clang introduces memcpys to copy objects (which may have
> uninitialized
> padding). My suggestion (contrary to my previous opinion) is to make
> memcpy
> a bit-wise copy to simplify things.
And then how exactly do we turn small memcpys into "regular"
loads/stores?

Thanks again,
Hal
> 3. For char ptrs, not sure that's needed. You would be loading an i8,
> which
> is the type you are asking for, so it doesn't seem necessary.
> 4. I'm not a clang expert by any means, but I briefly spoke with
> Richard
> Smith during the dev meeting and he told me that there are some cases
> involving initialization of unions in C++ that may also need this
> freezing
> load. I don't know the exact details.
> 
> 
> >> On the negative side is that the freezing load is harder to move
> >> around and
> >
> > Why exactly are these harder to move around?
> 
> Because they have an implicit freeze, which is also non-trivial to
> move.
> For example, these freezing loads cannot easily be sank into loops,
> like:
> %v = load %p
> while (...) { use(%v) }
>   =>
> while (...) { %v = load %p; use(%v) }
> 
> The load inside the loop may return a different value for each
> uninitialized/poison bit in each loop iteration, while in the
> original code
> all iterations would see the same value for %v.
> 
> So this load should be reserved for the cases when it's actually
> needed
> only.  For example, in clang, all loads from the program itself
> should use
> the value-wise-poison-load, not this one.
> This load is similar as if we were to make undef yield the same value
> for
> all uses. Since our load today returns undef for uninitialized
> memory, the
> current load would become pretty much equivalent to the load we are
> proposing.
> (we already established that having undef is not a good idea; this is
> just
> an analogy that will hopefully give some intuition and help
> understanding
> the concept of this new load)
> 
> Nuno
> 
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Nuno Lopes via llvm-dev

2016-Dec-08 22:16 UTC

head link

[llvm-dev] Killing undef and spreading poison

>> >> I've implemented both semantics in Alive (branches:
newsema2 and
>> >> bitwise-poison2, respectively) and then we run Alive over a
large
>> >> set
>> >> of
>> >> optimizations to see which were ok and which became wrong.
>> >>
>> >> Many optimizations became wrong with either of the bitwise
>> >> semantics.
>> >
>> > To be clear, "wrong" here means that the value resulting
from the
>> > transformation might have more poison bits for some inputs than
the
>> > original value, correct?
>>
>> The vast majority of the examples were of that case, yes.  But we
>> also have
>> a few cases where the transformed expression computes a different
>> value (I
>> don't have a record of these handy, sorry).
>
> Those are just bugs, right?
Well, increasing the number of poison bits would also be considered a bug.
I guess the main issue is that some InstCombine transformations that we 
consider correct today are not under the bitwise poison semantics.  Some of 
these transformations start producing wrong values because poison is not 
propagating enough for us to be able to discard the inputs that trigger 
different output values as "uninteresting".  Bitwise poison semantics
has
too many opportunities to lose poison (e.g., shifting out poison bits, no 
carry in arithmetic instructions, and/or with zero/one bits, etc).

>> >> Adding this extra instruction has some benefits: it simplifies
IR
>> >> auto-upgrade, and frontends can move to the new instruction
>> >> incrementally.
>> >> Also, it seems that C++ codegen could use the freezing load in
a
>> >> few
>> >> places.
>> >
>> > Can we be specific on what "few places" we're
talking about? I
>> > assume this
>> > means at least:
>> >
>> >  1. Accesses to bit fields (or other places where Clang generates
>> >  loads
>> > larger than the underlying data type)
>> >  2. memcpy-like operations
>> >  3. Accesses though character pointers? (because character
pointers
>> >  can be
>> > used to access data of any type)
>>
>> 1. Bit fields definitely. Sometimes clang needs to widen loads
>> because of
>> the ABI, but these can probably be changed to vector loads instead.
>> 2. Clang introduces memcpys to copy objects (which may have
>> uninitialized
>> padding). My suggestion (contrary to my previous opinion) is to make
>> memcpy
>> a bit-wise copy to simplify things.
>
> And then how exactly do we turn small memcpys into "regular"
loads/stores?
Bit vector load/stores (i.e., <n x i1>).  I was really happy to see that
the
codegen of i1 vectors has gotten some attention recently btw!

Nuno

John Regehr via llvm-dev

2016-Dec-27 18:22 UTC

head link

[llvm-dev] Killing undef and spreading poison

Related work:

http://plv.mpi-sws.org/llvmcs/paper.pdf

Hope everyone had a great Christmas!

John


On 12/7/16 4:27 PM, Nuno Lopes wrote:>>>  1) a bit in the output is poison if flipping any set of poison
bits
>>>  in the
>>> input may yield different values for the output bit.
>>
>> This is the definition I'd expect.
>>
>>> For example (bitwise notation):
>>> ppp * 000 == 000 (since flipping any of the poison bits cannot
yield
>>> a
>>> result other than zero)
>>> 00p + 00p == 0pp
>>> ppp << 2 == p00
>>> icmp ugt ppp, uint_max  == p  (special case)
>>>
>>> 2) (thanks to Sanjoy):  for an operation R = A op B, bit R[i] is
>>> poison if
>>> bit i of 'A op undef' may yield 0 and 1  (same for
'undef op B').
>>> This one unfortunately breaks some algebraic rewrites like: x * -1
>>>   ->   0
>>> - x
>>>
>>>
>>> I've implemented both semantics in Alive (branches: newsema2
and
>>> bitwise-poison2, respectively) and then we run Alive over a large
set
>>> of
>>> optimizations to see which were ok and which became wrong.
>>>
>>> Many optimizations became wrong with either of the bitwise
semantics.
>>
>> To be clear, "wrong" here means that the value resulting from
the
>> transformation might have more poison bits for some inputs than the
>> original value, correct?
>
> The vast majority of the examples were of that case, yes.  But we also
> have a few cases where the transformed expression computes a different
> value (I don't have a record of these handy, sorry).
>
>
>>> So, my suggestion is to go for value-wise poison, but maybe with a
>>> few
>>> tweaks from what I've presented (based on the feedback we
received):
>>>  - Have 2 load instructions: one that does a value-wise access (if
>>>  any bit
>>> is poison in memory, the whole value becomes poison), and another
>>> version
>>> that freezes each bit individually (which is just syntactic sugar
for
>>> a
>>> freeze(load <n x i1>) followed by a bitcast). The latter load
is very
>>> close
>>> to the load instruction we have today.
>>>
>>> Adding this extra instruction has some benefits: it simplifies IR
>>> auto-upgrade, and frontends can move to the new instruction
>>> incrementally.
>>> Also, it seems that C++ codegen could use the freezing load in a
few
>>> places.
>>
>> Can we be specific on what "few places" we're talking
about? I assume
>> this means at least:
>>
>>  1. Accesses to bit fields (or other places where Clang generates
>> loads larger than the underlying data type)
>>  2. memcpy-like operations
>>  3. Accesses though character pointers? (because character pointers
>> can be used to access data of any type)
>
> 1. Bit fields definitely. Sometimes clang needs to widen loads because
> of the ABI, but these can probably be changed to vector loads instead.
> 2. Clang introduces memcpys to copy objects (which may have
> uninitialized padding). My suggestion (contrary to my previous opinion)
> is to make memcpy a bit-wise copy to simplify things.
> 3. For char ptrs, not sure that's needed. You would be loading an i8,
> which is the type you are asking for, so it doesn't seem necessary.
> 4. I'm not a clang expert by any means, but I briefly spoke with
Richard
> Smith during the dev meeting and he told me that there are some cases
> involving initialization of unions in C++ that may also need this
> freezing load. I don't know the exact details.
>
>
>>> On the negative side is that the freezing load is harder to move
>>> around and
>>
>> Why exactly are these harder to move around?
>
> Because they have an implicit freeze, which is also non-trivial to move.
> For example, these freezing loads cannot easily be sank into loops, like:
> %v = load %p
> while (...) { use(%v) }
>  =>
> while (...) { %v = load %p; use(%v) }
>
> The load inside the loop may return a different value for each
> uninitialized/poison bit in each loop iteration, while in the original
> code all iterations would see the same value for %v.
>
> So this load should be reserved for the cases when it's actually needed
> only.  For example, in clang, all loads from the program itself should
> use the value-wise-poison-load, not this one.
> This load is similar as if we were to make undef yield the same value
> for all uses. Since our load today returns undef for uninitialized
> memory, the current load would become pretty much equivalent to the load
> we are proposing.
> (we already established that having undef is not a good idea; this is
> just an analogy that will hopefully give some intuition and help
> understanding the concept of this new load)
>
> Nuno

regehr via llvm-dev

2016-Dec-30 21:37 UTC

head link

[llvm-dev] Killing undef and spreading poison

This is pretty interesting and I had missed the earlier discussion.

http://lists.llvm.org/pipermail/llvm-dev/2016-December/108588.html

John

llvm dev - Dec 2016 - Killing undef and spreading poison

[llvm-dev] Killing undef and spreading poison

[llvm-dev] Killing undef and spreading poison

[llvm-dev] Killing undef and spreading poison

[llvm-dev] Killing undef and spreading poison

[llvm-dev] Killing undef and spreading poison