thr3ads.net - llvm dev - [llvm-dev] RFC: Killing undef and spreading poison [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth via llvm-dev

2016-Nov-04 17:34 UTC

[llvm-dev] RFC: Killing undef and spreading poison

On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> > I'm still digesting the proposal in its entirety, but one point I
want
> to call out here:
> >
> >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:
> >> Instcombine currently transforms "memcpy(dst, src, 4)"
to "load i32 src;
> >> store i32 dst".  I guess that's illegal under this model?
How about the
> >> related transform "memcpy(dst, src, 4)" to "load
<4 x i8> src; store <4
> >> x i8> dst"?  (SROA also performs similar transforms.)
> >
> > I think it is really important that memcpy and load/store pair be
> equivalent. If this is in fact true about the proposal, I would want to
> really dig into why and whether anything could be done to make them
> equivalent.
>
> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only
> the byte containing that bit becomes poison.
> Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is
> not equivalent to "load i32" since load makes the whole value
poison if any
> of the bits is poison.
>
That is not the semantic model that LLVM uses today. Your proposal is
changing this, and it is an *incredibly* fundamental part of the optimizer.

LLVM, very fundamentally, models an integer, a vector of bytes, and a
vector of bits as absolutely identical to the underlying storage in memory.
Changing this would completely break SROA, various parts of mem2reg and
reg2mem, and all load widening.

I understand you have reasons here, but this is a much bigger change than
the undef vs. poison part. ;] I'm currently deeply concerned by this change
and would strongly prefer looking at alternatives.

> The alternative as given by Eli is to use "load <4 x i8>". 
Since now we
> are loading 4 separate values, poison does not extend past the byte
> boundary.  When load is lowered, you should get exactly the same code as
> with "load i32", though.
> So the hope is that there's no diff at assembly level.
>
Note that vectors in LLVM are both quite limited and will have likely very
different representations in the assembly.

You're essentially conferring a *type* onto *memory*. That is not currently
LLVM's model. Instead, the types in LLVM are tied to *operations*.

As a consequence, when LLVM sees a vector type, it canonicalizes,
optimizes, and code generates to facilitate vector operations on values of
vector type. So for example, a vector of 32 i1s is likely to be put into a
vector mask register.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/d43cfd57/attachment.html>

Nuno Lopes via llvm-dev

2016-Nov-08 18:47 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

Hi Chandler,

Thanks for your feedback!

I’m still trying to get to the bottom of your concerns and how to best address
them.

As far as I understand, your main concern is that the following should be
equivalent to a no-op (thanks Sanjoy for the example!):

i32* ptr = ...

%tmp = load i32, i32* %ptr

store i32 %tmp, i32* %ptr

So introducing a store of the loaded value should be ok, but under our proposal
it’s not.  Just to clarify: is this actually your main concern?

I believe many parts of LLVM already interpret poison as a per-value attribute,
including when it comes to loads.  Take the following function as an example:

define i16 @g(i8 %in) {

  %a = alloca i32

  %v = add nsw i8 127, %in

  %ptr = bitcast i32* %a to i8*

  %ptr1 = getelementptr i8, i8* %ptr, i32 1

  store i8 %v, i8* %ptr1

  %ptr16 = bitcast i32* %a to i16*

  %load = load i16, i16* %ptr16

  ret i16 %load

}

This program stores 8 bits, and leaves the remaining 24 bits uninitialized. It
then loads 16 bits, half initialized to %v, half uninitialized. SROA transforms
the above function to:

define i16 @g(i8 %in) {

  %v = add nsw i8 127, %in

  %1 = zext i8 %v to i16

  %2 = shl i16 %1, 8

  %3 = and i16 undef, 255

  %4 = or i16 %3, %2

  ret i16 %4

}

SROA gets rid of the alloca/store/load instructions and replaces those with an
OR of undef and shifted %v.

If I call g() with %in=1, then the addition overflows and %v is poison.  Under
LLVM semantics, all instructions return poison if given poison as input.
Therefore the whole function can be folded to “ret i16 poison”  (if we had a
poison value in IR, or otherwise fold to “add nsw i16 INT_MAX, 1”).

So, to justify the correctness of SROA, the load in the original program had to
be widening the poison from the stored i8 value and return a i16 poison (i.e.,
%load had to be poison if %v was poison).  Therefore, we can conclude that in
current semantics, loads already widen poison from bit-representation into
value-wise representation, which makes the insertion of a load-store round-trip
illegal in general.

Please let me know what you think. I may be missing some important detail.

Thanks,

Nuno

From: Chandler Carruth [mailto:chandlerc at google.com] 
Sent: 4 de novembro de 2016 17:35
Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison

On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt
<mailto:nuno.lopes at ist.utl.pt> > wrote:
> I'm still digesting the proposal in its entirety, but one point I want
to call out here:
>
>> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:
>> Instcombine currently transforms "memcpy(dst, src, 4)" to
"load i32 src;
>> store i32 dst".  I guess that's illegal under this model?  How
about the
>> related transform "memcpy(dst, src, 4)" to "load <4 x
i8> src; store <4
>> x i8> dst"?  (SROA also performs similar transforms.)
>
> I think it is really important that memcpy and load/store pair be
equivalent. If this is in fact true about the proposal, I would want to really
dig into why and whether anything could be done to make them equivalent.
Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the
byte containing that bit becomes poison.
Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is not
equivalent to "load i32" since load makes the whole value poison if
any of the bits is poison.

That is not the semantic model that LLVM uses today. Your proposal is changing
this, and it is an *incredibly* fundamental part of the optimizer.

LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of
bits as absolutely identical to the underlying storage in memory. Changing this
would completely break SROA, various parts of mem2reg and reg2mem, and all load
widening.

I understand you have reasons here, but this is a much bigger change than the
undef vs. poison part. ;] I'm currently deeply concerned by this change and
would strongly prefer looking at alternatives.

The alternative as given by Eli is to use "load <4 x i8>". 
Since now we are loading 4 separate values, poison does not extend past the byte
boundary.  When load is lowered, you should get exactly the same code as with
"load i32", though.
So the hope is that there's no diff at assembly level.

Note that vectors in LLVM are both quite limited and will have likely very
different representations in the assembly.

You're essentially conferring a *type* onto *memory*. That is not currently
LLVM's model. Instead, the types in LLVM are tied to *operations*.

As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and
code generates to facilitate vector operations on values of vector type. So for
example, a vector of 32 i1s is likely to be put into a vector mask register.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/ffb4a8a1/attachment.html>

Friedman, Eli via llvm-dev

2016-Nov-08 20:48 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

On 11/8/2016 10:47 AM, Nuno Lopes wrote:>
> Hi Chandler,
>
> Thanks for your feedback!
>
> I’m still trying to get to the bottom of your concerns and how to best 
> address them.
>
> As far as I understand, your main concern is that the following should 
> be equivalent to a no-op (thanks Sanjoy for the example!):
>
> i32* ptr = ...
>
> %tmp = load i32, i32* %ptr
>
> store i32 %tmp, i32* %ptr
>
> So introducing a store of the loaded value should be ok, but under our 
> proposal it’s not.  Just to clarify: is this actually your main concern?
>
> I believe many parts of LLVM already interpret poison as a per-value 
> attribute, including when it comes to loads.  Take the following 
> function as an example:
>
> define i16 @g(i8 %in) {
>
>   %a = alloca i32
>
>   %v = add nsw i8 127, %in
>
>   %ptr = bitcast i32* %a to i8*
>
>   %ptr1 = getelementptr i8, i8* %ptr, i32 1
>
>   store i8 %v, i8* %ptr1
>
>   %ptr16 = bitcast i32* %a to i16*
>
>   %load = load i16, i16* %ptr16
>
>   ret i16 %load
>
> }
>
> This program stores 8 bits, and leaves the remaining 24 bits 
> uninitialized. It then loads 16 bits, half initialized to %v, half 
> uninitialized. SROA transforms the above function to:
>
> define i16 @g(i8 %in) {
>
>   %v = add nsw i8 127, %in
>
>   %1 = zext i8 %v to i16
>
>   %2 = shl i16 %1, 8
>
>   %3 = and i16 undef, 255
>
>   %4 = or i16 %3, %2
>
>   ret i16 %4
>
> }
>
> SROA gets rid of the alloca/store/load instructions and replaces those 
> with an OR of undef and shifted %v.
>
> If I call g() with %in=1, then the addition overflows and %v is 
> poison.  Under LLVM semantics, all instructions return poison if given 
> poison as input. Therefore the whole function can be folded to “ret 
> i16 poison”(if we had a poison value in IR, or otherwise fold to “add 
> nsw i16 INT_MAX, 1”).
>
> So, to justify the correctness of SROA, the load in the original 
> program had to be widening the poison from the stored i8 value and 
> return a i16 poison (i.e., %load had to be poison if %v was poison).  
> Therefore, we can conclude that in current semantics, loads already 
> widen poison from bit-representation into value-wise representation, 
> which makes the insertion of a load-store round-trip illegal in general.
>
> Please let me know what you think. I may be missing some important detail.
>
Well, I wouldn't pay too much attention to what LangRef currently says 
about poison, considering that the current model isn't actually consistent.

It's possible we could tweak your model a little to make poison a 
bit-wise property, so an i32 can be partially poison, sort of like how 
undef currently works.  Then we can consider on an operation by 
operation basis how it propagates poison bits.  In other words, let's 
pretend every Value has type <n x i1> for the purpose of poison until we 
actually do math on it.

Under this model, we can say a non-vector icmp with any poisoned bit as 
input returns poison, but a load produces a value whose bits are poison 
only where the input is poison.  We can say "freeze(load i32)" does
the
same thing as "bitcast(freeze(load <32 x i1>) to i32)".  We can
define
lshr and trunc to allow merging two i8 loads into an i16 load.  Not sure 
what the rules for arithmetic and logic operations would be off the top 
of my head; there are some tradeoffs here.

I'm not sure this is the right approach (this model is more complicated, 
and we probably lose a bit of optimization power in some cases), but 
it's definitely nicer from the perspective of SROA and GVN.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/9d3a237a/attachment.html>

Sanjoy Das via llvm-dev

2016-Nov-08 23:32 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

Hi Nuno, Chandler,

Nuno Lopes via llvm-dev wrote:
 > This program stores 8 bits, and leaves the remaining 24 bits
 > uninitialized. It then loads 16 bits, half initialized to %v, half
 > uninitialized. SROA transforms the above function to:
 >
 > define i16 @g(i8 %in) {
 >    %v = add nsw i8 127, %in
 >    %1 = zext i8 %v to i16
 >    %2 = shl i16 %1, 8
 >    %3 = and i16 undef, 255
 >    %4 = or i16 %3, %2
 >    ret i16 %4
 > }

This program above returns i16 poison only if "shl i16 poison, 8" is a
full value poison.  Whether that is the case today or not is anybody's
guess (as Eli says, we should not rely too much on semantics that are
known to be broken), so the important question is whether "shl i16
poison, 8" _can_ be defined to be poison only on the higher 8 bits in
a world with bitwise poison.  If we could make that happen, we'd also
get some other fringe benefits, like ComputeKnownBits would be able to
fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as
opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison".

One problem with shl always generating non-poison zeroes in the lower
bits is that it then is not equivalent to multiplication.  This breaks
optimizations like:

   %t = ...
   %q = shl i32 %t, 6

=/=>

   %t = ...
   %q = mul i32 %t, 64

and (probably more meaningful)

   %t = ...
   %q0 = shl i32 %t, 6
   %q1 = mul i32 %q0, 3

=/=>

   %t = ...
   ;; %q0 = shl i32 %t, 6
   %q1 = mul i32 %q0, (3 * 64)

and (probably even worse) it also breaks reassociation:

   %t = ...
   %q0 = shl i32 %t, %P
   %q1 = mul i32 %q0, %Q

(No other uses of %q0, but there are uses of %q1)

=/=>

   %t = ...
   %q0 = mul i32 %t, %Q
   %q1 = shl i32 %q1, %P

I'm especially worried about this last re-association example since
SCEV does things like this internally when canonicalizing expressions,
and I'm not convinced that we can change it to tiptoe around these
problematic cases.

-- Sanjoy

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Nov 2016 - RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

Apparently Analagous Threads