thr3ads.net - llvm dev - [llvm-dev] RFC: Killing undef and spreading poison [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth via llvm-dev

2016-Oct-19 10:30 UTC

[llvm-dev] RFC: Killing undef and spreading poison

I'm still digesting the proposal in its entirety, but one point I want to
call out here:

On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Instcombine currently transforms "memcpy(dst, src, 4)" to
"load i32 src;
> store i32 dst".  I guess that's illegal under this model?  How
about the
> related transform "memcpy(dst, src, 4)" to "load <4 x
i8> src; store <4
> x i8> dst"?  (SROA also performs similar transforms.)
>
I think it is really important that memcpy and load/store pair be
equivalent. If this is in fact true about the proposal, I would want to
really dig into why and whether anything could be done to make them
equivalent.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/bd8ea53e/attachment.html>

Nuno Lopes via llvm-dev

2016-Oct-19 11:52 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

> I'm still digesting the proposal in its entirety, but one point I want
to call out here:
>
>> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:
>> Instcombine currently transforms "memcpy(dst, src, 4)" to
"load i32 src;
>> store i32 dst".  I guess that's illegal under this model?  How
about the
>> related transform "memcpy(dst, src, 4)" to "load <4 x
i8> src; store <4
>> x i8> dst"?  (SROA also performs similar transforms.)
> 
> I think it is really important that memcpy and load/store pair be
equivalent. If this is in fact true about the proposal, I would want to really
dig into why and whether anything could be done to make them equivalent.
Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the
byte containing that bit becomes poison.
Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is not
equivalent to "load i32" since load makes the whole value poison if
any of the bits is poison.
The alternative as given by Eli is to use "load <4 x i8>". 
Since now we are loading 4 separate values, poison does not extend past the byte
boundary.  When load is lowered, you should get exactly the same code as with
"load i32", though.
So the hope is that there's no diff at assembly level.

We were supposed to be doing this already btw; there's no change regarding
poison semantics for load/store. LLVM is risking miscompilation ATM (when doing
this memcpy -> load/store transformation).

Nuno

Bruce Hoult via llvm-dev

2016-Oct-19 12:44 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

On Wed, Oct 19, 2016 at 2:52 PM, Nuno Lopes via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only
> the byte containing that bit becomes poison.
> Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is
> not equivalent to "load i32" since load makes the whole value
poison if any
> of the bits is poison.
> The alternative as given by Eli is to use "load <4 x i8>". 
Since now we
> are loading 4 separate values, poison does not extend past the byte
> boundary.  When load is lowered, you should get exactly the same code as
> with "load i32", though.
> So the hope is that there's no diff at assembly level.
>
I'm curious. Where is it defined that memcpy is byte by byte not, for
example, bit by bit? Why is the destination not identical to the source,
with exactly the same bits poison?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/5bbc9478/attachment.html>

Chandler Carruth via llvm-dev

2016-Nov-04 17:34 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt>
wrote:
> > I'm still digesting the proposal in its entirety, but one point I
want
> to call out here:
> >
> >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:
> >> Instcombine currently transforms "memcpy(dst, src, 4)"
to "load i32 src;
> >> store i32 dst".  I guess that's illegal under this model?
How about the
> >> related transform "memcpy(dst, src, 4)" to "load
<4 x i8> src; store <4
> >> x i8> dst"?  (SROA also performs similar transforms.)
> >
> > I think it is really important that memcpy and load/store pair be
> equivalent. If this is in fact true about the proposal, I would want to
> really dig into why and whether anything could be done to make them
> equivalent.
>
> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only
> the byte containing that bit becomes poison.
> Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is
> not equivalent to "load i32" since load makes the whole value
poison if any
> of the bits is poison.
>
That is not the semantic model that LLVM uses today. Your proposal is
changing this, and it is an *incredibly* fundamental part of the optimizer.

LLVM, very fundamentally, models an integer, a vector of bytes, and a
vector of bits as absolutely identical to the underlying storage in memory.
Changing this would completely break SROA, various parts of mem2reg and
reg2mem, and all load widening.

I understand you have reasons here, but this is a much bigger change than
the undef vs. poison part. ;] I'm currently deeply concerned by this change
and would strongly prefer looking at alternatives.

> The alternative as given by Eli is to use "load <4 x i8>". 
Since now we
> are loading 4 separate values, poison does not extend past the byte
> boundary.  When load is lowered, you should get exactly the same code as
> with "load i32", though.
> So the hope is that there's no diff at assembly level.
>
Note that vectors in LLVM are both quite limited and will have likely very
different representations in the assembly.

You're essentially conferring a *type* onto *memory*. That is not currently
LLVM's model. Instead, the types in LLVM are tied to *operations*.

As a consequence, when LLVM sees a vector type, it canonicalizes,
optimizes, and code generates to facilitate vector operations on values of
vector type. So for example, a vector of 32 i1s is likely to be put into a
vector mask register.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/d43cfd57/attachment.html>

llvm dev - Oct 2016 - RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison