Chandler Carruth via llvm-dev
2016-Oct-19 10:30 UTC
[llvm-dev] RFC: Killing undef and spreading poison
I'm still digesting the proposal in its entirety, but one point I want to call out here: On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; > store i32 dst". I guess that's illegal under this model? How about the > related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 > x i8> dst"? (SROA also performs similar transforms.) >I think it is really important that memcpy and load/store pair be equivalent. If this is in fact true about the proposal, I would want to really dig into why and whether anything could be done to make them equivalent. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/bd8ea53e/attachment.html>
Nuno Lopes via llvm-dev
2016-Oct-19 11:52 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> I'm still digesting the proposal in its entirety, but one point I want to call out here: > >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote: >> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; >> store i32 dst". I guess that's illegal under this model? How about the >> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 >> x i8> dst"? (SROA also performs similar transforms.) > > I think it is really important that memcpy and load/store pair be equivalent. If this is in fact true about the proposal, I would want to really dig into why and whether anything could be done to make them equivalent.Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the byte containing that bit becomes poison. Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is not equivalent to "load i32" since load makes the whole value poison if any of the bits is poison. The alternative as given by Eli is to use "load <4 x i8>". Since now we are loading 4 separate values, poison does not extend past the byte boundary. When load is lowered, you should get exactly the same code as with "load i32", though. So the hope is that there's no diff at assembly level. We were supposed to be doing this already btw; there's no change regarding poison semantics for load/store. LLVM is risking miscompilation ATM (when doing this memcpy -> load/store transformation). Nuno
Bruce Hoult via llvm-dev
2016-Oct-19 12:44 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 2:52 PM, Nuno Lopes via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only > the byte containing that bit becomes poison. > Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is > not equivalent to "load i32" since load makes the whole value poison if any > of the bits is poison. > The alternative as given by Eli is to use "load <4 x i8>". Since now we > are loading 4 separate values, poison does not extend past the byte > boundary. When load is lowered, you should get exactly the same code as > with "load i32", though. > So the hope is that there's no diff at assembly level. >I'm curious. Where is it defined that memcpy is byte by byte not, for example, bit by bit? Why is the destination not identical to the source, with exactly the same bits poison? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/5bbc9478/attachment.html>
Chandler Carruth via llvm-dev
2016-Nov-04 17:34 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:> > I'm still digesting the proposal in its entirety, but one point I want > to call out here: > > > >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote: > >> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; > >> store i32 dst". I guess that's illegal under this model? How about the > >> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 > >> x i8> dst"? (SROA also performs similar transforms.) > > > > I think it is really important that memcpy and load/store pair be > equivalent. If this is in fact true about the proposal, I would want to > really dig into why and whether anything could be done to make them > equivalent. > > Memcpy does a byte-by-byte copy. So if one of the bits is poison then only > the byte containing that bit becomes poison. > Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is > not equivalent to "load i32" since load makes the whole value poison if any > of the bits is poison. >That is not the semantic model that LLVM uses today. Your proposal is changing this, and it is an *incredibly* fundamental part of the optimizer. LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of bits as absolutely identical to the underlying storage in memory. Changing this would completely break SROA, various parts of mem2reg and reg2mem, and all load widening. I understand you have reasons here, but this is a much bigger change than the undef vs. poison part. ;] I'm currently deeply concerned by this change and would strongly prefer looking at alternatives.> The alternative as given by Eli is to use "load <4 x i8>". Since now we > are loading 4 separate values, poison does not extend past the byte > boundary. When load is lowered, you should get exactly the same code as > with "load i32", though. > So the hope is that there's no diff at assembly level. >Note that vectors in LLVM are both quite limited and will have likely very different representations in the assembly. You're essentially conferring a *type* onto *memory*. That is not currently LLVM's model. Instead, the types in LLVM are tied to *operations*. As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and code generates to facilitate vector operations on values of vector type. So for example, a vector of 32 i1s is likely to be put into a vector mask register. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/d43cfd57/attachment.html>