Chandler Carruth via llvm-dev
2016-Nov-04 17:34 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:> > I'm still digesting the proposal in its entirety, but one point I want > to call out here: > > > >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote: > >> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; > >> store i32 dst". I guess that's illegal under this model? How about the > >> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 > >> x i8> dst"? (SROA also performs similar transforms.) > > > > I think it is really important that memcpy and load/store pair be > equivalent. If this is in fact true about the proposal, I would want to > really dig into why and whether anything could be done to make them > equivalent. > > Memcpy does a byte-by-byte copy. So if one of the bits is poison then only > the byte containing that bit becomes poison. > Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is > not equivalent to "load i32" since load makes the whole value poison if any > of the bits is poison. >That is not the semantic model that LLVM uses today. Your proposal is changing this, and it is an *incredibly* fundamental part of the optimizer. LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of bits as absolutely identical to the underlying storage in memory. Changing this would completely break SROA, various parts of mem2reg and reg2mem, and all load widening. I understand you have reasons here, but this is a much bigger change than the undef vs. poison part. ;] I'm currently deeply concerned by this change and would strongly prefer looking at alternatives.> The alternative as given by Eli is to use "load <4 x i8>". Since now we > are loading 4 separate values, poison does not extend past the byte > boundary. When load is lowered, you should get exactly the same code as > with "load i32", though. > So the hope is that there's no diff at assembly level. >Note that vectors in LLVM are both quite limited and will have likely very different representations in the assembly. You're essentially conferring a *type* onto *memory*. That is not currently LLVM's model. Instead, the types in LLVM are tied to *operations*. As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and code generates to facilitate vector operations on values of vector type. So for example, a vector of 32 i1s is likely to be put into a vector mask register. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/d43cfd57/attachment.html>
Nuno Lopes via llvm-dev
2016-Nov-08 18:47 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Hi Chandler, Thanks for your feedback! I’m still trying to get to the bottom of your concerns and how to best address them. As far as I understand, your main concern is that the following should be equivalent to a no-op (thanks Sanjoy for the example!): i32* ptr = ... %tmp = load i32, i32* %ptr store i32 %tmp, i32* %ptr So introducing a store of the loaded value should be ok, but under our proposal it’s not. Just to clarify: is this actually your main concern? I believe many parts of LLVM already interpret poison as a per-value attribute, including when it comes to loads. Take the following function as an example: define i16 @g(i8 %in) { %a = alloca i32 %v = add nsw i8 127, %in %ptr = bitcast i32* %a to i8* %ptr1 = getelementptr i8, i8* %ptr, i32 1 store i8 %v, i8* %ptr1 %ptr16 = bitcast i32* %a to i16* %load = load i16, i16* %ptr16 ret i16 %load } This program stores 8 bits, and leaves the remaining 24 bits uninitialized. It then loads 16 bits, half initialized to %v, half uninitialized. SROA transforms the above function to: define i16 @g(i8 %in) { %v = add nsw i8 127, %in %1 = zext i8 %v to i16 %2 = shl i16 %1, 8 %3 = and i16 undef, 255 %4 = or i16 %3, %2 ret i16 %4 } SROA gets rid of the alloca/store/load instructions and replaces those with an OR of undef and shifted %v. If I call g() with %in=1, then the addition overflows and %v is poison. Under LLVM semantics, all instructions return poison if given poison as input. Therefore the whole function can be folded to “ret i16 poison” (if we had a poison value in IR, or otherwise fold to “add nsw i16 INT_MAX, 1”). So, to justify the correctness of SROA, the load in the original program had to be widening the poison from the stored i8 value and return a i16 poison (i.e., %load had to be poison if %v was poison). Therefore, we can conclude that in current semantics, loads already widen poison from bit-representation into value-wise representation, which makes the insertion of a load-store round-trip illegal in general. Please let me know what you think. I may be missing some important detail. Thanks, Nuno From: Chandler Carruth [mailto:chandlerc at google.com] Sent: 4 de novembro de 2016 17:35 Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt <mailto:nuno.lopes at ist.utl.pt> > wrote:> I'm still digesting the proposal in its entirety, but one point I want to call out here: > >> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote: >> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; >> store i32 dst". I guess that's illegal under this model? How about the >> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 >> x i8> dst"? (SROA also performs similar transforms.) > > I think it is really important that memcpy and load/store pair be equivalent. If this is in fact true about the proposal, I would want to really dig into why and whether anything could be done to make them equivalent.Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the byte containing that bit becomes poison. Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is not equivalent to "load i32" since load makes the whole value poison if any of the bits is poison. That is not the semantic model that LLVM uses today. Your proposal is changing this, and it is an *incredibly* fundamental part of the optimizer. LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of bits as absolutely identical to the underlying storage in memory. Changing this would completely break SROA, various parts of mem2reg and reg2mem, and all load widening. I understand you have reasons here, but this is a much bigger change than the undef vs. poison part. ;] I'm currently deeply concerned by this change and would strongly prefer looking at alternatives. The alternative as given by Eli is to use "load <4 x i8>". Since now we are loading 4 separate values, poison does not extend past the byte boundary. When load is lowered, you should get exactly the same code as with "load i32", though. So the hope is that there's no diff at assembly level. Note that vectors in LLVM are both quite limited and will have likely very different representations in the assembly. You're essentially conferring a *type* onto *memory*. That is not currently LLVM's model. Instead, the types in LLVM are tied to *operations*. As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and code generates to facilitate vector operations on values of vector type. So for example, a vector of 32 i1s is likely to be put into a vector mask register. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/ffb4a8a1/attachment.html>
Friedman, Eli via llvm-dev
2016-Nov-08 20:48 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On 11/8/2016 10:47 AM, Nuno Lopes wrote:> > Hi Chandler, > > Thanks for your feedback! > > I’m still trying to get to the bottom of your concerns and how to best > address them. > > As far as I understand, your main concern is that the following should > be equivalent to a no-op (thanks Sanjoy for the example!): > > i32* ptr = ... > > %tmp = load i32, i32* %ptr > > store i32 %tmp, i32* %ptr > > So introducing a store of the loaded value should be ok, but under our > proposal it’s not. Just to clarify: is this actually your main concern? > > I believe many parts of LLVM already interpret poison as a per-value > attribute, including when it comes to loads. Take the following > function as an example: > > define i16 @g(i8 %in) { > > %a = alloca i32 > > %v = add nsw i8 127, %in > > %ptr = bitcast i32* %a to i8* > > %ptr1 = getelementptr i8, i8* %ptr, i32 1 > > store i8 %v, i8* %ptr1 > > %ptr16 = bitcast i32* %a to i16* > > %load = load i16, i16* %ptr16 > > ret i16 %load > > } > > This program stores 8 bits, and leaves the remaining 24 bits > uninitialized. It then loads 16 bits, half initialized to %v, half > uninitialized. SROA transforms the above function to: > > define i16 @g(i8 %in) { > > %v = add nsw i8 127, %in > > %1 = zext i8 %v to i16 > > %2 = shl i16 %1, 8 > > %3 = and i16 undef, 255 > > %4 = or i16 %3, %2 > > ret i16 %4 > > } > > SROA gets rid of the alloca/store/load instructions and replaces those > with an OR of undef and shifted %v. > > If I call g() with %in=1, then the addition overflows and %v is > poison. Under LLVM semantics, all instructions return poison if given > poison as input. Therefore the whole function can be folded to “ret > i16 poison”(if we had a poison value in IR, or otherwise fold to “add > nsw i16 INT_MAX, 1”). > > So, to justify the correctness of SROA, the load in the original > program had to be widening the poison from the stored i8 value and > return a i16 poison (i.e., %load had to be poison if %v was poison). > Therefore, we can conclude that in current semantics, loads already > widen poison from bit-representation into value-wise representation, > which makes the insertion of a load-store round-trip illegal in general. > > Please let me know what you think. I may be missing some important detail. >Well, I wouldn't pay too much attention to what LangRef currently says about poison, considering that the current model isn't actually consistent. It's possible we could tweak your model a little to make poison a bit-wise property, so an i32 can be partially poison, sort of like how undef currently works. Then we can consider on an operation by operation basis how it propagates poison bits. In other words, let's pretend every Value has type <n x i1> for the purpose of poison until we actually do math on it. Under this model, we can say a non-vector icmp with any poisoned bit as input returns poison, but a load produces a value whose bits are poison only where the input is poison. We can say "freeze(load i32)" does the same thing as "bitcast(freeze(load <32 x i1>) to i32)". We can define lshr and trunc to allow merging two i8 loads into an i16 load. Not sure what the rules for arithmetic and logic operations would be off the top of my head; there are some tradeoffs here. I'm not sure this is the right approach (this model is more complicated, and we probably lose a bit of optimization power in some cases), but it's definitely nicer from the perspective of SROA and GVN. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/9d3a237a/attachment.html>
Sanjoy Das via llvm-dev
2016-Nov-08 23:32 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Hi Nuno, Chandler, Nuno Lopes via llvm-dev wrote: > This program stores 8 bits, and leaves the remaining 24 bits > uninitialized. It then loads 16 bits, half initialized to %v, half > uninitialized. SROA transforms the above function to: > > define i16 @g(i8 %in) { > %v = add nsw i8 127, %in > %1 = zext i8 %v to i16 > %2 = shl i16 %1, 8 > %3 = and i16 undef, 255 > %4 = or i16 %3, %2 > ret i16 %4 > } This program above returns i16 poison only if "shl i16 poison, 8" is a full value poison. Whether that is the case today or not is anybody's guess (as Eli says, we should not rely too much on semantics that are known to be broken), so the important question is whether "shl i16 poison, 8" _can_ be defined to be poison only on the higher 8 bits in a world with bitwise poison. If we could make that happen, we'd also get some other fringe benefits, like ComputeKnownBits would be able to fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison". One problem with shl always generating non-poison zeroes in the lower bits is that it then is not equivalent to multiplication. This breaks optimizations like: %t = ... %q = shl i32 %t, 6 =/=> %t = ... %q = mul i32 %t, 64 and (probably more meaningful) %t = ... %q0 = shl i32 %t, 6 %q1 = mul i32 %q0, 3 =/=> %t = ... ;; %q0 = shl i32 %t, 6 %q1 = mul i32 %q0, (3 * 64) and (probably even worse) it also breaks reassociation: %t = ... %q0 = shl i32 %t, %P %q1 = mul i32 %q0, %Q (No other uses of %q0, but there are uses of %q1) =/=> %t = ... %q0 = mul i32 %t, %Q %q1 = shl i32 %q1, %P I'm especially worried about this last re-association example since SCEV does things like this internally when canonicalizing expressions, and I'm not convinced that we can change it to tiptoe around these problematic cases. -- Sanjoy