Nuno Lopes via llvm-dev
2016-Nov-09 16:44 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> On 11/8/2016 3:32 PM, Sanjoy Das wrote: >> Hi Nuno, Chandler, >> >> Nuno Lopes via llvm-dev wrote: >> > This program stores 8 bits, and leaves the remaining 24 bits >> > uninitialized. It then loads 16 bits, half initialized to %v, half >> > uninitialized. SROA transforms the above function to: >> > >> > define i16 @g(i8 %in) { >> > %v = add nsw i8 127, %in >> > %1 = zext i8 %v to i16 >> > %2 = shl i16 %1, 8 >> > %3 = and i16 undef, 255 >> > %4 = or i16 %3, %2 >> > ret i16 %4 >> > } >> >> This program above returns i16 poison only if "shl i16 poison, 8" is a >> full value poison. Whether that is the case today or not is anybody's >> guess (as Eli says, we should not rely too much on semantics that are >> known to be broken), so the important question is whether "shl i16 >> poison, 8" _can_ be defined to be poison only on the higher 8 bits in >> a world with bitwise poison. If we could make that happen, we'd also >> get some other fringe benefits, like ComputeKnownBits would be able to >> fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as >> opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison". >> >> One problem with shl always generating non-poison zeroes in the lower >> bits is that it then is not equivalent to multiplication. This breaks >> optimizations like: >> >> %t = ... >> %q = shl i32 %t, 6 >> >> =/=> >> >> %t = ... >> %q = mul i32 %t, 64 >> >> and (probably more meaningful) >> >> %t = ... >> %q0 = shl i32 %t, 6 >> %q1 = mul i32 %q0, 3 >> >> =/=> >> >> %t = ... >> ;; %q0 = shl i32 %t, 6 >> %q1 = mul i32 %q0, (3 * 64) >> >> >> and (probably even worse) it also breaks reassociation: >> >> %t = ... >> %q0 = shl i32 %t, %P >> %q1 = mul i32 %q0, %Q >> >> (No other uses of %q0, but there are uses of %q1) >> >> =/=> >> >> %t = ... >> %q0 = mul i32 %t, %Q >> %q1 = shl i32 %q1, %P >> >> >> I'm especially worried about this last re-association example since >> SCEV does things like this internally when canonicalizing expressions, >> and I'm not convinced that we can change it to tiptoe around these >> problematic cases. > > Well, we could say non-nsw add and mul are actually "bitwise" operations, so "mul i32 poison, 2" is guaranteed to have its bottom bit zero (but "mul nsw i32 poison, 2" is poison). Of course, there's a limit to how far we can go in that direction, or we just end up with the old notion of undef. Off the top of my head, I'm not exactly sure where that line is.Right, we could try to make it work. I don't have a good model for bit-wise poison yet, but it might be possible. One of the things it will break is rewrites into comparisons, since at least comparisons will need to return poison if any bit of the inputs is poison. So the direction of "arithmetic -> comparison" gets broken (InstCombine has a few of these). But before we try to make bit-wise poison work, I would like to understand what are the deficiencies of value-wise poison. Is there any major issue? I'm asking because value-wise poison seems to be a much simpler concept, so if it works ok it seems to be preferable. During the meeting last week, Richard Smith proposed that instead we add a "load not_poison %ptr" (like load atomic) instruction that would be equivalent to load+freeze+bitcast, because he is concerned that some C++ stuff needs to be lowered to such a load. This load then becomes tricky to move around (e.g., cannot be sank to inside loops), but there are options to attenuate that problem if necessary. Perhaps this construct would make John McCall happy as well (re. his question during the talk last week)? Thanks, Nuno
Hal Finkel via llvm-dev
2016-Nov-09 19:09 UTC
[llvm-dev] RFC: Killing undef and spreading poison
----- Original Message -----> From: "Nuno Lopes via llvm-dev" <llvm-dev at lists.llvm.org> > To: "Eli' 'Friedman" <efriedma at codeaurora.org>, "Sanjoy Das" <sanjoy at playingwithpointers.com> > Cc: llvm-dev at lists.llvm.org, "John Regehr" <regehr at cs.utah.edu> > Sent: Wednesday, November 9, 2016 10:44:19 AM > Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison > > > On 11/8/2016 3:32 PM, Sanjoy Das wrote: > >> Hi Nuno, Chandler, > >> > >> Nuno Lopes via llvm-dev wrote: > >> > This program stores 8 bits, and leaves the remaining 24 bits > >> > uninitialized. It then loads 16 bits, half initialized to %v, > >> > half > >> > uninitialized. SROA transforms the above function to: > >> > > >> > define i16 @g(i8 %in) { > >> > %v = add nsw i8 127, %in > >> > %1 = zext i8 %v to i16 > >> > %2 = shl i16 %1, 8 > >> > %3 = and i16 undef, 255 > >> > %4 = or i16 %3, %2 > >> > ret i16 %4 > >> > } > >> > >> This program above returns i16 poison only if "shl i16 poison, 8" > >> is a > >> full value poison. Whether that is the case today or not is > >> anybody's > >> guess (as Eli says, we should not rely too much on semantics that > >> are > >> known to be broken), so the important question is whether "shl i16 > >> poison, 8" _can_ be defined to be poison only on the higher 8 bits > >> in > >> a world with bitwise poison. If we could make that happen, we'd > >> also > >> get some other fringe benefits, like ComputeKnownBits would be > >> able to > >> fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as > >> opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison". > >> > >> One problem with shl always generating non-poison zeroes in the > >> lower > >> bits is that it then is not equivalent to multiplication. This > >> breaks > >> optimizations like: > >> > >> %t = ... > >> %q = shl i32 %t, 6 > >> > >> =/=> > >> > >> %t = ... > >> %q = mul i32 %t, 64 > >> > >> and (probably more meaningful) > >> > >> %t = ... > >> %q0 = shl i32 %t, 6 > >> %q1 = mul i32 %q0, 3 > >> > >> =/=> > >> > >> %t = ... > >> ;; %q0 = shl i32 %t, 6 > >> %q1 = mul i32 %q0, (3 * 64) > >> > >> > >> and (probably even worse) it also breaks reassociation: > >> > >> %t = ... > >> %q0 = shl i32 %t, %P > >> %q1 = mul i32 %q0, %Q > >> > >> (No other uses of %q0, but there are uses of %q1) > >> > >> =/=> > >> > >> %t = ... > >> %q0 = mul i32 %t, %Q > >> %q1 = shl i32 %q1, %P > >> > >> > >> I'm especially worried about this last re-association example > >> since > >> SCEV does things like this internally when canonicalizing > >> expressions, > >> and I'm not convinced that we can change it to tiptoe around these > >> problematic cases. > > > > Well, we could say non-nsw add and mul are actually "bitwise" > > operations, so "mul i32 poison, 2" is guaranteed to have its > > bottom bit zero (but "mul nsw i32 poison, 2" is poison). Of > > course, there's a limit to how far we can go in that direction, or > > we just end up with the old notion of undef. Off the top of my > > head, I'm not exactly sure where that line is. > > Right, we could try to make it work. I don't have a good model for > bit-wise poison yet, but it might be possible. One of the things it > will break is rewrites into comparisons, since at least comparisons > will need to return poison if any bit of the inputs is poison. So > the direction of "arithmetic -> comparison" gets broken (InstCombine > has a few of these). > > But before we try to make bit-wise poison work, I would like to > understand what are the deficiencies of value-wise poison. Is there > any major issue? I'm asking because value-wise poison seems to be a > much simpler concept, so if it works ok it seems to be preferable. > > During the meeting last week, Richard Smith proposed that instead we > add a "load not_poison %ptr" (like load atomic) instruction that > would be equivalent to load+freeze+bitcast, because he is concerned > that some C++ stuff needs to be lowered to such a load. This load > then becomes tricky to move around (e.g., cannot be sank to inside > loops), but there are options to attenuate that problem if > necessary.Wouldn't we need to do this for all loads? Does this solve the load/store <-> memcpy equivalence problem? Thanks again, Hal> Perhaps this construct would make John McCall happy as well (re. his > question during the talk last week)? > > Thanks, > Nuno > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Nuno Lopes via llvm-dev
2016-Nov-09 23:13 UTC
[llvm-dev] RFC: Killing undef and spreading poison
>> > On 11/8/2016 3:32 PM, Sanjoy Das wrote: >> >> Hi Nuno, Chandler, >> >> >> >> Nuno Lopes via llvm-dev wrote: >> >> > This program stores 8 bits, and leaves the remaining 24 bits >> >> > uninitialized. It then loads 16 bits, half initialized to %v, >> >> > half >> >> > uninitialized. SROA transforms the above function to: >> >> > >> >> > define i16 @g(i8 %in) { >> >> > %v = add nsw i8 127, %in >> >> > %1 = zext i8 %v to i16 >> >> > %2 = shl i16 %1, 8 >> >> > %3 = and i16 undef, 255 >> >> > %4 = or i16 %3, %2 >> >> > ret i16 %4 >> >> > } >> >> >> >> This program above returns i16 poison only if "shl i16 poison, 8" >> >> is a >> >> full value poison. Whether that is the case today or not is >> >> anybody's >> >> guess (as Eli says, we should not rely too much on semantics that >> >> are >> >> known to be broken), so the important question is whether "shl i16 >> >> poison, 8" _can_ be defined to be poison only on the higher 8 bits >> >> in >> >> a world with bitwise poison. If we could make that happen, we'd >> >> also >> >> get some other fringe benefits, like ComputeKnownBits would be >> >> able to >> >> fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as >> >> opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison". >> >> >> >> One problem with shl always generating non-poison zeroes in the >> >> lower >> >> bits is that it then is not equivalent to multiplication. This >> >> breaks >> >> optimizations like: >> >> >> >> %t = ... >> >> %q = shl i32 %t, 6 >> >> >> >> =/=> >> >> >> >> %t = ... >> >> %q = mul i32 %t, 64 >> >> >> >> and (probably more meaningful) >> >> >> >> %t = ... >> >> %q0 = shl i32 %t, 6 >> >> %q1 = mul i32 %q0, 3 >> >> >> >> =/=> >> >> >> >> %t = ... >> >> ;; %q0 = shl i32 %t, 6 >> >> %q1 = mul i32 %q0, (3 * 64) >> >> >> >> >> >> and (probably even worse) it also breaks reassociation: >> >> >> >> %t = ... >> >> %q0 = shl i32 %t, %P >> >> %q1 = mul i32 %q0, %Q >> >> >> >> (No other uses of %q0, but there are uses of %q1) >> >> >> >> =/=> >> >> >> >> %t = ... >> >> %q0 = mul i32 %t, %Q >> >> %q1 = shl i32 %q1, %P >> >> >> >> >> >> I'm especially worried about this last re-association example >> >> since >> >> SCEV does things like this internally when canonicalizing >> >> expressions, >> >> and I'm not convinced that we can change it to tiptoe around these >> >> problematic cases. >> > >> > Well, we could say non-nsw add and mul are actually "bitwise" >> > operations, so "mul i32 poison, 2" is guaranteed to have its >> > bottom bit zero (but "mul nsw i32 poison, 2" is poison). Of >> > course, there's a limit to how far we can go in that direction, or >> > we just end up with the old notion of undef. Off the top of my >> > head, I'm not exactly sure where that line is. >> >> Right, we could try to make it work. I don't have a good model for >> bit-wise poison yet, but it might be possible. One of the things it >> will break is rewrites into comparisons, since at least comparisons >> will need to return poison if any bit of the inputs is poison. So >> the direction of "arithmetic -> comparison" gets broken (InstCombine >> has a few of these). >> >> But before we try to make bit-wise poison work, I would like to >> understand what are the deficiencies of value-wise poison. Is there >> any major issue? I'm asking because value-wise poison seems to be a >> much simpler concept, so if it works ok it seems to be preferable. >> >> During the meeting last week, Richard Smith proposed that instead we >> add a "load not_poison %ptr" (like load atomic) instruction that >> would be equivalent to load+freeze+bitcast, because he is concerned >> that some C++ stuff needs to be lowered to such a load. This load >> then becomes tricky to move around (e.g., cannot be sank to inside >> loops), but there are options to attenuate that problem if >> necessary. > >Wouldn't we need to do this for all loads?I don't think so. Most loads can be plain value-wise poison tainting (lacking better naming). Only loads that may touch uninitialized memory in a *valid* way need to be frozen. Richard's example was calling the constructor with a union or something similar (sorry, don't remember the exact details). For those cases it's probably easier for the front-end to legally read past boundaries.> Does this solve the load/store <-> memcpy equivalence problem?If we define memcpy to be a bit-wise copy, then it would be equivalent to a bit-wise load/store (e.g., <n x i1>). I know this is not well supported by LLVM at the moment, but we can fix that. Nuno
John McCall via llvm-dev
2016-Nov-11 05:10 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> On Nov 9, 2016, at 8:44 AM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote: >> On 11/8/2016 3:32 PM, Sanjoy Das wrote: >>> Hi Nuno, Chandler, >>> >>> Nuno Lopes via llvm-dev wrote: >>>> This program stores 8 bits, and leaves the remaining 24 bits >>>> uninitialized. It then loads 16 bits, half initialized to %v, half >>>> uninitialized. SROA transforms the above function to: >>>> >>>> define i16 @g(i8 %in) { >>>> %v = add nsw i8 127, %in >>>> %1 = zext i8 %v to i16 >>>> %2 = shl i16 %1, 8 >>>> %3 = and i16 undef, 255 >>>> %4 = or i16 %3, %2 >>>> ret i16 %4 >>>> } >>> >>> This program above returns i16 poison only if "shl i16 poison, 8" is a >>> full value poison. Whether that is the case today or not is anybody's >>> guess (as Eli says, we should not rely too much on semantics that are >>> known to be broken), so the important question is whether "shl i16 >>> poison, 8" _can_ be defined to be poison only on the higher 8 bits in >>> a world with bitwise poison. If we could make that happen, we'd also >>> get some other fringe benefits, like ComputeKnownBits would be able to >>> fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as >>> opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison". >>> >>> One problem with shl always generating non-poison zeroes in the lower >>> bits is that it then is not equivalent to multiplication. This breaks >>> optimizations like: >>> >>> %t = ... >>> %q = shl i32 %t, 6 >>> >>> =/=> >>> >>> %t = ... >>> %q = mul i32 %t, 64 >>> >>> and (probably more meaningful) >>> >>> %t = ... >>> %q0 = shl i32 %t, 6 >>> %q1 = mul i32 %q0, 3 >>> >>> =/=> >>> >>> %t = ... >>> ;; %q0 = shl i32 %t, 6 >>> %q1 = mul i32 %q0, (3 * 64) >>> >>> >>> and (probably even worse) it also breaks reassociation: >>> >>> %t = ... >>> %q0 = shl i32 %t, %P >>> %q1 = mul i32 %q0, %Q >>> >>> (No other uses of %q0, but there are uses of %q1) >>> >>> =/=> >>> >>> %t = ... >>> %q0 = mul i32 %t, %Q >>> %q1 = shl i32 %q1, %P >>> >>> >>> I'm especially worried about this last re-association example since >>> SCEV does things like this internally when canonicalizing expressions, >>> and I'm not convinced that we can change it to tiptoe around these >>> problematic cases. >> >> Well, we could say non-nsw add and mul are actually "bitwise" operations, so "mul i32 poison, 2" is guaranteed to have its bottom bit zero (but "mul nsw i32 poison, 2" is poison). Of course, there's a limit to how far we can go in that direction, or we just end up with the old notion of undef. Off the top of my head, I'm not exactly sure where that line is. > > Right, we could try to make it work. I don't have a good model for bit-wise poison yet, but it might be possible. One of the things it will break is rewrites into comparisons, since at least comparisons will need to return poison if any bit of the inputs is poison. So the direction of "arithmetic -> comparison" gets broken (InstCombine has a few of these). > > But before we try to make bit-wise poison work, I would like to understand what are the deficiencies of value-wise poison. Is there any major issue? I'm asking because value-wise poison seems to be a much simpler concept, so if it works ok it seems to be preferable. > > During the meeting last week, Richard Smith proposed that instead we add a "load not_poison %ptr" (like load atomic) instruction that would be equivalent to load+freeze+bitcast, because he is concerned that some C++ stuff needs to be lowered to such a load. This load then becomes tricky to move around (e.g., cannot be sank to inside loops), but there are options to attenuate that problem if necessary. > Perhaps this construct would make John McCall happy as well (re. his question during the talk last week)?I think the source of evil here is the existence of inconsistent values in the representation. So no, if I understand your suggestion correctly, I don't think I'd be happy because it still admits the existence of an unfrozen poison value. It seems to me that the problem is that people are trying to specify the behavior of operations using the template "the operation succeeds normally if the conditions are met, and otherwise <something>". I don't think it works. I think the right template is simply "the result may be analyzed assuming that the conditions will be met", and we aren't really interested in what happens if the conditions aren't met. Undefined behavior is valuable to a compiler when it allows us to simplify some operation assuming that the bad conditions never happen. Actually optimizing code that we've proven has undefined behavior, in contrast, is basically uninteresting and leads us into these silly philosophical problems. I would suggest: 1. Make undef an instruction which produces an arbitrary but consistent result of its type. 2. Revisit most of the transformations that actually try to introduce undef. The consistency rule may make may of them questionable. In particular, sinking undef into a loop may change semantics. 3. Understand that folding something to undef may actually lose information. It's possible to prove that %x < %x +nsw 1. It's not possible to prove that %x < undef. Accept that this is ok because optimizing undefined behavior is not actually interesting. John.
Sanjoy Das via llvm-dev
2016-Nov-11 06:37 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Hi John, John McCall wrote:>>> Well, we could say non-nsw add and mul are actually "bitwise" operations, so "mul i32 poison, 2" is guaranteed to have its bottom bit zero (but "mul nsw i32 poison, 2" is poison). Of course, there's a limit to how far we can go in that direction, or we just end up with the old notion of undef. Off the top of my head, I'm not exactly sure where that line is. >> Right, we could try to make it work. I don't have a good model for bit-wise poison yet, but it might be possible. One of the things it will break is rewrites into comparisons, since at least comparisons will need to return poison if any bit of the inputs is poison. So the direction of "arithmetic -> comparison" gets broken (InstCombine has a few of these). >> >> But before we try to make bit-wise poison work, I would like to understand what are the deficiencies of value-wise poison. Is there any major issue? I'm asking because value-wise poison seems to be a much simpler concept, so if it works ok it seems to be preferable. >> >> During the meeting last week, Richard Smith proposed that instead we add a "load not_poison %ptr" (like load atomic) instruction that would be equivalent to load+freeze+bitcast, because he is concerned that some C++ stuff needs to be lowered to such a load. This load then becomes tricky to move around (e.g., cannot be sank to inside loops), but there are options to attenuate that problem if necessary. >> Perhaps this construct would make John McCall happy as well (re. his question during the talk last week)? > > I think the source of evil here is the existence of inconsistent values in the representation. So no, if I understand your suggestion correctly, I don't think I'd be happy because it still admits the existence of an unfrozen poison value. > > It seems to me that the problem is that people are trying to specify the behavior of operations using the template "the operation succeeds normally if the conditions are met, and otherwise<something>". I don't think it works. I think the right template is simply "the result may be analyzed assuming that the conditions will be met", and we aren't really interested in what happens if the conditions aren't met. Undefined behavior is valuable to a compiler when it allows us to simplify some operation assuming that the bad conditions never happen.I don't think we can get away without assigning some sort of meaning to the operations in the "abnormal case" for operations that don't have immediate UB in abnormal cases. Since the program can legally continue executing after doing the "abnormal" thing, we still need some sort of sane semantics for the values these abnormal operations produce. As a concrete example, say we have: i32 sum = x *nsw y i64 sum.sext = sext sum to i64 ... some use of sum.sext Pretending "x +nsw 1" does not sign-overflow, we can commute the sext into the arithmetic, but we still somehow need to capture the fact that, depending on the optimizer's whims and fancies (i.e. whether it does the commute or not), the high 32 bits of sum.sext can now be something other than all zeroes or all ones.> Actually optimizing code that we've proven has undefined behavior, in contrast, is basically uninteresting and leads us into these silly philosophical problems.I don't think we care about optimizing code that we've _proven_ has undefined behavior. We care about optimizing code in ways that is correct in the face of *other* transforms that we ourselves want to do, where these "other transforms" pretend certain abnormal cases do not exist. Poison is a "stand-in" for these transforms, which are sometimes non-local. For instance, continuing the previous example, say we're interested in the speculatibility of t = K `udiv` (-1 + (sum.sext >> 32)) We don't _really_ care about doing something intelligent when sum.sext is provably poison. However, we do care about taking into the _possibility_ of sum.sext being poison, which is really just a more precise way of saying that we care about taking into the possibility of sum.sext being commuted into (sext(x) * sext(y)) (in which case the division is not speculatable). And we want to do this with enough formalism in place so that we can write correct checking interpreters, fuzzers etc.> I would suggest: > 1. Make undef an instruction which produces an arbitrary but consistent result of its type. > 2. Revisit most of the transformations that actually try to introduce undef. The consistency rule may make may of them questionable. In particular, sinking undef into a loop may change semantics. > 3. Understand that folding something to undef may actually lose information. It's possible to prove that %x< %x +nsw 1. It's not possible to prove that %x< undef. Accept that this is ok because optimizing undefined behavior is not actually interesting.As I said above, we don't really care about folding cases that we've proved to have UB. Moreover, in your scheme, we still won't be able to do optimizations like: %y = ... %undef.0 = undefinst %val = select i1 %cond, %undef.0, %y print(%val) to %y = ... print(%val) since it could have been that %x = INT_MAX at runtime, but unknown at compile time %y = %x +nsw 1 %undef.0 = undefinst %val = select i1 %cond, %undef.0, %y print(%val) == %x s< %val Folding %val to %y will cause us to go always printing false to printing either true or false, depending on the will of the compiler. I have to think a bit about this, but in your scheme I think we will generally be only able to fold `undefinst` to constants since non-constant values could have expressions like `<INT_MAX at runtime> +nsw 1` as their source which would justify non-local transforms that `undefinst` would not. -- Sanjoy