Bjarke Roune
2015-Jun-26 23:01 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
*** Summary I'd like to propose (and implement) functionality in LLVM to determine when a poison value from an instruction is guaranteed to produce undefined behavior. I want to use that to improve handling of nsw, inbounds etc. flags in scalar evolution and LSR. I imagine that there would be other uses for it. I'd like feedback on this idea before I proceed with it. *** Details Poison values do produce undefined behavior if the poison becomes externally observable. A load or store to a poison address value is externally observable and I'd like to use that in a simple analysis pass to derive guarantees that certain overflows would produce undefined behavior, not just poison. Scalar evolution (and hence LSR) cannot currently make much use of the nsw and similar flags on instructions. That is because two instructions can map to the same scev even if one instruction has the nsw flag and the other one does not. If we applied the nsw flag to the scev, the scev for the instruction without the nsw flag would then incorrectly have the nsw flag. Scalar evolution would be able to use the nsw flag from an instruction for recurrences when the loop header dominates the entire loop, the instruction with nsw post-dominates the loop header and undefined behavior is guaranteed on wrap via the poison value analysis pass that I'd like to write. What do you think? Do we already have something similar to this? Bjarke *** PS: What got me thinking about this: My immediate motivation is that I'd like LSR to be able to create induction variables for expressions like &ptr[i + offset] where i and offset are 32 bit integers, ptr is a loop-invariant 64 bit pointer, i is an induction variable and offset is loop-invariant. For that to happen, scalar evolution needs to propagate the nsw flag from i + offset to the scev so that it can transform ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> to {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop> Currently the inner <nsw> is actually <nw>, which blocks the transformation (the outer two nsw's shouldn't currently be there either, as it's the same issue for inbounds on GEP: see llvm bug 23527) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150626/263e0f6f/attachment.html>
Sanjoy Das
2015-Jun-27 07:16 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
Hi Bjarke, Your proposal sounds reasonable and I can't immediately spot any fundamental issues. However, there is yet another way to skin this cat that you may want to consider: First of all, going by the "poison causes UB only when observed", SCEV does not do the right thing currently: declare void @use(i64) define void @f() { entry: br label %loop loop: %idx = phi i32 [ %idx.inc, %loop ], [ 0, %entry ] %idx2 = phi i32 [ %idx2.inc, %loop ], [ 0, %entry ] %idx.inc = add nsw i32 %idx, 1 %idx2.inc = add i32 %idx2, 1 %idx2.sext = sext i32 %idx2 to i64 call void @use(i64 %idx2.sext) br label %loop } opt -analyze -scalar-evolution ==> Classifying expressions for: @f %idx = phi i32 [ %idx.inc, %loop ], [ 0, %entry ] --> {0,+,1}<nuw><nsw><%loop> U: [0,-2147483648) S: [0,-2147483648) Exits: <<Unknown>> %idx2 = phi i32 [ %idx2.inc, %loop ], [ 0, %entry ] --> {0,+,1}<nuw><nsw><%loop> U: [0,-2147483648) S: [0,-2147483648) Exits: <<Unknown>> %idx.inc = add nsw i32 %idx, 1 --> {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: <<Unknown>> %idx2.inc = add i32 %idx2, 1 --> {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: <<Unknown>> %idx2.sext = sext i32 %idx2 to i64 --> {0,+,1}<nuw><nsw><%loop> U: [0,-9223372036854775808) S: [0,-9223372036854775808) Exits: <<Unknown>> SCEV concludes that %idx2 is nsw, even though there is no side effecting, observable use of %idx. One way to get what you want and also fix this existing bug(?) is to make the no wrap flags part of a SCEV expression's identity (i.e. make {A,+,B}<nuw> a different expression from {A,+,B}). If you start with the precondition that every <nuw> [0] came from a <nuw> [1] present in the IR, then you don't really have to worry about control dependence -- you can always optimize under the assumption that the SCEV expression does not unsign-overflow; if such an optimization changes program behavior then that program has UB since it was data dependent on poison [2]. We could still use the analysis you suggest and exploit poison-caused-UB to optimize wrapping add recs to non-wrapping add recs, but that's a separate optimization that may not be needed in IR generated from C/C++ frontends (since interesting arithmetic will already be marked <nuw>). All of this is based on what I understand the _intent_ of poison to be. I'll leave it to David to comment on what the future holds for nuw, nsw and poison in LLVM's specification. -- Sanjoy [0]: identical reasoning applies to <nsw>, leaving out <nsw> to avoid clutter [1]: it can also come from SCEV *proving* that the expression does not unsign-overflow, for instance by looking at trip counts [2]: in other words, it is always OK to replace zext(a +nuw b) with zext(a) + zext(b). If the program behavior changes observably by this substitution then the program had UB. On Fri, Jun 26, 2015 at 4:01 PM, Bjarke Roune <broune at google.com> wrote:> *** Summary > I'd like to propose (and implement) functionality in LLVM to determine when > a poison value from an instruction is guaranteed to produce undefined > behavior. I want to use that to improve handling of nsw, inbounds etc. flags > in scalar evolution and LSR. I imagine that there would be other uses for > it. I'd like feedback on this idea before I proceed with it. > > > *** Details > Poison values do produce undefined behavior if the poison becomes externally > observable. A load or store to a poison address value is externally > observable and I'd like to use that in a simple analysis pass to derive > guarantees that certain overflows would produce undefined behavior, not just > poison. > > Scalar evolution (and hence LSR) cannot currently make much use of the nsw > and similar flags on instructions. That is because two instructions can map > to the same scev even if one instruction has the nsw flag and the other one > does not. If we applied the nsw flag to the scev, the scev for the > instruction without the nsw flag would then incorrectly have the nsw flag. > > Scalar evolution would be able to use the nsw flag from an instruction for > recurrences when the loop header dominates the entire loop, the instruction > with nsw post-dominates the loop header and undefined behavior is guaranteed > on wrap via the poison value analysis pass that I'd like to write. > > What do you think? Do we already have something similar to this? > > Bjarke > > > > *** PS: What got me thinking about this: > My immediate motivation is that I'd like LSR to be able to create induction > variables for expressions like &ptr[i + offset] where i and offset are 32 > bit integers, ptr is a loop-invariant 64 bit pointer, i is an induction > variable and offset is loop-invariant. For that to happen, scalar evolution > needs to propagate the nsw flag from i + offset to the scev so that it can > transform > > ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> > > to > > {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop> > > Currently the inner <nsw> is actually <nw>, which blocks the transformation > (the outer two nsw's shouldn't currently be there either, as it's the same > issue for inbounds on GEP: see llvm bug 23527)
Adam Nemet
2015-Jun-29 18:54 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
> On Jun 26, 2015, at 4:01 PM, Bjarke Roune <broune at google.com> wrote: > > *** Summary > I'd like to propose (and implement) functionality in LLVM to determine when a poison value from an instruction is guaranteed to produce undefined behavior. I want to use that to improve handling of nsw, inbounds etc. flags in scalar evolution and LSR. I imagine that there would be other uses for it. I'd like feedback on this idea before I proceed with it. > > > *** Details > Poison values do produce undefined behavior if the poison becomes externally observable. A load or store to a poison address value is externally observable and I'd like to use that in a simple analysis pass to derive guarantees that certain overflows would produce undefined behavior, not just poison. > > Scalar evolution (and hence LSR) cannot currently make much use of the nsw and similar flags on instructions. That is because two instructions can map to the same scev even if one instruction has the nsw flag and the other one does not. If we applied the nsw flag to the scev, the scev for the instruction without the nsw flag would then incorrectly have the nsw flag. > > Scalar evolution would be able to use the nsw flag from an instruction for recurrences when the loop header dominates the entire loop, the instruction with nsw post-dominates the loop header and undefined behavior is guaranteed on wrap via the poison value analysis pass that I'd like to write. > > What do you think? Do we already have something similar to this? > > Bjarke > > > > *** PS: What got me thinking about this: > My immediate motivation is that I'd like LSR to be able to create induction variables for expressions like &ptr[i + offset] where i and offset are 32 bit integers, ptr is a loop-invariant 64 bit pointer, i is an induction variable and offset is loop-invariant. For that to happen, scalar evolution needs to propagate the nsw flag from i + offset to the scev so that it can transform > > ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> > > to > > {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop>I guess what I am missing here why indvars does not create an i64 induction variable for this? Adam> > Currently the inner <nsw> is actually <nw>, which blocks the transformation (the outer two nsw's shouldn't currently be there either, as it's the same issue for inbounds on GEP: see llvm bug 23527) > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Jingyue Wu
2015-Jun-30 03:57 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
Hi Adam, Indvar widening can sometimes be harmful for architectures (e.g. NVPTX and AMDGPU) where wider integer operations are more expensive ( https://llvm.org/bugs/show_bug.cgi?id=21148). For this reason, we disabled indvar widening in NVPTX in http://reviews.llvm.org/D6196. Hope it helps. Jingyue On Mon, Jun 29, 2015 at 11:59 AM Adam Nemet <anemet at apple.com> wrote:> > > On Jun 26, 2015, at 4:01 PM, Bjarke Roune <broune at google.com> wrote: > > > > *** Summary > > I'd like to propose (and implement) functionality in LLVM to determine > when a poison value from an instruction is guaranteed to produce undefined > behavior. I want to use that to improve handling of nsw, inbounds etc. > flags in scalar evolution and LSR. I imagine that there would be other uses > for it. I'd like feedback on this idea before I proceed with it. > > > > > > *** Details > > Poison values do produce undefined behavior if the poison becomes > externally observable. A load or store to a poison address value is > externally observable and I'd like to use that in a simple analysis pass to > derive guarantees that certain overflows would produce undefined behavior, > not just poison. > > > > Scalar evolution (and hence LSR) cannot currently make much use of the > nsw and similar flags on instructions. That is because two instructions can > map to the same scev even if one instruction has the nsw flag and the other > one does not. If we applied the nsw flag to the scev, the scev for the > instruction without the nsw flag would then incorrectly have the nsw flag. > > > > Scalar evolution would be able to use the nsw flag from an instruction > for recurrences when the loop header dominates the entire loop, the > instruction with nsw post-dominates the loop header and undefined behavior > is guaranteed on wrap via the poison value analysis pass that I'd like to > write. > > > > What do you think? Do we already have something similar to this? > > > > Bjarke > > > > > > > > *** PS: What got me thinking about this: > > My immediate motivation is that I'd like LSR to be able to create > induction variables for expressions like &ptr[i + offset] where i and > offset are 32 bit integers, ptr is a loop-invariant 64 bit pointer, i is an > induction variable and offset is loop-invariant. For that to happen, scalar > evolution needs to propagate the nsw flag from i + offset to the scev so > that it can transform > > > > ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> > > > > to > > > > {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop> > > I guess what I am missing here why indvars does not create an i64 > induction variable for this? > > Adam > > > > > > Currently the inner <nsw> is actually <nw>, which blocks the > transformation (the outer two nsw's shouldn't currently be there either, as > it's the same issue for inbounds on GEP: see llvm bug 23527) > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150630/fac963bd/attachment.html>
Hal Finkel
2015-Jun-30 21:30 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
----- Original Message -----> From: "Bjarke Roune" <broune at google.com> > To: llvmdev at cs.uiuc.edu > Sent: Friday, June 26, 2015 6:01:35 PM > Subject: [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution > > > > > *** Summary > I'd like to propose (and implement) functionality in LLVM to > determine when a poison value from an instruction is guaranteed to > produce undefined behavior. I want to use that to improve handling > of nsw, inbounds etc. flags in scalar evolution and LSR. I imagine > that there would be other uses for it. I'd like feedback on this > idea before I proceed with it. > > > > > *** Details > Poison values do produce undefined behavior if the poison becomes > externally observable. A load or store to a poison address value is > externally observable and I'd like to use that in a simple analysis > pass to derive guarantees that certain overflows would produce > undefined behavior, not just poison. > > > Scalar evolution (and hence LSR) cannot currently make much use of > the nsw and similar flags on instructions. That is because two > instructions can map to the same scev even if one instruction has > the nsw flag and the other one does not. If we applied the nsw flag > to the scev, the scev for the instruction without the nsw flag would > then incorrectly have the nsw flag. > > > Scalar evolution would be able to use the nsw flag from an > instruction for recurrences when the loop header dominates the > entire loop, the instruction with nsw post-dominates the loop header > and undefined behavior is guaranteed on wrap via the poison value > analysis pass that I'd like to write. >My understanding is that SCEV currently only uses no-wrap flags that it can independently prove, in part, because SCEVExpander might be used to materialize the expressions outside of the loop body (in the pre-header, for example). Andy, can you comment on the constraints here? -Hal> > > What do you think? Do we already have something similar to this? > > > > Bjarke > > > > > > > > *** PS: What got me thinking about this: > My immediate motivation is that I'd like LSR to be able to create > induction variables for expressions like &ptr[i + offset] where i > and offset are 32 bit integers, ptr is a loop-invariant 64 bit > pointer, i is an induction variable and offset is loop-invariant. > For that to happen, scalar evolution needs to propagate the nsw flag > from i + offset to the scev so that it can transform > > > ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> > > > > to > > > > {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop> > > > > Currently the inner <nsw> is actually <nw>, which blocks the > transformation (the outer two nsw's shouldn't currently be there > either, as it's the same issue for inbounds on GEP: see llvm bug > 23527) > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Bjarke Roune
2015-Jul-01 00:27 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
Hi Sanjoy, thanks for your thoughts on this. On Sat, Jun 27, 2015 at 12:16 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > First of all, going by the "poison causes UB only when observed", SCEV > does not do the right thing currently: [...] > > That seems like a bug? There's also bug 23527 for GEP. Sounds like theremight be more such bugs. One way to get what you want and also fix this existing bug(?) is to> make the no wrap flags part of a SCEV expression's identity (i.e. make > {A,+,B}<nuw> a different expression from {A,+,B}). If you start with > the precondition that every <nuw> [0] came from a <nuw> [1] present in > the IR, then you don't really have to worry about control dependence > -- you can always optimize under the assumption that the SCEV > expression does not unsign-overflow; if such an optimization changes > program behavior then that program has UB since it was data dependent > on poison [2].To make sure that I understand correctly, are you suggesting making the nuw (and similar) flags part of the key when looking up an SCEV in the UniqueSCEVs member variable on ScalarEvolution? That way, an instruction with nuw and one without will not map to the same SCEV unless the nuw can be inferred in some other way. Sounds good to me, I'm happy to go with either one of that or inferring UB from poison. Adding the flags to the key/identity does break the idea that mathematically equivalent expressions should cancel if subtracted. Andrew Trick wrote something about that previously: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141215/249390.html Andrew: I wonder what your view is on this? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150630/11ad37f8/attachment.html>
Andrew Trick
2015-Jul-01 18:49 UTC
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
> On Jun 30, 2015, at 4:30 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> From: "Bjarke Roune" <broune at google.com> >> To: llvmdev at cs.uiuc.edu >> Sent: Friday, June 26, 2015 6:01:35 PM >> Subject: [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution >> >> >> >> >> *** Summary >> I'd like to propose (and implement) functionality in LLVM to >> determine when a poison value from an instruction is guaranteed to >> produce undefined behavior. I want to use that to improve handling >> of nsw, inbounds etc. flags in scalar evolution and LSR. I imagine >> that there would be other uses for it. I'd like feedback on this >> idea before I proceed with it. >> >> >> >> >> *** Details >> Poison values do produce undefined behavior if the poison becomes >> externally observable. A load or store to a poison address value is >> externally observable and I'd like to use that in a simple analysis >> pass to derive guarantees that certain overflows would produce >> undefined behavior, not just poison. >> >> >> Scalar evolution (and hence LSR) cannot currently make much use of >> the nsw and similar flags on instructions. That is because two >> instructions can map to the same scev even if one instruction has >> the nsw flag and the other one does not. If we applied the nsw flag >> to the scev, the scev for the instruction without the nsw flag would >> then incorrectly have the nsw flag. >> >> >> Scalar evolution would be able to use the nsw flag from an >> instruction for recurrences when the loop header dominates the >> entire loop, the instruction with nsw post-dominates the loop header >> and undefined behavior is guaranteed on wrap via the poison value >> analysis pass that I'd like to write. >> > > My understanding is that SCEV currently only uses no-wrap flags that it can independently prove, in part, because SCEVExpander might be used to materialize the expressions outside of the loop body (in the pre-header, for example). Andy, can you comment on the constraints here?I was hesitant to jump in for fear of rehashing all of the poison/nsw issues and failing to do it justice. But I'll try to give a quick summary: The IR optimizer is free to transform operations in any way that preserves observable side effects. So when no-wrap flags appear on a side-effect free operation what does it mean? I say that those flags only have meaning with respect to that operation's transitive uses that induce side effects (hence poison). Now the idea behind SCEV is to provide an abstract representation of an expression independent of any constraints that do not effect the value of the expression. That allows expressions to be canonicalized and gracefully reduced. Value equivalence becomes expression identity. The poses a conundrum whenever we have multiple IR values with expression equivalence but different constraints. The users of one IR value may anticipate nsw behavior, while the users of another IR value do not. There's no guarantee of control equivalence across all these users, so the same constraints don't need to hold across them. The current compromise is to allow nsw to creep into SCEV only in the case of recurrences. The primary need for SCEV is for induction variable analysis, so we mainly care about the presence of flags on recurrences. In order to optimize as expected based on C/C++ undefined behavior, SCEV has to play loose with the poison rules as Sanjoy pointed out. Essentially, we assume that an IR value marked nsw has an observable use if it is live across loop iterations. Even if this is not strictly true in IR, it's almost certain that undefined behavior semantics would still cover us in that case. I think that Sanjoy's work on inferring nsw on SCEV recurrences from constraints present in the IR is great and avoids the issue above (if you prove nsw for an expression within a loop, the def-use chain no longer matters). I also like what Adam did recently in relying on SCEV nsw on a recurrence but analyzing address expressions at IR level. In general, I like progress in the direction of strengthening LLVM's analysis of value ranges along with adding metadata and explicit markers like llvm.assume, rather than relying on poison and undefined behevior, which I view as an unfortunate legacy. Andy>> >> >> What do you think? Do we already have something similar to this? >> >> >> >> Bjarke >> >> >> >> >> >> >> >> *** PS: What got me thinking about this: >> My immediate motivation is that I'd like LSR to be able to create >> induction variables for expressions like &ptr[i + offset] where i >> and offset are 32 bit integers, ptr is a loop-invariant 64 bit >> pointer, i is an induction variable and offset is loop-invariant. >> For that to happen, scalar evolution needs to propagate the nsw flag >> from i + offset to the scev so that it can transform >> >> >> ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw> >> >> >> >> to >> >> >> >> {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop> >> >> >> >> Currently the inner <nsw> is actually <nw>, which blocks the >> transformation (the outer two nsw's shouldn't currently be there >> either, as it's the same issue for inbounds on GEP: see llvm bug >> 23527) >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150701/98633e54/attachment.html>
Apparently Analagous Threads
- [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
- [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
- [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
- [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
- Improving SCEV's behavior around IR level no-wrap flags