Christof Douma via llvm-dev
2016-Sep-23 14:50 UTC
[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags
On 23 Sep 2016, at 11:08, Andrew Trick via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> >> On Sep 23, 2016, at 2:09 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >> >> Hi all, >> >> This is about a project I've been prototyping on-and-off for a while >> that has finally reached a point where I can claim it to be >> "potentially viable". I'd like to gather some input from the >> community before moving too far ahead. >> >> >> # The problem >> >> There is a representation issue within SCEV that prevents it from >> fully using information from nsw/nuw flags present in the IR. This >> isn't just a theoretical issue, e.g. today LLVM won't unroll this >> loop: >> >> void f(int x, long* arr) { >> for (int i = x + 1; i < x + 3; i++) >> arr[i] = 40; >> } >> >> since SCEV is unable to exploit the no-overflow on x+1 and x+3 to >> prove that the loop only runs twice. >> >> The fundamental problem here is that SCEV expressions are unique'd but >> the nsw/nuw flags on SCEV expressions are not part of the key they're >> unique'd by. Instead, nsw/nuw flags on SCEV expressions are expressed >> by mutating the SCEV expressions in place. >> >> This means "add %x, 1" and "add nsw %x, 1" both map to the _same_ SCEV >> expression (that is, literally the same SCEV* object), and we can't >> mutate the common SCEV expression to flag it as nsw since that will >> incorrectly denote "add %x, 1" as nsw too. >> >> In general, this means SCEV has to be very conservative about marking >> SCEV expressions as no-wrap. In some cases (e.g. the loop above), >> this ends up being excessively conservative. >> >> One path forward is to have SCEV try to prove that if a certain >> operation produces poison, the program definitely has undefined >> behavior. This can let us mutate the corresponding SCEV objects to >> pull the "nsw"-ness into SCEV. For instance, if we have >> >> %x = load ... >> %t = add i32 nsw %x, 1 >> %addr = gep(%array, %t) >> store i32 0, %addr >> %t2 = add i32 %x, 1 >> >> then transferring NSW to getSCEV(%t) is okay, since even though %t2 >> (which will be mapped to the same SCEV expression as %t) does not have >> "nsw" on the instruction, we know adding 1 to %x cannot overflow since >> the program would have UB otherwise. >> >> Bjarke Roune has implemented some of this. However, this is difficult >> to do for cases like the x+1 .. x+3 loop above without running a >> control flow analysis over the entire function. And this approach >> does not work in the presence of function calls or general control >> flow, like >> >> %x = load ... >> %t = add i32 nsw %x, 1 >> call void @f() >> %addr = gep(%array, %t) >> store i32 0, %addr >> >> or >> >> %x = load ... >> %t = add i32 nsw %x, 1 >> if (<condition>) >> return; >> %addr = gep(%array, %t) >> store i32 0, %addr >> >> since unless the side-effecting use of %t (the store) "strongly"[1] >> post dominates the def of %x, there is no guaranteed undefined >> behavior on a poisonous %t. Things are even more complex if %x is not >> a load, but an expression SCEV an look through, like an add or a shift >> by a constant. > > Great explanation. > >> *I think the current representation of nsw/nuw in SCEV expressions is >> not congruent with LLVM's specification of poison values, and that is >> blocking us from exploiting poison values as intended by LLVM's >> design.* > > I’m not 100% sure what you mean, but since SCEV expressions don’t represent IR values, mapping poison semantics, which are specific to IR values, onto SCEV expressions never made sense to me.As far as I understand this, the reasoning is that in SCEV the flags must be interpreted as UB/not-allowed to happen because of SCEV expressions can represent instructions in different contexts (e.g. in different execution paths). In the IR these flags mean poison. This is a semantical mismatch that gives a big proof burden on SCEV. To give an example (test/Analysis/ScalarEvolution/nsw.ll - addnsw): 01: entry: 02: %tmp = add i32 %a, %b 03: %cmp = icmp sgt i32 %tmp, 0 04: br i1 %cmp, label %greater, label %exit 05: greater: 06: %tmp2 = add nsw i32 %a, %b 07: br label %exit 08: exit: 09: %result = phi i32 [ %a, %entry ], [ %tmp2, %greater ] 10: ret i32 %result In the following example line 2 and 6 are mapped to a single SCEV expressions: "(%a + %b)”. The SCEV version of ‘NSW' does not hold because at line 2 the instruction might actually wrap around. Up till now I had not realised that sharing of SCEV expressions was the only reason for SCEV to have this different semantics.> >> # The proposed solution >> >> Since poison values are, well, _values_, I propose we model them as >> data within SCEV. We treat nsw/nuw flags as "operands" since they >> contribute to the result of an SCEV expression just like normal inputs >> to the expression. >> >> This means we'd treat "add %x, %y" as a different SCEV expression than >> "add nsw %x, %y", since the latter sometimes produces poison while the >> former doesn't. The latter would be known to not overflow, and SCEV >> would use that fact in the usual ways. > > Ok. What happens when you canonicalize the SCEV expressions? Reassociation is just one of the ways the nsw flags will be lost. > >> With this change SCEV expressions will be pointer equal less often, >> and while relying on pointer equality for value equality will be >> correct, it will be somewhat pessimistic; and we'll have to implement >> and use some form of structural equality. >> >> In other words, some places that do >> >> SCEV *X = ... >> SCEV *Y = ... >> if (X == Y) >> ... >> >> will have to be changed to do >> >> SCEV *X = ... >> SCEV *Y = ... >> if (X->equals(Y)) >> ... >> >> This has potential for compile-time regressions. Hopefully they'll >> all be addressable. > > That’s the main reason we never did this. It wasn’t clear to me what impact it might have on reducing SCEV expressions in general. Are you saying that this will not affect the outcome of SCEV reduction/canonicalization as long as we pay the cost of slow comparisons? What about nested expressions? Won’t this require scanning the whole SCEV tree for each comparison?There is a chance that due to the folding, the nsw and non-nsw nodes will look quite different. Currently SCEV does represent all these nodes identically. Part of this might be recovered by moving the current code in scalar evolution that performs the strong post-dominator analysis to a new pass that propagate nsw flags in the IR, but that will likely be only a small part now that SCEV is going to add NSW to many more expressions (which is the whole point).>> There are cases in which SCEV (via trip count analysis, say) can >> _prove_ that a certain expression does not overflow. In those cases >> we will mutate the SCEV expression to indicate no-wrap; since the >> no-wrap flag is just a "cache" of a proof based on the structure of >> the SCEV expression, and _does_ apply to all SCEV expressions with the >> same shapes. > > This is getting confusing. I think you’re saying that add %x, %y, and add nsw %x, %y should be different SCEV opcodes. In addition, you will keep the nsw flags independent of the opcode and only as a cache of the analysis. So add nsw %x, %y will always have the flag set, and add %x, %y may or may not have the flag. > > Let’s remember that SCEV expressions are immutable and, currently, nsw/nuw really is just a cache of if side info hanging off the SCEV. So that doesn’t really change. > > SCEV should really be a stable analysis. You should get the same expressions for two values regardless of which you compute first. Today that’s not the case, and it’s crazy. I don’t think you’re solving that problem.I hope that I understood Sanjoy its attempt correctly. I belief that a non-wrap and a wrap instruction result into two different SCEV expressions.> >> Concretely, we'll endow relevant SCEV expression types with two sets >> distinct of flags: >> >> - AxiomaticFlags: These flags follow from nsw/nuw annotations in the >> IR. These will be part of the key the SCEV expression is unique'd >> on. >> - ComputedFlags: These flags are derived from the structure of the >> SCEV expression, and they're *not* a part of the key the SCEV >> expression is unique'd on. >> >> For the purposes of consumption, there will be no difference between >> AxiomaticFlags and ComputedFlags. Consumers will get a union of the >> two when they ask for the set of flags that apply to a specific SCEV >> expression. >> >> ComputedFlags will, in general, depend on AxiomaticFlags. For >> instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we can add >> "nuw" to its ComputedFlags. There is no need to further distinguish >> "{1,+,1}-axiomatic<nsw>" on the computed<nuw> dimension since >> "{1,+,1}-axiomatic<nsw>" will always be computed<nuw>. > > Great. That answers my question above. > >> What do you think? Does the overall picture here make sense? > > Makes sense except for the issues I raised above.It is definitely something that improves SCEV its capabilities. And I love to see SCEV get the iteration count on that loop correct so it can be unrolled. I just hope the issue Andrew raised is not too bad.> >> Alternate solutions are also more than welcome (especially if they're >> easier to implement!). > > The alternative is to represent proofs about IR values independent from SCEV. I don’t know how much net value we get out of embedding these facts within SCEV (aside from the convenience given the current implementation). SCEV is about canonicalizing and algebraic reduction so that simple relations between IR values can be discovered. Of course, computing trip count needs this information, but is could always look elsewhere for those facts. The question is what SCEV reductions themselves need the nsw/nuw info, vs. simply propagating the flags. Can those reductions in turn be pulled out of as some high level analysis? I think sign-extend elimination would need some serious thought. But the current situation with reducing truncations/extensions in SCEV is not good anyway. It defeats SCEVs canonical form—you can get different answers depending on the order of reductions. > > -AndyI can’t help to ask. Why not define a wrapping nsw instruction as UB, instead of “delayed UB” aka poison? I believe we have the notion of poison solely to ease the movement of instructions. In my example above line 6 can be moved to line 2 due to poison. If the wrapping was UB such a move is still possible but would require dropping the nsw flag, nothing more. If this information loss is considered a problem, the use of llvm.assume can counter that. Has somebody ever looked at not having poison at all? What are the downsides of that?> >> Thanks, >> -- Sanjoy >> >> [1]: That is, it the store is guaranteed to execute once the load has >> been issued. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Andrew Trick via llvm-dev
2016-Sep-23 17:44 UTC
[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags
> On Sep 23, 2016, at 7:50 AM, Christof Douma <Christof.Douma at arm.com> wrote: > > I can’t help to ask. Why not define a wrapping nsw instruction as UB, instead of “delayed UB” aka poison? I believe we have the notion of poison solely to ease the movement of instructions. In my example above line 6 can be moved to line 2 due to poison. If the wrapping was UB such a move is still possible but would require dropping the nsw flag, nothing more. If this information loss is considered a problem, the use of llvm.assume can counter that. > > Has somebody ever looked at not having poison at all? What are the downsides of that?That’s basically incompatible with transformation of the IR. I’m sure that’s been explained by DanG in the past, but I don’t have a link. e.g. the optimizer doesn’t think that an ‘add’ has side effects and wants to freely hoist and combine these operations. If the operation has an ‘nsw’ flag that would change the UB semantics. I have to say, my heart is not really into optimizing for UB. Is this relevant to any scenario other than legacy C-without-overflow/bounds checks? -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160923/fa7ded68/attachment.html>
Sanjoy Das via llvm-dev
2016-Sep-24 02:36 UTC
[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags
Hi Christof, Christof Douma wrote: > As far as I understand this, the reasoning is that in SCEV the flags > must be interpreted as UB/not-allowed to happen because of SCEV > expressions can represent instructions in different contexts (e.g. in > different execution paths). In the IR these flags mean poison. This is > a semantical mismatch that gives a big proof burden on SCEV. > > To give an example (test/Analysis/ScalarEvolution/nsw.ll - addnsw): > > 01: entry: > 02: %tmp = add i32 %a, %b > 03: %cmp = icmp sgt i32 %tmp, 0 > 04: br i1 %cmp, label %greater, label %exit > 05: greater: > 06: %tmp2 = add nsw i32 %a, %b > 07: br label %exit > 08: exit: > 09: %result = phi i32 [ %a, %entry ], [ %tmp2, %greater ] > 10: ret i32 %result > > In the following example line 2 and 6 are mapped to a single SCEV > expressions: "(%a + %b)”. The SCEV version of ‘NSW' does not hold > because at line 2 the instruction might actually wrap around. > > Up till now I had not realised that sharing of SCEV expressions was > the only reason for SCEV to have this different semantics. Yeah, that's how poison is intended to work. The branch adds nothing, you have the same problem with > 01: %tmp = add i32 %a, %b > 02: %cmp = icmp sgt i32 %tmp, 0 > 03: %tmp2 = add nsw i32 %a, %b However, lack of context sensitivity does mean that in %t0 = add i32 %a, 1 if (%a s< 100) %t1 = add i32 %a, 1 SCEV cannot mark the expression for %t1 as nsw, even though that addition won't sign-overflow; since, as you said, both %t0 and %t1 will be mapped to the same SCEV expression, in the current scheme as well as in the new scheme I'm proposing. In the new scheme, there is a solution for cases like the above: teach instcombine or LVI etc. to transform that IR to %t0 = add i32 %a, 1 if (%a s< 100) %t1 = add nsw i32 %a, 1 and then let the normal logic (that will be newly added in SCEV) kick in. > I can’t help to ask. Why not define a wrapping nsw instruction as > UB, instead of “delayed UB” aka poison? I believe we have the notion > of poison solely to ease the movement of instructions. In my example > above line 6 can be moved to line 2 due to poison. If the wrapping was > UB such a move is still possible but would require dropping the nsw > flag, nothing more. If this information loss is considered a problem, > the use of llvm.assume can counter that. > > Has somebody ever looked at not having poison at all? What are the > downsides of that? That's just the way things are at the moment. We _could_ change that (and make overflowing nsw/nuw operations undefined behavior) but since that is a very deep change to LLVM IR (basically arithmetic will be sometimes non-speculatable), it has a very high bar for acceptance. -- Sanjoy
Sanjoy Das via llvm-dev
2016-Sep-24 05:32 UTC
[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags
Hi Andy, Andrew Trick wrote: >> On Sep 23, 2016, at 7:50 AM, Christof Douma <Christof.Douma at arm.com <mailto:Christof.Douma at arm.com>> wrote: >> >> I can’t help to ask. Why not define a wrapping nsw instruction as UB, instead of “delayed UB” aka poison? I believe we >> have the notion of poison solely to ease the movement of instructions. In my example above line 6 can be moved to line >> 2 due to poison. If the wrapping was UB such a move is still possible but would require dropping the nsw flag, nothing >> more. If this information loss is considered a problem, the use of llvm.assume can counter that. >> >> Has somebody ever looked at not having poison at all? What are the downsides of that? > > That’s basically incompatible with transformation of the IR. I’m sure that’s been explained by DanG in the past, but I > don’t have a link. e.g. the optimizer doesn’t think that an ‘add’ has side effects and wants to freely hoist and combine > these operations. If the operation has an ‘nsw’ flag that would change the UB semantics. +1. > I have to say, my heart is not really into optimizing for UB. Is this relevant to any scenario other than legacy > C-without-overflow/bounds checks? I don't think we'll be able to give up on C and C++ inspired optimizations anytime soon. Given what I've seen on llvm-dev and bugzilla, it is pretty clear to me that we have to do _something_ to address the missing pieces here. -- Sanjoy