Nuno Lopes via llvm-dev
2017-Jun-07 21:23 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions. Sure, we could drop nsw when moving these instructions, but there are still many other problems left. Please read more about the topic here: https://blog.regehr.org/archives/1496 For example, doing loop unswitching, followed by inline (just to illustrate that the reasoning cannot be local), plus GVN gives you end-to-end miscompilations easily. Nuno -----Original Message----- From: Peter Lawrence via llvm-dev Sent: Wednesday, June 7, 2017 8:48 PM Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison Sanjoy, In that case I change my answer and say that the illegal optimization was (step 1) hoisting the “nsw” out of the if-statement, When inside the protection of the if-clause the “nsw” was conditional on the if-clause, but when you hoisted it out of the if-statement there is no reason to continue assuming “nsw” is true. (This is true regardless of how the original “nsw” got there, either by language definition or by analysis). “Nsw” isn’t an opcode, it is analysis information, and as we all know Analysis information can and does get invalidated by transformations. This is a case in point. I think the “nsw” got hoisted out of the if-statement by accident, assuming it was part of the opcode, but we have just seen that this is incorrect. - - - - - - - - - - - - - - - - - - - - If the above analysis is correct (I am pretty sure it is, but have been known to make mistakes!), then my concern is that since this example was incorrectly root-caused I am not sure we are ready to commit to “poison” and “freeze” changes, and need to see more original source causes of end-to-end-miscompilation to be sure we haven’t also mis-diagnosed those as well. Peter Lawrence. On Jun 5, 2017, at 12:02 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: Hi Peter, On Mon, Jun 5, 2017 at 9:43 AM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote: I think it is valid by definition -- since "A *nsw B" does not sign wrap, "sext(A *nsw B)" == (sext A) * (sext B). In other words, these two expressions are inequal only if "A *nsw B" sign wraps. The way you use “nsw” it seems like an assertion or an assumption. Yes, that is correct. Where did the “nsw” come from, is it from the source language definition, Or compiler derived analysis information ? Both are possible. In C/C++, as you know, most (all?) signed arithmetic can be marked nsw. In many cases the optimizer will also infer nsw if it can prove the arithmetic does not overflow (e.g. if (a != INT_SMAX) b = a + 1; ==> if (a != INT_SMAX) b = a+nsw 1;). -- Sanjoy Here’s the original source example for context ——————————————————————————————— [llvm-dev] RFC: Killing undef and spreading poison Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org Fri Nov 11 23:55:32 PST 2016 Firstly consider this sequence of transforms: for (...) { if (condition){ i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (sum.prod >> 32)) use(t) } } ==> (step 1) multiplication and sext can be speculated i32 prod = x *nsw y i64 prod.sext = sext prod to i64 for (...) { if (condition){ i64 t = K `udiv` (-1 + (prod.sext >> 32)) use(t) } } ==> (step 2) known bits analysis on sext to prove divisor is non-zero i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } ==> (step 3) commute sext through nsw multiplicationa ;; i32 prod = x *nsw y i64 prod.sext = (sext x) *nsw (sext y) i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } Now we've managed to introduce a fault if we were in case 1 -- the source program was well defined but the target program divides by zero. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Peter Lawrence via llvm-dev
2017-Jun-08 16:41 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote: > > Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.Nuno, I’m not saying the operations can’t be moved, I’m saying that once you move one the ‘nsw’ attribute no longer applies, unless you can mathematically prove that it still does, otherwise an “add nsw” has to be converted to plain “add”. It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example. Don’t you agree ? -Peter.> Sure, we could drop nsw when moving these instructions, but there are still many other problems left. Please read more about the topic here: https://blog.regehr.org/archives/1496 > For example, doing loop unswitching, followed by inline (just to illustrate that the reasoning cannot be local), plus GVN gives you end-to-end miscompilations easily. > > Nuno > > -----Original Message----- From: Peter Lawrence via llvm-dev > Sent: Wednesday, June 7, 2017 8:48 PM > Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison > > > Sanjoy, > In that case I change my answer and say that the illegal > optimization was (step 1) hoisting the “nsw” out of the if-statement, > > When inside the protection of the if-clause the “nsw” was conditional on > the if-clause, but when you hoisted it out of the if-statement there is no > reason to continue assuming “nsw” is true. (This is true regardless of how > the original “nsw” got there, either by language definition or by analysis). > > “Nsw” isn’t an opcode, it is analysis information, and as we all know > Analysis information can and does get invalidated by transformations. > This is a case in point. > > I think the “nsw” got hoisted out of the if-statement by accident, assuming > it was part of the opcode, but we have just seen that this is incorrect. > > - - - - - - - - - - - - - - - - - - - - > > If the above analysis is correct (I am pretty sure it is, but have been known > to make mistakes!), then my concern is that since this example was > incorrectly root-caused I am not sure we are ready to commit to “poison” > and “freeze” changes, and need to see more original source causes of > end-to-end-miscompilation to be sure we haven’t also mis-diagnosed those > as well. > > > Peter Lawrence. > > > > > > On Jun 5, 2017, at 12:02 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > > Hi Peter, > > On Mon, Jun 5, 2017 at 9:43 AM, Peter Lawrence > <peterl95124 at sbcglobal.net> wrote: > I think it is valid by definition -- since "A *nsw B" does not sign > wrap, "sext(A *nsw B)" == (sext A) * (sext B). In other words, these > two expressions are inequal only if "A *nsw B" sign wraps. > > The way you use “nsw” it seems like an assertion or an assumption. > > Yes, that is correct. > > Where did the “nsw” come from, is it from the source language definition, > Or compiler derived analysis information ? > > Both are possible. In C/C++, as you know, most (all?) signed > arithmetic can be marked nsw. In many cases the optimizer will also > infer nsw if it can prove the arithmetic does not overflow (e.g. if (a > != INT_SMAX) b = a + 1; ==> if (a != INT_SMAX) b = a+nsw 1;). > > -- Sanjoy > > > > Here’s the original source example for context > ——————————————————————————————— > > [llvm-dev] RFC: Killing undef and spreading poison > > Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org > > Fri Nov 11 23:55:32 PST 2016 > > Firstly consider this sequence of transforms: for (...) { if (condition){ i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (sum.prod >> 32)) use(t) } } ==> (step 1) multiplication and sext can be speculated i32 prod = x *nsw y i64 prod.sext = sext prod to i64 for (...) { if (condition){ i64 t = K `udiv` (-1 + (prod.sext >> 32)) use(t) } } ==> (step 2) known bits analysis on sext to prove divisor is non-zero i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } ==> (step 3) commute sext through nsw multiplicationa ;; i32 prod = x *nsw y i64 prod.sext = (sext x) *nsw (sext y) i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } Now we've managed to introduce a fault if we were in case 1 -- the source program was well defined but the target program divides by zero. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Sanjoy Das via llvm-dev
2017-Jun-08 17:33 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Hi Peter, On Thu, Jun 8, 2017 at 9:41 AM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> >> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote: >> >> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions. > > Nuno, > I’m not saying the operations can’t be moved, > I’m saying that once you move one the ‘nsw’ attribute no longer applies, > unless you can mathematically prove that it still does, > otherwise an “add nsw” has to be converted to plain “add”. > > It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement > that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.That would be correct (and we do this for some constructs: for instance when we have !dereferenceable attached to a load instruction we will strip the !dereferenceable when hoisting it out of control flow). However, with poison we want to have our cake and eat it too (perhaps eating is not the best analogy with poison :) ) -- we want to (whenever correct) exploit the fact that a certain operation does not overflow when possible even when hoisting it above control flow. For instance, if we have: if (cond) { t = a +nsw b; print(t); } Now if once we hoist t out of the control block: t = a +nsw b; if (cond) { print(t); } in the transformed program, t itself may sign overflow. In LLVM IR (or at least in the semantics we'd like), this has no correctness implications -- t becomes "poison" (which is basically deferred undefined behavior), and the program is undefined only if the generated poison value is used in a "side effecting" manner. Assuming that print is a "side effect", this means at print, we can assume t isn't poison (and thus a + b did not sign overflow). This is a weaker model than C/C++; and the difficult bits are getting the poison propagation rules correct, and to have a sound definition of a "side effect" (i.e. the points at which poison == deferred UB actually becomes UB).> I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?FYI, I think it is poor form to insinuate such things when you clearly haven't made an effort to dig back and understand the all of prior discussions we've had in this area (hint: we've discussed and explicitly decided to not implement the semantics you're suggesting). Of course, fresh ideas are always welcome but I suggest you start by first reading http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf and some of the mailing list discussions we've had in the past on this topic. Thanks! -- Sanjoy