> > > Message: 5 > Date: Tue, 25 Jul 2017 00:12:35 -0700 > From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> > To: Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> > Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>, John Regehr > <regehr at cs.utah.edu <mailto:regehr at cs.utah.edu>> > Subject: Re: [llvm-dev] GEP with a null pointer base > > On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> > wrote: > >> Sean, >> >> Dan Gohman’s “transform” changes a loop induction variable, but does not >> change the CFG, >> >> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it. >> >> These are two totally different transforms. >> >> And even the analysis is different, >> >> The first is based on an *assumption* of non-UB (actually there is no >> analysis to perform) >> >> the second Is based on a *proof* of existence of UB (here typically some >> non-trivial analysis is required) >> >> These have, practically speaking, nothing in common. >> > > > Ah, I think I see your point of confusion now. They are actually the same > thing, but one has a hidden step. You can consider the induction variable > transformation to be something like this: > > > ``` > a = add nsw i32 b, c > ``` > > 1. ==> widen `a` and expand the add into: > > ``` > if ("add i32 b, c" does not overflow) { > a = add nsw i64 b, c > } else { > a = (cast to i64) add nsw i32 b, c > } > ``` > > 2. ==> if the `else` branch executes, then we provably have UB. > > ``` > a = add nsw i64 b, c > ``` > > > The last step is the same as in Hal's example where we delete a part of the > CFG by proving that if we reach it there is guaranteed to be UB. In Hal's > example, the frontend marked the shift as having UB if the shift amount > exceeds the bitwidth. In this case, the frontend marked it as > non-overflowing. In this case, the fact that it is guaranteed to overflow > in the `else` branch has to be proven by looking at a dominating > conditional check instead of being manifest from a peephole observation > (after inlining/constant folding) as it is in Hal's example. > > Hopefully that makes sense. If not, could you indicate specifically which > of transformations 1. and/or 2. you think is illegal and why? > > -- Sean Silva >Sean, Yes I can see your point, but in practice that’s not how it’s done, no one expands the “+nsw” into control flow and then applies Hal’s transform to it to get Dan’s transform. Dan’s transform doesn’t alter the _original_ CFG and is done as part of induction-variable analysis and simplification. Hal’s does alter the _original_ CFG, and uses a totally different analysis as well as different transformation. The only thing they have in common is the assumption that the program is UB-free at runtime, but they have nothing in common in implementation. So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are unnecessary. And we’re missing the point, which is this, why ever delete UB code which can cover up a bug, and more importantly, why ever make that the default, given the emphasis in software engineering of avoiding UB at the source level. Why are we implementing both -fsanitize=undefined and deleting undefined as an optimization. Peter.> >> >> >> Peter Lawrence. >> >> >> >> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote: >> >> >> >> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> >> wrote: >> >> >> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >> >> >> >> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev < >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >>> Chandler, >>> The only thing David made clear that wasn’t already clear >>> is that he believes UB to be “comparatively rare”, which is in agreement >>> with what Hal already said which is that he does not expect deleting >>> UB will be of benefit to for example SPEC benchmarks. >>> >>> Given that it is “comparatively rare”, why all the effort to delete it ? >>> >> And why make deleting it the default rather than warning about it ? >>> >> >> There seems to be some confusion/misunderstanding here. My best >> understanding is that when David said this: >> >> "The cases where the compiler can statically prove that undefined >> behaviour is present are comparatively rare." >> >> What he was referring to/describing was a contrast with the optimizations >> described prior to that. >> >> It's something like this: >> >> UB-based optimizations don't prove UB is present - they optimize on the >> assumption that it is not present due to some unproven (by the compiler, >> but assumed to be known by the developer) invariants in the program. >> >> Think about a simple case like array bounds - the compiler emits an >> unconditional load to the memory because it assumes the developer correctly >> validated the bounds or otherwise constructed so that out of bounds indexes >> never reach that piece of code. This is quite common - that UB is assumed >> to not happen, and the compiler optimizes on this fact. >> >> What is less common, is for the compiler to be able to (in reasonable >> time) prove that UB /does/ happen (in many cases this would require complex >> interprocedural analysis - the array is defined in one function, maybe with >> a complex dynamic bound, then passed to another function and indexed using >> a non-trivial dynamic expression... statically proving that to be true or >> false is complex/expensive and so basically not done by any compiler - so >> any cases that are caught by the compiler are relatively trivial ("oh, you >> declared a const null pointer value, then dereferenced it within the same >> function", etc) & so don't happen very often (because they're also fairly >> obvious to developers too)) >> >> Does that help explain the difference/distinction being drawn here? >> >> >> >> >> Dave, >> perhaps you missed these parts of the discussion >> >> Here is the definition, acknowledged by Hal, of what we’re doing >> >> 1. Sometimes there are abstraction penalties in C++ code >> 2. That can be optimized away after template instantiation, function >> inlining, etc >> 3. When they for example exhibit this pattern >> if (A) { >> stuff; >> } else { >> other stuff including “undefined behavior”; >> } >> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen >> because >> In the C language standard it is the users responsibility to avoid it >> 5. Therefore in this example the compiler can a) delete the else-clause >> b) delete the if-cond, c) assume A is true and propagate that >> information >> >> >> >> And, here’s the optimization that according to Sean we’re using to delete >> UB >> >> [ … ] >> >> >> In other words, if we can prove "when program statement A executes then >> program statement B is guaranteed to execute and have undefined behavior" >> then we can assume that program statement A is never executed. >> >> In particular, deleting program statement A is a correct transformation. >> Deleting program statement B itself is a special case of this (when A = B). >> >> And yes, this does include everything up to and including `main`, >> intraprocedurally and interprocedurally. >> >> >> [ … ] >> >> >> -- Sean Silva >> >> >> >> This is entirely a separate issue from what Dan Gohman did to optimize >> sign extension >> of i32 induction variables out of loops for LP64 target machines, where >> the optimization >> is justified based on “the assumption that UB does not happen”, and no >> actual UB >> exists either statically or dynamically. >> >> >> Sorry, the way I phrased this in terms of program statements may have made >> this unclear, but this is precisely the particular case A=B that I >> mentioned. In this case, A=B="the induction variable increment" and we use >> that to deduce that the statement will not execute and overflow, which is >> what justifies the widening. >> >> Notice that I mentioned that deleting code is only a particular case. In >> the general case we deduce that dynamically something simply does not >> happen, which is what we do in order to prove that induction variable >> widening is safe (overflow cannot happen). >> >> There is nothing special about induction variable widening with respect to >> UB. It is justified by applying the same principles as all other UB-related >> transformations. >> >> >> Briefly, there is only one axiom in the compiler writer's toolbox w.r.t. >> UB and that is "the input program does not execute UB". Everything else is >> derived from that by pure logical reasoning. Does that make sense? Can you >> see how all the descriptions I gave above are derivable from that axiom? >> >> The common application of this is that we can assume any program property >> P whatsoever (not just liveness, but variable ranges, etc.) if we can prove >> that the program would execute UB should that property P fail to hold. >> >> >> -- Sean Silva >> >> >> But when it comes to actual provable UB the plan is to delete it. >> On that there is no confusion, and there is no mis-understanding. >> >> >> Peter Lawrence. >> >> >> >> >> >> - Dave >> >>> >>> Peter >>> >>> >>> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> >>> wrote: >>> >>> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev < >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>>> David, >>>> Here is the definition accepted by Hal of what we’re doing >>>> >>>>> 1. Sometimes there are abstraction penalties in C++ code >>>>> 2. That can be optimized away after template instantiation, function >>>> inlining, etc >>>>> 3. When they for example exhibit this pattern >>>>> if (A) { >>>>> stuff; >>>>> } else { >>>>> other stuff including “undefined behavior”; >>>>> } >>>>> 4. Where the compiler assumes “undefined behavior” doesn’t actually >>>> happen because >>>>> In the C language standard it is the users responsibility to avoid >>>> it >>>>> 5. Therefore in this example the compiler can a) delete the else-clause >>>>> b) delete the if-cond, c) assume A is true and propagate that >>>> information >>>> >>>> >>>> >>>> We are actively deleting undefined behavior, and the question is why >>>> given that doing so potentially masks a real source code bug. >>>> At the very least deleting undefined behavior should not be the default. >>>> >>> >>> You are asserting this (again), but others have clearly stated that they >>> disagree. David gave detailed and clear reasons why. Continuing to re-state >>> positions is not productive. >>> >>> -Chandler >>> >>> >>> _______________________________________________-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/e692ea61/attachment-0001.html>
On Wed, Jul 26, 2017 at 8:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> > > > Message: 5 > Date: Tue, 25 Jul 2017 00:12:35 -0700 > From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> > To: Peter Lawrence <peterl95124 at sbcglobal.net> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, John Regehr > <regehr at cs.utah.edu> > Subject: Re: [llvm-dev] GEP with a null pointer base > > On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net > > > wrote: > > Sean, > > Dan Gohman’s “transform” changes a loop induction variable, but does not > change the CFG, > > Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it. > > These are two totally different transforms. > > And even the analysis is different, > > The first is based on an *assumption* of non-UB (actually there is no > analysis to perform) > > the second Is based on a *proof* of existence of UB (here typically some > non-trivial analysis is required) > > These have, practically speaking, nothing in common. > > > > Ah, I think I see your point of confusion now. They are actually the same > thing, but one has a hidden step. You can consider the induction variable > transformation to be something like this: > > > ``` > a = add nsw i32 b, c > ``` > > 1. ==> widen `a` and expand the add into: > > ``` > if ("add i32 b, c" does not overflow) { > a = add nsw i64 b, c > } else { > a = (cast to i64) add nsw i32 b, c > } > ``` > > 2. ==> if the `else` branch executes, then we provably have UB. > > ``` > a = add nsw i64 b, c > ``` > > > The last step is the same as in Hal's example where we delete a part of the > CFG by proving that if we reach it there is guaranteed to be UB. In Hal's > example, the frontend marked the shift as having UB if the shift amount > exceeds the bitwidth. In this case, the frontend marked it as > non-overflowing. In this case, the fact that it is guaranteed to overflow > in the `else` branch has to be proven by looking at a dominating > conditional check instead of being manifest from a peephole observation > (after inlining/constant folding) as it is in Hal's example. > > Hopefully that makes sense. If not, could you indicate specifically which > of transformations 1. and/or 2. you think is illegal and why? > > -- Sean Silva > > > > Sean, > Yes I can see your point, but in practice that’s not how it’s > done, > no one expands the “+nsw” into control flow and then applies Hal’s > transform to it to get Dan’s transform. Dan’s transform doesn’t alter > the _original_ CFG and is done as part of induction-variable analysis > and simplification. Hal’s does alter the _original_ CFG, and uses a totally > different analysis as well as different transformation. The only thing they > have in common is the assumption that the program is UB-free at runtime, > but they have nothing in common in implementation. > > So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are > unnecessary. > > > And we’re missing the point, which is this, why ever delete UB code > which can cover up a bug, and more importantly, why ever make that > the default, given the emphasis in software engineering of avoiding UB > at the source level. Why are we implementing both -fsanitize=undefined > and deleting undefined as an optimization. >Did you ever manage to get those performance numbers on 32-bit code with -fwrapv? In the other thread you volunteered to do that as a means of demonstrating that at least the +nsw transformations (excluding induction variable widening) were actually unnecessary for performance, contrary to everybody else's beliefs (and supporting your own position). -- Sean Silva> > > Peter. > > > > > > > > Peter Lawrence. > > > > On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at sbcglobal.net> > wrote: > > > On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com> wrote: > > > > On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Chandler, > The only thing David made clear that wasn’t already clear > is that he believes UB to be “comparatively rare”, which is in agreement > with what Hal already said which is that he does not expect deleting > UB will be of benefit to for example SPEC benchmarks. > > Given that it is “comparatively rare”, why all the effort to delete it ? > > And why make deleting it the default rather than warning about it ? > > > > There seems to be some confusion/misunderstanding here. My best > understanding is that when David said this: > > "The cases where the compiler can statically prove that undefined > behaviour is present are comparatively rare." > > What he was referring to/describing was a contrast with the optimizations > described prior to that. > > It's something like this: > > UB-based optimizations don't prove UB is present - they optimize on the > assumption that it is not present due to some unproven (by the compiler, > but assumed to be known by the developer) invariants in the program. > > Think about a simple case like array bounds - the compiler emits an > unconditional load to the memory because it assumes the developer correctly > validated the bounds or otherwise constructed so that out of bounds indexes > never reach that piece of code. This is quite common - that UB is assumed > to not happen, and the compiler optimizes on this fact. > > What is less common, is for the compiler to be able to (in reasonable > time) prove that UB /does/ happen (in many cases this would require complex > interprocedural analysis - the array is defined in one function, maybe with > a complex dynamic bound, then passed to another function and indexed using > a non-trivial dynamic expression... statically proving that to be true or > false is complex/expensive and so basically not done by any compiler - so > any cases that are caught by the compiler are relatively trivial ("oh, you > declared a const null pointer value, then dereferenced it within the same > function", etc) & so don't happen very often (because they're also fairly > obvious to developers too)) > > Does that help explain the difference/distinction being drawn here? > > > > > Dave, > perhaps you missed these parts of the discussion > > Here is the definition, acknowledged by Hal, of what we’re doing > > 1. Sometimes there are abstraction penalties in C++ code > 2. That can be optimized away after template instantiation, function > inlining, etc > 3. When they for example exhibit this pattern > if (A) { > stuff; > } else { > other stuff including “undefined behavior”; > } > 4. Where the compiler assumes “undefined behavior” doesn’t actually happen > because > In the C language standard it is the users responsibility to avoid it > 5. Therefore in this example the compiler can a) delete the else-clause > b) delete the if-cond, c) assume A is true and propagate that > information > > > > And, here’s the optimization that according to Sean we’re using to delete > UB > > [ … ] > > > In other words, if we can prove "when program statement A executes then > program statement B is guaranteed to execute and have undefined behavior" > then we can assume that program statement A is never executed. > > In particular, deleting program statement A is a correct transformation. > Deleting program statement B itself is a special case of this (when A = B). > > And yes, this does include everything up to and including `main`, > intraprocedurally and interprocedurally. > > > [ … ] > > > -- Sean Silva > > > > This is entirely a separate issue from what Dan Gohman did to optimize > sign extension > of i32 induction variables out of loops for LP64 target machines, where > the optimization > is justified based on “the assumption that UB does not happen”, and no > actual UB > exists either statically or dynamically. > > > Sorry, the way I phrased this in terms of program statements may have made > this unclear, but this is precisely the particular case A=B that I > mentioned. In this case, A=B="the induction variable increment" and we use > that to deduce that the statement will not execute and overflow, which is > what justifies the widening. > > Notice that I mentioned that deleting code is only a particular case. In > the general case we deduce that dynamically something simply does not > happen, which is what we do in order to prove that induction variable > widening is safe (overflow cannot happen). > > There is nothing special about induction variable widening with respect to > UB. It is justified by applying the same principles as all other UB-related > transformations. > > > Briefly, there is only one axiom in the compiler writer's toolbox w.r.t. > UB and that is "the input program does not execute UB". Everything else is > derived from that by pure logical reasoning. Does that make sense? Can you > see how all the descriptions I gave above are derivable from that axiom? > > The common application of this is that we can assume any program property > P whatsoever (not just liveness, but variable ranges, etc.) if we can prove > that the program would execute UB should that property P fail to hold. > > > -- Sean Silva > > > But when it comes to actual provable UB the plan is to delete it. > On that there is no confusion, and there is no mis-understanding. > > > Peter Lawrence. > > > > > > - Dave > > > Peter > > > On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at gmail.com> > wrote: > > On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > David, > Here is the definition accepted by Hal of what we’re doing > > 1. Sometimes there are abstraction penalties in C++ code > 2. That can be optimized away after template instantiation, function > > inlining, etc > > 3. When they for example exhibit this pattern > if (A) { > stuff; > } else { > other stuff including “undefined behavior”; > } > 4. Where the compiler assumes “undefined behavior” doesn’t actually > > happen because > > In the C language standard it is the users responsibility to avoid > > it > > 5. Therefore in this example the compiler can a) delete the else-clause > b) delete the if-cond, c) assume A is true and propagate that > > information > > > > We are actively deleting undefined behavior, and the question is why > given that doing so potentially masks a real source code bug. > At the very least deleting undefined behavior should not be the default. > > > You are asserting this (again), but others have clearly stated that they > disagree. David gave detailed and clear reasons why. Continuing to re-state > positions is not productive. > > -Chandler > > > _______________________________________________ > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/7b08152b/attachment-0001.html>