thr3ads.net - llvm dev - [llvm-dev] GEP with a null pointer base [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Peter Lawrence via llvm-dev

2017-Jul-27 03:40 UTC

[llvm-dev] GEP with a null pointer base

> 
> 
> Message: 5
> Date: Tue, 25 Jul 2017 00:12:35 -0700
> From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> To: Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124
at sbcglobal.net>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>, John Regehr
> 	<regehr at cs.utah.edu <mailto:regehr at cs.utah.edu>>
> Subject: Re: [llvm-dev] GEP with a null pointer base
> 
> On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>
> wrote:
> 
>> Sean,
>> 
>> Dan Gohman’s “transform” changes a loop induction variable, but does
not
>> change the CFG,
>> 
>> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering
it.
>> 
>> These are two totally different transforms.
>> 
>> And even the analysis is different,
>> 
>> The first is based on an *assumption* of non-UB (actually there is no
>> analysis to perform)
>> 
>> the second Is based on a *proof* of existence of UB (here typically
some
>> non-trivial analysis is required)
>> 
>> These have, practically speaking, nothing in common.
>> 
> 
> 
> Ah, I think I see your point of confusion now. They are actually the same
> thing, but one has a hidden step. You can consider the induction variable
> transformation to be something like this:
> 
> 
> ```
> a = add nsw i32 b, c
> ```
> 
> 1. ==> widen `a` and expand the add into:
> 
> ```
> if ("add i32 b, c" does not overflow) {
>  a = add nsw i64 b, c
> } else {
>  a = (cast to i64) add nsw i32 b, c
> }
> ```
> 
> 2. ==> if the `else` branch executes, then we provably have UB.
> 
> ```
> a = add nsw i64 b, c
> ```
> 
> 
> The last step is the same as in Hal's example where we delete a part of
the
> CFG by proving that if we reach it there is guaranteed to be UB. In
Hal's
> example, the frontend marked the shift as having UB if the shift amount
> exceeds the bitwidth. In this case, the frontend marked it as
> non-overflowing. In this case, the fact that it is guaranteed to overflow
> in the `else` branch has to be proven by looking at a dominating
> conditional check instead of being manifest from a peephole observation
> (after inlining/constant folding) as it is in Hal's example.
> 
> Hopefully that makes sense. If not, could you indicate specifically which
> of transformations 1. and/or 2. you think is illegal and why?
> 
> -- Sean Silva
> 

Sean,
          Yes I can see your point, but in practice that’s not how it’s done,
no one expands the “+nsw” into control flow and then applies Hal’s
transform to it to get Dan’s transform.  Dan’s transform doesn’t alter
the _original_ CFG and is done as part of induction-variable analysis
and simplification. Hal’s does alter the _original_ CFG, and uses a totally
different analysis as well as different transformation. The only thing they
have in common is the assumption that the program is UB-free at runtime,
but they have nothing in common in implementation.

So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
unnecessary.


And we’re missing the point, which is this, why ever delete UB code
which can cover up a bug, and more importantly, why ever make that
the default, given the emphasis in software engineering of avoiding UB
at the source level.  Why are we implementing both -fsanitize=undefined
and deleting undefined as an optimization.


Peter.



> 
>> 
>> 
>> Peter Lawrence.
>> 
>> 
>> 
>> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>
>> wrote:
>> 
>> 
>> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com
<mailto:dblaikie at gmail.com>> wrote:
>> 
>> 
>> 
>> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev <
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> 
>>> Chandler,
>>>               The only thing David made clear that wasn’t already
clear
>>> is that he believes UB to be “comparatively rare”, which is in
agreement
>>> with what Hal already said which is that he does not expect
deleting
>>> UB will be of benefit to for example SPEC benchmarks.
>>> 
>>> Given that it is “comparatively rare”, why all the effort to delete
it ?
>>> 
>> And why make deleting it the default rather than warning about it ?
>>> 
>> 
>> There seems to be some confusion/misunderstanding here. My best
>> understanding is that when David said this:
>> 
>> "The cases where the compiler can statically prove that undefined
>> behaviour is present are comparatively rare."
>> 
>> What he was referring to/describing was a contrast with the
optimizations
>> described prior to that.
>> 
>> It's something like this:
>> 
>> UB-based optimizations don't prove UB is present - they optimize on
the
>> assumption that it is not present due to some unproven (by the
compiler,
>> but assumed to be known by the developer) invariants in the program.
>> 
>> Think about a simple case like array bounds - the compiler emits an
>> unconditional load to the memory because it assumes the developer
correctly
>> validated the bounds or otherwise constructed so that out of bounds
indexes
>> never reach that piece of code. This is quite common - that UB is
assumed
>> to not happen, and the compiler optimizes on this fact.
>> 
>> What is less common, is for the compiler to be able to (in reasonable
>> time) prove that UB /does/ happen (in many cases this would require
complex
>> interprocedural analysis - the array is defined in one function, maybe
with
>> a complex dynamic bound, then passed to another function and indexed
using
>> a non-trivial dynamic expression... statically proving that to be true
or
>> false is complex/expensive and so basically not done by any compiler -
so
>> any cases that are caught by the compiler are relatively trivial
("oh, you
>> declared a const null pointer value, then dereferenced it within the
same
>> function", etc) & so don't happen very often (because
they're also fairly
>> obvious to developers too))
>> 
>> Does that help explain the difference/distinction being drawn here?
>> 
>> 
>> 
>> 
>> Dave,
>>          perhaps you missed these parts of the discussion
>> 
>> Here is the definition, acknowledged by Hal, of what we’re doing
>> 
>> 1. Sometimes there are abstraction penalties in C++ code
>> 2. That can be optimized away after template instantiation, function
>> inlining, etc
>> 3. When they for example exhibit this pattern
>> if (A) {
>> stuff;
>> } else {
>> other stuff including “undefined behavior”;
>> }
>> 4. Where the compiler assumes “undefined behavior” doesn’t actually
happen
>> because
>>  In the C language standard it is the users responsibility to avoid it
>> 5. Therefore in this example the compiler can a) delete the else-clause
>>   b) delete the if-cond, c) assume A is true and propagate that
>> information
>> 
>> 
>> 
>> And, here’s the optimization that according to Sean we’re using to
delete
>> UB
>> 
>> [ … ]
>> 
>> 
>> In other words, if we can prove  "when program statement A
executes then
>> program statement B is guaranteed to execute and have undefined
behavior"
>> then we can assume that program statement A is never executed.
>> 
>> In particular, deleting program statement A is a correct
transformation.
>> Deleting program statement B itself is a special case of this (when A =
B).
>> 
>> And yes, this does include everything up to and including `main`,
>> intraprocedurally and interprocedurally.
>> 
>> 
>> [ … ]
>> 
>> 
>> -- Sean Silva
>> 
>> 
>> 
>> This is entirely a separate issue from what Dan Gohman did to optimize
>> sign extension
>> of i32 induction variables out of loops for LP64 target machines, where
>> the optimization
>> is justified based on “the assumption that UB does not happen”, and no
>> actual UB
>> exists either statically or dynamically.
>> 
>> 
>> Sorry, the way I phrased this in terms of program statements may have
made
>> this unclear, but this is precisely the particular case A=B that I
>> mentioned. In this case, A=B="the induction variable
increment" and we use
>> that to deduce that the statement will not execute and overflow, which
is
>> what justifies the widening.
>> 
>> Notice that I mentioned that deleting code is only a particular case.
In
>> the general case we deduce that dynamically something simply does not
>> happen, which is what we do in order to prove that induction variable
>> widening is safe (overflow cannot happen).
>> 
>> There is nothing special about induction variable widening with respect
to
>> UB. It is justified by applying the same principles as all other
UB-related
>> transformations.
>> 
>> 
>> Briefly, there is only one axiom in the compiler writer's toolbox
w.r.t.
>> UB and that is "the input program does not execute UB".
Everything else is
>> derived from that by pure logical reasoning. Does that make sense? Can
you
>> see how all the descriptions I gave above are derivable from that
axiom?
>> 
>> The common application of this is that we can assume any program
property
>> P whatsoever (not just liveness, but variable ranges, etc.) if we can
prove
>> that the program would execute UB should that property P fail to hold.
>> 
>> 
>> -- Sean Silva
>> 
>> 
>> But when it comes to actual provable UB the plan is to delete it.
>> On that there is no confusion, and there is no mis-understanding.
>> 
>> 
>> Peter Lawrence.
>> 
>> 
>> 
>> 
>> 
>> - Dave
>> 
>>> 
>>> Peter
>>> 
>>> 
>>> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at
gmail.com <mailto:chandlerc at gmail.com>>
>>> wrote:
>>> 
>>> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev <
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>> 
>>>> David,
>>>>          Here is the definition accepted by Hal of what we’re
doing
>>>> 
>>>>> 1. Sometimes there are abstraction penalties in C++ code
>>>>> 2. That can be optimized away after template instantiation,
function
>>>> inlining, etc
>>>>> 3. When they for example exhibit this pattern
>>>>>      if (A) {
>>>>>              stuff;
>>>>>      } else {
>>>>>              other stuff including “undefined behavior”;
>>>>>      }
>>>>> 4. Where the compiler assumes “undefined behavior” doesn’t
actually
>>>> happen because
>>>>>   In the C language standard it is the users responsibility
to avoid
>>>> it
>>>>> 5. Therefore in this example the compiler can a) delete the
else-clause
>>>>>    b) delete the if-cond, c) assume A is true and propagate
that
>>>> information
>>>> 
>>>> 
>>>> 
>>>> We are actively deleting undefined behavior, and the question
is why
>>>> given that doing so potentially masks a real source code bug.
>>>> At the very least deleting undefined behavior should not be the
default.
>>>> 
>>> 
>>> You are asserting this (again), but others have clearly stated that
they
>>> disagree. David gave detailed and clear reasons why. Continuing to
re-state
>>> positions is not productive.
>>> 
>>> -Chandler
>>> 
>>> 
>>> _______________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/e692ea61/attachment-0001.html>

Sean Silva via llvm-dev

2017-Jul-27 04:12 UTC

head link

[llvm-dev] GEP with a null pointer base

On Wed, Jul 26, 2017 at 8:40 PM, Peter Lawrence <peterl95124 at
sbcglobal.net>
wrote:
>
>
>
> Message: 5
> Date: Tue, 25 Jul 2017 00:12:35 -0700
> From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org>
> To: Peter Lawrence <peterl95124 at sbcglobal.net>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, John Regehr
> <regehr at cs.utah.edu>
> Subject: Re: [llvm-dev] GEP with a null pointer base
>
> On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at
sbcglobal.net
> >
> wrote:
>
> Sean,
>
> Dan Gohman’s “transform” changes a loop induction variable, but does not
> change the CFG,
>
> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.
>
> These are two totally different transforms.
>
> And even the analysis is different,
>
> The first is based on an *assumption* of non-UB (actually there is no
> analysis to perform)
>
> the second Is based on a *proof* of existence of UB (here typically some
> non-trivial analysis is required)
>
> These have, practically speaking, nothing in common.
>
>
>
> Ah, I think I see your point of confusion now. They are actually the same
> thing, but one has a hidden step. You can consider the induction variable
> transformation to be something like this:
>
>
> ```
> a = add nsw i32 b, c
> ```
>
> 1. ==> widen `a` and expand the add into:
>
> ```
> if ("add i32 b, c" does not overflow) {
>  a = add nsw i64 b, c
> } else {
>  a = (cast to i64) add nsw i32 b, c
> }
> ```
>
> 2. ==> if the `else` branch executes, then we provably have UB.
>
> ```
> a = add nsw i64 b, c
> ```
>
>
> The last step is the same as in Hal's example where we delete a part of
the
> CFG by proving that if we reach it there is guaranteed to be UB. In
Hal's
> example, the frontend marked the shift as having UB if the shift amount
> exceeds the bitwidth. In this case, the frontend marked it as
> non-overflowing. In this case, the fact that it is guaranteed to overflow
> in the `else` branch has to be proven by looking at a dominating
> conditional check instead of being manifest from a peephole observation
> (after inlining/constant folding) as it is in Hal's example.
>
> Hopefully that makes sense. If not, could you indicate specifically which
> of transformations 1. and/or 2. you think is illegal and why?
>
> -- Sean Silva
>
>
>
> Sean,
>           Yes I can see your point, but in practice that’s not how it’s
> done,
> no one expands the “+nsw” into control flow and then applies Hal’s
> transform to it to get Dan’s transform.  Dan’s transform doesn’t alter
> the _original_ CFG and is done as part of induction-variable analysis
> and simplification. Hal’s does alter the _original_ CFG, and uses a totally
> different analysis as well as different transformation. The only thing they
> have in common is the assumption that the program is UB-free at runtime,
> but they have nothing in common in implementation.
>
> So I’m not saying your transforms 1 & 2 are illegal, I’m saying they
are
> unnecessary.
>
>
> And we’re missing the point, which is this, why ever delete UB code
> which can cover up a bug, and more importantly, why ever make that
> the default, given the emphasis in software engineering of avoiding UB
> at the source level.  Why are we implementing both -fsanitize=undefined
> and deleting undefined as an optimization.
>

Did you ever manage to get those performance numbers on 32-bit code with
-fwrapv? In the other thread you volunteered to do that as a means of
demonstrating that at least the +nsw transformations (excluding induction
variable widening) were actually unnecessary for performance, contrary to
everybody else's beliefs (and supporting your own position).

-- Sean Silva

>
>
> Peter.
>
>
>
>
>
>
>
> Peter Lawrence.
>
>
>
> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
>
> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at
sbcglobal.net>
> wrote:
>
>
> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
>
> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Chandler,
>               The only thing David made clear that wasn’t already clear
> is that he believes UB to be “comparatively rare”, which is in agreement
> with what Hal already said which is that he does not expect deleting
> UB will be of benefit to for example SPEC benchmarks.
>
> Given that it is “comparatively rare”, why all the effort to delete it ?
>
> And why make deleting it the default rather than warning about it ?
>
>
>
> There seems to be some confusion/misunderstanding here. My best
> understanding is that when David said this:
>
> "The cases where the compiler can statically prove that undefined
> behaviour is present are comparatively rare."
>
> What he was referring to/describing was a contrast with the optimizations
> described prior to that.
>
> It's something like this:
>
> UB-based optimizations don't prove UB is present - they optimize on the
> assumption that it is not present due to some unproven (by the compiler,
> but assumed to be known by the developer) invariants in the program.
>
> Think about a simple case like array bounds - the compiler emits an
> unconditional load to the memory because it assumes the developer correctly
> validated the bounds or otherwise constructed so that out of bounds indexes
> never reach that piece of code. This is quite common - that UB is assumed
> to not happen, and the compiler optimizes on this fact.
>
> What is less common, is for the compiler to be able to (in reasonable
> time) prove that UB /does/ happen (in many cases this would require complex
> interprocedural analysis - the array is defined in one function, maybe with
> a complex dynamic bound, then passed to another function and indexed using
> a non-trivial dynamic expression... statically proving that to be true or
> false is complex/expensive and so basically not done by any compiler - so
> any cases that are caught by the compiler are relatively trivial ("oh,
you
> declared a const null pointer value, then dereferenced it within the same
> function", etc) & so don't happen very often (because
they're also fairly
> obvious to developers too))
>
> Does that help explain the difference/distinction being drawn here?
>
>
>
>
> Dave,
>          perhaps you missed these parts of the discussion
>
> Here is the definition, acknowledged by Hal, of what we’re doing
>
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
> inlining, etc
> 3. When they for example exhibit this pattern
> if (A) {
> stuff;
> } else {
> other stuff including “undefined behavior”;
> }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen
> because
>  In the C language standard it is the users responsibility to avoid it
> 5. Therefore in this example the compiler can a) delete the else-clause
>   b) delete the if-cond, c) assume A is true and propagate that
> information
>
>
>
> And, here’s the optimization that according to Sean we’re using to delete
> UB
>
> [ … ]
>
>
> In other words, if we can prove  "when program statement A executes
then
> program statement B is guaranteed to execute and have undefined
behavior"
> then we can assume that program statement A is never executed.
>
> In particular, deleting program statement A is a correct transformation.
> Deleting program statement B itself is a special case of this (when A = B).
>
> And yes, this does include everything up to and including `main`,
> intraprocedurally and interprocedurally.
>
>
> [ … ]
>
>
> -- Sean Silva
>
>
>
> This is entirely a separate issue from what Dan Gohman did to optimize
> sign extension
> of i32 induction variables out of loops for LP64 target machines, where
> the optimization
> is justified based on “the assumption that UB does not happen”, and no
> actual UB
> exists either statically or dynamically.
>
>
> Sorry, the way I phrased this in terms of program statements may have made
> this unclear, but this is precisely the particular case A=B that I
> mentioned. In this case, A=B="the induction variable increment"
and we use
> that to deduce that the statement will not execute and overflow, which is
> what justifies the widening.
>
> Notice that I mentioned that deleting code is only a particular case. In
> the general case we deduce that dynamically something simply does not
> happen, which is what we do in order to prove that induction variable
> widening is safe (overflow cannot happen).
>
> There is nothing special about induction variable widening with respect to
> UB. It is justified by applying the same principles as all other UB-related
> transformations.
>
>
> Briefly, there is only one axiom in the compiler writer's toolbox
w.r.t.
> UB and that is "the input program does not execute UB".
Everything else is
> derived from that by pure logical reasoning. Does that make sense? Can you
> see how all the descriptions I gave above are derivable from that axiom?
>
> The common application of this is that we can assume any program property
> P whatsoever (not just liveness, but variable ranges, etc.) if we can prove
> that the program would execute UB should that property P fail to hold.
>
>
> -- Sean Silva
>
>
> But when it comes to actual provable UB the plan is to delete it.
> On that there is no confusion, and there is no mis-understanding.
>
>
> Peter Lawrence.
>
>
>
>
>
> - Dave
>
>
> Peter
>
>
> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at
gmail.com>
> wrote:
>
> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> David,
>          Here is the definition accepted by Hal of what we’re doing
>
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
>
> inlining, etc
>
> 3. When they for example exhibit this pattern
>      if (A) {
>              stuff;
>      } else {
>              other stuff including “undefined behavior”;
>      }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually
>
> happen because
>
>   In the C language standard it is the users responsibility to avoid
>
> it
>
> 5. Therefore in this example the compiler can a) delete the else-clause
>    b) delete the if-cond, c) assume A is true and propagate that
>
> information
>
>
>
> We are actively deleting undefined behavior, and the question is why
> given that doing so potentially masks a real source code bug.
> At the very least deleting undefined behavior should not be the default.
>
>
> You are asserting this (again), but others have clearly stated that they
> disagree. David gave detailed and clear reasons why. Continuing to re-state
> positions is not productive.
>
> -Chandler
>
>
> _______________________________________________
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/7b08152b/attachment-0001.html>

llvm dev - Jul 2017 - GEP with a null pointer base

[llvm-dev] GEP with a null pointer base

[llvm-dev] GEP with a null pointer base