thr3ads.net - llvm dev - [llvm-dev] GEP with a null pointer base [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Peter Lawrence via llvm-dev

2017-Jul-22 04:27 UTC

[llvm-dev] GEP with a null pointer base

Sean,
     Let me re-phrase a couple words to make it perfectly clear
> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at
sbcglobal.net> wrote:
> 
> Sean,
> 
> Dan Gohman’s “transform” changes a loop induction variable, but does not
change the CFG,
> 
> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.
> 
> These are two totally different transforms.
> 
> 
> And even the analysis is different,
> 
> The first is based on an *assumption* of non-UB (actually there is no
analysis to perform)
                       the *absence* of UB> 
> the second Is based on a *proof* of existence of UB (here typically some
non-trivial analysis is required)                        the *presence* of UB
> These have, practically speaking, nothing in common.
> 

In particular, the first is an optimization, while the second is a
transformation that
fails to be an optimization because the opportunity for it happening in real
world
code that is expected to pass compilation without warnings, static analysis
without
warnings, and dynamic sanitizers without warnings, is zero.

Or to put it another way, if llvm manages to find some UB that no analyzer or
sanitizer does, and then deletes the UB, then the author of that part of llvm
is in the wrong group, and belongs over in the analyzer and/or sanitizer group.


Peter.

> 
> Peter Lawrence.
> 
> 
> 
>> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote:
>> 
>>> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at
gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> Chandler,
>>>                The only thing David made clear that wasn’t already
clear
>>> is that he believes UB to be “comparatively rare”, which is in
agreement
>>> with what Hal already said which is that he does not expect
deleting
>>> UB will be of benefit to for example SPEC benchmarks.
>>> 
>>> Given that it is “comparatively rare”, why all the effort to delete
it ?
>>> And why make deleting it the default rather than warning about it ?
>>> 
>>> There seems to be some confusion/misunderstanding here. My best
understanding is that when David said this:
>>> 
>>> "The cases where the compiler can statically prove that
undefined behaviour is present are comparatively rare."
>>> 
>>> What he was referring to/describing was a contrast with the
optimizations described prior to that.
>>> 
>>> It's something like this:
>>> 
>>> UB-based optimizations don't prove UB is present - they
optimize on the assumption that it is not present due to some unproven (by the
compiler, but assumed to be known by the developer) invariants in the program.
>>> 
>>> Think about a simple case like array bounds - the compiler emits an
unconditional load to the memory because it assumes the developer correctly
validated the bounds or otherwise constructed so that out of bounds indexes
never reach that piece of code. This is quite common - that UB is assumed to not
happen, and the compiler optimizes on this fact.
>>> 
>>> What is less common, is for the compiler to be able to (in
reasonable time) prove that UB /does/ happen (in many cases this would require
complex interprocedural analysis - the array is defined in one function, maybe
with a complex dynamic bound, then passed to another function and indexed using
a non-trivial dynamic expression... statically proving that to be true or false
is complex/expensive and so basically not done by any compiler - so any cases
that are caught by the compiler are relatively trivial ("oh, you declared a
const null pointer value, then dereferenced it within the same function",
etc) & so don't happen very often (because they're also fairly
obvious to developers too))
>>> 
>>> Does that help explain the difference/distinction being drawn here?
>> 
>> 
>> 
>> Dave,
>>           perhaps you missed these parts of the discussion
>> 
>> Here is the definition, acknowledged by Hal, of what we’re doing
>> 
>>> 1. Sometimes there are abstraction penalties in C++ code
>>> 2. That can be optimized away after template instantiation,
function inlining, etc
>>> 3. When they for example exhibit this pattern
>>> 	if (A) {
>>> 		stuff;
>>> 	} else {
>>> 		other stuff including “undefined behavior”;
>>> 	}
>>> 4. Where the compiler assumes “undefined behavior” doesn’t actually
happen because
>>>   In the C language standard it is the users responsibility to
avoid it
>>> 5. Therefore in this example the compiler can a) delete the
else-clause
>>>    b) delete the if-cond, c) assume A is true and propagate that
information
>> 
>> 
>> And, here’s the optimization that according to Sean we’re using to
delete UB
>> 
>>> [ … ]
>>> 
>>> In other words, if we can prove  "when program statement A
executes then
>>> program statement B is guaranteed to execute and have undefined
behavior"
>>> then we can assume that program statement A is never executed.
>>> 
>>> In particular, deleting program statement A is a correct
transformation.
>>> Deleting program statement B itself is a special case of this (when
A = B).
>>> 
>>> And yes, this does include everything up to and including `main`,
>>> intraprocedurally and interprocedurally.
>>> 
>>> 
>>> [ … ]
>>> 
>>> -- Sean Silva
>> 
>> 
>> This is entirely a separate issue from what Dan Gohman did to optimize
sign extension
>> of i32 induction variables out of loops for LP64 target machines, where
the optimization
>> is justified based on “the assumption that UB does not happen”, and no
actual UB
>> exists either statically or dynamically.
>> 
>> Sorry, the way I phrased this in terms of program statements may have
made this unclear, but this is precisely the particular case A=B that I
mentioned. In this case, A=B="the induction variable increment" and we
use that to deduce that the statement will not execute and overflow, which is
what justifies the widening.
>> 
>> Notice that I mentioned that deleting code is only a particular case.
In the general case we deduce that dynamically something simply does not happen,
which is what we do in order to prove that induction variable widening is safe
(overflow cannot happen).
>> 
>> There is nothing special about induction variable widening with respect
to UB. It is justified by applying the same principles as all other UB-related
transformations.
>> 
>> 
>> Briefly, there is only one axiom in the compiler writer's toolbox
w.r.t. UB and that is "the input program does not execute UB".
Everything else is derived from that by pure logical reasoning. Does that make
sense? Can you see how all the descriptions I gave above are derivable from that
axiom?
>> 
>> The common application of this is that we can assume any program
property P whatsoever (not just liveness, but variable ranges, etc.) if we can
prove that the program would execute UB should that property P fail to hold.
>> 
>> 
>> -- Sean Silva
>> 
>> 
>> But when it comes to actual provable UB the plan is to delete it.
>> On that there is no confusion, and there is no mis-understanding.
>> 
>> 
>> Peter Lawrence.
>> 
>> 
>> 
>> 
>>> 
>>> - Dave
>>> 
>>> Peter
>>> 
>>> 
>>>> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at
gmail.com <mailto:chandlerc at gmail.com>> wrote:
>>>> 
>>>> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> David,
>>>>           Here is the definition accepted by Hal of what we’re
doing
>>>> 
>>>> > 1. Sometimes there are abstraction penalties in C++ code
>>>> > 2. That can be optimized away after template
instantiation, function inlining, etc
>>>> > 3. When they for example exhibit this pattern
>>>> >       if (A) {
>>>> >               stuff;
>>>> >       } else {
>>>> >               other stuff including “undefined behavior”;
>>>> >       }
>>>> > 4. Where the compiler assumes “undefined behavior” doesn’t
actually happen because
>>>> >    In the C language standard it is the users
responsibility to avoid it
>>>> > 5. Therefore in this example the compiler can a) delete
the else-clause
>>>> >     b) delete the if-cond, c) assume A is true and
propagate that information
>>>> 
>>>> 
>>>> 
>>>> We are actively deleting undefined behavior, and the question
is why
>>>> given that doing so potentially masks a real source code bug.
>>>> At the very least deleting undefined behavior should not be the
default.
>>>> 
>>>> You are asserting this (again), but others have clearly stated
that they disagree. David gave detailed and clear reasons why. Continuing to
re-state positions is not productive.
>>>> 
>>>> -Chandler
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170721/ac05cdb3/attachment-0001.html>

Peter Lawrence via llvm-dev

2017-Jul-22 05:44 UTC

head link

[llvm-dev] GEP with a null pointer base

Mehdi,
           Hal’s transformation only kicks in in the *presence* of UB, and
it does not matter how that UB got there, whether by function inlining
or without function inlining.

The problem with Hal’s argument is that the compiler does not have
a built in ouija board with which it can conjure up the spirit of the
author of the source code and find out if the UB was intentional
with the expectation of it being deleted, or is simply a bug.
Function inlining does not magically turn a bug into not-a-bug, nor
does post-inlining simplification magically turn a bug into not-a-bug.

Let me say it again:  if the compiler can find this UB (after whatever
optimizations it takes to get there) then the static analyzer must
be able to do the same thing, forcing the programmer to fix it
rather than have the compiler optimize it.

Or, to put it another way:  there is no difference between a compiler
and a static analyzer [*]. So regardless of whether it is the compiler or
the static analyzer that finds any UB, the only rational thing to do with
it is report it as a bug.


Peter Lawrence.


[* in fact that’s one of the primary reasons Apple adopted llvm, to use
  It as a base for static analysis]


> On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 
> 
> 
> 2017-07-21 21:27 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net
<mailto:peterl95124 at sbcglobal.net>>:
> Sean,
>      Let me re-phrase a couple words to make it perfectly clear
> 
>> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote:
>> 
>> Sean,
>> 
>> Dan Gohman’s “transform” changes a loop induction variable, but does
not change the CFG,
>> 
>> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering
it.
>> 
>> These are two totally different transforms.
>> 
>> 
>> And even the analysis is different,
>> 
>> The first is based on an *assumption* of non-UB (actually there is no
analysis to perform)
>                        the *absence* of UB
>> 
>> the second Is based on a *proof* of existence of UB (here typically
some non-trivial analysis is required)
>                         the *presence* of UB
> 
>> These have, practically speaking, nothing in common.
>> 
> 
> 
> In particular, the first is an optimization, while the second is a
transformation that
> fails to be an optimization because the opportunity for it happening in
real world
> code that is expected to pass compilation without warnings, static analysis
without
> warnings, and dynamic sanitizers without warnings, is zero.
> 
> Or to put it another way, if llvm manages to find some UB that no analyzer
or
> sanitizer does, and then deletes the UB, then the author of that part of
llvm
> is in the wrong group, and belongs over in the analyzer and/or sanitizer
group.
> 
> I don't understand your claim, it does not match at all my understand
of what we managed to get on agreement on in the past.
> 
> The second transformation (dead code elimination to simplify) is based on
the assumption that there is no UB.
> 
> I.e. after inlining for example, the extra context of the calling function
allows us to deduce the value of some conditional branching in the inline body
based on the impossibility of one of the path *in the context of this particular
caller*.
> 
> This does not mean that the program written by the programmer has any UB
inside.
> 
> This is exactly the example that Hal gave.
> 
> This can't be used to expose any meaningful information to the
programmer, because it would be full of false positive. Basically a program
could be clean of any static analyzer error, of any UBSAN error, and totally
UB-free, and still exhibit tons and tons of such issues.
> 
> -- 
> Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170721/3be0562f/attachment-0001.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jul 2017 - GEP with a null pointer base

[llvm-dev] GEP with a null pointer base

[llvm-dev] GEP with a null pointer base

Maybe Matching Threads