Mehdi, Hal’s transformation only kicks in in the *presence* of UB, and it does not matter how that UB got there, whether by function inlining or without function inlining. The problem with Hal’s argument is that the compiler does not have a built in ouija board with which it can conjure up the spirit of the author of the source code and find out if the UB was intentional with the expectation of it being deleted, or is simply a bug. Function inlining does not magically turn a bug into not-a-bug, nor does post-inlining simplification magically turn a bug into not-a-bug. Let me say it again: if the compiler can find this UB (after whatever optimizations it takes to get there) then the static analyzer must be able to do the same thing, forcing the programmer to fix it rather than have the compiler optimize it. Or, to put it another way: there is no difference between a compiler and a static analyzer [*]. So regardless of whether it is the compiler or the static analyzer that finds any UB, the only rational thing to do with it is report it as a bug. Peter Lawrence. [* in fact that’s one of the primary reasons Apple adopted llvm, to use It as a base for static analysis]> On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <joker.eph at gmail.com> wrote: > > > > 2017-07-21 21:27 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>: > Sean, > Let me re-phrase a couple words to make it perfectly clear > >> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote: >> >> Sean, >> >> Dan Gohman’s “transform” changes a loop induction variable, but does not change the CFG, >> >> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it. >> >> These are two totally different transforms. >> >> >> And even the analysis is different, >> >> The first is based on an *assumption* of non-UB (actually there is no analysis to perform) > the *absence* of UB >> >> the second Is based on a *proof* of existence of UB (here typically some non-trivial analysis is required) > the *presence* of UB > >> These have, practically speaking, nothing in common. >> > > > In particular, the first is an optimization, while the second is a transformation that > fails to be an optimization because the opportunity for it happening in real world > code that is expected to pass compilation without warnings, static analysis without > warnings, and dynamic sanitizers without warnings, is zero. > > Or to put it another way, if llvm manages to find some UB that no analyzer or > sanitizer does, and then deletes the UB, then the author of that part of llvm > is in the wrong group, and belongs over in the analyzer and/or sanitizer group. > > I don't understand your claim, it does not match at all my understand of what we managed to get on agreement on in the past. > > The second transformation (dead code elimination to simplify) is based on the assumption that there is no UB. > > I.e. after inlining for example, the extra context of the calling function allows us to deduce the value of some conditional branching in the inline body based on the impossibility of one of the path *in the context of this particular caller*. > > This does not mean that the program written by the programmer has any UB inside. > > This is exactly the example that Hal gave. > > This can't be used to expose any meaningful information to the programmer, because it would be full of false positive. Basically a program could be clean of any static analyzer error, of any UBSAN error, and totally UB-free, and still exhibit tons and tons of such issues. > > -- > Mehdi-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170721/3be0562f/attachment-0001.html>
2017-07-21 22:44 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net>:> Mehdi, > Hal’s transformation only kicks in in the *presence* of UB >No, sorry I entirely disagree with this assertion: I believe we optimize program where there is no UB. We delete dead code, code that never runs, so it is code that does not exercise UB. The example Hal showed does not exhibit UB, it is perfectly valid according to the standard.> , and > it does not matter how that UB got there, whether by function inlining > or without function inlining. > > The problem with Hal’s argument is that the compiler does not have > a built in ouija board with which it can conjure up the spirit of the > author of the source code and find out if the UB was intentional > with the expectation of it being deleted, or is simply a bug. > Function inlining does not magically turn a bug into not-a-bug, nor > does post-inlining simplification magically turn a bug into not-a-bug. > > Let me say it again: if the compiler can find this UB (after whatever > optimizations it takes to get there) then the static analyzer must > be able to do the same thing, forcing the programmer to fix it > rather than have the compiler optimize it. >This is again incorrect: there is no UB in the program, there is nothing the static analyzer should report. The compile is still able to delete some code, because of breaking the abstraction through inlining or template instantiation for example (cf Hal example). -- Mehdi> > Or, to put it another way: there is no difference between a compiler > and a static analyzer [*]. So regardless of whether it is the compiler or > the static analyzer that finds any UB, the only rational thing to do with > it is report it as a bug. > > > Peter Lawrence. > > > [* in fact that’s one of the primary reasons Apple adopted llvm, to use > It as a base for static analysis] > > > > On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <joker.eph at gmail.com> wrote: > > > > 2017-07-21 21:27 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net>: > >> Sean, >> Let me re-phrase a couple words to make it perfectly clear >> >> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net> >> wrote: >> >> Sean, >> >> Dan Gohman’s “transform” changes a loop induction variable, but does not >> change the CFG, >> >> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering >> it. >> >> These are two totally different transforms. >> >> >> >> And even the analysis is different, >> >> The first is based on an *assumption* of non-UB (actually there is no >> analysis to perform) >> >> the *absence* of UB >> >> >> the second Is based on a *proof* of existence of UB (here typically some >> non-trivial analysis is required) >> >> the *presence* of UB >> >> These have, practically speaking, nothing in common. >> >> >> >> In particular, the first is an optimization, while the second is a >> transformation that >> fails to be an optimization because the opportunity for it happening in >> real world >> code that is expected to pass compilation without warnings, static >> analysis without >> warnings, and dynamic sanitizers without warnings, is zero. >> >> Or to put it another way, if llvm manages to find some UB that no >> analyzer or >> sanitizer does, and then deletes the UB, then the author of that part of >> llvm >> is in the wrong group, and belongs over in the analyzer and/or sanitizer >> group. >> > > I don't understand your claim, it does not match at all my understand of > what we managed to get on agreement on in the past. > > The second transformation (dead code elimination to simplify) is based on > the assumption that there is no UB. > > I.e. after inlining for example, the extra context of the calling function > allows us to deduce the value of some conditional branching in the inline > body based on the impossibility of one of the path *in the context of this > particular caller*. > > This does not mean that the program written by the programmer has any UB > inside. > > This is exactly the example that Hal gave. > > This can't be used to expose any meaningful information to the programmer, > because it would be full of false positive. Basically a program could be > clean of any static analyzer error, of any UBSAN error, and totally > UB-free, and still exhibit tons and tons of such issues. > > -- > Mehdi > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170721/bba51d4a/attachment.html>
> On Jul 21, 2017, at 10:55 PM, Mehdi AMINI <joker.eph at gmail.com> wrote: > > > > 2017-07-21 22:44 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>: > Mehdi, > Hal’s transformation only kicks in in the *presence* of UB > > No, sorry I entirely disagree with this assertion: I believe we optimize program where there is no UB. We delete dead code, code that never runs, so it is code that does not exercise UB. >Mehdi, I had to read that sentence several times to figure out what the problem is, which is sloppy terminology on my part Strictly speaking the C standard uses “undefined behavior” to describe what happens at runtime when an “illegal” construct is executed. I have been using “undefined behavior” and UB to describe the “illegal” construct whether it is executed or not. Hence I say “Hal’s transform is triggered by UB”, when I should be saying “Hal’s transformation is triggered by illegal IR”. All I can say is I’m not the only one being sloppy, what started this entire conversation is the paper titled “Taming Undefined Behavior in LLVM”, while the correct title would be “Taming Illegal IR in LLVM”. (I think we are all pretty confident that LLVM itself is UB-free, or at least we all hope so :-). I believe you are being sloppy when you say "we optimize program where there is no UB”, because I believe you mean "we optimize program under the assumption that there is no UB”. In other words we recognize “Illegal” constructs and then assume they are unreachable, and delete them, even when we can’t prove by any other means that they are unreachable. We don’t know that there is no (runtime) UB, we just assume it.> The example Hal showed does not exhibit UB, it is perfectly valid according to the standard. >Whether it exhibits UB at runtime or not is not the issue, the issue is what a static analyzer or compiler can tell before runtime, see below> > , and > it does not matter how that UB got there, whether by function inlining > or without function inlining. > > The problem with Hal’s argument is that the compiler does not have > a built in ouija board with which it can conjure up the spirit of the > author of the source code and find out if the UB was intentional > with the expectation of it being deleted, or is simply a bug. > Function inlining does not magically turn a bug into not-a-bug, nor > does post-inlining simplification magically turn a bug into not-a-bug. > > Let me say it again: if the compiler can find this UB (after whatever > optimizations it takes to get there) then the static analyzer must > be able to do the same thing, forcing the programmer to fix it > rather than have the compiler optimize it. > > This is again incorrect: there is no UB in the program, there is nothing the static analyzer should report.Hal’s example starts with this template> template <typename T> > int do_something(T mask, bool cond) { > if (mask & 2) > return 42; > > if (cond) { > T high_mask = mask >> 48; // UB if sizeof(T) < 8, and cond true > if (high_mask > 5) > do_something_1(high_mask); > else > do_something_2(); > } > > return 0; > }Which is then instantiated with T = char, and where it is impossible for either a static analyzer or a compiler to figure out and prove that ‘cond’ is always false. Hence a static analyzer issues a warning about the shift, while llvm gives no warning and instead optimizes the entire if-statement away on the assumption that it is unreachable. Yes a static analyzer does issue a warning in this case. This is not the only optimization to be based on assumption rather than fact, for example type-based-alias-analysis is based on the assumption that the program is free of this sort of aliasing. The difference is that a user can disable TBAA and only TBAA if a program seems to be running incorrectly when optimized and thereby possibly track down a bug, but so far there is no command line option to disable UB-based- analysis (or ‘illegal-IR-based” :-), but there really needs to be. Do we at least agree on that last paragraph ? Peter Lawrence.> > The compile is still able to delete some code, because of breaking the abstraction through inlining or template instantiation for example (cf Hal example). > > -- > Mehdi > > > > Or, to put it another way: there is no difference between a compiler > and a static analyzer [*]. So regardless of whether it is the compiler or > the static analyzer that finds any UB, the only rational thing to do with > it is report it as a bug. > > > Peter Lawrence. > > > [* in fact that’s one of the primary reasons Apple adopted llvm, to use > It as a base for static analysis] > > > >> On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <joker.eph at gmail.com <mailto:joker.eph at gmail.com>> wrote: >> >> >> >> 2017-07-21 21:27 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>: >> Sean, >> Let me re-phrase a couple words to make it perfectly clear >> >>> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote: >>> >>> Sean, >>> >>> Dan Gohman’s “transform” changes a loop induction variable, but does not change the CFG, >>> >>> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it. >>> >>> These are two totally different transforms. >>> >>> >>> And even the analysis is different, >>> >>> The first is based on an *assumption* of non-UB (actually there is no analysis to perform) >> the *absence* of UB >>> >>> the second Is based on a *proof* of existence of UB (here typically some non-trivial analysis is required) >> the *presence* of UB >> >>> These have, practically speaking, nothing in common. >>> >> >> >> In particular, the first is an optimization, while the second is a transformation that >> fails to be an optimization because the opportunity for it happening in real world >> code that is expected to pass compilation without warnings, static analysis without >> warnings, and dynamic sanitizers without warnings, is zero. >> >> Or to put it another way, if llvm manages to find some UB that no analyzer or >> sanitizer does, and then deletes the UB, then the author of that part of llvm >> is in the wrong group, and belongs over in the analyzer and/or sanitizer group. >> >> I don't understand your claim, it does not match at all my understand of what we managed to get on agreement on in the past. >> >> The second transformation (dead code elimination to simplify) is based on the assumption that there is no UB. >> >> I.e. after inlining for example, the extra context of the calling function allows us to deduce the value of some conditional branching in the inline body based on the impossibility of one of the path *in the context of this particular caller*. >> >> This does not mean that the program written by the programmer has any UB inside. >> >> This is exactly the example that Hal gave. >> >> This can't be used to expose any meaningful information to the programmer, because it would be full of false positive. Basically a program could be clean of any static analyzer error, of any UBSAN error, and totally UB-free, and still exhibit tons and tons of such issues. >> >> -- >> Mehdi > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/98006c76/attachment-0001.html>