> On Jul 6, 2017, at 3:07 PM, Chris Lattner <clattner at nondot.org> wrote: > > >> On Jul 6, 2017, at 2:05 PM, Peter Lawrence via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >>> On Jul 6, 2017, at 1:00 PM, via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>>> So far, so good. The problem is that while LLVM seems to consider >>>> the above IR to be valid, we officially do not allow dereferencing >>>> a pointer constructed in this way (if I’m reading the rules >>>> correctly). Consequently, if this GEP ever gets close enough to a >>>> load using the pointer, InstCombine will eliminate the GEP and the >>>> load. >> >> This is the part that confuses me, why would such code be eliminated. >> If it is illegal then this should be a compilation failure, > > This is illegal code, and if we only cared about the C spec, we could at least warn about it if not reject it outright. > > That said, the purpose of clang is to build real code, and real code contains some amount of invalid constructs in important code bases. We care about building a pragmatic compiler that gets the job done, so we sometimes “make things work” even though we don’t have to. There are numerous patterns in old-style “offsetof” macros that do similar things. Instead of fighting to make all the world’s code be theoretically ideal, it is better to just eat it and “do what they meant”. >Chris, The issue the original poster brought up is that instead of a compiler that as you say “makes things work” and “gets the job done” we have a compiler that intentionally deletes “undefined behavior”, on the assumption that since it is the users responsibility to avoid UB this code must be unreachable and is therefore safe to delete. It seems like there are three things the compiler could do with undefined behavior 1) let the code go through (perhaps with a warning) 2) replace the code with a trap 3) optimize the code as unreachable (no warning because we’re assuming this is the users intention) It looks like 3 is the llvm default, but IMHO is the least desirable choice, real world examples showing the benefit are practically non-existent, and it can mask a real source code bug. In spite of option 3 being (IMHO) the least desirable choice, considerable resources are being devoted to implementing it, and it does not seem to be being done according to good software engineering practice. This optimization seems to fit the “compiler design pattern” of a separate analysis and transform pass where “poison” is an attribute that gets forward propagated through expressions and assignments until it reaches some instruction that turns “poison” into “undefined behavior”, after which the block containing the UB can be deleted. Putting this analysis and transform into a separate pass means that the LangRef and IR can be cleaned up, there is no reason to have “poison” and “freeze” in the IR, nor have any other passes have to deal with them. Some folks are saying damn the torpedoes full speed ahead on option 3 in its least software-engineering-friendly form, others are saying wait-a-minute lets slow down take a deep breath and consider the big picture first. Thoughts ? Comments ? Questions ? Peter Lawrence.> -Chris > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170707/1d58acc0/attachment.html>
On Jul 7, 2017, at 1:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> Chris, > The issue the original poster brought up is that instead of a compiler > that as you say “makes things work” and “gets the job done” we have a compiler > that intentionally deletes “undefined behavior”, on the assumption that since it > is the users responsibility to avoid UB this code must be unreachable and > is therefore safe to delete. > > It seems like there are three things the compiler could do with undefined behavior > 1) let the code go through (perhaps with a warning) > 2) replace the code with a trap > 3) optimize the code as unreachable (no warning because we’re assuming this is the users intention)Hi Peter, I think you have a somewhat fundamental misunderstanding of how UB works (or rather, why it is so crappy and doesn’t really work :-). The compiler can and does do all three of those, and it doesn’t have to have consistent algorithms for how it picks. I highly recommend you read some blog posts I wrote about it years ago, starting with: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html John Regehr also has written a lot on the topic, including the recent post: https://blog.regehr.org/archives/1520 What you should take from this is that while UB is an inseperable part of C programming, that this is a depressing and faintly terrifying thing. The tooling built around the C family of languages helps make the situation “less bad”, but it is still pretty bad. The only solution is to move to new programming languages that don’t inherit the problem of C. I’m a fan of Swift, but there are others. In the case of this particular thread, we aren’t trying to fix UB, we’re trying to switch one very specific syntactic idiom from UB to defined. -Chris
In this particular case, I believe llvm is really emitting a store of undef to a nullptr. I think this later gets turned into a trap. Here's the relevant code from instcombine. // load(gep null, ...) -> unreachable // load null/undef -> unreachable // TODO: Consider a target hook for valid address spaces for this xforms. if (canSimplifyNullLoadOrGEP(LI, Op)) { // Insert a new store to null instruction before the load to indicate // that this code is not reachable. We do this instead of inserting // an unreachable instruction directly because we cannot modify the // CFG. new StoreInst(UndefValue::get(LI.getType()), Constant::getNullValue(Op->getType()), &LI); return replaceInstUsesWith(LI, UndefValue::get(LI.getType())); } ~Craig On Fri, Jul 7, 2017 at 3:44 PM, Chris Lattner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Jul 7, 2017, at 1:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net> > wrote: > > Chris, > > The issue the original poster brought up is that instead of a > compiler > > that as you say “makes things work” and “gets the job done” we have a > compiler > > that intentionally deletes “undefined behavior”, on the assumption that > since it > > is the users responsibility to avoid UB this code must be unreachable and > > is therefore safe to delete. > > > > It seems like there are three things the compiler could do with > undefined behavior > > 1) let the code go through (perhaps with a warning) > > 2) replace the code with a trap > > 3) optimize the code as unreachable (no warning because we’re assuming > this is the users intention) > > Hi Peter, > > I think you have a somewhat fundamental misunderstanding of how UB works > (or rather, why it is so crappy and doesn’t really work :-). The compiler > can and does do all three of those, and it doesn’t have to have consistent > algorithms for how it picks. I highly recommend you read some blog posts I > wrote about it years ago, starting with: > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html > > John Regehr also has written a lot on the topic, including the recent post: > https://blog.regehr.org/archives/1520 > > What you should take from this is that while UB is an inseperable part of > C programming, that this is a depressing and faintly terrifying thing. The > tooling built around the C family of languages helps make the situation > “less bad”, but it is still pretty bad. The only solution is to move to > new programming languages that don’t inherit the problem of C. I’m a fan > of Swift, but there are others. > > In the case of this particular thread, we aren’t trying to fix UB, we’re > trying to switch one very specific syntactic idiom from UB to defined. > > -Chris > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/2376c6ba/attachment.html>
Chris, nice segue to Swift ! :-), but... The question is what should LLVM do with UB in general, saying that we are going to change one specific idiom from undefined to defined glosses over the real question: why should we ever optimize / delete any UB at all ? This “depressing and faintly terrifying thing” as you call it, should be viewed not as an opportunity for optimization, but rather as the source of bugs that need to be warned about at the very least. The current action is to delete UB (without warning), is that really right ? Who can possibly benefit from this optimization ? And how can that ever outweigh the harm in not reporting an error ? Why are we expending any effort at all to optimize something that just about everyone agrees should always be avoided ? These last questions are all rhetorical so no need to answer them, the problem as I see it now is that everyone CC'ed on this email probably by now would agree privately that optimizing away undefined behavior is wrong, but no one wants to be the first to say so publicly. We’re stuck in a log-jam. We need someone like you to take that first step so the rest can go along. Please help in un-jamming the current log-jam. Peter Lawrence.> On Jul 7, 2017, at 3:44 PM, Chris Lattner <clattner at nondot.org> wrote: > > On Jul 7, 2017, at 1:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote: >> Chris, >> The issue the original poster brought up is that instead of a compiler >> that as you say “makes things work” and “gets the job done” we have a compiler >> that intentionally deletes “undefined behavior”, on the assumption that since it >> is the users responsibility to avoid UB this code must be unreachable and >> is therefore safe to delete. >> >> It seems like there are three things the compiler could do with undefined behavior >> 1) let the code go through (perhaps with a warning) >> 2) replace the code with a trap >> 3) optimize the code as unreachable (no warning because we’re assuming this is the users intention) > > Hi Peter, > > I think you have a somewhat fundamental misunderstanding of how UB works (or rather, why it is so crappy and doesn’t really work :-). The compiler can and does do all three of those, and it doesn’t have to have consistent algorithms for how it picks. I highly recommend you read some blog posts I wrote about it years ago, starting with: > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html > > John Regehr also has written a lot on the topic, including the recent post: > https://blog.regehr.org/archives/1520 > > What you should take from this is that while UB is an inseperable part of C programming, that this is a depressing and faintly terrifying thing. The tooling built around the C family of languages helps make the situation “less bad”, but it is still pretty bad. The only solution is to move to new programming languages that don’t inherit the problem of C. I’m a fan of Swift, but there are others. > > In the case of this particular thread, we aren’t trying to fix UB, we’re trying to switch one very specific syntactic idiom from UB to defined. > > -Chris >