Chris,
nice segue to Swift ! :-), but...
The question is what should LLVM do with UB in general, saying that
we are going to change one specific idiom from undefined to defined glosses
over the real question: why should we ever optimize / delete any UB at all ?
This “depressing and faintly terrifying thing” as you call it, should be viewed
not as
an opportunity for optimization, but rather as the source of bugs that need to
be
warned about at the very least.
The current action is to delete UB (without warning), is that really right ?
Who can possibly benefit from this optimization ?
And how can that ever outweigh the harm in not reporting an error ?
Why are we expending any effort at all to optimize something that just
about everyone agrees should always be avoided ?
These last questions are all rhetorical so no need to answer them, the problem
as I see it now is that everyone CC'ed on this email probably by now would
agree
privately that optimizing away undefined behavior is wrong, but no one wants
to be the first to say so publicly. We’re stuck in a log-jam. We need someone
like
you to take that first step so the rest can go along.
Please help in un-jamming the current log-jam.
Peter Lawrence.
> On Jul 7, 2017, at 3:44 PM, Chris Lattner <clattner at nondot.org>
wrote:
>
> On Jul 7, 2017, at 1:40 PM, Peter Lawrence <peterl95124 at
sbcglobal.net> wrote:
>> Chris,
>> The issue the original poster brought up is that instead of a
compiler
>> that as you say “makes things work” and “gets the job done” we have a
compiler
>> that intentionally deletes “undefined behavior”, on the assumption that
since it
>> is the users responsibility to avoid UB this code must be unreachable
and
>> is therefore safe to delete.
>>
>> It seems like there are three things the compiler could do with
undefined behavior
>> 1) let the code go through (perhaps with a warning)
>> 2) replace the code with a trap
>> 3) optimize the code as unreachable (no warning because we’re
assuming this is the users intention)
>
> Hi Peter,
>
> I think you have a somewhat fundamental misunderstanding of how UB works
(or rather, why it is so crappy and doesn’t really work :-). The compiler can
and does do all three of those, and it doesn’t have to have consistent
algorithms for how it picks. I highly recommend you read some blog posts I
wrote about it years ago, starting with:
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
>
> John Regehr also has written a lot on the topic, including the recent post:
> https://blog.regehr.org/archives/1520
>
> What you should take from this is that while UB is an inseperable part of C
programming, that this is a depressing and faintly terrifying thing. The
tooling built around the C family of languages helps make the situation “less
bad”, but it is still pretty bad. The only solution is to move to new
programming languages that don’t inherit the problem of C. I’m a fan of Swift,
but there are others.
>
> In the case of this particular thread, we aren’t trying to fix UB, we’re
trying to switch one very specific syntactic idiom from UB to defined.
>
> -Chris
>
On 07/10/2017 01:36 PM, Peter Lawrence wrote:> Chris, > nice segue to Swift ! :-), but... > > The question is what should LLVM do with UB in general, saying that > we are going to change one specific idiom from undefined to defined glosses > over the real question: why should we ever optimize / delete any UB at all ? > > This “depressing and faintly terrifying thing” as you call it, should be viewed not as > an opportunity for optimization, but rather as the source of bugs that need to be > warned about at the very least. > > The current action is to delete UB (without warning), is that really right ? > Who can possibly benefit from this optimization ? > And how can that ever outweigh the harm in not reporting an error ? > Why are we expending any effort at all to optimize something that just > about everyone agrees should always be avoided ? > > These last questions are all rhetorical so no need to answer them, the problem > as I see it now is that everyone CC'ed on this email probably by now would agree > privately that optimizing away undefined behavior is wrong, but no one wants > to be the first to say so publicly.I'm certain this hypothesis is false. -Hal> We’re stuck in a log-jam. We need someone like > you to take that first step so the rest can go along. > > Please help in un-jamming the current log-jam. > > > Peter Lawrence. > > > > >> On Jul 7, 2017, at 3:44 PM, Chris Lattner <clattner at nondot.org> wrote: >> >> On Jul 7, 2017, at 1:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote: >>> Chris, >>> The issue the original poster brought up is that instead of a compiler >>> that as you say “makes things work” and “gets the job done” we have a compiler >>> that intentionally deletes “undefined behavior”, on the assumption that since it >>> is the users responsibility to avoid UB this code must be unreachable and >>> is therefore safe to delete. >>> >>> It seems like there are three things the compiler could do with undefined behavior >>> 1) let the code go through (perhaps with a warning) >>> 2) replace the code with a trap >>> 3) optimize the code as unreachable (no warning because we’re assuming this is the users intention) >> Hi Peter, >> >> I think you have a somewhat fundamental misunderstanding of how UB works (or rather, why it is so crappy and doesn’t really work :-). The compiler can and does do all three of those, and it doesn’t have to have consistent algorithms for how it picks. I highly recommend you read some blog posts I wrote about it years ago, starting with: >> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html >> >> John Regehr also has written a lot on the topic, including the recent post: >> https://blog.regehr.org/archives/1520 >> >> What you should take from this is that while UB is an inseperable part of C programming, that this is a depressing and faintly terrifying thing. The tooling built around the C family of languages helps make the situation “less bad”, but it is still pretty bad. The only solution is to move to new programming languages that don’t inherit the problem of C. I’m a fan of Swift, but there are others. >> >> In the case of this particular thread, we aren’t trying to fix UB, we’re trying to switch one very specific syntactic idiom from UB to defined. >> >> -Chris >>-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
On Mon, Jul 10, 2017 at 11:42 AM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On 07/10/2017 01:36 PM, Peter Lawrence wrote: > >> Chris, >> nice segue to Swift ! :-), but... >> >> The question is what should LLVM do with UB in general, saying that >> we are going to change one specific idiom from undefined to defined >> glosses >> over the real question: why should we ever optimize / delete any UB at >> all ? >> >> This “depressing and faintly terrifying thing” as you call it, should be >> viewed not as >> an opportunity for optimization, but rather as the source of bugs that >> need to be >> warned about at the very least. >> >> The current action is to delete UB (without warning), is that really >> right ? >> Who can possibly benefit from this optimization ? >> And how can that ever outweigh the harm in not reporting an error ? >> Why are we expending any effort at all to optimize something that just >> about everyone agrees should always be avoided ? >> >> These last questions are all rhetorical so no need to answer them, the >> problem >> as I see it now is that everyone CC'ed on this email probably by now >> would agree >> privately that optimizing away undefined behavior is wrong, but no one >> wants >> to be the first to say so publicly. >> > > I'm certain this hypothesis is false. >It's definitely false. I think it's quite right to optimize way undefined behavior. I'm in the camp of "if you want better programming languages, take full advantage of what you can in the current ones, and if people don't like it, awesome, let them use a language that has more guarantees". (Though i'm usually in the camp of using flags and defaults to achieve this, since i'm somewhat pragmatic) If you don't take advantage at all, or only of some things, in the name of pragmatics, now every compiler does something different, and users can't even know what to expect. In practice, taking advantage is also the only way this stuff gets fixed. It becomes enough of a pain that either later revisions of the same programming language, or a new programming language, fix it. I'm emphatically not a fan of "well, the language designers suck at giving users what they want and need, so we should fix it for them in the compiler". Down this path, madness lies. Instead, i'd rather make these folks do their job ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/f92bf07a/attachment.html>
Chandler Carruth via llvm-dev
2017-Jul-10 19:00 UTC
[llvm-dev] GEP with a null pointer base
On Mon, Jul 10, 2017 at 2:36 PM Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> The question is what should LLVM do with UB in general, saying that > we are going to change one specific idiom from undefined to defined glosses > over the real question: why should we ever optimize / delete any UB at all > ? > > This “depressing and faintly terrifying thing” as you call it, should be > viewed not as > an opportunity for optimization, but rather as the source of bugs that > need to be > warned about at the very least. > > The current action is to delete UB (without warning), is that really right > ? > Who can possibly benefit from this optimization ? > And how can that ever outweigh the harm in not reporting an error ? > Why are we expending any effort at all to optimize something that just > about everyone agrees should always be avoided ? > > These last questions are all rhetorical so no need to answer them, the > problem > as I see it now is that everyone CC'ed on this email probably by now would > agree > privately that optimizing away undefined behavior is wrong, but no one > wants > to be the first to say so publicly. We’re stuck in a log-jam. We need > someone like > you to take that first step so the rest can go along. > > Please help in un-jamming the current log-jam. >As several others have pointed out, we are not in a log-jam and your hypothesis about how people feel is false. However, I also want to point out that using rhetorical questions (or more broadly, rhetorical devices and techniques) is probably not the best approach for any of these discussions. And down this path comes a form of dialog that has been a problem in prior discussions, so I strongly urge you to phrase your emails differently. If you want to persuade people on this list about how LLVM should work, I would suggest that you to avoid rhetoric and instead focus on actually making changes to LLVM (it is open source) and showing the empirical results of the experiment. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/ab8f1c91/attachment.html>
On 10 Jul 2017, at 19:36, Peter Lawrence via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > why should we ever optimize / delete any UB at all ?No one has addressed this yet, which may be why you are still under the impression that it’s under discussion. You seem to have some misconceptions about UB. There are two major reasons for UB in a language such as C: 1. The safe version may be prohibitively expensive on some architectures. For example, division by zero will give a trap on some architectures or an unspecified value on others. If the C specification stated that it gave an unspecified value, then on architectures that trap every division would have to be wrapped in a branch that tested if the divisor was zero and branched over the divide instruction if so. This would lead to very inefficient code on these architectures, so the C standard says that anything is allowed to happen. 2. Some properties are effectively impossible to verify statically[1]. For example, it is undefined behaviour in C to use a pointer to a deallocated object[2]. If C required well-defined behaviour here then it would effectively be mandating some form of garbage collector: you’d need to either find all pointers to an object and null them on free, or you’d need to mark the object, check the mark on every load, and not reuse the memory until later. This would be unacceptable overhead for a C implementation. A lot of other things are in this category. Many things are a mixture of these. Most optimisations that rely on undefined behaviour are not saying ‘aha, we’ve spotted that the program invokes undefined behaviour, we can replace the whole thing with a trap instruction!’, they’re saying ‘this program does either X or Y, X would be undefined behaviour and so we can assume that it does Y’. Not being able to do the latter would mean that they’d need to insert run-time checks each time, which would negate the optimisation and *in well-written code would never be hit*. The cases where the compiler can statically prove that undefined behaviour is present are comparatively rare. David [1] Note: Some of these are possible with whole-program analysis, so if you’re happy without shared libraries and with week-long compile times then it’s possible. [2] The wording of this actually means that it’s impossible to implement malloc() in ISO C, because as soon as the pointer has been passed to free then it becomes invalid and using it to put the object on a free list invokes undefined behaviour.
David,
Here is the definition accepted by Hal of what we’re doing
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
inlining, etc
> 3. When they for example exhibit this pattern
> if (A) {
> stuff;
> } else {
> other stuff including “undefined behavior”;
> }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen
because
> In the C language standard it is the users responsibility to avoid it
> 5. Therefore in this example the compiler can a) delete the else-clause
> b) delete the if-cond, c) assume A is true and propagate that
information
We are actively deleting undefined behavior, and the question is why
given that doing so potentially masks a real source code bug.
At the very least deleting undefined behavior should not be the default.
Peter Lawrence.
> On Jul 12, 2017, at 2:24 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
>
> On 10 Jul 2017, at 19:36, Peter Lawrence via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> why should we ever optimize / delete any UB at all ?
>
> No one has addressed this yet, which may be why you are still under the
impression that it’s under discussion. You seem to have some misconceptions
about UB. There are two major reasons for UB in a language such as C:
>
> 1. The safe version may be prohibitively expensive on some architectures.
For example, division by zero will give a trap on some architectures or an
unspecified value on others. If the C specification stated that it gave an
unspecified value, then on architectures that trap every division would have to
be wrapped in a branch that tested if the divisor was zero and branched over the
divide instruction if so. This would lead to very inefficient code on these
architectures, so the C standard says that anything is allowed to happen.
>
> 2. Some properties are effectively impossible to verify statically[1]. For
example, it is undefined behaviour in C to use a pointer to a deallocated
object[2]. If C required well-defined behaviour here then it would effectively
be mandating some form of garbage collector: you’d need to either find all
pointers to an object and null them on free, or you’d need to mark the object,
check the mark on every load, and not reuse the memory until later. This would
be unacceptable overhead for a C implementation. A lot of other things are in
this category.
>
> Many things are a mixture of these. Most optimisations that rely on
undefined behaviour are not saying ‘aha, we’ve spotted that the program invokes
undefined behaviour, we can replace the whole thing with a trap instruction!’,
they’re saying ‘this program does either X or Y, X would be undefined behaviour
and so we can assume that it does Y’. Not being able to do the latter would
mean that they’d need to insert run-time checks each time, which would negate
the optimisation and *in well-written code would never be hit*.
>
> The cases where the compiler can statically prove that undefined behaviour
is present are comparatively rare.
>
> David
>
> [1] Note: Some of these are possible with whole-program analysis, so if
you’re happy without shared libraries and with week-long compile times then it’s
possible.
>
> [2] The wording of this actually means that it’s impossible to implement
malloc() in ISO C, because as soon as the pointer has been passed to free then
it becomes invalid and using it to put the object on a free list invokes
undefined behaviour.