Peter Sewell
2015-Jul-01 12:10 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On 1 July 2015 at 10:17, Renato Golin <renato.golin at linaro.org> wrote:> On 1 July 2015 at 03:53, Sean Silva <chisophugis at gmail.com> wrote: >> Unfortunately in other cases it is very hard to communicate what the user >> should assert/why they should assert it, as Chris talks about in his blog >> posts. So it realistically becomes sort of black and white -- either don't >> optimize based on UB or do. For what is probably just social reasons, the >> desire to optimize wins out; from an economic standpoint (e.g. power saved) >> it overall may be the right choice (I haven't run any ballpark figures >> though and don't claim this to be true). > > This is *so* true. There's a natural progression of programmers as > they age. Initially, people are adverse of side effects and they hate > "misbehaviours" from their compiler. As time passes and their > experiences grow, they start to like some of the side effects, and as > maturity reaches them, they are already *relying* on them. C/C++ > undefined behaviour and Perl's utter disregard for clarity are some of > the examples. > > Chandler said something at the last US LLVM meeting that stuck with > me: "you guys expect hardware to behave in ways that hardware can't". > Undefined behaviour and implementation defined features in the C/C++ > standards is what it is, on purpose. If it wasn't for that, C/C++ > couldn't perform well on most hardware architectures of today.I fear that this:> Programmers *must* learn not to rely on their particular desires or > compilers, to understand the language for what it is, and to exploit > its perks while still being platform independent. It is possible, but > *very* hard.while attractive from the compiler-writer point of view, is just not realistic, given the enormous body of C code out there which does depend on some particular properties which are not guaranteed by the ISO standard. That code is not necessarily all gospel, of course, far from it - but its existence does have to be taken seriously. Peter>> Ideally together with the compiler there would be a static analyzer with the >> invariant that it finds all situations where the compiler, while compiling, >> optimizes based on UB. This static analyzer would report in an intelligible >> way on all such situations. Unfortunately this is a really hard problem. > > And that's why you have static analyser tools! Lints, checks, > sanitizers, warnings and error messages are all there to make you into > a better programmer, so you can learn about the language, and how to > use your compiler. > > Ultimately, compilers are tools. The sharper they get, the more > carefully you need to handle it. They also have a safety trigger: it's > called -O0. > > cheers, > --renato
Renato Golin
2015-Jul-01 12:29 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On 1 July 2015 at 13:10, Peter Sewell <Peter.Sewell at cl.cam.ac.uk> wrote:> while attractive from the compiler-writer point of view, is just not > realistic, given the enormous body of C code out there which does > depend on some particular properties which are not guaranteed by the > ISO standard.There is also an enormous body of code that is just wrong. Do we have to worry about getting that right, too? Trying to "understand" the authors' intentions and do that instead of what they asked? Where do we draw the line? What do we consider "a reasonable deviation" from just "plain wrong"? There is a large portion of non-standard documented behaviours in all compilers, and GCC and Clang are particularly important here. Most builtin functions, attributes, and extensions are supported by both compilers in a similar way, and people can somewhat rely on it. But the only true reliable sources are the standards. However, the very definition of undefined behaviour is "here be dragons", and that's something that was purposely done to aid compilers at optimising code. You may try to unite the open source compilers in many ways (as I tried last year), but trying to regulate undefined behaviour is not one of them.> That code is not necessarily all gospel, of course, far from it - but > its existence does have to be taken seriously.And we do! Though, in a completely different direction than you would expect. :) You advocate for better consistent support, which is ok and I, for one, have gone down that path multiple times. But in this specific case, the way we take it seriously is by warning the users of the potential peril AND abuse of it for performance reasons. This is a sweet spot because novice users will learn the language and advanced users will like the performance. cheers, --renato
Yury Gribov
2015-Jul-01 12:38 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On 07/01/2015 03:10 PM, Peter Sewell wrote:> On 1 July 2015 at 10:17, Renato Golin <renato.golin at linaro.org> wrote: >> On 1 July 2015 at 03:53, Sean Silva <chisophugis at gmail.com> wrote: >>> Unfortunately in other cases it is very hard to communicate what the user >>> should assert/why they should assert it, as Chris talks about in his blog >>> posts. So it realistically becomes sort of black and white -- either don't >>> optimize based on UB or do. For what is probably just social reasons, the >>> desire to optimize wins out; from an economic standpoint (e.g. power saved) >>> it overall may be the right choice (I haven't run any ballpark figures >>> though and don't claim this to be true). >> >> This is *so* true. There's a natural progression of programmers as >> they age. Initially, people are adverse of side effects and they hate >> "misbehaviours" from their compiler. As time passes and their >> experiences grow, they start to like some of the side effects, and as >> maturity reaches them, they are already *relying* on them. C/C++ >> undefined behaviour and Perl's utter disregard for clarity are some of >> the examples. >> >> Chandler said something at the last US LLVM meeting that stuck with >> me: "you guys expect hardware to behave in ways that hardware can't". >> Undefined behaviour and implementation defined features in the C/C++ >> standards is what it is, on purpose. If it wasn't for that, C/C++ >> couldn't perform well on most hardware architectures of today. > > I fear that this: > >> Programmers *must* learn not to rely on their particular desires or >> compilers, to understand the language for what it is, and to exploit >> its perks while still being platform independent. It is possible, but >> *very* hard. > > while attractive from the compiler-writer point of view, is just not > realistic, given the enormous body of C code out there which does > depend on some particular properties which are not guaranteed by the > ISO standard. > That code is not necessarily all gospel, of course, far from it - but > its existence does have to be taken seriously.Sounds like endless compiler writers vs. maintainers dispute. -Y
Peter Sewell
2015-Jul-01 15:06 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On 1 July 2015 at 13:29, Renato Golin <renato.golin at linaro.org> wrote:> On 1 July 2015 at 13:10, Peter Sewell <Peter.Sewell at cl.cam.ac.uk> wrote: >> while attractive from the compiler-writer point of view, is just not >> realistic, given the enormous body of C code out there which does >> depend on some particular properties which are not guaranteed by the >> ISO standard. > > There is also an enormous body of code that is just wrong. Do we have > to worry about getting that right, too? Trying to "understand" the > authors' intentions and do that instead of what they asked? > > Where do we draw the line? What do we consider "a reasonable > deviation" from just "plain wrong"?It varies from case to case, and one has to be pragmatic. But from what we see, in our survey results and in Table 1 of http://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf, there are a number of non-ISO idioms that really are used pervasively and for good reasons in systems code. - some are actually supported by mainstream compilers but not documented as such, e.g. where the ISO standard forbade things for now-obsolute h/w reasons. For those, we can identify a stronger-than-ISO mainstream semantics. For example, our Q12, making a null pointer by casting from an expression that isn't a constant but that evaluates to 0, might be in this category. - some are used more rarely but in important use-cases, and we could have options to turn off the relevant optimisations, or perhaps additional annotations in the source types, that guarantee they work. For example, our Q3 "Can one use pointer arithmetic between separately allocated C objects" may be like this. - for some, OS developers are already routinely turning off optimisations for the sake of more predictable semantics, e.g. with fno-strict-aliasing. - for a few (e.g. our Q1 and Q2, and maybe also Q9 and Q10), there are real conflicts, and it's not clear how to reconcile the compiler and systems-programmer views; there we're trying to understand what's possible. That might involve restricting some optimisations (and one should try to understand the cost thereof), or additional options, or documenting what compilers already do more clearly.> There is a large portion of non-standard documented behaviours in all > compilers, and GCC and Clang are particularly important here. Most > builtin functions, attributes, and extensions are supported by both > compilers in a similar way, and people can somewhat rely on it.> But > the only true reliable sources are the standards.Sadly the ISO standards are neither completely unambiguous nor a good guide to what can be or is assumed about implementations. (I say this having contributed to the C/C++11 standards.)> However, the very definition of undefined behaviour is "here be > dragons", and that's something that was purposely done to aid > compilers at optimising code. You may try to unite the open source > compilers in many ways (as I tried last year), but trying to regulate > undefined behaviour is not one of them. > > >> That code is not necessarily all gospel, of course, far from it - but >> its existence does have to be taken seriously. > > And we do! Though, in a completely different direction than you would expect. :) > > You advocate for better consistent support, which is ok and I, for > one, have gone down that path multiple times. But in this specific > case, the way we take it seriously is by warning the users of the > potential peril AND abuse of it for performance reasons. This is a > sweet spot because novice users will learn the language and advanced > users will like the performance.What we see in discussions with at least some communities of advanced users is not completely consistent with that, I'm afraid... thanks, Peter> cheers, > --renato