Peter Sewell
2015-Jul-01 15:06 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On 1 July 2015 at 13:29, Renato Golin <renato.golin at linaro.org> wrote:> On 1 July 2015 at 13:10, Peter Sewell <Peter.Sewell at cl.cam.ac.uk> wrote: >> while attractive from the compiler-writer point of view, is just not >> realistic, given the enormous body of C code out there which does >> depend on some particular properties which are not guaranteed by the >> ISO standard. > > There is also an enormous body of code that is just wrong. Do we have > to worry about getting that right, too? Trying to "understand" the > authors' intentions and do that instead of what they asked? > > Where do we draw the line? What do we consider "a reasonable > deviation" from just "plain wrong"?It varies from case to case, and one has to be pragmatic. But from what we see, in our survey results and in Table 1 of http://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf, there are a number of non-ISO idioms that really are used pervasively and for good reasons in systems code. - some are actually supported by mainstream compilers but not documented as such, e.g. where the ISO standard forbade things for now-obsolute h/w reasons. For those, we can identify a stronger-than-ISO mainstream semantics. For example, our Q12, making a null pointer by casting from an expression that isn't a constant but that evaluates to 0, might be in this category. - some are used more rarely but in important use-cases, and we could have options to turn off the relevant optimisations, or perhaps additional annotations in the source types, that guarantee they work. For example, our Q3 "Can one use pointer arithmetic between separately allocated C objects" may be like this. - for some, OS developers are already routinely turning off optimisations for the sake of more predictable semantics, e.g. with fno-strict-aliasing. - for a few (e.g. our Q1 and Q2, and maybe also Q9 and Q10), there are real conflicts, and it's not clear how to reconcile the compiler and systems-programmer views; there we're trying to understand what's possible. That might involve restricting some optimisations (and one should try to understand the cost thereof), or additional options, or documenting what compilers already do more clearly.> There is a large portion of non-standard documented behaviours in all > compilers, and GCC and Clang are particularly important here. Most > builtin functions, attributes, and extensions are supported by both > compilers in a similar way, and people can somewhat rely on it.> But > the only true reliable sources are the standards.Sadly the ISO standards are neither completely unambiguous nor a good guide to what can be or is assumed about implementations. (I say this having contributed to the C/C++11 standards.)> However, the very definition of undefined behaviour is "here be > dragons", and that's something that was purposely done to aid > compilers at optimising code. You may try to unite the open source > compilers in many ways (as I tried last year), but trying to regulate > undefined behaviour is not one of them. > > >> That code is not necessarily all gospel, of course, far from it - but >> its existence does have to be taken seriously. > > And we do! Though, in a completely different direction than you would expect. :) > > You advocate for better consistent support, which is ok and I, for > one, have gone down that path multiple times. But in this specific > case, the way we take it seriously is by warning the users of the > potential peril AND abuse of it for performance reasons. This is a > sweet spot because novice users will learn the language and advanced > users will like the performance.What we see in discussions with at least some communities of advanced users is not completely consistent with that, I'm afraid... thanks, Peter> cheers, > --renato
Joerg Sonnenberger
2015-Jul-02 16:07 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On Wed, Jul 01, 2015 at 04:06:45PM +0100, Peter Sewell wrote:> - for some, OS developers are already routinely turning off > optimisations for the sake of more predictable semantics, e.g. with > fno-strict-aliasing.This one is interesting, because the biggest problem with strict aliasing is that there is no standard compliant way to override it. The most basic issue is how the allocator is supposed to work internally. If you can fully inline malloc/free pairs, it is practically impossible to avoid aliasing conflicts. Other important use cases are things like vectorizing access, which often means checking for the alignment of the data and casting to a more appropiate type. Not everyone wants to implement strlen in assembler, but writing a standard compliant and still fast implementation in C seems impossible. Joerg
Sean Silva
2015-Jul-02 21:19 UTC
[LLVMdev] C as used/implemented in practice: analysis of responses
On Thu, Jul 2, 2015 at 9:07 AM, Joerg Sonnenberger <joerg at britannica.bec.de> wrote:> On Wed, Jul 01, 2015 at 04:06:45PM +0100, Peter Sewell wrote: > > - for some, OS developers are already routinely turning off > > optimisations for the sake of more predictable semantics, e.g. with > > fno-strict-aliasing. > > This one is interesting, because the biggest problem with strict > aliasing is that there is no standard compliant way to override it. > The most basic issue is how the allocator is supposed to work > internally. If you can fully inline malloc/free pairs, it is practically > impossible to avoid aliasing conflicts. >I thought strict aliasing was more about types? I think memcpy is the standard escape hatch there. Generally speaking a fixed-size memcpy into a local variable is optimized down into the load/store that you want. This is e.g. how Support/Endian.h works. Yes, this is very awkward, and there might be some namespacing issues when writing memcpy itself though :) -- Sean Silva> > Other important use cases are things like vectorizing access, which > often means checking for the alignment of the data and casting to a more > appropiate type. Not everyone wants to implement strlen in assembler, > but writing a standard compliant and still fast implementation in C > seems impossible. > > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150702/18287d96/attachment.html>