On Tue, Jun 25, 2019 at 2:53 AM Peter Smith <peter.smith at linaro.org> wrote:> > On Mon, 24 Jun 2019 at 23:23, Siva Chandra via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hello LLVM Developers, > > > > > > Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation. > > > > Are you able to share what some of these needs are? My reason for > asking is to see if there is a particular niche where existing libc > designs are not working, or if there is an approach that will handle > many use cases better than existing libc implementations.There have been a lot of questions about our reasons for opting to build a new libc and why an existing libc implementation does not meet our needs. I will try to address these questions in a general fashion in this email. I will answer individual concerns separately. Before I start, I also want to apologize if I am being late to answer, or appearing to be ignoring some of the emails. I am not trying to ignore or avoid any one or any question - it is just that I need time to process your questions and compose meaningful answers. So, we have a bunch of reasons for a new libc and why we prefer it to be a part of the LLVM project: 1. Static linking without the complexity of dynamic linking - Most libc implementations end up being complicated because they support dynamic loading/linking. This is not bad by itself, but we want to be able to take out dynamic linking capability where possible and get the benefits of the much simpler system. We believe that building everything in a “as a library fashion” would facilitate this. 2. As somebody else has pointed out in the list, we want to have a libc with as much fine grained modularity as possible. This not only helps one to pick and choose what they want, but also makes it easy to adapt to different build systems. Moreover, such a modular system will also facilitate deploying chunks of functionality during the transition from another libc to this new libc. 3. Sanitizer supported testing and fuzz testing from the start - Doing this from the start will impact few design choices non-trivially. For example, sanitizers need that a target be rebuilt with sanitizer specific specialized options. We want to develop the new libc in such a fashion that it will work with these specialized options as well. 4. ABI independent implementation as far as possible - There will be places where it would not be possible to implement in an ABI independent fashion. However, wherever possible, we want to use normal source code so that compiler-based changes to the ABI are easy. Our reasons for ABI independent implementations fall into two categories: a) Long term changes to the ABI for security like SCADS, and for performance tuning like caller/callee register ratios to better match software and hardware. b) Rapid deployment of specific ABI changes as part of security mitigation strategies such as those for Spectre. For example, speculative load hardening would have vastly benefitted from being able to change the calling convention. 5. Avoid assembly language as far as possible - Again, there will be places where one cannot avoid assembly level implementations. But, wherever possible, we want to avoid assembly level implementations. There are a few reasons here as well: a) We want to leverage the compiler for performance wherever possible, and as part of the LLVM project, fix compiler bugs rather than use assembly. b) Enable sanitizers and coverage-based fuzzing to work well across the implementation of libc. c) Allow deploying compiler-based security mitigations such as those we needed for Spectre. 6. Having the support of the LLVM community, project, and infrastructure - From access to the broad platform expertise in the community to the strong license and project structure, we think the project will be significantly more successful as part of LLVM than elsewhere. All this does not mean we want to implement everything from scratch. If someone has implementations for parts of the libc ready, and would like to contribute to this project under the LLVM license, we will certainly welcome it.
On Jun 26, 2019, at 11:20 AM, Siva Chandra via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > a) We want to leverage the compiler for performance wherever possible, > and as part of the LLVM project, fix compiler bugs rather than use > assembly.I love this approach as a way of driving low level performance forward! How do you anticipate this working in practice? For example, if someone says “I can shave 1 cycle out of this important thing if I write it in asm” and you know that a suitably capable compiler engineer can achieve the same thing given enough time, how do you plan to push back? -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190626/ee3e3156/attachment.html>
On Wed, Jun 26, 2019 at 11:52 PM Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Jun 26, 2019, at 11:20 AM, Siva Chandra via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > a) We want to leverage the compiler for performance wherever possible, > and as part of the LLVM project, fix compiler bugs rather than use > assembly. > > > I love this approach as a way of driving low level performance forward! How do you anticipate this working in practice? > > For example, if someone says “I can shave 1 cycle out of this important thing if I write it in asm” and you know that a suitably capable compiler engineer can achieve the same thing given enough time, how do you plan to push back?I think it's becoming uncommon to find cases like that today; the person who thinks they have a magic assembly hack finds that it works well for one microbenchmark on one architecture variant, but disappoints when used in real code. In fact, glibc has been throwing out a bunch of assembly code in recent years, as testing shows much of it to not to have any noticeable advantage. If the customized calling convention scheme works out, it's going to be a huge incentive to fix the compiler in case of performance lossage; it will be quite difficult to write assembly that is equally performant for all possible calling conventions, and if you try to assume a convention, then the assumption propagates up through the program, possibly defeating more important optimizations.
> On Jun 26, 2019, at 2:20 PM, Siva Chandra via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > 5. Avoid assembly language as far as possible - Again, there will be > places where one cannot avoid assembly level implementations. But, > wherever possible, we want to avoid assembly level implementations. > There are a few reasons here as well: > > a) We want to leverage the compiler for performance wherever possible, > and as part of the LLVM project, fix compiler bugs rather than use > assembly.As a long time libm and libc developer, and occasional compiler contributor, I will point out that this is either fundamentally in conflict with your other stated goals, entails a commitment to wide-ranging compiler improvements, or requires some very specific choices about your implementation. Much of a libc can be implemented quite easily in C or C++. However: - You say you want to conform to relevant standards; however, e.g. the UNIX test suite requires that math.h functions not set spurious flags. This is impossible to reliably achieve in C with clang, because clang and LLVM do not precisely model the floating-point environment. On Apple’s platforms, much of the math library is written in assembly as much for this reason as for performance. I see four basic options for you here: 1. You could partially work around this by adding builtins and an extensive conformance suite, making your implementations fragile to compiler optimization but detecting the breakages immediately. 2. You could do the work of precisely modeling the floating-point environment. 3. You could simply declare that you are not going to care about flags at all, which is fine for 99% of users, but is a clear break from relevant standards (and would make your libc unable to be adopted by some platform maintainers). 4. You could implement significant pieces of the math library in assembly. None of these is a decision to be undertaken lightly. Have you thought about this issue at all? I would also be curious what your plans are with regard to reproducible results in the math library: is it your intention to produce the same result on all platforms? On all microarchitectures? If so, and you’re developing for baseline x86_64 first, you’re locking yourself out of using many architectural features that are critical to delivering 30-50% of performance for these functions on other platforms (and even on newer x86)—static rounding control, FMA, etc. Even if you don’t care about that, implementation choices you make for around x86_64 will severely restrict your performance on other platforms if exact reproducibility is a requirement and you don’t carefully choose a set of “required ISA operations” on which to implement your math functions. - For most platforms, there are significant performance wins available for some of the core strings and memory functions using assembly, even as compared to the best compiler auto-vectorization output. There are a few reasons for this, but one of the major ones is that—in assembly, on most architectures—we can safely do aligned memory accesses that are partially outside the buffer that has been passed in, and mask off or ignore the bytes that are invalid. This is a hugely significant optimization for edging around core vector loops, and it’s simply unavailable in C and C++ because of the abstract memory models they define. A compiler could do this for you automatically, but this is not yet implemented in LLVM (and you don’t want to be tightly coupled to LLVM, anyway?) In practice, on many systems, the small-buffer case dominates usage for these functions, so getting the most efficient edging code is basically the only thing that matters. 1. Are you going to teach LLVM to perform these optimizations? If so, awesome, but this is not at all a small project—you’re not just fixing an isolated perf bug, you’re fundamentally reworking autovectorization. What about other compilers? 2. Are you going to simply write off performance in these cases and let the autovectorizer do what it does? 3. Will you use assembly instead purely for optimization purposes? A bunch of other questions will probably come to me around the math library, but I would encourage you to think very carefully about what specifications you want to have for a libm before you start building one. All that said, I think having more libc implementations is great, but I would be very careful to define what design tradeoffs you’re making around these choices and to what spec(s) you plan to conform, and why they necessitate a new libc rather than adapting an existing one. – Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/80412028/attachment.html>
On Fri, Jun 28, 2019 at 11:58 AM Stephen Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Jun 26, 2019, at 2:20 PM, Siva Chandra via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > [...] > > 1. You could partially work around this by adding builtins and an extensive conformance suite, making your implementations fragile to compiler optimization but detecting the breakages immediately. > 2. You could do the work of precisely modeling the floating-point environment. > 3. You could simply declare that you are not going to care about flags at all, which is fine for 99% of users, but is a clear break from relevant standards (and would make your libc unable to be adopted by some platform maintainers). > 4. You could implement significant pieces of the math library in assembly.I'm no math expert, but I tangle with clang vs glibc's math code regularly, and have discussed all this with Siva. It's too early to say exactly what the implementation will look like, but I anticipate it will be a combination of 1) and 2). There's really no alternative to having a mode that does accurate flag handling, but if the compiler has both library sources and call sites in hand, it should be able to determine whether it needs to include, say, underflow handling, and only compile in those parts. We've handicapped ourselves somewhat by having shifted to a model where the library functions are black boxes because of dynamic linking, and I think we can do better than just introducing more and more ifuncs or whatever. I also expect there will be more work to do in the compiler, both for builtins and for additional optimizations, and to me that is part of the rationale to put the libc project under LLVM in general. There won't be any secrets - if GCC folks want to try their hand at compiling this libc, they're welcome to it - but there will be some opportunities to co-develop library code that takes advantage of new compiler abilities and vice versa.> - For most platforms, there are significant performance wins available for some of the core strings and memory functions using assembly, even as compared to the best compiler auto-vectorization output. There are a few reasons for this, but one of the major ones is that—in assembly, on most architectures—we can safely do aligned memory accesses that are partially outside the buffer that has been passed in, and mask off or ignore the bytes that are invalid. This is a hugely significant optimization for edging around core vector loops, and it’s simply unavailable in C and C++ because of the abstract memory models they define. A compiler could do this for you automatically, but this is not yet implemented in LLVM (and you don’t want to be tightly coupled to LLVM, anyway?) In practice, on many systems, the small-buffer case dominates usage for these functions, so getting the most efficient edging code is basically the only thing that matters.Google does have a little experience in this area, mem* being the libc functions that perennially show up at the top of fleetwide performance profiles. (Lots of protobufs to move, I guess. :-) ) I imagine there will be both assembly and high-level versions in libc, and it will be the compiler's challenge to meet or beat the assembly code.> > 1. Are you going to teach LLVM to perform these optimizations? If so, awesome, but this is not at all a small project—you’re not just fixing an isolated perf bug, you’re fundamentally reworking autovectorization. What about other compilers? > 2. Are you going to simply write off performance in these cases and let the autovectorizer do what it does? > 3. Will you use assembly instead purely for optimization purposes? > > A bunch of other questions will probably come to me around the math library, but I would encourage you to think very carefully about what specifications you want to have for a libm before you start building one. All that said, I think having more libc implementations is great, but I would be very careful to define what design tradeoffs you’re making around these choices and to what spec(s) you plan to conform, and why they necessitate a new libc rather than adapting an existing one. > > – Steve > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev