Guillaume Chatelet via llvm-dev
2019-Apr-26 11:47 UTC
[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
*TL;DR:* Defining memory functions in C / C++ results in a chicken and egg problem. Clang can mutate the code into semantically equivalent calls to libc. None of `-fno-builtin-memcpy`, `-ffreestanding` nor `-nostdlib` provide a satisfactory answer to the problem. *Goal* Create libc's memory functions (aka `memcpy`, `memset`, `memcmp`, ...) in C++ to benefit from compiler's knowledge and profile guided optimizations. *Current state* LLVM is allowed to replace a piece of code that looks like a memcpy with an IR intrinsic that implements the same semantic, namely `call void @llvm.memcpy.p0i8.p0i8.i64` (e.g. https://godbolt.org/z/0y1Yqh). This is a problem when designing a libc's memory function as the compiler may choose to replace the implementation with a call to itself (e.g. https://godbolt.org/z/eg0p_E) Using `-fno-builtin-memcpy` prevents the compiler from understanding that an expression has memory copy semantic, effectively removing `@llvm.memcpy` at the IR level : https://godbolt.org/z/lnCIIh. In this specific example, the vectorizer kicks in and the generated code is quite good. Unfortunately this is not always the case: https://godbolt.org/z/mHpAYe. In addition `-fno-builtin-memcpy` prevents the compiler from understanding that a piece of code has the memory copy semantic but does not prevent the compiler from generating calls to libc's `memcpy`, for instance: Using `__builtin_memcpy`: https://godbolt.org/z/O0sjIl Passing big structs by value: https://godbolt.org/z/4BUDc0 In both cases, the generated `@llvm.memcpy` IR intrinsic is lowered into a libc `memcpy` call. We would like to use `__builtin_memcpy` to communicate the semantic to the compiler but prevent it from generating calls to the libc. One could argue that this is the purpose of `-ffreestanding` but the standard leaves a lot of freestanding requirements implementation defined ( see https://en.cppreference.com/w/cpp/freestanding ). In practice, making sure that `-ffreestanding` never calls libc memory functions will probably do more harm than good. People using `-ffreestanding` are now expecting the compiler to call these functions, inlining bloat can be problematic for the embedded world ( see comments in https://reviews.llvm.org/D60719 ) *Proposals* We envision two approaches: an *attribute to prevent the compiler from synthesizing calls* or a *set of builtins* to communicate the intent more precisely to the compiler. 1. A function/module attribute to disable synthesis of calls 1.1 A specific attribute to disable the synthesis of a single call __attribute__((disable_call_synthesis("memcpy"))) Question: Is it possible to specify the attribute several times on a function to disable many calls? 1.2 A specific attribute to disable synthesis of all libc calls __attribute__((disable_libc_call_synthesis)) With this one we are losing precision and we may inline too much. There is also the question of what is considered a libc function, LLVM mainly defines target library calls. 1.3 Stretch - a specific attribute to redirect a single synthesizable function. This one would help explore the impact of replacing a synthesized function call with another function but is not strictly required to solve the problem at hand. __attribute__((redirect_synthesized_calls("memcpy", "my_memcpy"))) 2. A set of builtins in clang to communicate the intent clearly __builtin_memcpy_alwaysinline(...) __builtin_memmove_alwaysinline(...) __builtin_memset_alwaysinline(...) To achieve this we may have to provide new IR builtins (e.g. `@llvm.alwaysinline_memcpy`) which can be a lot of work. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190426/1ecade18/attachment.html>
David Chisnall via llvm-dev
2019-Apr-29 08:48 UTC
[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
On 26/04/2019 12:47, Guillaume Chatelet via llvm-dev wrote:> 1.2 A specific attribute to disable synthesis of all libc calls > __attribute__((disable_libc_call_synthesis)) > With this one we are losing precision and we may inline too much. There > is also the question of what is considered a libc function, LLVM mainly > defines target library calls.Target library is probably more relevant than libc. We have a number of issues with libm on tier 2 platforms for FreeBSD without assembly fast paths. This requires work-arounds for the fact that clang likes to say 'oh, this function seems to be calling X on the result of Y, and I know that this can be more efficient if you replace that sequence with Z', ignoring the fact that this case is an implementation of Z. The same thing is true in Objective-C runtime implementations, where we need to be careful to avoid LLVM performing optimisations on the ARC functions that result in infinite recursion. There are numerous cases of compiler-rt suffering from the same issue. TL;DR: This is a really important problem for clang and your proposed solution 1 looks like it is far more broadly applicable. David
Guillaume Chatelet via llvm-dev
2019-Apr-30 14:01 UTC
[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
Thx for the feedback David. So we're heading toward a broader> __attribute__((disable_call_synthesis))David what do you think about the additional version that restrict the effect to a few named functions?> e.g. __attribute__((disable_call_synthesis("memset", "memcpy", "sqrt")))A warning should be issued if the arguments are not part of RuntimeLibcalls.def. Also I'd like to get your take on whether it makes sense to have this attribute apply to functions only or at module level as well. Thx, Guillaume On Mon, Apr 29, 2019 at 10:48 AM David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 26/04/2019 12:47, Guillaume Chatelet via llvm-dev wrote: > > 1.2 A specific attribute to disable synthesis of all libc calls > > __attribute__((disable_libc_call_synthesis)) > > With this one we are losing precision and we may inline too much. There > > is also the question of what is considered a libc function, LLVM mainly > > defines target library calls. > > Target library is probably more relevant than libc. We have a number of > issues with libm on tier 2 platforms for FreeBSD without assembly fast > paths. This requires work-arounds for the fact that clang likes to say > 'oh, this function seems to be calling X on the result of Y, and I know > that this can be more efficient if you replace that sequence with Z', > ignoring the fact that this case is an implementation of Z. > > The same thing is true in Objective-C runtime implementations, where we > need to be careful to avoid LLVM performing optimisations on the ARC > functions that result in infinite recursion. > > There are numerous cases of compiler-rt suffering from the same issue. > > TL;DR: This is a really important problem for clang and your proposed > solution 1 looks like it is far more broadly applicable. > > David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190430/f7bf5e5d/attachment.html>
Possibly Parallel Threads
- [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
- [RFC] Small Bitfield utilities
- [RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
- [RFC] Small Bitfield utilities
- Introducing an Alignment object in LLVM