thr3ads.net - llvm dev - [llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++ [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Guillaume Chatelet via llvm-dev

2019-Apr-26 11:47 UTC

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

*TL;DR:*
Defining memory functions in C / C++ results in a chicken and egg problem.
Clang can mutate the code into semantically equivalent calls to libc. None
of `-fno-builtin-memcpy`, `-ffreestanding` nor `-nostdlib` provide a
satisfactory answer to the problem.

*Goal*
Create libc's memory functions (aka `memcpy`, `memset`, `memcmp`, ...) in
C++ to benefit from compiler's knowledge and profile guided optimizations.

*Current state*
LLVM is allowed to replace a piece of code that looks like a memcpy with an
IR intrinsic that implements the same semantic, namely `call void
@llvm.memcpy.p0i8.p0i8.i64` (e.g. https://godbolt.org/z/0y1Yqh).

This is a problem when designing a libc's memory function as the compiler
may choose to replace the implementation with a call to itself (e.g.
https://godbolt.org/z/eg0p_E)

Using `-fno-builtin-memcpy` prevents the compiler from understanding that
an expression has memory copy semantic, effectively removing `@llvm.memcpy`
at the IR level : https://godbolt.org/z/lnCIIh. In this specific example,
the vectorizer kicks in and the generated code is quite good. Unfortunately
this is not always the case: https://godbolt.org/z/mHpAYe.

In addition `-fno-builtin-memcpy` prevents the compiler from understanding
that a piece of code has the memory copy semantic but does not prevent the
compiler from generating calls to libc's `memcpy`, for instance:
Using `__builtin_memcpy`: https://godbolt.org/z/O0sjIl
Passing big structs by value: https://godbolt.org/z/4BUDc0

In both cases, the generated `@llvm.memcpy` IR intrinsic is lowered into a
libc `memcpy` call.

We would like to use `__builtin_memcpy` to communicate the semantic to the
compiler but prevent it from generating calls to the libc.

One could argue that this is the purpose of `-ffreestanding` but the
standard leaves a lot of freestanding requirements implementation defined (
see https://en.cppreference.com/w/cpp/freestanding ).

In practice, making sure that `-ffreestanding` never calls libc memory
functions will probably do more harm than good. People using
`-ffreestanding` are now expecting the compiler to call these functions,
inlining bloat can be problematic for the embedded world ( see comments in
https://reviews.llvm.org/D60719 )

*Proposals*
We envision two approaches: an *attribute to prevent the compiler from
synthesizing calls* or a *set of builtins* to communicate the intent more
precisely to the compiler.

  1. A function/module attribute to disable synthesis of calls

    1.1 A specific attribute to disable the synthesis of a single call
__attribute__((disable_call_synthesis("memcpy")))
Question: Is it possible to specify the attribute several times on a
function to disable many calls?

    1.2 A specific attribute to disable synthesis of all libc calls
__attribute__((disable_libc_call_synthesis))
With this one we are losing precision and we may inline too much. There is
also the question of what is considered a libc function, LLVM mainly
defines target library calls.

    1.3 Stretch - a specific attribute to redirect a single synthesizable
function.
This one would help explore the impact of replacing a synthesized function
call with another function but is not strictly required to solve the
problem at hand.
__attribute__((redirect_synthesized_calls("memcpy",
"my_memcpy")))

  2. A set of builtins in clang to communicate the intent clearly

__builtin_memcpy_alwaysinline(...)
__builtin_memmove_alwaysinline(...)
__builtin_memset_alwaysinline(...)

To achieve this we may have to provide new IR builtins (e.g.
`@llvm.alwaysinline_memcpy`) which can be a lot of work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190426/1ecade18/attachment.html>

David Chisnall via llvm-dev

2019-Apr-29 08:48 UTC

head link

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

On 26/04/2019 12:47, Guillaume Chatelet via llvm-dev
wrote:>      1.2 A specific attribute to disable synthesis of all libc calls
> __attribute__((disable_libc_call_synthesis))
> With this one we are losing precision and we may inline too much. There 
> is also the question of what is considered a libc function, LLVM mainly 
> defines target library calls.
Target library is probably more relevant than libc. We have a number of 
issues with libm on tier 2 platforms for FreeBSD without assembly fast 
paths.  This requires work-arounds for the fact that clang likes to say 
'oh, this function seems to be calling X on the result of Y, and I know 
that this can be more efficient if you replace that sequence with Z', 
ignoring the fact that this case is an implementation of Z.

The same thing is true in Objective-C runtime implementations, where we 
need to be careful to avoid LLVM performing optimisations on the ARC 
functions that result in infinite recursion.

There are numerous cases of compiler-rt suffering from the same issue.

TL;DR: This is a really important problem for clang and your proposed 
solution 1 looks like it is far more broadly applicable.

David

Guillaume Chatelet via llvm-dev

2019-Apr-30 14:01 UTC

head link

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Thx for the feedback David.

So we're heading toward a broader> __attribute__((disable_call_synthesis))
David what do you think about the additional version that restrict the
effect to a few named functions?> e.g. __attribute__((disable_call_synthesis("memset",
"memcpy", "sqrt")))
A warning should be issued if the arguments are not part of
RuntimeLibcalls.def.

Also I'd like to get your take on whether it makes sense to have this
attribute apply to functions only or at module level as well.

Thx,
Guillaume

On Mon, Apr 29, 2019 at 10:48 AM David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 26/04/2019 12:47, Guillaume Chatelet via llvm-dev wrote:
> >      1.2 A specific attribute to disable synthesis of all libc calls
> > __attribute__((disable_libc_call_synthesis))
> > With this one we are losing precision and we may inline too much.
There
> > is also the question of what is considered a libc function, LLVM
mainly
> > defines target library calls.
>
> Target library is probably more relevant than libc. We have a number of
> issues with libm on tier 2 platforms for FreeBSD without assembly fast
> paths.  This requires work-arounds for the fact that clang likes to say
> 'oh, this function seems to be calling X on the result of Y, and I know
> that this can be more efficient if you replace that sequence with Z',
> ignoring the fact that this case is an implementation of Z.
>
> The same thing is true in Objective-C runtime implementations, where we
> need to be careful to avoid LLVM performing optimisations on the ARC
> functions that result in infinite recursion.
>
> There are numerous cases of compiler-rt suffering from the same issue.
>
> TL;DR: This is a really important problem for clang and your proposed
> solution 1 looks like it is far more broadly applicable.
>
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190430/f7bf5e5d/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Apr 2019 - [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

[llvm-dev] [RFC][clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Possibly Parallel Threads