thr3ads.net - llvm dev - [LLVMdev] [RFC] Linkage of user-supplied library functions in LTO [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Mar-10 16:49 UTC

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On Mar 8, 2014, at 3:43 PM, Krzysztof Parzyszek <kparzysz at
codeaurora.org> wrote:
> I believe it doesn't matter if the symbols in sections are internal or
external---that only matters for symbol resolution.
Given this...
> I've read the original thread (the 3 emails), and I'm still not
sure what the purpose of internalization is in the context of user-provided
library functions.
...I’m not sure there is a point.

The general idea is: unless the linker has told us to preserve a symbol,
internalize it, exposing it to other optimizations (like -globalopt).
However, for library functions, this breaks down because later passes
insert calls (e.g., -instcombine converts printf => puts, and
-codegenprepare converts llvm.memcpy => memcpy).  So, add them to
@llvm.compiler.used to protect them temporarily.

If

  - the linker (e.g., /bin/ld) will delete unreferenced symbols (through
    -dead_strip, etc.) only if they have local linkage, or

  - LTO has a pass that will delete unreferenced symbols with local
    linkage *after* @llvm.compiler.used gets dropped (maybe we can add
    this),

then there’s a point.
> If the output of LTO is one giant object file, it could make some sense
(since the assembler could potentially do the "symbol resolution”).
The output of LTO *is* one giant object file, but the linker (e.g.,
/bin/ld) may be linking it with other object files.
> Otherwise the problem is in telling the "ld" which definition of
"printf" it needs to pick up,
In the LTO API, the linker should call
lto_codegen_add_must_preserve_symbol() on symbols it expects to come out
the other side.  Basically, the user of LTO decides which version of
printf to pick up.  If there are any calls to printf from outside the
bitcode and the linker is using the one in the bitcode, then the one in
the bitcode won’t be internalized.
> or asking the user not to link the program with libc (a bit of a
questionable request).
A common case for user-supplied library functions is that users cannot
link against libc, so they supply their own.  This shouldn’t be the only
supported case, though.
> How are optimizations "incorrectly modifying" (user-provided)
library functions?
The current problem is that -instcombine will rename the function through
Module::getOrInsertFunction().  getOrInsertFunction() chooses this path
because the function has local linkage.  However, the function is a
member of @llvm.compiler.used, so it shouldn’t really be modified.

I think in the normal case (non-LTO, where -internalize hasn’t run),
Module::getOrInsertFunction() *should* take this path with functions that
have local linkage.  And it’s not trivial to check for membership in
@llvm.compiler.used.

Krzysztof Parzyszek

2014-Mar-10 19:30 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On 3/10/2014 11:49 AM, Duncan P. N. Exon Smith wrote:>
> The general idea is: unless the linker has told us to preserve a symbol,
> internalize it, exposing it to other optimizations (like -globalopt).
> However, for library functions, this breaks down because later passes
> insert calls (e.g., -instcombine converts printf => puts, and
> -codegenprepare converts llvm.memcpy => memcpy).  So, add them to
> @llvm.compiler.used to protect them temporarily.
I see. Thanks for the explanation.

How about this: resolve symbols during LTO and detect which ones are 
used in the program, and which are not?  It would probably require a lot 
more work in the LTO framework, but it has the benefit that we no longer 
need any "preserve" list merely to allow it to link, or 
"internalization".  The only exception would be export lists for
shared
objects, but it would be a lot easier for the user to provide that, than 
to list all the functions referenced from other objects/libraries.

To help with user-provided library functions we could develop a way for 
the user to specify "resolution preference", i.e. if
"printf" is not
explicitly defined, pick it up from /usr/lib/libc.a, otherwise ignore 
the definition from libc.a.
This would only work for linking non-shared objects though.  If the user 
wants to get the rest of the functions from libc, the replaced ones 
would need to be internally renamed to avoid conflicts with those in 
libc (the system linker could otherwise complain).

Has anything like this been considered?

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Rafael Espíndola

2014-Mar-10 20:01 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

> I see. Thanks for the explanation.
>
>
> How about this: resolve symbols during LTO and detect which ones are used
in
> the program, and which are not?  It would probably require a lot more work
> in the LTO framework, but it has the benefit that we no longer need any
> "preserve" list merely to allow it to link, or
"internalization".  The only
> exception would be export lists for shared objects, but it would be a lot
> easier for the user to provide that, than to list all the functions
> referenced from other objects/libraries.
>
> To help with user-provided library functions we could develop a way for the
> user to specify "resolution preference", i.e. if
"printf" is not explicitly
> defined, pick it up from /usr/lib/libc.a, otherwise ignore the definition
> from libc.a.
> This would only work for linking non-shared objects though.  If the user
> wants to get the rest of the functions from libc, the replaced ones would
> need to be internally renamed to avoid conflicts with those in libc (the
> system linker could otherwise complain).
During LTO time we don't know exactly which functions will be used. If
memcpy is used or not depends on which backend we are using for
example. The only reliable way would be iterate codegen, making not at
each step which new undefined references shows up, which is an
overkill.

The current strategy of just knowing which symbols llvm *might* use
seems appropriate, we just to fix llvm to always respect it.

Cheers,
Rafael

llvm dev - Mar 2014 - [LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO