thr3ads.net - llvm dev - [LLVMdev] [RFC] Linkage of user-supplied library functions in LTO [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Mar-08 16:59 UTC

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

+nick and rafael, who seem to a lot about linkage.

I made the following claim on llmv-commits [1]:

On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at apple.com>
wrote:
> Giving these functions internal linkage allows them to be dead-stripped.
Is that even correct?

This is the assumption I’ve been working under, but I’m not sure where I
got it from.  It seems like the linker is free to dead-strip symbols
whether they’re internal or external.

The current state of user-supplied library functions in LTO is that we
internalize library functions, add them to llvm.compiler.used so that
optimizations don’t modify them, and then optimizations incorrectly
modify them.  LLVM *really* doesn't expect library functions to have
local linkage.

And I’m not sure internal linkage is the right model anyway.

I see two paths forward:

 1. Add a new linkage type called “linker_internal”, which LLVM treats as
    a type of non-local linkage, but gets emitted as internal.  This
    might be worth it if linkers don’t dead strip external symbols.

 2. If linkers *do* dead strip external symbols, then we should not
    internalize user-supplied library functions in -internalize.

Do linkers dead strip symbols with external linkage?  Any other reason to
prefer one path over the other?  Is there another way?

Duncan

[1]:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140303/207033.html

Krzysztof Parzyszek

2014-Mar-08 23:43 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On 3/8/2014 10:59 AM, Duncan P. N. Exon Smith wrote:> +nick and rafael, who seem to a lot about linkage.
>
> I made the following claim on llmv-commits [1]:
>
> On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
>
>> Giving these functions internal linkage allows them to be
dead-stripped.
>
> Is that even correct?
If by "linker" you mean things like /bin/ld, then linkers can 
garbage-collect sections that don't contain any referenced symbols, not 
the symbols themselves. I believe it doesn't matter if the symbols in 
sections are internal or external---that only matters for symbol resolution.

> The current state of user-supplied library functions in LTO is that we
> internalize library functions, add them to llvm.compiler.used so that
> optimizations don’t modify them, and then optimizations incorrectly
> modify them.  LLVM *really* doesn't expect library functions to have
> local linkage.
I've read the original thread (the 3 emails), and I'm still not sure 
what the purpose of internalization is in the context of user-provided 
library functions. If the output of LTO is one giant object file, it 
could make some sense (since the assembler could potentially do the 
"symbol resolution"). Otherwise the problem is in telling the
"ld" which
definition of "printf" it needs to pick up, or asking the user not to 
link the program with libc (a bit of a questionable request).

How are optimizations "incorrectly modifying" (user-provided) library 
functions?


-Krzysztof


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Nick Kledzik

2014-Mar-09 00:03 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On Mar 8, 2014, at 8:59 AM, Duncan P. N. Exon Smith wrote:
> +nick and rafael, who seem to a lot about linkage.
> 
> I made the following claim on llmv-commits [1]:
> 
> On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> 
>> Giving these functions internal linkage allows them to be
dead-stripped.
> 
> Is that even correct?
> 
> This is the assumption I’ve been working under, but I’m not sure where I
> got it from.  It seems like the linker is free to dead-strip symbols
> whether they’re internal or external.The darwin linker does not care about internal vs external when doing
liveness analysis.  But, by default when dylibs (DSOs) are created,
all global symbols are marked live at the start of the analysis.  This
is not the case for main executables which just have main() and any 
initializers marked live initially.  



> 
> The current state of user-supplied library functions in LTO is that we
> internalize library functions, add them to llvm.compiler.used so that
> optimizations don’t modify them, and then optimizations incorrectly
> modify them.  LLVM *really* doesn't expect library functions to have
> local linkage.
> Since most code is dynamic that is probably why LLVM expects library
functions to not be local.

The case that revealed this bug was someone building a static binary.
Most code these days is dynamic, so libLTO will not see the implementation
of libcalls functions.  Whereas with static binaries, it will always see libcall
function implementations.  I don't know if supply that (static vs dynamic)
to libLTO would help.  

> And I’m not sure internal linkage is the right model anyway.
> 
> I see two paths forward:
> 
> 1. Add a new linkage type called “linker_internal”, which LLVM treats as
>    a type of non-local linkage, but gets emitted as internal.  This
>    might be worth it if linkers don’t dead strip external symbols.
> 
> 2. If linkers *do* dead strip external symbols, then we should not
>    internalize user-supplied library functions in -internalize.
> 
> Do linkers dead strip symbols with external linkage?  This is probably the wrong question.  When linking main executables, the
linker can dead strip external functions.  But if LTO is used with a main
executable, the linker will not tell libLTO to preserve the global symbols,
so they will quickly be made internal.  

My question:  will libLTO ever dead strip a non-internal function?  If not,
is that one of the reasons the internalize pass tries to make functions 
internal?


> Any other reason to
> prefer one path over the other?  Is there another way?
>From the darwin linker's perspective, if libLTO does not dead stripa libcall function implementation, that is OK.  The linker runs another 
liveness analysis pass after the libLTO result and any other mach-o f
files are merged. So any extra functions will still get deleted from the 
final output.

-Nick

Rafael Espíndola

2014-Mar-10 14:48 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

>> Giving these functions internal linkage allows them to be
dead-stripped.
>
> Is that even correct?
By LLVM point of view, yes. It can drop linkonce and local (private* +
internal) globals.
> This is the assumption I’ve been working under, but I’m not sure where I
> got it from.  It seems like the linker is free to dead-strip symbols
> whether they’re internal or external.
The system linker, yes. LLVM knows it is not seeing the full picture
with regards to external ones.
> The current state of user-supplied library functions in LTO is that we
> internalize library functions, add them to llvm.compiler.used so that
> optimizations don’t modify them, and then optimizations incorrectly
> modify them.  LLVM *really* doesn't expect library functions to have
> local linkage.
Why not just fix the optimizations that are not handling
llvm.compiler.used correctly?
> And I’m not sure internal linkage is the right model anyway.
>
> I see two paths forward:
>
>  1. Add a new linkage type called “linker_internal”, which LLVM treats as
>     a type of non-local linkage, but gets emitted as internal.  This
>     might be worth it if linkers don’t dead strip external symbols.
>
>  2. If linkers *do* dead strip external symbols, then we should not
>     internalize user-supplied library functions in -internalize.
>
> Do linkers dead strip symbols with external linkage?  Any other reason to
> prefer one path over the other?  Is there another way?
So, linkers have a better view of what is and is not used. They pass
that information down to LLVM during link. The thing llvm has to be
careful about are symbols it can introduce calls to (like memcpy). For
those llvm.compiler.used should be fine.

Using llvm.compiler.used and llvm.used is pretty annoying, and should
probably be made into an easier to use attribute, but probably not
folded into linkage. It is pretty orthogonal to other linkage
properties. We can have a llvm.used that is weak_odr, external or
internal for example.

Cheers,
Rafael

Duncan P. N. Exon Smith

2014-Mar-10 16:49 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On Mar 8, 2014, at 3:43 PM, Krzysztof Parzyszek <kparzysz at
codeaurora.org> wrote:
> I believe it doesn't matter if the symbols in sections are internal or
external---that only matters for symbol resolution.
Given this...
> I've read the original thread (the 3 emails), and I'm still not
sure what the purpose of internalization is in the context of user-provided
library functions.
...I’m not sure there is a point.

The general idea is: unless the linker has told us to preserve a symbol,
internalize it, exposing it to other optimizations (like -globalopt).
However, for library functions, this breaks down because later passes
insert calls (e.g., -instcombine converts printf => puts, and
-codegenprepare converts llvm.memcpy => memcpy).  So, add them to
@llvm.compiler.used to protect them temporarily.

If

  - the linker (e.g., /bin/ld) will delete unreferenced symbols (through
    -dead_strip, etc.) only if they have local linkage, or

  - LTO has a pass that will delete unreferenced symbols with local
    linkage *after* @llvm.compiler.used gets dropped (maybe we can add
    this),

then there’s a point.
> If the output of LTO is one giant object file, it could make some sense
(since the assembler could potentially do the "symbol resolution”).
The output of LTO *is* one giant object file, but the linker (e.g.,
/bin/ld) may be linking it with other object files.
> Otherwise the problem is in telling the "ld" which definition of
"printf" it needs to pick up,
In the LTO API, the linker should call
lto_codegen_add_must_preserve_symbol() on symbols it expects to come out
the other side.  Basically, the user of LTO decides which version of
printf to pick up.  If there are any calls to printf from outside the
bitcode and the linker is using the one in the bitcode, then the one in
the bitcode won’t be internalized.
> or asking the user not to link the program with libc (a bit of a
questionable request).
A common case for user-supplied library functions is that users cannot
link against libc, so they supply their own.  This shouldn’t be the only
supported case, though.
> How are optimizations "incorrectly modifying" (user-provided)
library functions?
The current problem is that -instcombine will rename the function through
Module::getOrInsertFunction().  getOrInsertFunction() chooses this path
because the function has local linkage.  However, the function is a
member of @llvm.compiler.used, so it shouldn’t really be modified.

I think in the normal case (non-LTO, where -internalize hasn’t run),
Module::getOrInsertFunction() *should* take this path with functions that
have local linkage.  And it’s not trivial to check for membership in
@llvm.compiler.used.

Duncan P. N. Exon Smith

2014-Mar-10 17:17 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On Mar 8, 2014, at 4:03 PM, Nick Kledzik <kledzik at apple.com> wrote:
> The darwin linker does not care about internal vs external when doing
> liveness analysis.  But, by default when dylibs (DSOs) are created,
> all global symbols are marked live at the start of the analysis.  This
> is not the case for main executables which just have main() and any 
> initializers marked live initially.  
This is interesting.  So the distinction does matter for shared objects,
but not for main executables.  But LTO will be told to preserve all the
global symbols anyway for shared objects.
> Since most code is dynamic that is probably why LLVM expects library
> functions to not be local.
> 
> The case that revealed this bug was someone building a static binary.
> Most code these days is dynamic, so libLTO will not see the implementation
> of libcalls functions.  Whereas with static binaries, it will always see
libcall
> function implementations.  I don't know if supply that (static vs
dynamic)
> to libLTO would help.  
That’s an idea, but I don’t think it’s necessary.

In static binaries, it’s important to optimize out unused user-supplied
library functions for size reasons.  But the linker is going to do that
whether we make these functions local or global, so there’s no benefit
to internalizing them.

In dynamic binaries, users will link against a dynamic libc and are
unlikely to provide their own library function implementations.  So
there’s no benefit to internalizing them here, either.
>> Do linkers dead strip symbols with external linkage?  
> 
> This is probably the wrong question.  When linking main executables, the
> linker can dead strip external functions.  But if LTO is used with a main
> executable, the linker will not tell libLTO to preserve the global symbols,
> so they will quickly be made internal.  
> 
> My question:  will libLTO ever dead strip a non-internal function?  If not,
> is that one of the reasons the internalize pass tries to make functions 
> internal?
Exactly.  libLTO will only dead strip functions with local linkage (such
as internal).  During normal (non-LTO) optimizations, only functions with
local linkage are safe to remove.  LTO runs -internalize before other
optimizations so that the rest of the optimizations don’t need to know
that anything is different.

(This is where I got the (apparently incorrect) idea that the linker
would only dead-strip internal functions.)
> From the darwin linker's perspective, if libLTO does not dead strip
> a libcall function implementation, that is OK.  The linker runs another 
> liveness analysis pass after the libLTO result and any other mach-o f
> files are merged. So any extra functions will still get deleted from the 
> final output.
Okay, great.  I think then it’s safe to remove the @llvm.compiler.used
hack that I went with originally, and just leave them external.

Duncan P. N. Exon Smith

2014-Mar-10 17:43 UTC

head link

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

On Mar 10, 2014, at 7:48 AM, Rafael Espíndola <rafael.espindola at
gmail.com> wrote:
> By LLVM point of view, yes. It can drop linkonce and local (private* +
> internal) globals.
[...]
> The system linker, yes. LLVM knows it is not seeing the full picture
> with regards to external ones.
I mistakenly assumed the LLVM perspective applied also to the system
linker (!).
> Why not just fix the optimizations that are not handling
> llvm.compiler.used correctly?
That’s valuable work.  However, for this use case, there doesn’t seem
to be any benefit in relying on @llvm.compiler.used.  If I’d properly
understood how linkers work, I wouldn’t have complicated the flow in
the first place (i.e., I think r194514 should have just blocked
-internalize from giving these functions local linkage).

I’m also not sure what Module::getOrInsertFunction() *should* do when
it finds a function with local linkage in @llvm.compiler.used.  I
think its current behaviour is correct most of the time (moving
functions with local linkage seems correct in the usual case), and as
you point out below, checking for membership in @llvm.compiler.used
is not cheap.
> So, linkers have a better view of what is and is not used. They pass
> that information down to LLVM during link. The thing llvm has to be
> careful about are symbols it can introduce calls to (like memcpy). For
> those llvm.compiler.used should be fine.
> 
> Using llvm.compiler.used and llvm.used is pretty annoying, and should
> probably be made into an easier to use attribute, but probably not
> folded into linkage. It is pretty orthogonal to other linkage
> properties. We can have a llvm.used that is weak_odr, external or
> internal for example.
> 
> Cheers,
> Rafael

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2014 - [LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO

Possibly Parallel Threads