Duncan P. N. Exon Smith
2014-Mar-08 16:59 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
+nick and rafael, who seem to a lot about linkage. I made the following claim on llmv-commits [1]: On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> Giving these functions internal linkage allows them to be dead-stripped.Is that even correct? This is the assumption I’ve been working under, but I’m not sure where I got it from. It seems like the linker is free to dead-strip symbols whether they’re internal or external. The current state of user-supplied library functions in LTO is that we internalize library functions, add them to llvm.compiler.used so that optimizations don’t modify them, and then optimizations incorrectly modify them. LLVM *really* doesn't expect library functions to have local linkage. And I’m not sure internal linkage is the right model anyway. I see two paths forward: 1. Add a new linkage type called “linker_internal”, which LLVM treats as a type of non-local linkage, but gets emitted as internal. This might be worth it if linkers don’t dead strip external symbols. 2. If linkers *do* dead strip external symbols, then we should not internalize user-supplied library functions in -internalize. Do linkers dead strip symbols with external linkage? Any other reason to prefer one path over the other? Is there another way? Duncan [1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140303/207033.html
Krzysztof Parzyszek
2014-Mar-08 23:43 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On 3/8/2014 10:59 AM, Duncan P. N. Exon Smith wrote:> +nick and rafael, who seem to a lot about linkage. > > I made the following claim on llmv-commits [1]: > > On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > >> Giving these functions internal linkage allows them to be dead-stripped. > > Is that even correct?If by "linker" you mean things like /bin/ld, then linkers can garbage-collect sections that don't contain any referenced symbols, not the symbols themselves. I believe it doesn't matter if the symbols in sections are internal or external---that only matters for symbol resolution.> The current state of user-supplied library functions in LTO is that we > internalize library functions, add them to llvm.compiler.used so that > optimizations don’t modify them, and then optimizations incorrectly > modify them. LLVM *really* doesn't expect library functions to have > local linkage.I've read the original thread (the 3 emails), and I'm still not sure what the purpose of internalization is in the context of user-provided library functions. If the output of LTO is one giant object file, it could make some sense (since the assembler could potentially do the "symbol resolution"). Otherwise the problem is in telling the "ld" which definition of "printf" it needs to pick up, or asking the user not to link the program with libc (a bit of a questionable request). How are optimizations "incorrectly modifying" (user-provided) library functions? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Nick Kledzik
2014-Mar-09 00:03 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On Mar 8, 2014, at 8:59 AM, Duncan P. N. Exon Smith wrote:> +nick and rafael, who seem to a lot about linkage. > > I made the following claim on llmv-commits [1]: > > On 2014 Mar 3, at 15:01, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > >> Giving these functions internal linkage allows them to be dead-stripped. > > Is that even correct? > > This is the assumption I’ve been working under, but I’m not sure where I > got it from. It seems like the linker is free to dead-strip symbols > whether they’re internal or external.The darwin linker does not care about internal vs external when doing liveness analysis. But, by default when dylibs (DSOs) are created, all global symbols are marked live at the start of the analysis. This is not the case for main executables which just have main() and any initializers marked live initially.> > The current state of user-supplied library functions in LTO is that we > internalize library functions, add them to llvm.compiler.used so that > optimizations don’t modify them, and then optimizations incorrectly > modify them. LLVM *really* doesn't expect library functions to have > local linkage. >Since most code is dynamic that is probably why LLVM expects library functions to not be local. The case that revealed this bug was someone building a static binary. Most code these days is dynamic, so libLTO will not see the implementation of libcalls functions. Whereas with static binaries, it will always see libcall function implementations. I don't know if supply that (static vs dynamic) to libLTO would help.> And I’m not sure internal linkage is the right model anyway. > > I see two paths forward: > > 1. Add a new linkage type called “linker_internal”, which LLVM treats as > a type of non-local linkage, but gets emitted as internal. This > might be worth it if linkers don’t dead strip external symbols. > > 2. If linkers *do* dead strip external symbols, then we should not > internalize user-supplied library functions in -internalize. > > Do linkers dead strip symbols with external linkage?This is probably the wrong question. When linking main executables, the linker can dead strip external functions. But if LTO is used with a main executable, the linker will not tell libLTO to preserve the global symbols, so they will quickly be made internal. My question: will libLTO ever dead strip a non-internal function? If not, is that one of the reasons the internalize pass tries to make functions internal?> Any other reason to > prefer one path over the other? Is there another way? >From the darwin linker's perspective, if libLTO does not dead stripa libcall function implementation, that is OK. The linker runs another liveness analysis pass after the libLTO result and any other mach-o f files are merged. So any extra functions will still get deleted from the final output. -Nick
Rafael Espíndola
2014-Mar-10 14:48 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
>> Giving these functions internal linkage allows them to be dead-stripped. > > Is that even correct?By LLVM point of view, yes. It can drop linkonce and local (private* + internal) globals.> This is the assumption I’ve been working under, but I’m not sure where I > got it from. It seems like the linker is free to dead-strip symbols > whether they’re internal or external.The system linker, yes. LLVM knows it is not seeing the full picture with regards to external ones.> The current state of user-supplied library functions in LTO is that we > internalize library functions, add them to llvm.compiler.used so that > optimizations don’t modify them, and then optimizations incorrectly > modify them. LLVM *really* doesn't expect library functions to have > local linkage.Why not just fix the optimizations that are not handling llvm.compiler.used correctly?> And I’m not sure internal linkage is the right model anyway. > > I see two paths forward: > > 1. Add a new linkage type called “linker_internal”, which LLVM treats as > a type of non-local linkage, but gets emitted as internal. This > might be worth it if linkers don’t dead strip external symbols. > > 2. If linkers *do* dead strip external symbols, then we should not > internalize user-supplied library functions in -internalize. > > Do linkers dead strip symbols with external linkage? Any other reason to > prefer one path over the other? Is there another way?So, linkers have a better view of what is and is not used. They pass that information down to LLVM during link. The thing llvm has to be careful about are symbols it can introduce calls to (like memcpy). For those llvm.compiler.used should be fine. Using llvm.compiler.used and llvm.used is pretty annoying, and should probably be made into an easier to use attribute, but probably not folded into linkage. It is pretty orthogonal to other linkage properties. We can have a llvm.used that is weak_odr, external or internal for example. Cheers, Rafael
Duncan P. N. Exon Smith
2014-Mar-10 16:49 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On Mar 8, 2014, at 3:43 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> I believe it doesn't matter if the symbols in sections are internal or external---that only matters for symbol resolution.Given this...> I've read the original thread (the 3 emails), and I'm still not sure what the purpose of internalization is in the context of user-provided library functions....I’m not sure there is a point. The general idea is: unless the linker has told us to preserve a symbol, internalize it, exposing it to other optimizations (like -globalopt). However, for library functions, this breaks down because later passes insert calls (e.g., -instcombine converts printf => puts, and -codegenprepare converts llvm.memcpy => memcpy). So, add them to @llvm.compiler.used to protect them temporarily. If - the linker (e.g., /bin/ld) will delete unreferenced symbols (through -dead_strip, etc.) only if they have local linkage, or - LTO has a pass that will delete unreferenced symbols with local linkage *after* @llvm.compiler.used gets dropped (maybe we can add this), then there’s a point.> If the output of LTO is one giant object file, it could make some sense (since the assembler could potentially do the "symbol resolution”).The output of LTO *is* one giant object file, but the linker (e.g., /bin/ld) may be linking it with other object files.> Otherwise the problem is in telling the "ld" which definition of "printf" it needs to pick up,In the LTO API, the linker should call lto_codegen_add_must_preserve_symbol() on symbols it expects to come out the other side. Basically, the user of LTO decides which version of printf to pick up. If there are any calls to printf from outside the bitcode and the linker is using the one in the bitcode, then the one in the bitcode won’t be internalized.> or asking the user not to link the program with libc (a bit of a questionable request).A common case for user-supplied library functions is that users cannot link against libc, so they supply their own. This shouldn’t be the only supported case, though.> How are optimizations "incorrectly modifying" (user-provided) library functions?The current problem is that -instcombine will rename the function through Module::getOrInsertFunction(). getOrInsertFunction() chooses this path because the function has local linkage. However, the function is a member of @llvm.compiler.used, so it shouldn’t really be modified. I think in the normal case (non-LTO, where -internalize hasn’t run), Module::getOrInsertFunction() *should* take this path with functions that have local linkage. And it’s not trivial to check for membership in @llvm.compiler.used.
Duncan P. N. Exon Smith
2014-Mar-10 17:17 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On Mar 8, 2014, at 4:03 PM, Nick Kledzik <kledzik at apple.com> wrote:> The darwin linker does not care about internal vs external when doing > liveness analysis. But, by default when dylibs (DSOs) are created, > all global symbols are marked live at the start of the analysis. This > is not the case for main executables which just have main() and any > initializers marked live initially.This is interesting. So the distinction does matter for shared objects, but not for main executables. But LTO will be told to preserve all the global symbols anyway for shared objects.> Since most code is dynamic that is probably why LLVM expects library > functions to not be local. > > The case that revealed this bug was someone building a static binary. > Most code these days is dynamic, so libLTO will not see the implementation > of libcalls functions. Whereas with static binaries, it will always see libcall > function implementations. I don't know if supply that (static vs dynamic) > to libLTO would help.That’s an idea, but I don’t think it’s necessary. In static binaries, it’s important to optimize out unused user-supplied library functions for size reasons. But the linker is going to do that whether we make these functions local or global, so there’s no benefit to internalizing them. In dynamic binaries, users will link against a dynamic libc and are unlikely to provide their own library function implementations. So there’s no benefit to internalizing them here, either.>> Do linkers dead strip symbols with external linkage? > > This is probably the wrong question. When linking main executables, the > linker can dead strip external functions. But if LTO is used with a main > executable, the linker will not tell libLTO to preserve the global symbols, > so they will quickly be made internal. > > My question: will libLTO ever dead strip a non-internal function? If not, > is that one of the reasons the internalize pass tries to make functions > internal?Exactly. libLTO will only dead strip functions with local linkage (such as internal). During normal (non-LTO) optimizations, only functions with local linkage are safe to remove. LTO runs -internalize before other optimizations so that the rest of the optimizations don’t need to know that anything is different. (This is where I got the (apparently incorrect) idea that the linker would only dead-strip internal functions.)> From the darwin linker's perspective, if libLTO does not dead strip > a libcall function implementation, that is OK. The linker runs another > liveness analysis pass after the libLTO result and any other mach-o f > files are merged. So any extra functions will still get deleted from the > final output.Okay, great. I think then it’s safe to remove the @llvm.compiler.used hack that I went with originally, and just leave them external.
Duncan P. N. Exon Smith
2014-Mar-10 17:43 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On Mar 10, 2014, at 7:48 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> By LLVM point of view, yes. It can drop linkonce and local (private* + > internal) globals.[...]> The system linker, yes. LLVM knows it is not seeing the full picture > with regards to external ones.I mistakenly assumed the LLVM perspective applied also to the system linker (!).> Why not just fix the optimizations that are not handling > llvm.compiler.used correctly?That’s valuable work. However, for this use case, there doesn’t seem to be any benefit in relying on @llvm.compiler.used. If I’d properly understood how linkers work, I wouldn’t have complicated the flow in the first place (i.e., I think r194514 should have just blocked -internalize from giving these functions local linkage). I’m also not sure what Module::getOrInsertFunction() *should* do when it finds a function with local linkage in @llvm.compiler.used. I think its current behaviour is correct most of the time (moving functions with local linkage seems correct in the usual case), and as you point out below, checking for membership in @llvm.compiler.used is not cheap.> So, linkers have a better view of what is and is not used. They pass > that information down to LLVM during link. The thing llvm has to be > careful about are symbols it can introduce calls to (like memcpy). For > those llvm.compiler.used should be fine. > > Using llvm.compiler.used and llvm.used is pretty annoying, and should > probably be made into an easier to use attribute, but probably not > folded into linkage. It is pretty orthogonal to other linkage > properties. We can have a llvm.used that is weak_odr, external or > internal for example. > > Cheers, > Rafael
Apparently Analagous Threads
- [LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
- [LLVMdev] RFC: ThinLTO Symbol Linkage and Renaming
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
- [LLVMdev] LLVM linkage error - Program used external function 'foo' which could not be resolved!
- [LLVMdev] new LTO C interface