Duncan P. N. Exon Smith
2014-Mar-10 16:49 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On Mar 8, 2014, at 3:43 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> I believe it doesn't matter if the symbols in sections are internal or external---that only matters for symbol resolution.Given this...> I've read the original thread (the 3 emails), and I'm still not sure what the purpose of internalization is in the context of user-provided library functions....I’m not sure there is a point. The general idea is: unless the linker has told us to preserve a symbol, internalize it, exposing it to other optimizations (like -globalopt). However, for library functions, this breaks down because later passes insert calls (e.g., -instcombine converts printf => puts, and -codegenprepare converts llvm.memcpy => memcpy). So, add them to @llvm.compiler.used to protect them temporarily. If - the linker (e.g., /bin/ld) will delete unreferenced symbols (through -dead_strip, etc.) only if they have local linkage, or - LTO has a pass that will delete unreferenced symbols with local linkage *after* @llvm.compiler.used gets dropped (maybe we can add this), then there’s a point.> If the output of LTO is one giant object file, it could make some sense (since the assembler could potentially do the "symbol resolution”).The output of LTO *is* one giant object file, but the linker (e.g., /bin/ld) may be linking it with other object files.> Otherwise the problem is in telling the "ld" which definition of "printf" it needs to pick up,In the LTO API, the linker should call lto_codegen_add_must_preserve_symbol() on symbols it expects to come out the other side. Basically, the user of LTO decides which version of printf to pick up. If there are any calls to printf from outside the bitcode and the linker is using the one in the bitcode, then the one in the bitcode won’t be internalized.> or asking the user not to link the program with libc (a bit of a questionable request).A common case for user-supplied library functions is that users cannot link against libc, so they supply their own. This shouldn’t be the only supported case, though.> How are optimizations "incorrectly modifying" (user-provided) library functions?The current problem is that -instcombine will rename the function through Module::getOrInsertFunction(). getOrInsertFunction() chooses this path because the function has local linkage. However, the function is a member of @llvm.compiler.used, so it shouldn’t really be modified. I think in the normal case (non-LTO, where -internalize hasn’t run), Module::getOrInsertFunction() *should* take this path with functions that have local linkage. And it’s not trivial to check for membership in @llvm.compiler.used.
Krzysztof Parzyszek
2014-Mar-10 19:30 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
On 3/10/2014 11:49 AM, Duncan P. N. Exon Smith wrote:> > The general idea is: unless the linker has told us to preserve a symbol, > internalize it, exposing it to other optimizations (like -globalopt). > However, for library functions, this breaks down because later passes > insert calls (e.g., -instcombine converts printf => puts, and > -codegenprepare converts llvm.memcpy => memcpy). So, add them to > @llvm.compiler.used to protect them temporarily.I see. Thanks for the explanation. How about this: resolve symbols during LTO and detect which ones are used in the program, and which are not? It would probably require a lot more work in the LTO framework, but it has the benefit that we no longer need any "preserve" list merely to allow it to link, or "internalization". The only exception would be export lists for shared objects, but it would be a lot easier for the user to provide that, than to list all the functions referenced from other objects/libraries. To help with user-provided library functions we could develop a way for the user to specify "resolution preference", i.e. if "printf" is not explicitly defined, pick it up from /usr/lib/libc.a, otherwise ignore the definition from libc.a. This would only work for linking non-shared objects though. If the user wants to get the rest of the functions from libc, the replaced ones would need to be internally renamed to avoid conflicts with those in libc (the system linker could otherwise complain). Has anything like this been considered? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Rafael Espíndola
2014-Mar-10 20:01 UTC
[LLVMdev] [RFC] Linkage of user-supplied library functions in LTO
> I see. Thanks for the explanation. > > > How about this: resolve symbols during LTO and detect which ones are used in > the program, and which are not? It would probably require a lot more work > in the LTO framework, but it has the benefit that we no longer need any > "preserve" list merely to allow it to link, or "internalization". The only > exception would be export lists for shared objects, but it would be a lot > easier for the user to provide that, than to list all the functions > referenced from other objects/libraries. > > To help with user-provided library functions we could develop a way for the > user to specify "resolution preference", i.e. if "printf" is not explicitly > defined, pick it up from /usr/lib/libc.a, otherwise ignore the definition > from libc.a. > This would only work for linking non-shared objects though. If the user > wants to get the rest of the functions from libc, the replaced ones would > need to be internally renamed to avoid conflicts with those in libc (the > system linker could otherwise complain).During LTO time we don't know exactly which functions will be used. If memcpy is used or not depends on which backend we are using for example. The only reliable way would be iterate codegen, making not at each step which new undefined references shows up, which is an overkill. The current strategy of just knowing which symbols llvm *might* use seems appropriate, we just to fix llvm to always respect it. Cheers, Rafael