Hi everyone, In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) during static linking because they may be used by relocations (R_ARM_TLS_IE32) or by some other stuff like STT_GNU_IFUNC symbols. The problem is that now symbols are added in a declarative way by specifying in ExecutableWriter::addDefaultAtoms() override. At that stage, there's no way to determine if additional symbols are required. But libraries providing optimizations like STT_GNU_IFUNC (glibc, for example) expect the GOT symbol to be defined, so the linking process fails in Resolver::resolve() if the symbol is not found. I propose to add the ability to ignore undefined symbols during initial resolution, and then postprocess only those undefines for the second time after the pass manager execution. Technically, this shouldn't be a problem: - there will be a new option in the linking context that should signal that the postprocessing of undefined symbols should be performed. - if postprocessing option is set, newly added symbols will be collected in the MergedFile returned by the Resolver, and then only those new symbols will take part in the resolution process very similar to what Resolver::resolve() does. - available implementations will not break and keep working without use of postprocessing feature. So my proposal is to move from the declarative style towards imperative and more flexible approach. Of course, there's a downside as the code loses some of its regularity and becomes more volatile, but in the end - we have tests to cover such things and ensure everything works as expected. Any ideas? - Denis Protivensky.
On Wed, Feb 18, 2015 at 01:38:15AM -0800, Denis Protivensky wrote:> The problem is that now symbols are added in a declarative way by > specifying in ExecutableWriter::addDefaultAtoms() override. > At that stage, there's no way to determine if additional symbols are > required.Correct, this is actually quite a bit more fundamental. If you check various test cases, you will find symbol table polllution with unused items like __tls_get_addr.> I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution.Do you want to do that before or after dead code elimination? Joerg
On 2/18/2015 3:38 AM, Denis Protivensky wrote:> Hi everyone, > > In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) > during > static linking because they may be used by relocations (R_ARM_TLS_IE32) or > by some other stuff like STT_GNU_IFUNC symbols. > The problem is that now symbols are added in a declarative way by > specifying in ExecutableWriter::addDefaultAtoms() override. > At that stage, there's no way to determine if additional symbols are > required. > But libraries providing optimizations like STT_GNU_IFUNC > (glibc, for example) expect the GOT symbol to be defined, so the linking > process > fails in Resolver::resolve() if the symbol is not found. > > I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution.I came across this same problem, and was planning on adding a notifyUndefinedSymbol to the LinkingContext, if the linker wants to add a defined symbol and coalesce it, it would be possible. Do you think this will work for your case too ?> > Technically, this shouldn't be a problem: > - there will be a new option in the linking context that should signal > that the postprocessing of undefined symbols should be performed. > - if postprocessing option is set, newly added symbols will be collected > in the MergedFile returned by the Resolver, and then only those new symbols > will take part in the resolution process very similar to what > Resolver::resolve() does. > - available implementations will not break and keep working without use of > postprocessing feature. > > So my proposal is to move from the declarative style towards imperative > and more flexible approach. Of course, there's a downside as the code > loses some of its regularity and becomes more volatile, but in the end - > we have tests to cover such things and ensure everything works as expected. > > Any ideas? > > - Denis Protivensky. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
Joerg:> I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution.Do you want to do that before or after dead code elimination? I think dead code elimination should be performed after all possible object code modifications done by lld. Therefore, it should be done after undefines' postprocessing as well. Shankar:> I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution.I came across this same problem, and was planning on adding a notifyUndefinedSymbol to the LinkingContext, if the linker wants to add a defined symbol and coalesce it, it would be possible. Do you think this will work for your case too ? With this option, I don't see: - how to postpone processing and reaction on undefines. If the callback is called from within Resolver::resolve(), you should react on it immediately, because otherwise the code will still fail in Resolver::resolve(). - how to know if a symbol is needed within the callback body. The need of any symbol is determined in some other place. So I need to keep a sort of indication (boolean flags, whatever) to know which symbols are really needed. - the exact interface of notifyUndefinedSymbol callback. If it receives `StringRef` name of the undefined symbol, what reaction should be? Should it return new symbols to add back to the caller as `const Atom*`? Thanks, Denis. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/9a5c09dc/attachment.html>
On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky < dprotivensky at accesssoftek.com> wrote:> Hi everyone, > > In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) > during > static linking because they may be used by relocations (R_ARM_TLS_IE32) or > by some other stuff like STT_GNU_IFUNC symbols. > The problem is that now symbols are added in a declarative way by > specifying in ExecutableWriter::addDefaultAtoms() override. > At that stage, there's no way to determine if additional symbols are > required. > But libraries providing optimizations like STT_GNU_IFUNC > (glibc, for example) expect the GOT symbol to be defined, so the linking > process > fails in Resolver::resolve() if the symbol is not found. >I don't know if this is directly applicable to your problem, but for PE/COFF I needed to add symbols conditionally. If you have a function *func* and if there's a reference to __imp_*func*, linker needs to create a data containing the address of func as __imp_func content. It's rarely used, so I wanted to create the __imp_ atom only when there's an unresolved reference to that symbol. What I did at that moment is to define a (virtual) library file which dynamically creates an atom. The virtual library file is added at end of the input file list, and if the core linker looks it up for a symbol starting __imp_, the library creates an object file containing the symbol on the fly and returns it. My experience of doing that is that worked but might have been too tricky. If this trick is directly applicable to your problem, you may want to do that. If not, I'm perhaps okay with your suggestion (although I didn't think about that hard yet.) Thanks> I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution. > > Technically, this shouldn't be a problem: > - there will be a new option in the linking context that should signal > that the postprocessing of undefined symbols should be performed. > - if postprocessing option is set, newly added symbols will be collected > in the MergedFile returned by the Resolver, and then only those new symbols > will take part in the resolution process very similar to what > Resolver::resolve() does. > - available implementations will not break and keep working without use of > postprocessing feature. > > So my proposal is to move from the declarative style towards imperative > and more flexible approach. Of course, there's a downside as the code > loses some of its regularity and becomes more volatile, but in the end - > we have tests to cover such things and ensure everything works as expected. > > Any ideas? > > - Denis Protivensky. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/26bbdc98/attachment.html>
On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky < dprotivensky at accesssoftek.com> wrote:> Hi everyone, > > In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) > during > static linking because they may be used by relocations (R_ARM_TLS_IE32) or > by some other stuff like STT_GNU_IFUNC symbols. > The problem is that now symbols are added in a declarative way by > specifying in ExecutableWriter::addDefaultAtoms() override. > At that stage, there's no way to determine if additional symbols are > required. > But libraries providing optimizations like STT_GNU_IFUNC > (glibc, for example) expect the GOT symbol to be defined, so the linking > process > fails in Resolver::resolve() if the symbol is not found. > > I propose to add the ability to ignore undefined symbols during initial > resolution, and then postprocess only those undefines for the second time > after the pass manager execution. > > Technically, this shouldn't be a problem: > - there will be a new option in the linking context that should signal > that the postprocessing of undefined symbols should be performed. > - if postprocessing option is set, newly added symbols will be collected > in the MergedFile returned by the Resolver, and then only those new symbols > will take part in the resolution process very similar to what > Resolver::resolve() does. > - available implementations will not break and keep working without use of > postprocessing feature. >I'm fine with the basic idea of allowing undefined symbols in the first resolver pass. A few questions about the implementation. - How do you know which atom is newly added and which is not? Once an atom is added to a MutableFile, there's no easy way to recognize that, I guess. - Does the second resolver pass need to be run after all other passes? Why don't you run the resolver once, and then call some externally-given function (from the resolver) to get a list of atoms that needs to be added to the result, and then resolve again, all inside the resolver? So my proposal is to move from the declarative style towards imperative> and more flexible approach. Of course, there's a downside as the code > loses some of its regularity and becomes more volatile, but in the end - > we have tests to cover such things and ensure everything works as expected. > > Any ideas? > > - Denis Protivensky. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150220/1e46e223/attachment.html>
Rui, see inline. On 02/20/2015 10:20 PM, Rui Ueyama wrote: On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <dprotivensky at accesssoftek.com<mailto:dprotivensky at accesssoftek.com>> wrote: Hi everyone, In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) during static linking because they may be used by relocations (R_ARM_TLS_IE32) or by some other stuff like STT_GNU_IFUNC symbols. The problem is that now symbols are added in a declarative way by specifying in ExecutableWriter::addDefaultAtoms() override. At that stage, there's no way to determine if additional symbols are required. But libraries providing optimizations like STT_GNU_IFUNC (glibc, for example) expect the GOT symbol to be defined, so the linking process fails in Resolver::resolve() if the symbol is not found. I propose to add the ability to ignore undefined symbols during initial resolution, and then postprocess only those undefines for the second time after the pass manager execution. Technically, this shouldn't be a problem: - there will be a new option in the linking context that should signal that the postprocessing of undefined symbols should be performed. - if postprocessing option is set, newly added symbols will be collected in the MergedFile returned by the Resolver, and then only those new symbols will take part in the resolution process very similar to what Resolver::resolve() does. - available implementations will not break and keep working without use of postprocessing feature. I'm fine with the basic idea of allowing undefined symbols in the first resolver pass. A few questions about the implementation. - How do you know which atom is newly added and which is not? Once an atom is added to a MutableFile, there's no easy way to recognize that, I guess. The Resolver returns Resolver::MergedFile type as a result of call to resolve(), and we can override its addAtom method to put newly added atoms to a special separate collection which then may be examined for undefines. - Does the second resolver pass need to be run after all other passes? Why don't you run the resolver once, and then call some externally-given function (from the resolver) to get a list of atoms that needs to be added to the result, and then resolve again, all inside the resolver? Since we have a chance to determine newly added atoms after resolution, I don't see why to complicate the process with external functions and additional call dependencies. It all can be done by adding second resolve()-like function call in the Driver::link() after PassManager run. So my proposal is to move from the declarative style towards imperative and more flexible approach. Of course, there's a downside as the code loses some of its regularity and becomes more volatile, but in the end - we have tests to cover such things and ensure everything works as expected. Any ideas? - Denis Protivensky. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150223/9890fe03/attachment.html>