thr3ads.net - llvm dev - [LLVMdev] [lld] Undefined symbols postprocessing [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Denis Protivensky

2015-Feb-18 09:38 UTC

[LLVMdev] [lld] Undefined symbols postprocessing

Hi everyone,

In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE) 
during
static linking because they may be used by relocations (R_ARM_TLS_IE32) or
by some other stuff like STT_GNU_IFUNC symbols.
The problem is that now symbols are added in a declarative way by
specifying in ExecutableWriter::addDefaultAtoms() override.
At that stage, there's no way to determine if additional symbols are 
required.
But libraries providing optimizations like STT_GNU_IFUNC
(glibc, for example) expect the GOT symbol to be defined, so the linking 
process
fails in Resolver::resolve() if the symbol is not found.

I propose to add the ability to ignore undefined symbols during initial
resolution, and then postprocess only those undefines for the second time
after the pass manager execution.

Technically, this shouldn't be a problem:
- there will be a new option in the linking context that should signal
that the postprocessing of undefined symbols should be performed.
- if postprocessing option is set, newly added symbols will be collected
in the MergedFile returned by the Resolver, and then only those new symbols
will take part in the resolution process very similar to what 
Resolver::resolve() does.
- available implementations will not break and keep working without use of
postprocessing feature.

So my proposal is to move from the declarative style towards imperative
and more flexible approach. Of course, there's a downside as the code
loses some of its regularity and becomes more volatile, but in the end -
we have tests to cover such things and ensure everything works as expected.

Any ideas?

- Denis Protivensky.

Joerg Sonnenberger

2015-Feb-18 13:45 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

On Wed, Feb 18, 2015 at 01:38:15AM -0800, Denis Protivensky
wrote:> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are 
> required.
Correct, this is actually quite a bit more fundamental. If you check
various test cases, you will find symbol table polllution with unused
items like __tls_get_addr.
> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.
Do you want to do that before or after dead code elimination?

Joerg

Shankar Easwaran

2015-Feb-18 16:31 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

On 2/18/2015 3:38 AM, Denis Protivensky wrote:> Hi everyone,
>
> In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
> during
> static linking because they may be used by relocations (R_ARM_TLS_IE32) or
> by some other stuff like STT_GNU_IFUNC symbols.
> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are
> required.
> But libraries providing optimizations like STT_GNU_IFUNC
> (glibc, for example) expect the GOT symbol to be defined, so the linking
> process
> fails in Resolver::resolve() if the symbol is not found.
>
> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.I came across this same problem, and was planning on adding a 
notifyUndefinedSymbol to the LinkingContext, if the linker wants to add 
a defined symbol and coalesce it, it would be possible.

Do you think this will work for your case too ?
>
> Technically, this shouldn't be a problem:
> - there will be a new option in the linking context that should signal
> that the postprocessing of undefined symbols should be performed.
> - if postprocessing option is set, newly added symbols will be collected
> in the MergedFile returned by the Resolver, and then only those new symbols
> will take part in the resolution process very similar to what
> Resolver::resolve() does.
> - available implementations will not break and keep working without use of
> postprocessing feature.
>
> So my proposal is to move from the declarative style towards imperative
> and more flexible approach. Of course, there's a downside as the code
> loses some of its regularity and becomes more volatile, but in the end -
> we have tests to cover such things and ensure everything works as expected.
>
> Any ideas?
>
> - Denis Protivensky.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

Denis Protivensky

2015-Feb-19 09:58 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

Joerg:> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.
Do you want to do that before or after dead code elimination?
I think dead code elimination should be performed after all possible object code
modifications done by lld. Therefore, it should be done after undefines'
postprocessing as well.

Shankar:> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.I came across this same problem, and was planning on adding a
notifyUndefinedSymbol to the LinkingContext, if the linker wants to add
a defined symbol and coalesce it, it would be possible.

Do you think this will work for your case too ?
With this option, I don't see:
- how to postpone processing and reaction on undefines. If the callback is
called from within Resolver::resolve(), you should react on it immediately,
because otherwise the code will still fail in Resolver::resolve().
- how to know if a symbol is needed within the callback body. The need of any
symbol is determined in some other place. So I need to keep a sort of indication
(boolean flags, whatever) to know which symbols are really needed.
- the exact interface of notifyUndefinedSymbol callback. If it receives
`StringRef` name of the undefined symbol, what reaction should be? Should it
return new symbols to add back to the caller as `const Atom*`?

Thanks,
  Denis.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/9a5c09dc/attachment.html>

Rui Ueyama

2015-Feb-19 20:46 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <
dprotivensky at accesssoftek.com> wrote:
> Hi everyone,
>
> In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
> during
> static linking because they may be used by relocations (R_ARM_TLS_IE32) or
> by some other stuff like STT_GNU_IFUNC symbols.
> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are
> required.
> But libraries providing optimizations like STT_GNU_IFUNC
> (glibc, for example) expect the GOT symbol to be defined, so the linking
> process
> fails in Resolver::resolve() if the symbol is not found.
>
I don't know if this is directly applicable to your problem, but for
PE/COFF I needed to add symbols conditionally. If you have a function
*func* and
if there's a reference to __imp_*func*, linker needs to create a data
containing the address of func as __imp_func content. It's rarely used, so
I wanted to create the __imp_ atom only when there's an unresolved
reference to that symbol.

What I did at that moment is to define a (virtual) library file which
dynamically creates an atom. The virtual library file is added at end of
the input file list, and if the core linker looks it up for a symbol
starting __imp_, the library creates an object file containing the symbol
on the fly and returns it.

My experience of doing that is that worked but might have been too tricky.
If this trick is directly applicable to your problem, you may want to do
that. If not, I'm perhaps okay with your suggestion (although I didn't
think about that hard yet.)

Thanks

> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.
>
> Technically, this shouldn't be a problem:
> - there will be a new option in the linking context that should signal
> that the postprocessing of undefined symbols should be performed.
> - if postprocessing option is set, newly added symbols will be collected
> in the MergedFile returned by the Resolver, and then only those new symbols
> will take part in the resolution process very similar to what
> Resolver::resolve() does.
> - available implementations will not break and keep working without use of
> postprocessing feature.
>
> So my proposal is to move from the declarative style towards imperative
> and more flexible approach. Of course, there's a downside as the code
> loses some of its regularity and becomes more volatile, but in the end -
> we have tests to cover such things and ensure everything works as expected.
>
> Any ideas?
>
> - Denis Protivensky.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/26bbdc98/attachment.html>

Rui Ueyama

2015-Feb-20 19:20 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <
dprotivensky at accesssoftek.com> wrote:
> Hi everyone,
>
> In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
> during
> static linking because they may be used by relocations (R_ARM_TLS_IE32) or
> by some other stuff like STT_GNU_IFUNC symbols.
> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are
> required.
> But libraries providing optimizations like STT_GNU_IFUNC
> (glibc, for example) expect the GOT symbol to be defined, so the linking
> process
> fails in Resolver::resolve() if the symbol is not found.
>
> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.
>
> Technically, this shouldn't be a problem:
> - there will be a new option in the linking context that should signal
> that the postprocessing of undefined symbols should be performed.
> - if postprocessing option is set, newly added symbols will be collected
> in the MergedFile returned by the Resolver, and then only those new symbols
> will take part in the resolution process very similar to what
> Resolver::resolve() does.
> - available implementations will not break and keep working without use of
> postprocessing feature.
>
I'm fine with the basic idea of allowing undefined symbols in the first
resolver pass. A few questions about the implementation.

- How do you know which atom is newly added and which is not? Once an atom
is added to a MutableFile, there's no easy way to recognize that, I guess.

- Does the second resolver pass need to be run after all other passes? Why
don't you run the resolver once, and then call some externally-given
function (from the resolver) to get a list of atoms that needs to be added
to the result, and then resolve again, all inside the resolver?

So my proposal is to move from the declarative style towards
imperative> and more flexible approach. Of course, there's a downside as the code
> loses some of its regularity and becomes more volatile, but in the end -
> we have tests to cover such things and ensure everything works as expected.
>
> Any ideas?
>
> - Denis Protivensky.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150220/1e46e223/attachment.html>

Denis Protivensky

2015-Feb-23 08:26 UTC

head link

[LLVMdev] [lld] Undefined symbols postprocessing

Rui, see inline.

On 02/20/2015 10:20 PM, Rui Ueyama wrote:
On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <dprotivensky at
accesssoftek.com<mailto:dprotivensky at accesssoftek.com>> wrote:
Hi everyone,

In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
during
static linking because they may be used by relocations (R_ARM_TLS_IE32) or
by some other stuff like STT_GNU_IFUNC symbols.
The problem is that now symbols are added in a declarative way by
specifying in ExecutableWriter::addDefaultAtoms() override.
At that stage, there's no way to determine if additional symbols are
required.
But libraries providing optimizations like STT_GNU_IFUNC
(glibc, for example) expect the GOT symbol to be defined, so the linking
process
fails in Resolver::resolve() if the symbol is not found.

I propose to add the ability to ignore undefined symbols during initial
resolution, and then postprocess only those undefines for the second time
after the pass manager execution.

Technically, this shouldn't be a problem:
- there will be a new option in the linking context that should signal
that the postprocessing of undefined symbols should be performed.
- if postprocessing option is set, newly added symbols will be collected
in the MergedFile returned by the Resolver, and then only those new symbols
will take part in the resolution process very similar to what
Resolver::resolve() does.
- available implementations will not break and keep working without use of
postprocessing feature.

I'm fine with the basic idea of allowing undefined symbols in the first
resolver pass. A few questions about the implementation.

- How do you know which atom is newly added and which is not? Once an atom is
added to a MutableFile, there's no easy way to recognize that, I guess.
The Resolver returns Resolver::MergedFile type as a result of call to resolve(),
and we can override its addAtom method to put newly added atoms to a special
separate collection which then may be examined for undefines.

- Does the second resolver pass need to be run after all other passes? Why
don't you run the resolver once, and then call some externally-given
function (from the resolver) to get a list of atoms that needs to be added to
the result, and then resolve again, all inside the resolver?
Since we have a chance to determine newly added atoms after resolution, I
don't see why to complicate the process with external functions and
additional call dependencies. It all can be done by adding second resolve()-like
function call in the Driver::link() after PassManager run.

So my proposal is to move from the declarative style towards imperative
and more flexible approach. Of course, there's a downside as the code
loses some of its regularity and becomes more volatile, but in the end -
we have tests to cover such things and ensure everything works as expected.

Any ideas?

- Denis Protivensky.

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150223/9890fe03/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Feb 2015 - [LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

[LLVMdev] [lld] Undefined symbols postprocessing

Maybe Matching Threads