On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at google.com> wrote:> > On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > LLD treats any symbol referenced from a linker script as a GC root, which makes sense. Unfortunately, it also processes --defsym as a linker script fragment internally, so all target symbols of a --defsym also get treated as GC roots (i.e., if you have something like --defsym SRC=TGT, TGT will become a GC root). I believe this to be unnecessary for defsym specifically, since you're just aliasing a symbol, and if the original or aliased symbols are referenced from anywhere, the symbol's section will get preserved anyway. (There's also cases where the defsym target can be an expression instead of just a symbol name, which I admittedly haven't thought about too hard, but I believe the same logic should hold in terms of any needed sections getting preserved regardless.) I want to change defsym targets specifically to not be considered as GC roots, so that they can be dead code eliminated. Does anyone foresee any issues with this? > > % cat a.s > .globl _start, foo, bar > .text; _start: movabs $d, %rax > .section .text_foo,"ax"; foo: ret > .section .text_bar,"ax"; bar: nop > % as a.s -o a.o > > % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is retained > % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is retained > % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo nor > .text_bar is retained > % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a => Neither > .text_foo nor .text_bar is retained; lld will retain .text_foo. > > For --defsym from=an_expression_with_to, GNU ld appears to add a > reference from 'from' to 'to'. lld's behavior > (https://reviews.llvm.org/D34195) is more conservative. > > If we stop treating script->referencedSymbols as GC roots, > instructions like `movabs $d, %rax` will no longer be able to access > the intended section. We can tweak our behavior to be like GNU ld, but > the additional complexity may not be worthwhile.I think it would be a step too far for defsym symbol=expression to have no effect on GC. I'd expect that something like defsym foo=bar is used because some live code refers to foo, but does not refer to bar, so ideally we'd like defsym foo=bar to keep bar live. I've seen this idiom used in embedded systems in the presence of binary only libraries. It is true that the programmer can always go the extra mile to force bar to be marked live, however I think the expectation would be defsym foo=bar would do it. I think the GNU ld behaviour is reasonable. If nothing refers to either foo or bar then there is no reason to mark them live. On the implementation cost-benefit trade off I guess we won't know until there is a prototype, and some idea of what implementing it will save on a real example. Peter
Shoaib Meenai via llvm-dev
2019-Dec-04 16:51 UTC
[llvm-dev] GC for defsym'd symbols in LLD
I completely agree that --defsym foo=bar should keep bar (or more precisely the
section containing bar) alive if foo is referenced.
My mental model of how --defsym foo=bar behaves is that (assuming bar is a
defined symbol) we create a symbol foo that points to the same location as bar
(as in it has the same section + address within that section). Any reference to
foo should therefore prevent that section from getting garbage collected. bar
doesn't need to enter the picture directly (and we don't need to store
any sort of explicit link between foo and bar); its section getting preserved
just naturally falls out of foo getting preserved.
For example, in Fāng-ruì's movabs example, the symbol _start (which is the
entry point and therefore a GC root) will have a relocation against d, so d will
be kept alive too. With --defsym d=foo, the symbol d should point to the same
section as foo, so that section will be preserved; it doesn't matter if the
symbol foo itself is preserved (unless there are other non-dead references to
it, of course, but then those references should cause foo to be marked alive as
well).
I haven't actually studied how LLD models a defsym though, so my mental
model might be way off. I apologize for not having done so before replying, but
it'll be at least a few days before I have the chance to get to that. If my
mental model is accurate, preserving the needed section for defsym should just
fall out naturally from it (without needing to give the target of a defsym any
special treatment), but if not, the whole thing might be much more complicated
and not worth it.
On 12/4/19, 1:35 AM, "Peter Smith" <peter.smith at linaro.org>
wrote:
On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at google.com>
wrote:
>
> On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > LLD treats any symbol referenced from a linker script as a GC
root, which makes sense. Unfortunately, it also processes --defsym as a linker
script fragment internally, so all target symbols of a --defsym also get treated
as GC roots (i.e., if you have something like --defsym SRC=TGT, TGT will become
a GC root). I believe this to be unnecessary for defsym specifically, since
you're just aliasing a symbol, and if the original or aliased symbols are
referenced from anywhere, the symbol's section will get preserved anyway.
(There's also cases where the defsym target can be an expression instead of
just a symbol name, which I admittedly haven't thought about too hard, but I
believe the same logic should hold in terms of any needed sections getting
preserved regardless.) I want to change defsym targets specifically to not be
considered as GC roots, so that they can be dead code eliminated. Does anyone
foresee any issues with this?
>
> % cat a.s
> .globl _start, foo, bar
> .text; _start: movabs $d, %rax
> .section .text_foo,"ax"; foo: ret
> .section .text_bar,"ax"; bar: nop
> % as a.s -o a.o
>
> % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is
retained
> % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is
retained
> % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo
nor
> .text_bar is retained
> % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a =>
Neither
> .text_foo nor .text_bar is retained; lld will retain .text_foo.
>
> For --defsym from=an_expression_with_to, GNU ld appears to add a
> reference from 'from' to 'to'. lld's behavior
>
(https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D34195&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=MpiPCWMhZJFZg0s-e1lhHtcCr-BLzG6zbJ44d0isoMc&s=7j_hrwm8LBMCPNgU_IXbhye_YKPQFgGJlU3YMAtWGLE&e=
) is more conservative.
>
> If we stop treating script->referencedSymbols as GC roots,
> instructions like `movabs $d, %rax` will no longer be able to access
> the intended section. We can tweak our behavior to be like GNU ld, but
> the additional complexity may not be worthwhile.
I think it would be a step too far for defsym symbol=expression to
have no effect on GC. I'd expect that something like defsym foo=bar is
used because some live code refers to foo, but does not refer to bar,
so ideally we'd like defsym foo=bar to keep bar live. I've seen this
idiom used in embedded systems in the presence of binary only
libraries. It is true that the programmer can always go the extra mile
to force bar to be marked live, however I think the expectation would
be defsym foo=bar would do it.
I think the GNU ld behaviour is reasonable. If nothing refers to
either foo or bar then there is no reason to mark them live. On the
implementation cost-benefit trade off I guess we won't know until
there is a prototype, and some idea of what implementing it will save
on a real example.
Peter
Fāng-ruì Sòng via llvm-dev
2019-Dec-05 22:17 UTC
[llvm-dev] GC for defsym'd symbols in LLD
I have made some further investigation. My conclusion is that GNU ld does
not do better than lld. Making the --defsym behavior ideal is difficult in
the current framework.
GNU ld has some unintended behaviors.
ld.bfd a.o --defsym 'd=foo' --gc-sections -o a => GNU ld retains
.text_foo
ld.bfd a.o --defsym 'd=foo+3' --gc-sections -o a => GNU ld drops
.text_foo
ld.bfd a.o --defsym 'd=bar-bar+foo' --gc-sections -o a => GNU ld
drops
.text_foo
I traced its logic under a debugger. Here is the stack trace:
ld/ldlang.c:lang_gc_sections
bfd/elflink.c:bfd_elf_gc_sections
bfd/elflink.c:_bfd_elf_gc_mark_reloc
...
bfd/elflink.c:_bfd_elf_gc_mark_hook
asection *
_bfd_elf_gc_mark_hook (asection *sec,
...
case bfd_link_hash_defined:
case bfd_link_hash_defweak:
// It points to .text_foo for --defsym d=foo, but *ABS* for --defsym
d=bar-bar+foo or --defsym d=foo+3
return h->root.u.def.section;
GNU ld evaluates symbol assignments in many passes, the representation of a
symbol (section+offset) can vary among passes.
In the GC pass, its rule only works for simple expressions like --defsym
d=foo, but not any slightly complex expressions.
In lld, it would be difficult to drop the following rule in MarkLive.cpp:
for (StringRef s : script->referencedSymbols)
markSymbol(symtab->find(s));
The issue can be demonstrated by the following call tree:
LinkerDriver::link
markLive
...
resolveReloc
// Defined::section is nullptr for `d` because the assignment d=foo
hasn't been evaluated yet.
writeResult
Writer<ELFT>::run
Writer<ELFT>::finalizeSections
LinkerScript::processSymbolAssignments
// Symbol section+offset are evaluated here.
It seems that github issues may be a good place to record the problem. I
just created https://github.com/llvm/llvm-project/issues/52
I wanted to mark it low priority, but there is no such label.
On Wed, Dec 4, 2019 at 8:51 AM Shoaib Meenai <smeenai at fb.com> wrote:
> I completely agree that --defsym foo=bar should keep bar (or more
> precisely the section containing bar) alive if foo is referenced.
>
> My mental model of how --defsym foo=bar behaves is that (assuming bar is a
> defined symbol) we create a symbol foo that points to the same location as
> bar (as in it has the same section + address within that section). Any
> reference to foo should therefore prevent that section from getting garbage
> collected. bar doesn't need to enter the picture directly (and we
don't
> need to store any sort of explicit link between foo and bar); its section
> getting preserved just naturally falls out of foo getting preserved.
>
> For example, in Fāng-ruì's movabs example, the symbol _start (which is
the
> entry point and therefore a GC root) will have a relocation against d, so d
> will be kept alive too. With --defsym d=foo, the symbol d should point to
> the same section as foo, so that section will be preserved; it doesn't
> matter if the symbol foo itself is preserved (unless there are other
> non-dead references to it, of course, but then those references should
> cause foo to be marked alive as well).
>
> I haven't actually studied how LLD models a defsym though, so my mental
> model might be way off. I apologize for not having done so before replying,
> but it'll be at least a few days before I have the chance to get to
that.
> If my mental model is accurate, preserving the needed section for defsym
> should just fall out naturally from it (without needing to give the target
> of a defsym any special treatment), but if not, the whole thing might be
> much more complicated and not worth it.
>
> On 12/4/19, 1:35 AM, "Peter Smith" <peter.smith at
linaro.org> wrote:
>
> On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at
google.com> wrote:
> >
> > On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > LLD treats any symbol referenced from a linker script as a GC
> root, which makes sense. Unfortunately, it also processes --defsym as a
> linker script fragment internally, so all target symbols of a --defsym also
> get treated as GC roots (i.e., if you have something like --defsym SRC=TGT,
> TGT will become a GC root). I believe this to be unnecessary for defsym
> specifically, since you're just aliasing a symbol, and if the original
or
> aliased symbols are referenced from anywhere, the symbol's section will
get
> preserved anyway. (There's also cases where the defsym target can be an
> expression instead of just a symbol name, which I admittedly haven't
> thought about too hard, but I believe the same logic should hold in terms
> of any needed sections getting preserved regardless.) I want to change
> defsym targets specifically to not be considered as GC roots, so that they
> can be dead code eliminated. Does anyone foresee any issues with this?
> >
> > % cat a.s
> > .globl _start, foo, bar
> > .text; _start: movabs $d, %rax
> > .section .text_foo,"ax"; foo: ret
> > .section .text_bar,"ax"; bar: nop
> > % as a.s -o a.o
> >
> > % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is
> retained
> > % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is
> retained
> > % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither
.text_foo nor
> > .text_bar is retained
> > % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a =>
> Neither
> > .text_foo nor .text_bar is retained; lld will retain .text_foo.
> >
> > For --defsym from=an_expression_with_to, GNU ld appears to add a
> > reference from 'from' to 'to'. lld's behavior
> > (
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D34195&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=MpiPCWMhZJFZg0s-e1lhHtcCr-BLzG6zbJ44d0isoMc&s=7j_hrwm8LBMCPNgU_IXbhye_YKPQFgGJlU3YMAtWGLE&e>
) is more conservative.
> >
> > If we stop treating script->referencedSymbols as GC roots,
> > instructions like `movabs $d, %rax` will no longer be able to
access
> > the intended section. We can tweak our behavior to be like GNU ld,
> but
> > the additional complexity may not be worthwhile.
>
> I think it would be a step too far for defsym symbol=expression to
> have no effect on GC. I'd expect that something like defsym foo=bar
is
> used because some live code refers to foo, but does not refer to bar,
> so ideally we'd like defsym foo=bar to keep bar live. I've seen
this
> idiom used in embedded systems in the presence of binary only
> libraries. It is true that the programmer can always go the extra mile
> to force bar to be marked live, however I think the expectation would
> be defsym foo=bar would do it.
>
> I think the GNU ld behaviour is reasonable. If nothing refers to
> either foo or bar then there is no reason to mark them live. On the
> implementation cost-benefit trade off I guess we won't know until
> there is a prototype, and some idea of what implementing it will save
> on a real example.
>
> Peter
>
>
>
--
宋方睿
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191205/1cf7f3cd/attachment.html>