thr3ads.net - llvm dev - [llvm-dev] GC for defsym'd symbols in LLD [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Shoaib Meenai via llvm-dev

2019-Dec-04 03:02 UTC

[llvm-dev] GC for defsym'd symbols in LLD

LLD treats any symbol referenced from a linker script as a GC root, which makes
sense. Unfortunately, it also processes --defsym as a linker script fragment
internally, so all target symbols of a --defsym also get treated as GC roots
(i.e., if you have something like --defsym SRC=TGT, TGT will become a GC root).
I believe this to be unnecessary for defsym specifically, since you're just
aliasing a symbol, and if the original or aliased symbols are referenced from
anywhere, the symbol's section will get preserved anyway. (There's also
cases where the defsym target can be an expression instead of just a symbol
name, which I admittedly haven't thought about too hard, but I believe the
same logic  should hold in terms of any needed sections getting preserved
regardless.) I want to change defsym targets specifically to not be considered
as GC roots, so that they can be dead code eliminated. Does anyone foresee any
issues with this?

Thanks,
Shoaib

Fāng-ruì Sòng via llvm-dev

2019-Dec-04 07:05 UTC

head link

[llvm-dev] GC for defsym'd symbols in LLD

On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> LLD treats any symbol referenced from a linker script as a GC root, which
makes sense. Unfortunately, it also processes --defsym as a linker script
fragment internally, so all target symbols of a --defsym also get treated as GC
roots (i.e., if you have something like --defsym SRC=TGT, TGT will become a GC
root). I believe this to be unnecessary for defsym specifically, since
you're just aliasing a symbol, and if the original or aliased symbols are
referenced from anywhere, the symbol's section will get preserved anyway.
(There's also cases where the defsym target can be an expression instead of
just a symbol name, which I admittedly haven't thought about too hard, but I
believe the same logic  should hold in terms of any needed sections getting
preserved regardless.) I want to change defsym targets specifically to not be
considered as GC roots, so that they can be dead code eliminated. Does anyone
foresee any issues with this?
% cat a.s
.globl _start, foo, bar
.text; _start: movabs $d, %rax
.section .text_foo,"ax"; foo: ret
.section .text_bar,"ax"; bar: nop
% as a.s -o a.o

% ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is retained
% ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is retained
% ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo nor
.text_bar is retained
% ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a => Neither
.text_foo nor .text_bar is retained; lld will retain .text_foo.

For --defsym from=an_expression_with_to, GNU ld appears to add a
reference from 'from' to 'to'. lld's behavior
(https://reviews.llvm.org/D34195) is more conservative.

If we stop treating script->referencedSymbols as GC roots,
instructions like `movabs $d, %rax` will no longer be able to access
the intended section. We can tweak our behavior to be like GNU ld, but
the additional complexity may not be worthwhile.

Peter Smith via llvm-dev

2019-Dec-04 09:35 UTC

head link

[llvm-dev] GC for defsym'd symbols in LLD

On Wed, 4 Dec 2019 at 07:05, Fāng-ruì Sòng <maskray at google.com>
wrote:>
> On Tue, Dec 3, 2019 at 7:02 PM Shoaib Meenai via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > LLD treats any symbol referenced from a linker script as a GC root,
which makes sense. Unfortunately, it also processes --defsym as a linker script
fragment internally, so all target symbols of a --defsym also get treated as GC
roots (i.e., if you have something like --defsym SRC=TGT, TGT will become a GC
root). I believe this to be unnecessary for defsym specifically, since
you're just aliasing a symbol, and if the original or aliased symbols are
referenced from anywhere, the symbol's section will get preserved anyway.
(There's also cases where the defsym target can be an expression instead of
just a symbol name, which I admittedly haven't thought about too hard, but I
believe the same logic  should hold in terms of any needed sections getting
preserved regardless.) I want to change defsym targets specifically to not be
considered as GC roots, so that they can be dead code eliminated. Does anyone
foresee any issues with this?
>
> % cat a.s
> .globl _start, foo, bar
> .text; _start: movabs $d, %rax
> .section .text_foo,"ax"; foo: ret
> .section .text_bar,"ax"; bar: nop
> % as a.s -o a.o
>
> % ld.bfd a.o --defsym d=foo --gc-sections -o a => .text_foo is retained
> % ld.bfd a.o --defsym d=bar --gc-sections -o a => .text_bar is retained
> % ld.bfd a.o --defsym d=1 --gc-sections -o a => Neither .text_foo nor
> .text_bar is retained
> % ld.bfd a.o --defsym c=foo --defsym d=1 --gc-sections -o a => Neither
> .text_foo nor .text_bar is retained; lld will retain .text_foo.
>
> For --defsym from=an_expression_with_to, GNU ld appears to add a
> reference from 'from' to 'to'. lld's behavior
> (https://reviews.llvm.org/D34195) is more conservative.
>
> If we stop treating script->referencedSymbols as GC roots,
> instructions like `movabs $d, %rax` will no longer be able to access
> the intended section. We can tweak our behavior to be like GNU ld, but
> the additional complexity may not be worthwhile.
I think it would be a step too far for defsym symbol=expression to
have no effect on GC. I'd expect that something like defsym foo=bar is
used because some live code refers to foo, but does not refer to bar,
so ideally we'd like defsym foo=bar to keep bar live. I've seen this
idiom used in embedded systems in the presence of binary only
libraries. It is true that the programmer can always go the extra mile
to force bar to be marked live, however I think the expectation would
be defsym foo=bar would do it.

I think the GNU ld behaviour is reasonable. If nothing refers to
either foo or bar then there is no reason to mark them live. On the
implementation cost-benefit trade off I guess we won't know until
there is a prototype, and some idea of what implementing it will save
on a real example.

Peter

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Dec 2019 - GC for defsym'd symbols in LLD

[llvm-dev] GC for defsym'd symbols in LLD

[llvm-dev] GC for defsym'd symbols in LLD

[llvm-dev] GC for defsym'd symbols in LLD

Apparently Analagous Threads