Fangrui Song via llvm-dev
2021-Feb-20 00:01 UTC
[llvm-dev] ld.lld "Don't let __start_/__stop_ retain C identifier name sections" && Swift
tl;dr With --gc-sections, I think the rule "__start_foo/__stop_foo
references from live sections retains all non-SHF_LINK_ORDER input sections
foo" does not cary its weight, so I'd like to drop it entirely in
https://reviews.llvm.org/D96914
I have done a large-scale internal test with huge amount of OSS usage and
spotted two issues:
(1) Linking systemd.
https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-bus/bus-error.h#L33
there will be an `undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP` error.
Supposedly it can be trivially fixed by using undefined weak symbols on
__start_/__stop_.
(2) Linking Swift. There will be errors like `undefined hidden symbol:
__start_swift5_protocols`.
https://github.com/apple/swift/blob/main/stdlib/public/runtime/SwiftRT-ELF.cpp
It seems that trivially making `extern const char __start_##name` does not
work.
The code relies on some `swift5_*` input sections being GC root.
(If someone can file an issue to Swift, I'd appreciate that.)
(If Swift folks can fix it, I'll give my big thanks:)
This can still potentially break some propritary code so I am sending this
heads-up.
I'll place rationale below (it is complicated).
The current rule is:
__start_/__stop_ references retains all non-SHF_LINK_ORDER C identifier name
sections.
After https://reviews.llvm.org/D96753 , it will become
__start_/__stop_ references retains all non-SHF_LINK_ORDER non-SHF_GROUP C
identifier name sections.
(The section group special case is to allow garbage collecting __llvm_prf_*
sections for -fprofile-generate/-fprofile-instr-generate. The saving is huge.)
Personally I'd drop the rule entirely (D96914) (get support from jhenderson
and phosek), i.e.
__start_/__stop_ references do not retain C identifier name sections.
and hope folks can fix Swift/systemd to not rely on the original rule.
---
I have placed more details in
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order
discussing why the rule gets in the away and why SHF_LINK_ORDER is not a
solution.
(Section groups have size overhead for single metadata section.)
I'll paste the relevant paragraph here for your convenience.
(I may edit my article to make it clear)
This is a common usage of metadata sections: each text section references a
metadata section.
The metadata sections have a C identifier name to allow the runtime to collect
them via `__start_`/`__stop_` symbols.
Since `__start_`/`__stop_` references are always present from live sections, the
C identifier name sections appear like GC roots, which means they cannot be
discarded by `ld --gc-sections`.
For users who want to keep GC for these metadata sections, they can set the
`SHF_LINK_ORDER` flag or make the metadata section a member of a section group.
(GNU ld does not implement the `SHF_LINK_ORDER` rule yet.
<https://sourceware.org/bugzilla/show_bug.cgi?id=27259>)
(In LLD, some folks have concluded that this rule does not cary its weight, so
possibly it would be nice it we can drop it
([D96914](https://reviews.llvm.org/D96914)).)
Now, let's walk through an `SHF_LINK_ORDER` example when inlining can cause
problems.
```asm
# Monolithic meta.
.globl _start
_start:
leaq __start_meta(%rip), %rdi
leaq __stop_meta(%rip), %rsi
.section .text.foo,"ax", at progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret
.section .text.bar,"ax", at progbits
.globl bar
bar:
call foo
leaq .Lmeta.bar(%rip), %rax
ret
.section meta,"a", at progbits
.Lmeta.foo:
.byte 0
.Lmeta.bar:
.byte 1
```
The monolithic `meta` does not enable precise garbage collection.
It may be attempting to make `meta` separate and add the `SHF_LINK_ORDER` flag
(to defeat the "C identifier name sections appear like GC roots"
rule):
```asm
.section meta,"ao", at progbits,foo
.Lmeta.foo:
.byte 0
.section meta,"ao", at progbits,bar
.Lmeta.bar:
.byte 1
```
However, due to inlining (foo into bar), the `meta` for `.text.foo` may now get
a reference from another text section `.text.bar`, breaking an implicit
assumption of `SHF_LINK_ORDER`: such a section can only be referenced from its
linked-to section.
```asm
# Both .text.foo and .text.bar reference meta.
.section .text.foo,"ax", at progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret
.section .text.bar,"ax", at progbits
.globl bar
bar:
leaq .Lmeta.foo(%rip), %rax
leaq .Lmeta.bar(%rip), %rax
ret
```
If `.text.bar` (caller) is retained while `.text.foo` (callee) is discarded, the
`meta` for `foo` will link to a discarded section: an invalid state.
LLD has an error: `{{.*}}:(meta): sh_link points to discarded section
{{.*}}:(.text.foo)`.
Shoaib Meenai via llvm-dev
2021-Feb-20 00:46 UTC
[llvm-dev] ld.lld "Don't let __start_/__stop_ retain C identifier name sections" && Swift
We rely on this rule in various places (e.g. see the "Merging
JNI_OnLoad" section of
https://engineering.fb.com/2018/01/23/android/android-native-library-merging/),
so thank you for the heads up.
SHF_GNU_RETAIN looks like a good solution for our use cases, and is something
I've wanted for ELF for a long time. If I'm understanding the current
setup correctly though, we would need to apply it using a .section attribute in
inline assembly, which is inconvenient. For Mach-O (and I believe for COFF as
well), __attribute__((used)) instructs the linker to not garbage collect the
symbol. Would we consider translating __attribute__((used)) to SHF_GNU_RETAIN
for ELF as well and thereby get consistent linker behavior across platforms?
That should work great for our use cases, and it'd be much cleaner and more
explicit than the current linker special-casing.
On 2/19/21, 4:01 PM, "Fangrui Song via llvm-dev" <llvm-dev at
lists.llvm.org> wrote:
tl;dr With --gc-sections, I think the rule "__start_foo/__stop_foo
references from live sections retains all non-SHF_LINK_ORDER input sections
foo" does not cary its weight, so I'd like to drop it entirely in
https://reviews.llvm.org/D96914
I have done a large-scale internal test with huge amount of OSS usage and
spotted two issues:
(1) Linking systemd.
https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-bus/bus-error.h#L33
there will be an `undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP` error.
Supposedly it can be trivially fixed by using undefined weak symbols on
__start_/__stop_.
(2) Linking Swift. There will be errors like `undefined hidden symbol:
__start_swift5_protocols`.
https://github.com/apple/swift/blob/main/stdlib/public/runtime/SwiftRT-ELF.cpp
It seems that trivially making `extern const char __start_##name` does
not work.
The code relies on some `swift5_*` input sections being GC root.
(If someone can file an issue to Swift, I'd appreciate that.)
(If Swift folks can fix it, I'll give my big thanks:)
This can still potentially break some propritary code so I am sending this
heads-up.
I'll place rationale below (it is complicated).
The current rule is:
__start_/__stop_ references retains all non-SHF_LINK_ORDER C identifier
name sections.
After https://reviews.llvm.org/D96753 , it will become
__start_/__stop_ references retains all non-SHF_LINK_ORDER non-SHF_GROUP
C identifier name sections.
(The section group special case is to allow garbage collecting __llvm_prf_*
sections for -fprofile-generate/-fprofile-instr-generate. The saving is huge.)
Personally I'd drop the rule entirely (D96914) (get support from
jhenderson and phosek), i.e.
__start_/__stop_ references do not retain C identifier name sections.
and hope folks can fix Swift/systemd to not rely on the original rule.
---
I have placed more details in
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order
discussing why the rule gets in the away and why SHF_LINK_ORDER is not a
solution.
(Section groups have size overhead for single metadata section.)
I'll paste the relevant paragraph here for your convenience.
(I may edit my article to make it clear)
This is a common usage of metadata sections: each text section references a
metadata section.
The metadata sections have a C identifier name to allow the runtime to
collect them via `__start_`/`__stop_` symbols.
Since `__start_`/`__stop_` references are always present from live sections,
the C identifier name sections appear like GC roots, which means they cannot be
discarded by `ld --gc-sections`.
For users who want to keep GC for these metadata sections, they can set the
`SHF_LINK_ORDER` flag or make the metadata section a member of a section group.
(GNU ld does not implement the `SHF_LINK_ORDER` rule yet.
<https://sourceware.org/bugzilla/show_bug.cgi?id=27259 >)
(In LLD, some folks have concluded that this rule does not cary its weight,
so possibly it would be nice it we can drop it
([D96914](https://reviews.llvm.org/D96914 )).)
Now, let's walk through an `SHF_LINK_ORDER` example when inlining can
cause problems.
```asm
# Monolithic meta.
.globl _start
_start:
leaq __start_meta(%rip), %rdi
leaq __stop_meta(%rip), %rsi
.section .text.foo,"ax", at progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret
.section .text.bar,"ax", at progbits
.globl bar
bar:
call foo
leaq .Lmeta.bar(%rip), %rax
ret
.section meta,"a", at progbits
.Lmeta.foo:
.byte 0
.Lmeta.bar:
.byte 1
```
The monolithic `meta` does not enable precise garbage collection.
It may be attempting to make `meta` separate and add the `SHF_LINK_ORDER`
flag (to defeat the "C identifier name sections appear like GC roots"
rule):
```asm
.section meta,"ao", at progbits,foo
.Lmeta.foo:
.byte 0
.section meta,"ao", at progbits,bar
.Lmeta.bar:
.byte 1
```
However, due to inlining (foo into bar), the `meta` for `.text.foo` may now
get a reference from another text section `.text.bar`, breaking an implicit
assumption of `SHF_LINK_ORDER`: such a section can only be referenced from its
linked-to section.
```asm
# Both .text.foo and .text.bar reference meta.
.section .text.foo,"ax", at progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret
.section .text.bar,"ax", at progbits
.globl bar
bar:
leaq .Lmeta.foo(%rip), %rax
leaq .Lmeta.bar(%rip), %rax
ret
```
If `.text.bar` (caller) is retained while `.text.foo` (callee) is discarded,
the `meta` for `foo` will link to a discarded section: an invalid state.
LLD has an error: `{{.*}}:(meta): sh_link points to discarded section
{{.*}}:(.text.foo)`.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
James Henderson via llvm-dev
2021-Feb-22 10:34 UTC
[llvm-dev] ld.lld "Don't let __start_/__stop_ retain C identifier name sections" && Swift
I've filed an internal issue tracker for us to investigate the impact of this proposal, although I don't know when we'll get a chance to schedule the work at this point. Also, it's worth noting that we can't test all downstream codebases that potentially could use this feature, so we'll likely want some potential source code or switch that will allow our users to keep their unused sections. On Sat, 20 Feb 2021 at 00:01, Fangrui Song via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > tl;dr With --gc-sections, I think the rule "__start_foo/__stop_foo > references from live sections retains all non-SHF_LINK_ORDER input sections > foo" does not cary its weight, so I'd like to drop it entirely in > https://reviews.llvm.org/D96914 > > I have done a large-scale internal test with huge amount of OSS usage and > spotted two issues: > > (1) Linking systemd. > https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-bus/bus-error.h#L33 > there will be an `undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP` error. > Supposedly it can be trivially fixed by using undefined weak symbols on > __start_/__stop_. > (2) Linking Swift. There will be errors like `undefined hidden symbol: > __start_swift5_protocols`. > > https://github.com/apple/swift/blob/main/stdlib/public/runtime/SwiftRT-ELF.cpp > It seems that trivially making `extern const char __start_##name` does > not work. > The code relies on some `swift5_*` input sections being GC root. > (If someone can file an issue to Swift, I'd appreciate that.) > (If Swift folks can fix it, I'll give my big thanks:) > > This can still potentially break some propritary code so I am sending this > heads-up. > I'll place rationale below (it is complicated). > > > > The current rule is: > > __start_/__stop_ references retains all non-SHF_LINK_ORDER C identifier > name sections. > > After https://reviews.llvm.org/D96753 , it will become > > __start_/__stop_ references retains all non-SHF_LINK_ORDER > non-SHF_GROUP C identifier name sections. > > (The section group special case is to allow garbage collecting > __llvm_prf_* sections for -fprofile-generate/-fprofile-instr-generate. The > saving is huge.) > > Personally I'd drop the rule entirely (D96914) (get support from > jhenderson and phosek), i.e. > > __start_/__stop_ references do not retain C identifier name sections. > > and hope folks can fix Swift/systemd to not rely on the original rule. > > --- > > I have placed more details in > https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order > discussing why the rule gets in the away and why SHF_LINK_ORDER is not a > solution. > (Section groups have size overhead for single metadata section.) > > I'll paste the relevant paragraph here for your convenience. > (I may edit my article to make it clear) > > This is a common usage of metadata sections: each text section references > a metadata section. > The metadata sections have a C identifier name to allow the runtime to > collect them via `__start_`/`__stop_` symbols. > > Since `__start_`/`__stop_` references are always present from live > sections, the C identifier name sections appear like GC roots, which means > they cannot be discarded by `ld --gc-sections`. > > For users who want to keep GC for these metadata sections, they can set > the `SHF_LINK_ORDER` flag or make the metadata section a member of a > section group. > (GNU ld does not implement the `SHF_LINK_ORDER` rule yet. < > https://sourceware.org/bugzilla/show_bug.cgi?id=27259>) > (In LLD, some folks have concluded that this rule does not cary its > weight, so possibly it would be nice it we can drop it ([D96914]( > https://reviews.llvm.org/D96914)).) > > Now, let's walk through an `SHF_LINK_ORDER` example when inlining can > cause problems. > > ```asm > # Monolithic meta. > .globl _start > _start: > leaq __start_meta(%rip), %rdi > leaq __stop_meta(%rip), %rsi > > .section .text.foo,"ax", at progbits > .globl foo > foo: > leaq .Lmeta.foo(%rip), %rax > ret > > .section .text.bar,"ax", at progbits > .globl bar > bar: > call foo > leaq .Lmeta.bar(%rip), %rax > ret > > .section meta,"a", at progbits > .Lmeta.foo: > .byte 0 > .Lmeta.bar: > .byte 1 > ``` > > The monolithic `meta` does not enable precise garbage collection. > It may be attempting to make `meta` separate and add the `SHF_LINK_ORDER` > flag (to defeat the "C identifier name sections appear like GC roots" rule): > > ```asm > .section meta,"ao", at progbits,foo > .Lmeta.foo: > .byte 0 > > .section meta,"ao", at progbits,bar > .Lmeta.bar: > .byte 1 > ``` > > However, due to inlining (foo into bar), the `meta` for `.text.foo` may > now get a reference from another text section `.text.bar`, breaking an > implicit assumption of `SHF_LINK_ORDER`: such a section can only be > referenced from its linked-to section. > ```asm > # Both .text.foo and .text.bar reference meta. > .section .text.foo,"ax", at progbits > .globl foo > foo: > leaq .Lmeta.foo(%rip), %rax > ret > > .section .text.bar,"ax", at progbits > .globl bar > bar: > leaq .Lmeta.foo(%rip), %rax > leaq .Lmeta.bar(%rip), %rax > ret > ``` > > If `.text.bar` (caller) is retained while `.text.foo` (callee) is > discarded, the `meta` for `foo` will link to a discarded section: an > invalid state. > LLD has an error: `{{.*}}:(meta): sh_link points to discarded section > {{.*}}:(.text.foo)`. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210222/427b108c/attachment.html>