Zequan Wu via llvm-dev
2021-Mar-23 18:41 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
The size increase of chrome on Linux by switching from all ICF to safe ICF
is small.
All ICF:
text data bss dec hex filename
169314343 8472660 2368965 180155968 abcf640 chrome
Safe ICF:
text data bss dec hex filename
174521550 8497604 2368965 185388119 b0ccc57 chrome
On Windows, chrome.dll increases size by around 14 MB (12MB increases in
.text section).
All ICF:
Size of out\Default\chrome.dll is 170.715648 MB
name: mem size , disk size
.text: 141.701417 MB
.rdata: 22.458476 MB
.data: 3.093948 MB, 0.523264 MB
.pdata: 4.412364 MB
.00cfg: 0.000040 MB
.gehcont: 0.000132 MB
.retplne: 0.000108 MB
.rodata: 0.004544 MB
.tls: 0.000561 MB
CPADinfo: 0.000056 MB
_RDATA: 0.000244 MB
.rsrc: 0.285232 MB
.reloc: 1.324196 MB
Safe ICF:
Size of out\icf-safe\chrome.dll is 184.499712 MB
name: mem size , disk size
.text: 153.809529 MB
.rdata: 23.123628 MB
.data: 3.093948 MB, 0.523264 MB
.pdata: 5.367396 MB
.00cfg: 0.000040 MB
.gehcont: 0.000132 MB
.retplne: 0.000108 MB
.rodata: 0.004544 MB
.tls: 0.000561 MB
CPADinfo: 0.000056 MB
_RDATA: 0.000244 MB
.rsrc: 0.285232 MB
.reloc: 1.379364 MB
If an attribute is used and it affects unnamed_addr of a symbol, it
determines whether the symbols should show up in the .addrsig table.
All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table,
if they belong to code sections. So, it won't have an effect on disabling
ICF.
On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com>
wrote:
>
> On 2021-03-22, David Blaikie via llvm-dev wrote:
> >ICF: Identical Code Folding
> >
> >Linker deduplicates functions by collapsing any identical functions
> >together - with icf=safe, the linker looks at a .addressing section in
the
> >object file and any functions listed in that section are not treated as
> >collapsible (eg: because they need to meet C++'s "distinct
functions have
> >distinct addresses" guarantee)
>
> The name originated from MSVC link.exe where icf stands for "identical
> COMDAT folding".
> gold named it "identical code folding" - which makes some sense
because
> gold does not fold readonly data.
>
> In LLD, the name is not accurate for two reasons: (1) the feature can
> apply to readonly data as well; (2) the folding is by section, not by
> function.
>
> We define identical sections as they have identical content and their
> outgoing relocation sets cannot be distinguished: they need to have the
> same number of relocations, with the same relative locations, with the
> referenced symbols indistinguishable.
>
> Then, ld.lld --icf={safe,all} works like this:
>
> For a set of identical sections, the linker picks one representative and
> drops the rest, then redirects references to the representative.
>
> Note: this can confuse debuggers/symbolizers/profilers easily.
>
> lld-link /opt:icf is different from ld.lld --icf but I haven't looked
> into it closely.
>
>
> I find that the feature's saving is small given its downside
> (also increaded link time: the current LLD's implementation is
inferior:
> it performs a quadratic number of comparisons among an equality class):
>
> This is the size differences for the 'lld' executable:
>
> % size lld.{none,safe,all}
> text data bss dec hex filename
> 96821040 7210504 550810 104582354 63bccd2 lld.none
> 95217624 7167656 550810 102936090 622ae1a lld.safe
> 94038808 7167144 550810 101756762 610af5a lld.all
> % size gold.{none,safe,all}
> text data bss dec hex filename
> 96857302 7174792 550825 104582919 63bcf07 gold.none
> 94469390 7174792 550825 102195007 6175f3f gold.safe
> 94184430 7174792 550825 101910047 613061f gold.all
>
> Note that the --icf=all result caps the potential saving of the proposed
> annotation.
>
> Actually with some large internal targets I get even smaller savings.
>
>
> ld.lld --icf=safe is safer than gold --icf=safe but probably misses some
> opportunities.
> It can be that clang codegen/optimizer fail to mark some cases as
> {,local_}unnamed_addr.
>
> I know Chromium and the Windows world can be different:) But I'd still
> want to
> get some numbers first.
>
>
> Last, I have seen that Chromium has some code like
>
>
https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp
>
> void sk_abort_no_print() {
> // Linker's ICF feature may merge this function with other
> functions with
> // the same definition (e.g. any function whose sole job is to call
> abort())
> // and it may confuse the crash report processing system.
> // http://crbug.com/860850
> static int static_variable_to_make_this_function_unique = 0x736b;
> // "sk"
>
base::debug::Alias(&static_variable_to_make_this_function_unique);
>
> abort();
> }
>
> If we want an approach to work with link.exe, I don't know what we can
> do...
> If no desire for link.exe compatibility, I can see that having a proper
> way marking the function
> can be useful... but in any case if an attribute is used, it probably
> should affect
> unnamed_addr directly instead of being called *icf*.
>
>
>
> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
> >llvm-dev at lists.llvm.org> wrote:
> >
> >> Can you define ICF please? And give a bit of context?
> >>
> >> Philip
> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
> >>
> >> Hi all,
> >>
> >> Background:
> >> It's been a longstanding difficulty of debugging with ICF.
Programmers
> >> don't have control over which sections should be folded by
ICF, which
> >> sections shouldn't. The existing address significant table
won't have
> >> effect for code sections during all ICF mode in both ld.lld and
> lld-link.
> >> By switching to safe ICF could mark code sections as unique, but
at a
> cost
> >> of increasing binary size out of control. So, it would be good if
> >> programmers could selectively disable ICF in source code by
annotating
> >> global functions/variables with an attribute to improve debugging
> >> experience and have the control on the binary size increase.
> >>
> >> My plan is to add a new section table(`.no_icf`) to object files.
> Sections
> >> of all symbols inside the table should not be folded by all ICF
mode.
> And
> >> symbols can only be added into the table by annotating global
> >> functions/variables with a new attribute(`no_icf`) in source code.
> >>
> >> What do you think about this approach?
> >>
> >> Thanks,
> >> Zequan
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://
> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210323/0b60b89f/attachment.html>
Fangrui Song via llvm-dev
2021-Mar-23 18:50 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
On 2021-03-23, Zequan Wu wrote:>The size increase of chrome on Linux by switching from all ICF to safe ICF >is small. >All ICF: > text data bss dec hex filename >169314343 8472660 2368965 180155968 abcf640 chrome >Safe ICF: > text data bss dec hex filename >174521550 8497604 2368965 185388119 b0ccc57 chrome > >On Windows, chrome.dll increases size by around 14 MB (12MB increases in >.text section). >All ICF: >Size of out\Default\chrome.dll is 170.715648 MB > name: mem size , disk size > .text: 141.701417 MB > .rdata: 22.458476 MB > .data: 3.093948 MB, 0.523264 MB > .pdata: 4.412364 MB > .00cfg: 0.000040 MB > .gehcont: 0.000132 MB > .retplne: 0.000108 MB > .rodata: 0.004544 MB > .tls: 0.000561 MB > CPADinfo: 0.000056 MB > _RDATA: 0.000244 MB > .rsrc: 0.285232 MB > .reloc: 1.324196 MB >Safe ICF: >Size of out\icf-safe\chrome.dll is 184.499712 MB > name: mem size , disk size > .text: 153.809529 MB > .rdata: 23.123628 MB > .data: 3.093948 MB, 0.523264 MB > .pdata: 5.367396 MB > .00cfg: 0.000040 MB > .gehcont: 0.000132 MB > .retplne: 0.000108 MB > .rodata: 0.004544 MB > .tls: 0.000561 MB > CPADinfo: 0.000056 MB > _RDATA: 0.000244 MB > .rsrc: 0.285232 MB > .reloc: 1.379364 MB > >If an attribute is used and it affects unnamed_addr of a symbol, it >determines whether the symbols should show up in the .addrsig table. >All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table, >if they belong to code sections. So, it won't have an effect on disabling >ICF.Is there something which can be improved on lld-link? Note also that the size difference between gold --icf={safe,all} is smaller than the difference between ld.lld --icf={safe,all}. Some may be due to gold performing "unsafe" safe ICF, some may suggest places where clang fails to add {,local_}unnamed_addr.>On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com> wrote: > >> >> On 2021-03-22, David Blaikie via llvm-dev wrote: >> >ICF: Identical Code Folding >> > >> >Linker deduplicates functions by collapsing any identical functions >> >together - with icf=safe, the linker looks at a .addressing section in the >> >object file and any functions listed in that section are not treated as >> >collapsible (eg: because they need to meet C++'s "distinct functions have >> >distinct addresses" guarantee) >> >> The name originated from MSVC link.exe where icf stands for "identical >> COMDAT folding". >> gold named it "identical code folding" - which makes some sense because >> gold does not fold readonly data. >> >> In LLD, the name is not accurate for two reasons: (1) the feature can >> apply to readonly data as well; (2) the folding is by section, not by >> function. >> >> We define identical sections as they have identical content and their >> outgoing relocation sets cannot be distinguished: they need to have the >> same number of relocations, with the same relative locations, with the >> referenced symbols indistinguishable. >> >> Then, ld.lld --icf={safe,all} works like this: >> >> For a set of identical sections, the linker picks one representative and >> drops the rest, then redirects references to the representative. >> >> Note: this can confuse debuggers/symbolizers/profilers easily. >> >> lld-link /opt:icf is different from ld.lld --icf but I haven't looked >> into it closely. >> >> >> I find that the feature's saving is small given its downside >> (also increaded link time: the current LLD's implementation is inferior: >> it performs a quadratic number of comparisons among an equality class): >> >> This is the size differences for the 'lld' executable: >> >> % size lld.{none,safe,all} >> text data bss dec hex filename >> 96821040 7210504 550810 104582354 63bccd2 lld.none >> 95217624 7167656 550810 102936090 622ae1a lld.safe >> 94038808 7167144 550810 101756762 610af5a lld.all >> % size gold.{none,safe,all} >> text data bss dec hex filename >> 96857302 7174792 550825 104582919 63bcf07 gold.none >> 94469390 7174792 550825 102195007 6175f3f gold.safe >> 94184430 7174792 550825 101910047 613061f gold.all >> >> Note that the --icf=all result caps the potential saving of the proposed >> annotation. >> >> Actually with some large internal targets I get even smaller savings. >> >> >> ld.lld --icf=safe is safer than gold --icf=safe but probably misses some >> opportunities. >> It can be that clang codegen/optimizer fail to mark some cases as >> {,local_}unnamed_addr. >> >> I know Chromium and the Windows world can be different:) But I'd still >> want to >> get some numbers first. >> >> >> Last, I have seen that Chromium has some code like >> >> https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp >> >> void sk_abort_no_print() { >> // Linker's ICF feature may merge this function with other >> functions with >> // the same definition (e.g. any function whose sole job is to call >> abort()) >> // and it may confuse the crash report processing system. >> // http://crbug.com/860850 >> static int static_variable_to_make_this_function_unique = 0x736b; >> // "sk" >> base::debug::Alias(&static_variable_to_make_this_function_unique); >> >> abort(); >> } >> >> If we want an approach to work with link.exe, I don't know what we can >> do... >> If no desire for link.exe compatibility, I can see that having a proper >> way marking the function >> can be useful... but in any case if an attribute is used, it probably >> should affect >> unnamed_addr directly instead of being called *icf*. >> >> >> >> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev < >> >llvm-dev at lists.llvm.org> wrote: >> > >> >> Can you define ICF please? And give a bit of context? >> >> >> >> Philip >> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote: >> >> >> >> Hi all, >> >> >> >> Background: >> >> It's been a longstanding difficulty of debugging with ICF. Programmers >> >> don't have control over which sections should be folded by ICF, which >> >> sections shouldn't. The existing address significant table won't have >> >> effect for code sections during all ICF mode in both ld.lld and >> lld-link. >> >> By switching to safe ICF could mark code sections as unique, but at a >> cost >> >> of increasing binary size out of control. So, it would be good if >> >> programmers could selectively disable ICF in source code by annotating >> >> global functions/variables with an attribute to improve debugging >> >> experience and have the control on the binary size increase. >> >> >> >> My plan is to add a new section table(`.no_icf`) to object files. >> Sections >> >> of all symbols inside the table should not be folded by all ICF mode. >> And >> >> symbols can only be added into the table by annotating global >> >> functions/variables with a new attribute(`no_icf`) in source code. >> >> >> >> What do you think about this approach? >> >> >> >> Thanks, >> >> Zequan >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps:// >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> >_______________________________________________ >> >LLVM Developers mailing list >> >llvm-dev at lists.llvm.org >> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>