Zequan Wu via llvm-dev
2021-Mar-23 18:41 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
The size increase of chrome on Linux by switching from all ICF to safe ICF is small. All ICF: text data bss dec hex filename 169314343 8472660 2368965 180155968 abcf640 chrome Safe ICF: text data bss dec hex filename 174521550 8497604 2368965 185388119 b0ccc57 chrome On Windows, chrome.dll increases size by around 14 MB (12MB increases in .text section). All ICF: Size of out\Default\chrome.dll is 170.715648 MB name: mem size , disk size .text: 141.701417 MB .rdata: 22.458476 MB .data: 3.093948 MB, 0.523264 MB .pdata: 4.412364 MB .00cfg: 0.000040 MB .gehcont: 0.000132 MB .retplne: 0.000108 MB .rodata: 0.004544 MB .tls: 0.000561 MB CPADinfo: 0.000056 MB _RDATA: 0.000244 MB .rsrc: 0.285232 MB .reloc: 1.324196 MB Safe ICF: Size of out\icf-safe\chrome.dll is 184.499712 MB name: mem size , disk size .text: 153.809529 MB .rdata: 23.123628 MB .data: 3.093948 MB, 0.523264 MB .pdata: 5.367396 MB .00cfg: 0.000040 MB .gehcont: 0.000132 MB .retplne: 0.000108 MB .rodata: 0.004544 MB .tls: 0.000561 MB CPADinfo: 0.000056 MB _RDATA: 0.000244 MB .rsrc: 0.285232 MB .reloc: 1.379364 MB If an attribute is used and it affects unnamed_addr of a symbol, it determines whether the symbols should show up in the .addrsig table. All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table, if they belong to code sections. So, it won't have an effect on disabling ICF. On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com> wrote:> > On 2021-03-22, David Blaikie via llvm-dev wrote: > >ICF: Identical Code Folding > > > >Linker deduplicates functions by collapsing any identical functions > >together - with icf=safe, the linker looks at a .addressing section in the > >object file and any functions listed in that section are not treated as > >collapsible (eg: because they need to meet C++'s "distinct functions have > >distinct addresses" guarantee) > > The name originated from MSVC link.exe where icf stands for "identical > COMDAT folding". > gold named it "identical code folding" - which makes some sense because > gold does not fold readonly data. > > In LLD, the name is not accurate for two reasons: (1) the feature can > apply to readonly data as well; (2) the folding is by section, not by > function. > > We define identical sections as they have identical content and their > outgoing relocation sets cannot be distinguished: they need to have the > same number of relocations, with the same relative locations, with the > referenced symbols indistinguishable. > > Then, ld.lld --icf={safe,all} works like this: > > For a set of identical sections, the linker picks one representative and > drops the rest, then redirects references to the representative. > > Note: this can confuse debuggers/symbolizers/profilers easily. > > lld-link /opt:icf is different from ld.lld --icf but I haven't looked > into it closely. > > > I find that the feature's saving is small given its downside > (also increaded link time: the current LLD's implementation is inferior: > it performs a quadratic number of comparisons among an equality class): > > This is the size differences for the 'lld' executable: > > % size lld.{none,safe,all} > text data bss dec hex filename > 96821040 7210504 550810 104582354 63bccd2 lld.none > 95217624 7167656 550810 102936090 622ae1a lld.safe > 94038808 7167144 550810 101756762 610af5a lld.all > % size gold.{none,safe,all} > text data bss dec hex filename > 96857302 7174792 550825 104582919 63bcf07 gold.none > 94469390 7174792 550825 102195007 6175f3f gold.safe > 94184430 7174792 550825 101910047 613061f gold.all > > Note that the --icf=all result caps the potential saving of the proposed > annotation. > > Actually with some large internal targets I get even smaller savings. > > > ld.lld --icf=safe is safer than gold --icf=safe but probably misses some > opportunities. > It can be that clang codegen/optimizer fail to mark some cases as > {,local_}unnamed_addr. > > I know Chromium and the Windows world can be different:) But I'd still > want to > get some numbers first. > > > Last, I have seen that Chromium has some code like > > https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp > > void sk_abort_no_print() { > // Linker's ICF feature may merge this function with other > functions with > // the same definition (e.g. any function whose sole job is to call > abort()) > // and it may confuse the crash report processing system. > // http://crbug.com/860850 > static int static_variable_to_make_this_function_unique = 0x736b; > // "sk" > base::debug::Alias(&static_variable_to_make_this_function_unique); > > abort(); > } > > If we want an approach to work with link.exe, I don't know what we can > do... > If no desire for link.exe compatibility, I can see that having a proper > way marking the function > can be useful... but in any case if an attribute is used, it probably > should affect > unnamed_addr directly instead of being called *icf*. > > > > >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev < > >llvm-dev at lists.llvm.org> wrote: > > > >> Can you define ICF please? And give a bit of context? > >> > >> Philip > >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote: > >> > >> Hi all, > >> > >> Background: > >> It's been a longstanding difficulty of debugging with ICF. Programmers > >> don't have control over which sections should be folded by ICF, which > >> sections shouldn't. The existing address significant table won't have > >> effect for code sections during all ICF mode in both ld.lld and > lld-link. > >> By switching to safe ICF could mark code sections as unique, but at a > cost > >> of increasing binary size out of control. So, it would be good if > >> programmers could selectively disable ICF in source code by annotating > >> global functions/variables with an attribute to improve debugging > >> experience and have the control on the binary size increase. > >> > >> My plan is to add a new section table(`.no_icf`) to object files. > Sections > >> of all symbols inside the table should not be folded by all ICF mode. > And > >> symbols can only be added into the table by annotating global > >> functions/variables with a new attribute(`no_icf`) in source code. > >> > >> What do you think about this approach? > >> > >> Thanks, > >> Zequan > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps:// > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > > >_______________________________________________ > >LLVM Developers mailing list > >llvm-dev at lists.llvm.org > >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210323/0b60b89f/attachment.html>
Fangrui Song via llvm-dev
2021-Mar-23 18:50 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
On 2021-03-23, Zequan Wu wrote:>The size increase of chrome on Linux by switching from all ICF to safe ICF >is small. >All ICF: > text data bss dec hex filename >169314343 8472660 2368965 180155968 abcf640 chrome >Safe ICF: > text data bss dec hex filename >174521550 8497604 2368965 185388119 b0ccc57 chrome > >On Windows, chrome.dll increases size by around 14 MB (12MB increases in >.text section). >All ICF: >Size of out\Default\chrome.dll is 170.715648 MB > name: mem size , disk size > .text: 141.701417 MB > .rdata: 22.458476 MB > .data: 3.093948 MB, 0.523264 MB > .pdata: 4.412364 MB > .00cfg: 0.000040 MB > .gehcont: 0.000132 MB > .retplne: 0.000108 MB > .rodata: 0.004544 MB > .tls: 0.000561 MB > CPADinfo: 0.000056 MB > _RDATA: 0.000244 MB > .rsrc: 0.285232 MB > .reloc: 1.324196 MB >Safe ICF: >Size of out\icf-safe\chrome.dll is 184.499712 MB > name: mem size , disk size > .text: 153.809529 MB > .rdata: 23.123628 MB > .data: 3.093948 MB, 0.523264 MB > .pdata: 5.367396 MB > .00cfg: 0.000040 MB > .gehcont: 0.000132 MB > .retplne: 0.000108 MB > .rodata: 0.004544 MB > .tls: 0.000561 MB > CPADinfo: 0.000056 MB > _RDATA: 0.000244 MB > .rsrc: 0.285232 MB > .reloc: 1.379364 MB > >If an attribute is used and it affects unnamed_addr of a symbol, it >determines whether the symbols should show up in the .addrsig table. >All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table, >if they belong to code sections. So, it won't have an effect on disabling >ICF.Is there something which can be improved on lld-link? Note also that the size difference between gold --icf={safe,all} is smaller than the difference between ld.lld --icf={safe,all}. Some may be due to gold performing "unsafe" safe ICF, some may suggest places where clang fails to add {,local_}unnamed_addr.>On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com> wrote: > >> >> On 2021-03-22, David Blaikie via llvm-dev wrote: >> >ICF: Identical Code Folding >> > >> >Linker deduplicates functions by collapsing any identical functions >> >together - with icf=safe, the linker looks at a .addressing section in the >> >object file and any functions listed in that section are not treated as >> >collapsible (eg: because they need to meet C++'s "distinct functions have >> >distinct addresses" guarantee) >> >> The name originated from MSVC link.exe where icf stands for "identical >> COMDAT folding". >> gold named it "identical code folding" - which makes some sense because >> gold does not fold readonly data. >> >> In LLD, the name is not accurate for two reasons: (1) the feature can >> apply to readonly data as well; (2) the folding is by section, not by >> function. >> >> We define identical sections as they have identical content and their >> outgoing relocation sets cannot be distinguished: they need to have the >> same number of relocations, with the same relative locations, with the >> referenced symbols indistinguishable. >> >> Then, ld.lld --icf={safe,all} works like this: >> >> For a set of identical sections, the linker picks one representative and >> drops the rest, then redirects references to the representative. >> >> Note: this can confuse debuggers/symbolizers/profilers easily. >> >> lld-link /opt:icf is different from ld.lld --icf but I haven't looked >> into it closely. >> >> >> I find that the feature's saving is small given its downside >> (also increaded link time: the current LLD's implementation is inferior: >> it performs a quadratic number of comparisons among an equality class): >> >> This is the size differences for the 'lld' executable: >> >> % size lld.{none,safe,all} >> text data bss dec hex filename >> 96821040 7210504 550810 104582354 63bccd2 lld.none >> 95217624 7167656 550810 102936090 622ae1a lld.safe >> 94038808 7167144 550810 101756762 610af5a lld.all >> % size gold.{none,safe,all} >> text data bss dec hex filename >> 96857302 7174792 550825 104582919 63bcf07 gold.none >> 94469390 7174792 550825 102195007 6175f3f gold.safe >> 94184430 7174792 550825 101910047 613061f gold.all >> >> Note that the --icf=all result caps the potential saving of the proposed >> annotation. >> >> Actually with some large internal targets I get even smaller savings. >> >> >> ld.lld --icf=safe is safer than gold --icf=safe but probably misses some >> opportunities. >> It can be that clang codegen/optimizer fail to mark some cases as >> {,local_}unnamed_addr. >> >> I know Chromium and the Windows world can be different:) But I'd still >> want to >> get some numbers first. >> >> >> Last, I have seen that Chromium has some code like >> >> https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp >> >> void sk_abort_no_print() { >> // Linker's ICF feature may merge this function with other >> functions with >> // the same definition (e.g. any function whose sole job is to call >> abort()) >> // and it may confuse the crash report processing system. >> // http://crbug.com/860850 >> static int static_variable_to_make_this_function_unique = 0x736b; >> // "sk" >> base::debug::Alias(&static_variable_to_make_this_function_unique); >> >> abort(); >> } >> >> If we want an approach to work with link.exe, I don't know what we can >> do... >> If no desire for link.exe compatibility, I can see that having a proper >> way marking the function >> can be useful... but in any case if an attribute is used, it probably >> should affect >> unnamed_addr directly instead of being called *icf*. >> >> >> >> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev < >> >llvm-dev at lists.llvm.org> wrote: >> > >> >> Can you define ICF please? And give a bit of context? >> >> >> >> Philip >> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote: >> >> >> >> Hi all, >> >> >> >> Background: >> >> It's been a longstanding difficulty of debugging with ICF. Programmers >> >> don't have control over which sections should be folded by ICF, which >> >> sections shouldn't. The existing address significant table won't have >> >> effect for code sections during all ICF mode in both ld.lld and >> lld-link. >> >> By switching to safe ICF could mark code sections as unique, but at a >> cost >> >> of increasing binary size out of control. So, it would be good if >> >> programmers could selectively disable ICF in source code by annotating >> >> global functions/variables with an attribute to improve debugging >> >> experience and have the control on the binary size increase. >> >> >> >> My plan is to add a new section table(`.no_icf`) to object files. >> Sections >> >> of all symbols inside the table should not be folded by all ICF mode. >> And >> >> symbols can only be added into the table by annotating global >> >> functions/variables with a new attribute(`no_icf`) in source code. >> >> >> >> What do you think about this approach? >> >> >> >> Thanks, >> >> Zequan >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps:// >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> >_______________________________________________ >> >LLVM Developers mailing list >> >llvm-dev at lists.llvm.org >> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>