Fangrui Song via llvm-dev
2021-Mar-23 05:18 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
On 2021-03-22, David Blaikie via llvm-dev wrote:>ICF: Identical Code Folding > >Linker deduplicates functions by collapsing any identical functions >together - with icf=safe, the linker looks at a .addressing section in the >object file and any functions listed in that section are not treated as >collapsible (eg: because they need to meet C++'s "distinct functions have >distinct addresses" guarantee)The name originated from MSVC link.exe where icf stands for "identical COMDAT folding". gold named it "identical code folding" - which makes some sense because gold does not fold readonly data. In LLD, the name is not accurate for two reasons: (1) the feature can apply to readonly data as well; (2) the folding is by section, not by function. We define identical sections as they have identical content and their outgoing relocation sets cannot be distinguished: they need to have the same number of relocations, with the same relative locations, with the referenced symbols indistinguishable. Then, ld.lld --icf={safe,all} works like this: For a set of identical sections, the linker picks one representative and drops the rest, then redirects references to the representative. Note: this can confuse debuggers/symbolizers/profilers easily. lld-link /opt:icf is different from ld.lld --icf but I haven't looked into it closely. I find that the feature's saving is small given its downside (also increaded link time: the current LLD's implementation is inferior: it performs a quadratic number of comparisons among an equality class): This is the size differences for the 'lld' executable: % size lld.{none,safe,all} text data bss dec hex filename 96821040 7210504 550810 104582354 63bccd2 lld.none 95217624 7167656 550810 102936090 622ae1a lld.safe 94038808 7167144 550810 101756762 610af5a lld.all % size gold.{none,safe,all} text data bss dec hex filename 96857302 7174792 550825 104582919 63bcf07 gold.none 94469390 7174792 550825 102195007 6175f3f gold.safe 94184430 7174792 550825 101910047 613061f gold.all Note that the --icf=all result caps the potential saving of the proposed annotation. Actually with some large internal targets I get even smaller savings. ld.lld --icf=safe is safer than gold --icf=safe but probably misses some opportunities. It can be that clang codegen/optimizer fail to mark some cases as {,local_}unnamed_addr. I know Chromium and the Windows world can be different:) But I'd still want to get some numbers first. Last, I have seen that Chromium has some code like https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp void sk_abort_no_print() { // Linker's ICF feature may merge this function with other functions with // the same definition (e.g. any function whose sole job is to call abort()) // and it may confuse the crash report processing system. // http://crbug.com/860850 static int static_variable_to_make_this_function_unique = 0x736b; // "sk" base::debug::Alias(&static_variable_to_make_this_function_unique); abort(); } If we want an approach to work with link.exe, I don't know what we can do... If no desire for link.exe compatibility, I can see that having a proper way marking the function can be useful... but in any case if an attribute is used, it probably should affect unnamed_addr directly instead of being called *icf*.>On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev < >llvm-dev at lists.llvm.org> wrote: > >> Can you define ICF please? And give a bit of context? >> >> Philip >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote: >> >> Hi all, >> >> Background: >> It's been a longstanding difficulty of debugging with ICF. Programmers >> don't have control over which sections should be folded by ICF, which >> sections shouldn't. The existing address significant table won't have >> effect for code sections during all ICF mode in both ld.lld and lld-link. >> By switching to safe ICF could mark code sections as unique, but at a cost >> of increasing binary size out of control. So, it would be good if >> programmers could selectively disable ICF in source code by annotating >> global functions/variables with an attribute to improve debugging >> experience and have the control on the binary size increase. >> >> My plan is to add a new section table(`.no_icf`) to object files. Sections >> of all symbols inside the table should not be folded by all ICF mode. And >> symbols can only be added into the table by annotating global >> functions/variables with a new attribute(`no_icf`) in source code. >> >> What do you think about this approach? >> >> Thanks, >> Zequan >> >> >> _______________________________________________ >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>_______________________________________________ >LLVM Developers mailing list >llvm-dev at lists.llvm.org >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Zequan Wu via llvm-dev
2021-Mar-23 18:41 UTC
[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking
The size increase of chrome on Linux by switching from all ICF to safe ICF is small. All ICF: text data bss dec hex filename 169314343 8472660 2368965 180155968 abcf640 chrome Safe ICF: text data bss dec hex filename 174521550 8497604 2368965 185388119 b0ccc57 chrome On Windows, chrome.dll increases size by around 14 MB (12MB increases in .text section). All ICF: Size of out\Default\chrome.dll is 170.715648 MB name: mem size , disk size .text: 141.701417 MB .rdata: 22.458476 MB .data: 3.093948 MB, 0.523264 MB .pdata: 4.412364 MB .00cfg: 0.000040 MB .gehcont: 0.000132 MB .retplne: 0.000108 MB .rodata: 0.004544 MB .tls: 0.000561 MB CPADinfo: 0.000056 MB _RDATA: 0.000244 MB .rsrc: 0.285232 MB .reloc: 1.324196 MB Safe ICF: Size of out\icf-safe\chrome.dll is 184.499712 MB name: mem size , disk size .text: 153.809529 MB .rdata: 23.123628 MB .data: 3.093948 MB, 0.523264 MB .pdata: 5.367396 MB .00cfg: 0.000040 MB .gehcont: 0.000132 MB .retplne: 0.000108 MB .rodata: 0.004544 MB .tls: 0.000561 MB CPADinfo: 0.000056 MB _RDATA: 0.000244 MB .rsrc: 0.285232 MB .reloc: 1.379364 MB If an attribute is used and it affects unnamed_addr of a symbol, it determines whether the symbols should show up in the .addrsig table. All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table, if they belong to code sections. So, it won't have an effect on disabling ICF. On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com> wrote:> > On 2021-03-22, David Blaikie via llvm-dev wrote: > >ICF: Identical Code Folding > > > >Linker deduplicates functions by collapsing any identical functions > >together - with icf=safe, the linker looks at a .addressing section in the > >object file and any functions listed in that section are not treated as > >collapsible (eg: because they need to meet C++'s "distinct functions have > >distinct addresses" guarantee) > > The name originated from MSVC link.exe where icf stands for "identical > COMDAT folding". > gold named it "identical code folding" - which makes some sense because > gold does not fold readonly data. > > In LLD, the name is not accurate for two reasons: (1) the feature can > apply to readonly data as well; (2) the folding is by section, not by > function. > > We define identical sections as they have identical content and their > outgoing relocation sets cannot be distinguished: they need to have the > same number of relocations, with the same relative locations, with the > referenced symbols indistinguishable. > > Then, ld.lld --icf={safe,all} works like this: > > For a set of identical sections, the linker picks one representative and > drops the rest, then redirects references to the representative. > > Note: this can confuse debuggers/symbolizers/profilers easily. > > lld-link /opt:icf is different from ld.lld --icf but I haven't looked > into it closely. > > > I find that the feature's saving is small given its downside > (also increaded link time: the current LLD's implementation is inferior: > it performs a quadratic number of comparisons among an equality class): > > This is the size differences for the 'lld' executable: > > % size lld.{none,safe,all} > text data bss dec hex filename > 96821040 7210504 550810 104582354 63bccd2 lld.none > 95217624 7167656 550810 102936090 622ae1a lld.safe > 94038808 7167144 550810 101756762 610af5a lld.all > % size gold.{none,safe,all} > text data bss dec hex filename > 96857302 7174792 550825 104582919 63bcf07 gold.none > 94469390 7174792 550825 102195007 6175f3f gold.safe > 94184430 7174792 550825 101910047 613061f gold.all > > Note that the --icf=all result caps the potential saving of the proposed > annotation. > > Actually with some large internal targets I get even smaller savings. > > > ld.lld --icf=safe is safer than gold --icf=safe but probably misses some > opportunities. > It can be that clang codegen/optimizer fail to mark some cases as > {,local_}unnamed_addr. > > I know Chromium and the Windows world can be different:) But I'd still > want to > get some numbers first. > > > Last, I have seen that Chromium has some code like > > https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp > > void sk_abort_no_print() { > // Linker's ICF feature may merge this function with other > functions with > // the same definition (e.g. any function whose sole job is to call > abort()) > // and it may confuse the crash report processing system. > // http://crbug.com/860850 > static int static_variable_to_make_this_function_unique = 0x736b; > // "sk" > base::debug::Alias(&static_variable_to_make_this_function_unique); > > abort(); > } > > If we want an approach to work with link.exe, I don't know what we can > do... > If no desire for link.exe compatibility, I can see that having a proper > way marking the function > can be useful... but in any case if an attribute is used, it probably > should affect > unnamed_addr directly instead of being called *icf*. > > > > >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev < > >llvm-dev at lists.llvm.org> wrote: > > > >> Can you define ICF please? And give a bit of context? > >> > >> Philip > >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote: > >> > >> Hi all, > >> > >> Background: > >> It's been a longstanding difficulty of debugging with ICF. Programmers > >> don't have control over which sections should be folded by ICF, which > >> sections shouldn't. The existing address significant table won't have > >> effect for code sections during all ICF mode in both ld.lld and > lld-link. > >> By switching to safe ICF could mark code sections as unique, but at a > cost > >> of increasing binary size out of control. So, it would be good if > >> programmers could selectively disable ICF in source code by annotating > >> global functions/variables with an attribute to improve debugging > >> experience and have the control on the binary size increase. > >> > >> My plan is to add a new section table(`.no_icf`) to object files. > Sections > >> of all symbols inside the table should not be folded by all ICF mode. > And > >> symbols can only be added into the table by annotating global > >> functions/variables with a new attribute(`no_icf`) in source code. > >> > >> What do you think about this approach? > >> > >> Thanks, > >> Zequan > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps:// > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > > >_______________________________________________ > >LLVM Developers mailing list > >llvm-dev at lists.llvm.org > >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210323/0b60b89f/attachment.html>