Gaël Jobin via llvm-dev
2020-Sep-16 13:14 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
On 2020-09-16 00:18, Fangrui Song wrote:> Usually it is because nobody has noticed the problem or nobody is > motivated enough to fix the problems, not that they intentionally leave > a problem open:) I took some time to look at the problem and conclude > that clang should do nothing on this. Actually, with the clang behavior, > you can discard "Unused" if you use LLD. Read on.Sorry if I misspoke, I was not suggesting that the bug was known and voluntary not fixed by laziness ;-). I am sure there is a valid reason and wanted to know about it. Just like you explained, it appears that LLVM rely on LLD to do that instead of enforcing it in the middle-end which is a different approach to GCC.> In GCC, -O turns on -fmerge-constants. Clang does not implement this > option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++ > require each variable, including multiple instances of the same variable > in recursive calls, to have distinct locations, so using this option > results in non-conforming behavior.").Non-confirming in the sense of C/C++ standard? How is it related to the -fdata-sections implementation?> With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16 > This is, however, suboptimal because the cost of a section header > (sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. > I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and > created a GNU ld feature request > (https://sourceware.org/bugzilla/show_bug.cgi?id=26622)In my example, LLVM/Clang already put both pointer "test" and "unused" in different data section because of "-fdata-sections" as seen below.> ; Segment unnamed segment > ; Range: [0x5c; 0x64[ (8 bytes) > ; File offset : [144; 152[ (8 bytes) > ; Permissions: - > > ; Section .data.test > ; Range: [0x5c; 0x60[ (4 bytes) > ; File offset : [144; 148[ (4 bytes) > ; Flags: 0x3 > ; SHT_PROGBITS > ; SHF_WRITE > ; SHF_ALLOC > > test: > > 0000005c dd 0x00000063 > > ; Section .data.unused > ; Range: [0x60; 0x64[ (4 bytes) > ; File offset : [148; 153[ (4 bytes) > ; Flags: 0x3 > ; SHT_PROGBITS > ; SHF_WRITE > ; SHF_ALLOC > > unused: > > 00000060 dw 0x00000070So I am not sure to understand the point about sub-optimality here since it is already the case for the .data section where each variable imply a suboptimal cost in term of section header. How the c-string like datas are different ? I mean, the concept of -fdata-section/-ffunction-section ("one section for each data/functions") should be the same for every kind of data, no? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/62a00cd9/attachment-0001.html>
Fangrui Song via llvm-dev
2020-Sep-16 17:42 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
On 2020-09-16, Gaël Jobin wrote:>On 2020-09-16 00:18, Fangrui Song wrote: > >>Usually it is because nobody has noticed the problem or nobody is >>motivated enough to fix the problems, not that they intentionally leave >>a problem open:) I took some time to look at the problem and conclude >>that clang should do nothing on this. Actually, with the clang behavior, >>you can discard "Unused" if you use LLD. Read on. > >Sorry if I misspoke, I was not suggesting that the bug was known and >voluntary not fixed by laziness ;-). I am sure there is a valid reason >and wanted to know about it. Just like you explained, it appears that >LLVM rely on LLD to do that instead of enforcing it in the middle-end >which is a different approach to GCC. > >>In GCC, -O turns on -fmerge-constants. Clang does not implement this >>option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++ >>require each variable, including multiple instances of the same variable >>in recursive calls, to have distinct locations, so using this option >>results in non-conforming behavior."). > >Non-confirming in the sense of C/C++ standard? How is it related to the >-fdata-sections implementation? > >>With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1 >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16 >>This is, however, suboptimal because the cost of a section header >>(sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. >>I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and >>created a GNU ld feature request >>(https://sourceware.org/bugzilla/show_bug.cgi?id=26622) > >In my example, LLVM/Clang already put both pointer "test" and "unused" >in different data section because of "-fdata-sections" as seen below.Your example uses global mutable variables "test" and "unused" and that is why they are in the .data.* sections. They are initialized to addresses of string literals in .rodata.* . .rodata.* are what we care about, not .data.* (.data.* can always be correctly garbage collected by GNU ld/gold/LLD).>>; Segment unnamed segment >>; Range: [0x5c; 0x64[ (8 bytes) >>; File offset : [144; 152[ (8 bytes) >>; Permissions: - >> >>; Section .data.test >>; Range: [0x5c; 0x60[ (4 bytes) >>; File offset : [144; 148[ (4 bytes) >>; Flags: 0x3 >>; SHT_PROGBITS >>; SHF_WRITE >>; SHF_ALLOC >> >>test: >> >>0000005c dd 0x00000063 >> >>; Section .data.unused >>; Range: [0x60; 0x64[ (4 bytes) >>; File offset : [148; 153[ (4 bytes) >>; Flags: 0x3 >>; SHT_PROGBITS >>; SHF_WRITE >>; SHF_ALLOC >> >>unused: >> >>00000060 dw 0x00000070 > >So I am not sure to understand the point about sub-optimality here since >it is already the case for the .data section where each variable imply a >suboptimal cost in term of section header. How the c-string like datas >are different ? I mean, the concept of -fdata-section/-ffunction-section >("one section for each data/functions") should be the same for every >kind of data, no?
Gaël Jobin via llvm-dev
2020-Sep-17 08:38 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
On 2020-09-16 19:42, Fangrui Song wrote:> On 2020-09-16, Gaël Jobin wrote: > > Your example uses global mutable variables "test" and "unused" and that > is why they are in the .data.* sections. They are initialized to > addresses of string literals in .rodata.* . .rodata.* are what we care > about, not .data.* (.data.* can always be correctly garbage collected by > GNU ld/gold/LLD).Of course, the issue here is .rodata.*. I use the .data.* section as a counterexample but it could be any section. I compare those two sections because they contain both small datas and the ratio "section header size" vs "data size" is not optimal. But my point is: Why the implementation of -fdata-sections should differ between .data.* and .rodata.* sections? Or why .rodata.* should be treated differently? If the only reason is because it is suboptimal due to the additional section header, this is definitely not a valid reason. Having everything in its own section is the purpose of the -f*-sections and allows the linker to easily strip them. I really don't get the exception made for .rodata.* here.>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200917/202a8038/attachment-0001.html>