Gaël Jobin via llvm-dev
2020-Sep-15 20:32 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
Hi there, When I compile my code with -fdata-sections and -ffunction-sections, I still see some unused string in my shared library (Android). Actually, the strings appear together inside a .rodata.str1.1 section instead of getting their own section. It seems that the C-string literal are considered differently than other constant and the -fdata-sections is not respected in https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799. [1] I came across the following GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 where they have fixed the issue back in 2015. Any reason not to do so in LLVM? My code example: - static library 1 : expose functions api1() and api3()> #include "lib1.h" > > static char *test = "Test"; > static char *unused = "Unused"; > > void api1(){ > printf(test); > } > > void api3(){ > printf(unused); > }- shared library : use only function api1() from static library 1> #include "lib1.h" > > void test(){ > api1(); > }Both compiled with "-fdata-sections -ffunction-sections -fvisibility=hidden" and linked with "--gc-sections". While the api3() function is correctly gone, the result for the C-string is the following (in Hopper):> ; Section .rodata.str1.1 > > ; Range: [0x63; 0x6f[ (12 bytes) > > ; File offset : [151; 163[ (12 bytes) > > ; Flags: 0x32 > > ; SHT_PROGBITS > > ; SHF_ALLOC > > .L.str: > > 00000063 db "Test", 0 > > .L.str.1: > > 00000068 db "Unused", 0Links: ------ [1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/f44bd730/attachment.html>
Fangrui Song via llvm-dev
2020-Sep-15 22:18 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
On 2020-09-15, Gaël Jobin via llvm-dev wrote:>Hi there, > >When I compile my code with -fdata-sections and -ffunction-sections, I >still see some unused string in my shared library (Android). Actually, >the strings appear together inside a .rodata.str1.1 section instead of >getting their own section. It seems that the C-string literal are >considered differently than other constant and the -fdata-sections is >not respected in >https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799. >[1] I came across the following GCC bug >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 where they have fixed >the issue back in 2015. Any reason not to do so in LLVM?Usually it is because nobody has noticed the problem or nobody is motivated enough to fix the problems, not that they intentionally leave a problem open:) I took some time to look at the problem and conclude that clang should do nothing on this. Actually, with the clang behavior, you can discard "Unused" if you use LLD. Read on.>My code example: >- static library 1 : expose functions api1() and api3() > >>#include "lib1.h" >> >>static char *test = "Test"; >>static char *unused = "Unused"; >> >>void api1(){ >>printf(test); >>} >> >>void api3(){ >>printf(unused); >>} > >- shared library : use only function api1() from static library 1 > >>#include "lib1.h" >> >>void test(){ >>api1(); >>} > >Both compiled with "-fdata-sections -ffunction-sections >-fvisibility=hidden" and linked with "--gc-sections". > >While the api3() function is correctly gone, the result for the C-string >is the following (in Hopper): > >>; Section .rodata.str1.1 >> >>; Range: [0x63; 0x6f[ (12 bytes) >> >>; File offset : [151; 163[ (12 bytes) >> >>; Flags: 0x32 >> >>; SHT_PROGBITS >> >>; SHF_ALLOC >> >>.L.str: >> >>00000063 db "Test", 0 >> >>.L.str.1: >> >>00000068 db "Unused", 0 > > > >Links: >------ >[1] >https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799In GCC, -O turns on -fmerge-constants. Clang does not implement this option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior."). With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16 This is, however, suboptimal because the cost of a section header (sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and created a GNU ld feature request (https://sourceware.org/bugzilla/show_bug.cgi?id=26622)
Gaël Jobin via llvm-dev
2020-Sep-16 13:14 UTC
[llvm-dev] [ELF] String literals don't obey -fdata-sections
On 2020-09-16 00:18, Fangrui Song wrote:> Usually it is because nobody has noticed the problem or nobody is > motivated enough to fix the problems, not that they intentionally leave > a problem open:) I took some time to look at the problem and conclude > that clang should do nothing on this. Actually, with the clang behavior, > you can discard "Unused" if you use LLD. Read on.Sorry if I misspoke, I was not suggesting that the bug was known and voluntary not fixed by laziness ;-). I am sure there is a valid reason and wanted to know about it. Just like you explained, it appears that LLVM rely on LLD to do that instead of enforcing it in the middle-end which is a different approach to GCC.> In GCC, -O turns on -fmerge-constants. Clang does not implement this > option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++ > require each variable, including multiple instances of the same variable > in recursive calls, to have distinct locations, so using this option > results in non-conforming behavior.").Non-confirming in the sense of C/C++ standard? How is it related to the -fdata-sections implementation?> With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16 > This is, however, suboptimal because the cost of a section header > (sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. > I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and > created a GNU ld feature request > (https://sourceware.org/bugzilla/show_bug.cgi?id=26622)In my example, LLVM/Clang already put both pointer "test" and "unused" in different data section because of "-fdata-sections" as seen below.> ; Segment unnamed segment > ; Range: [0x5c; 0x64[ (8 bytes) > ; File offset : [144; 152[ (8 bytes) > ; Permissions: - > > ; Section .data.test > ; Range: [0x5c; 0x60[ (4 bytes) > ; File offset : [144; 148[ (4 bytes) > ; Flags: 0x3 > ; SHT_PROGBITS > ; SHF_WRITE > ; SHF_ALLOC > > test: > > 0000005c dd 0x00000063 > > ; Section .data.unused > ; Range: [0x60; 0x64[ (4 bytes) > ; File offset : [148; 153[ (4 bytes) > ; Flags: 0x3 > ; SHT_PROGBITS > ; SHF_WRITE > ; SHF_ALLOC > > unused: > > 00000060 dw 0x00000070So I am not sure to understand the point about sub-optimality here since it is already the case for the .data section where each variable imply a suboptimal cost in term of section header. How the c-string like datas are different ? I mean, the concept of -fdata-section/-ffunction-section ("one section for each data/functions") should be the same for every kind of data, no? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/62a00cd9/attachment-0001.html>