Sriraman Tallam via llvm-dev
2017-May-08 20:55 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
+llvm-dev Discussion here: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html On Tue, May 2, 2017 at 10:17 AM, Suprateeka R Hegde <hegdesmailbox at gmail.com> wrote:> On 02-May-2017 12:05 AM, Florian Weimer wrote: >> On 05/01/2017 08:28 PM, Suprateeka R Hegde wrote: >>> So the ratio shows ~96% is RELATIVE reloc. And only ~4% others. This is >>> not the case on HP-UX/Itanium. But as I said, this comparison does not >>> make sense as the runtime architecture and ISA are totally different. >> >> It could be that HP-UX was written in a way to reduce relative >> relocations, > > Rather, the Itanium runtime architecture itself provides a way to reduce > them. > >> or that the final executables aren't actually PIC anymore. > > I was referring to shlibs (PIC) on HP-UX and it was implicit in my mind. > Sorry for that. > > I just built a large C++ shlib both on HP-UX/Itanium with our aCC > compiler and Linux x86-64 using GCC-6.2. The sources are almost same > with only a couple of lines differing between platforms. > > (HP-UX/Linux) > Total: 12224/38311 > RELATIVE: 18/6397 > > I will try to check the reason for such a huge difference in RELATIVE > reloc count. It might be useful for this discussion (just a guess) > > -- > Supra
Rahul Chaudhry via llvm-dev
2017-Dec-07 22:51 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
Sri and I have been working on this over the past few months, and we've made some good progress that we'd like to share and get feedback on. Our work is based on the 'experimental-relr' prototype from Cary that is available at 'users/ccoutant/experimental-relr' branch in the binutils repository [1], and was described earlier in this thread: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00003.html We've taken the '.relr.dyn' section from Cary's prototype, and implemented a custom encoding to compactly represent the list of offsets. We're calling the new compressed section '.relrz.dyn' (for relocations-relative-compressed). The encoding used is a simple combination of delta-encoding and a bitmap of offsets. The section consists of 64-bit entries: higher 8-bits contain delta since last offset, and lower 56-bits contain a bitmap for which words to apply the relocation to. This is best described by showing the code for decoding the section: typedef struct { Elf64_Xword r_data; /* jump and bitmap for relative relocations */ } Elf64_Relrz; #define ELF64_R_JUMP(val) ((val) >> 56) #define ELF64_R_BITS(val) ((val) & 0xffffffffffffff) #ifdef DO_RELRZ { ElfW(Addr) offset = 0; for (; relative < end; ++relative) { ElfW(Addr) jump = ELFW(R_JUMP) (relative->r_data); ElfW(Addr) bits = ELFW(R_BITS) (relative->r_data); offset += jump * sizeof(ElfW(Addr)); if (jump == 0) { ++relative; offset = relative->r_data; } ElfW(Addr) r_offset = offset; for (; bits != 0; bits >>= 1) { if ((bits&1) != 0) elf_machine_relrz_relative (l_addr, (void *) (l_addr + r_offset)); r_offset += sizeof(ElfW(Addr)); } } } #endif Note that the 8-bit 'jump' encodes the number of _words_ since last offset. The case where jump would not fit in 8-bits is handled by setting jump to 0, and emitting the full offset for the next relocation in the subsequent entry. The above code is the entirety of the implementation for decoding and processing '.relrz.dyn' sections in glibc dynamic loader. This encoding can represent up to 56 relocation offsets in a single 64-bit word. For many of the binaries we tested, this encoding provides >40x compression for storing offsets over the original `.relr.dyn` section. For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and 24-bits for the bitmap. Here are three real world examples that demonstrate the savings: 1) Chrome browser (x86_64, built as PIE): File size (stripped): 152265064 bytes (145.21MB) 605159 relocation entries (24 bytes each) in '.rela.dyn' 594542 are R_X86_64_RELATIVE relocations (98.25%) 14269008 bytes (13.61MB) in use in '.rela.dyn' section 109256 bytes (0.10MB) if moved to '.relrz.dyn' section Savings: 14159752 bytes, or 9.29% of original file size. 2) Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http') File size (stripped): 10238168 bytes (9.76MB) 83810 relocation entries (24 bytes each) in '.rela.dyn' 83804 are R_X86_64_RELATIVE relocations (99.99%) 2011296 bytes (1.92MB) in use in .rela.dyn section 43744 bytes (0.04MB) if moved to .relrz.dyn section Savings: 1967552 bytes, or 19.21% of original file size. 3) Vim binary in /usr/bin on my workstation (Ubuntu, x86_64) File size (stripped): 3030032 bytes (2.89MB) 6680 relocation entries (24 bytes each) in '.rela.dyn' 6272 are R_X86_64_RELATIVE relocations (93.89%) 150528 bytes (0.14MB) in use in .rela.dyn section 1992 bytes (0.00MB) if moved to .relrz.dyn section Savings: 148536 bytes, or 4.90% of original file size. Recent releases of Debian, Ubuntu, and several other distributions build executables as PIE by default. Suprateeka posted some statistics earlier in this thread on the prevalence of relative relocations in executables residing in /usr/bin: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00013.html The third example above shows that using '.relrz.dyn' sections to encode relative relocations can bring decent savings to executable sizes in /usr/bin across many distributions. We have working ld.gold and ld.so implementations for arm, aarch64, and x86_64, and would be happy to send patches to the binutils and glibc communities for review. However, before that can happen, we need agreement on the ABI side for the new section type and the encoding. We haven't worked on a change of this magnitude before that touches so many different pieces from the linker, elf tools, and the dynamic loader. Specifically, we need agreement and/or guidance on where and how should the new section type and its encoding be documented. We're proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ, DT_RELRZENT, and DT_RELRZCOUNT that all the different parts of the toolchains can agree on. Thanks, Rahul [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/ccoutant/experimental-relr On Mon, May 8, 2017 at 1:55 PM, Sriraman Tallam <tmsriram at google.com> wrote:> +llvm-dev > > Discussion here: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html > > On Tue, May 2, 2017 at 10:17 AM, Suprateeka R Hegde > <hegdesmailbox at gmail.com> wrote: >> On 02-May-2017 12:05 AM, Florian Weimer wrote: >>> On 05/01/2017 08:28 PM, Suprateeka R Hegde wrote: >>>> So the ratio shows ~96% is RELATIVE reloc. And only ~4% others. This is >>>> not the case on HP-UX/Itanium. But as I said, this comparison does not >>>> make sense as the runtime architecture and ISA are totally different. >>> >>> It could be that HP-UX was written in a way to reduce relative >>> relocations, >> >> Rather, the Itanium runtime architecture itself provides a way to reduce >> them. >> >>> or that the final executables aren't actually PIC anymore. >> >> I was referring to shlibs (PIC) on HP-UX and it was implicit in my mind. >> Sorry for that. >> >> I just built a large C++ shlib both on HP-UX/Itanium with our aCC >> compiler and Linux x86-64 using GCC-6.2. The sources are almost same >> with only a couple of lines differing between platforms. >> >> (HP-UX/Linux) >> Total: 12224/38311 >> RELATIVE: 18/6397 >> >> I will try to check the reason for such a huge difference in RELATIVE >> reloc count. It might be useful for this discussion (just a guess) >> >> -- >> Supra
Ian Lance Taylor via llvm-dev
2017-Dec-07 23:37 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
On Thu, Dec 7, 2017 at 2:51 PM, Rahul Chaudhry <rahulchaudhry at google.com> wrote:> > However, before that can happen, we need agreement on the ABI side for the new > section type and the encoding. We haven't worked on a change of this magnitude > before that touches so many different pieces from the linker, elf tools, and > the dynamic loader. Specifically, we need agreement and/or guidance on where > and how should the new section type and its encoding be documented. We're > proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ, DT_RELRZENT, > and DT_RELRZCOUNT that all the different parts of the toolchains can agree on.Sounds like good work. The place to hold a discussion on ELF ABI issues is generic-abi at googlegroups.com and gnu-gabi at sourceware.org. Inasmuch as there is an official ELF ABI any more now that SCO has gone under, it is maintained on the generic-abi list. Ian
Cary Coutant via llvm-dev
2017-Dec-09 06:36 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
> We've taken the '.relr.dyn' section from Cary's prototype, and implemented a > custom encoding to compactly represent the list of offsets. We're calling the > new compressed section '.relrz.dyn' (for relocations-relative-compressed).I'd suggest just using .relr.dyn -- your encoding is straightforward enough that I'd just make that the standard representation for this section type.> The encoding used is a simple combination of delta-encoding and a bitmap of > offsets. The section consists of 64-bit entries: higher 8-bits contain delta > since last offset, and lower 56-bits contain a bitmap for which words to apply > the relocation to. This is best described by showing the code for decoding the > section: > > ... > > The above code is the entirety of the implementation for decoding and > processing '.relrz.dyn' sections in glibc dynamic loader. > > This encoding can represent up to 56 relocation offsets in a single 64-bit > word. For many of the binaries we tested, this encoding provides >40x > compression for storing offsets over the original `.relr.dyn` section. > > For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and 24-bits for > the bitmap.Very nice! Simple and effective.> Here are three real world examples that demonstrate the savings:Impressive numbers. I've gotta admit, the savings are better than I expected.> However, before that can happen, we need agreement on the ABI side for the new > section type and the encoding. We haven't worked on a change of this magnitude > before that touches so many different pieces from the linker, elf tools, and > the dynamic loader. Specifically, we need agreement and/or guidance on where > and how should the new section type and its encoding be documented. We're > proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ, DT_RELRZENT, > and DT_RELRZCOUNT that all the different parts of the toolchains can agree on.Yes, as Ian mentioned, the generic ABI discussion is at generic-abi at googlegroups.com. Most people who would be interested are already on the gnu-gabi at sourceware.org list, but there are a few who are not, and who may not yet have seen this discussion. I'll support the proposal. Thanks for taking this idea the extra mile! -cary
Florian Weimer via llvm-dev
2017-Dec-09 23:06 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
* Rahul Chaudhry via gnu-gabi:> The encoding used is a simple combination of delta-encoding and a > bitmap of offsets. The section consists of 64-bit entries: higher > 8-bits contain delta since last offset, and lower 56-bits contain a > bitmap for which words to apply the relocation to. This is best > described by showing the code for decoding the section: > > typedef struct > { > Elf64_Xword r_data; /* jump and bitmap for relative relocations */ > } Elf64_Relrz; > > #define ELF64_R_JUMP(val) ((val) >> 56) > #define ELF64_R_BITS(val) ((val) & 0xffffffffffffff) > > #ifdef DO_RELRZ > { > ElfW(Addr) offset = 0; > for (; relative < end; ++relative) > { > ElfW(Addr) jump = ELFW(R_JUMP) (relative->r_data); > ElfW(Addr) bits = ELFW(R_BITS) (relative->r_data); > offset += jump * sizeof(ElfW(Addr)); > if (jump == 0) > { > ++relative; > offset = relative->r_data; > } > ElfW(Addr) r_offset = offset; > for (; bits != 0; bits >>= 1) > { > if ((bits&1) != 0) > elf_machine_relrz_relative (l_addr, (void *) (l_addr + r_offset)); > r_offset += sizeof(ElfW(Addr)); > } > } > } > #endifThat data-dependent “if ((bits&1) != 0)” branch looks a bit nasty. Have you investigated whether some sort of RLE-style encoding would be beneficial? If there are blocks of relative relocations, it might even be possible to use vector instructions to process them (although more than four relocations at a time are probably not achievable in a power-efficient manner on current x86-64).
Maybe Matching Threads
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section