Rahul Chaudhry via llvm-dev
2017-Dec-13 00:53 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
On Mon, Dec 11, 2017 at 6:14 PM, Roland McGrath <roland at hack.frob.com> wrote:> > On Mon, Dec 11, 2017 at 3:50 PM Rahul Chaudhry via gnu-gabi <gnu-gabi at sourceware.org> wrote: >> >> A simple combination of delta-encoding and run_length-encoding is one of the >> first schemes we experimented with (32-bit entries with 24-bit 'delta' and an >> 8-bit 'count'). This gave really good results, but as Sri mentions, we observed >> several cases where the relative relocations were not on consecutive offsets. >> There were common cases where the relocations applied to alternate words, and >> that totally wrecked the scheme (a bunch of entries with delta==16 and >> count==1). > > > For the same issue in a different context, I recently implemented a scheme using run-length-encoding but using a variable stride. So for a run of alternate words, you still get a single entry, but with stride 16 instead of 8. In my application, most cases of strides > 8 are a run of only 2 or 3 but there are a few cases of dozens or hundreds with a stride of 16. My case is a solution tailored to exactly one application (a kernel), so there is a closed sample set that's all that matters and the trade-off between simplicity of the analysis and compactness of the results is different than the general case you're addressing (my "analysis" consists of a few lines of AWK). But I wonder if it might be worthwhile to study the effect a variable-stride RLE scheme or adding the variable-stride ability into your hybrid scheme has on your sample applications. > > Since we're talking about specifying a new ABI that will be serving us for many years to come and will be hard to change once deployed, it seems worth spending quite a bit of effort up front to come to the most compact scheme that's feasible.I agree. Can you share more details of the encoding scheme that you found useful (size of each entry, number of bits used for stride/count etc.)? I just ran some experiments with an encoding with 32-bit entries: 16-bits for delta, 8-bits for stride, and 8-bits for count. Here are the numbers, inlined with those from the previous schemes for comparison: 1. Chrome browser (x86_64, built as PIE): 605159 relocation entries (24 bytes each) in '.rela.dyn' 594542 are R_X86_64_RELATIVE relocations (98.25%) 14269008 bytes (13.61MB) in use in '.rela.dyn' section 385420 bytes (0.37MB) using delta+count encoding 232540 bytes (0.22MB) using delta+stride+count encoding 109256 bytes (0.10MB) using jump+bitmap encoding 2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http') 83810 relocation entries (24 bytes each) in '.rela.dyn' 83804 are R_X86_64_RELATIVE relocations (99.99%) 2011296 bytes (1.92MB) in use in .rela.dyn section 204476 bytes (0.20MB) using delta+count encoding 132568 bytes (0.13MB) using delta+stride+count encoding 43744 bytes (0.04MB) using jump+bitmap encoding 3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64) 6680 relocation entries (24 bytes each) in '.rela.dyn' 6272 are R_X86_64_RELATIVE relocations (93.89%) 150528 bytes (0.14MB) in use in .rela.dyn section 14388 bytes (0.01MB) using delta+count encoding 7000 bytes (0.01MB) using delta+stride+count encoding 1992 bytes (0.00MB) using jump+bitmap encoding delta+count encoding is using 32-bit entries: 24-bit delta: number of bytes since last offset. 8-bit count: number of relocations to apply (consecutive words). delta+stride+count encoding is using 32-bit entries: 16-bit delta: number of bytes since last offset. 8-bit stride: stride (in bytes) for applying 'count' relocations. 8-bit count: number of relocations to apply (using 'stride'). jump+bitmap encoding is using 64-bit entries: 8-bit jump: number of words since last offset. 56-bit bitmap: bitmap for which words to apply relocations to. While adding a 'stride' field is definitely an improvement over simple delta+count encoding, it doesn't compare well against the bitmap based encoding. I took a look inside the encoding for the Vim binary. There are some instances in the bitmap based encoding like [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...] that encode sequences of relocations applying to alternate words. The stride based encoding works very well on these and turns it into much more compact [0x0ff010ff 0x0ff010ff 0x0ff010ff ...] using stride==0x10 and count==0xff. However, for the vast majority of cases, the stride based encoding ends up with count <= 2, and that kills it in the end. I could try something more complex with 16-bit entries, but that can only give 2x improvement at best, so it still won't be better than the bitmap approach. Thanks, Rahul> -- > > > Thanks, > Roland
Cary Coutant via llvm-dev
2017-Dec-14 08:11 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
> While adding a 'stride' field is definitely an improvement over simple > delta+count encoding, it doesn't compare well against the bitmap based > encoding. > > I took a look inside the encoding for the Vim binary. There are some instances > in the bitmap based encoding like > [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...] > that encode sequences of relocations applying to alternate words. The stride > based encoding works very well on these and turns it into much more compact > [0x0ff010ff 0x0ff010ff 0x0ff010ff ...] > using stride==0x10 and count==0xff.Have you looked much at where the RELATIVE relocations are coming from? I've looked at a PIE build of gold, and they're almost all for vtables, which mostly have consecutive entries with 8-byte strides. There are a few for the GOT, a few for static constructors (in .init_array), and a few for other initialized data, but vtables seem to account for the vast majority. (Gold has almost 19,000 RELATIVE dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.) Where do the 16-byte strides come from? Vim is plain C, right? I'm guessing its RELATIVE relocation count is fairly low compared to big C++ apps. I'm also guessing that the pattern comes from some large structure or structures in the source code where initialized pointers alternate with non-pointer values. I'm also curious about Roland's app. In my opinion, the data and my intuition both support your choice of a jump + bit vector representation. -cary
Rahul Chaudhry via llvm-dev
2017-Dec-15 20:23 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
On Thu, Dec 14, 2017 at 12:11 AM, Cary Coutant <ccoutant at gmail.com> wrote:>> While adding a 'stride' field is definitely an improvement over simple >> delta+count encoding, it doesn't compare well against the bitmap based >> encoding. >> >> I took a look inside the encoding for the Vim binary. There are some instances >> in the bitmap based encoding like >> [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...] >> that encode sequences of relocations applying to alternate words. The stride >> based encoding works very well on these and turns it into much more compact >> [0x0ff010ff 0x0ff010ff 0x0ff010ff ...] >> using stride==0x10 and count==0xff. > > Have you looked much at where the RELATIVE relocations are coming from? > > I've looked at a PIE build of gold, and they're almost all for > vtables, which mostly have consecutive entries with 8-byte strides. > There are a few for the GOT, a few for static constructors (in > .init_array), and a few for other initialized data, but vtables seem > to account for the vast majority. (Gold has almost 19,000 RELATIVE > dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.) > > Where do the 16-byte strides come from? Vim is plain C, right? I'm > guessing its RELATIVE relocation count is fairly low compared to big > C++ apps. I'm also guessing that the pattern comes from some large > structure or structures in the source code where initialized pointers > alternate with non-pointer values. I'm also curious about Roland's > app.I took a look inside vim for the source of the ..5555.. pattern (relative relocations applying to alternate words). One of the sources is the "builtin_termcaps" symbol, which is an array of "struct builtin_term": struct builtin_term { int bt_entry; char *bt_string; }; So the pattern makes sense. An encoding using strides will work really well here with stride == 0x10. There is another repeating pattern I noticed in vim ..9999... One of the sources behind this pattern is the "cmdnames" symbol, which is an array of "struct cmdname": struct cmdname { char_u *cmd_name; /* name of the command */ ex_func_T cmd_func; /* function for this command */ long_u cmd_argt; /* flags declared above */ int cmd_addr_type; /* flag for address type */ }; In this struct, the first two fields are pointers, and the next two are scalars. This explains the ..9999.. pattern for relative relocations. This is an example where a stride based encoding does not work well, simply because there is no single stride. The deltas are 8,24,8,24,8,24,... I think these two examples demonstrate the main weakness of using a simple stride based encoding: it is too sensitive to how the data structures are laid out in the program source. Rahul
Apparently Analagous Threads
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section