Rahul Chaudhry via llvm-dev
2017-Dec-15 20:23 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
On Thu, Dec 14, 2017 at 12:11 AM, Cary Coutant <ccoutant at gmail.com> wrote:>> While adding a 'stride' field is definitely an improvement over simple >> delta+count encoding, it doesn't compare well against the bitmap based >> encoding. >> >> I took a look inside the encoding for the Vim binary. There are some instances >> in the bitmap based encoding like >> [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...] >> that encode sequences of relocations applying to alternate words. The stride >> based encoding works very well on these and turns it into much more compact >> [0x0ff010ff 0x0ff010ff 0x0ff010ff ...] >> using stride==0x10 and count==0xff. > > Have you looked much at where the RELATIVE relocations are coming from? > > I've looked at a PIE build of gold, and they're almost all for > vtables, which mostly have consecutive entries with 8-byte strides. > There are a few for the GOT, a few for static constructors (in > .init_array), and a few for other initialized data, but vtables seem > to account for the vast majority. (Gold has almost 19,000 RELATIVE > dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.) > > Where do the 16-byte strides come from? Vim is plain C, right? I'm > guessing its RELATIVE relocation count is fairly low compared to big > C++ apps. I'm also guessing that the pattern comes from some large > structure or structures in the source code where initialized pointers > alternate with non-pointer values. I'm also curious about Roland's > app.I took a look inside vim for the source of the ..5555.. pattern (relative relocations applying to alternate words). One of the sources is the "builtin_termcaps" symbol, which is an array of "struct builtin_term": struct builtin_term { int bt_entry; char *bt_string; }; So the pattern makes sense. An encoding using strides will work really well here with stride == 0x10. There is another repeating pattern I noticed in vim ..9999... One of the sources behind this pattern is the "cmdnames" symbol, which is an array of "struct cmdname": struct cmdname { char_u *cmd_name; /* name of the command */ ex_func_T cmd_func; /* function for this command */ long_u cmd_argt; /* flags declared above */ int cmd_addr_type; /* flag for address type */ }; In this struct, the first two fields are pointers, and the next two are scalars. This explains the ..9999.. pattern for relative relocations. This is an example where a stride based encoding does not work well, simply because there is no single stride. The deltas are 8,24,8,24,8,24,... I think these two examples demonstrate the main weakness of using a simple stride based encoding: it is too sensitive to how the data structures are laid out in the program source. Rahul
Roland McGrath via llvm-dev
2018-Jan-07 07:43 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
The generic-abi thread has gone into broader subjects of the benefits and desireability of the work. I'm willing to take it as given that the encoded size of pure-relative address relocs (i.e. R_*_RELATIVE equivalents)--ultimately the RODATA segment size of a given ET_DYN file--as sole metric is a worthy goal and the ballpark savings ratios we're seeing are worth committing to a new ABI. But I am circumspect about choosing an encoding we will be supporting for decades to come. Whatever we do now will surely be good enough that nobody will want to innovate again for many years just to get to a little or a fair bit better. It behooves us to be deliberate in getting it as good as we reasonably can get it now for the broad range of ET_DYN files that will appear in years to come. I tend to share the intuitions people have expressed about what kinds of patterns of offsets are likely. I also have a contrary intuition that there are large codebases with lots of formulaic or generated code and data tables that may well have truly enormous numbers of such relocs that fit highly regular patterns just slightly different from the ones we're considering most likely. Moreover I don't think there is any excuse for relying on our intuition when there are vast quantities of actual data pretty readily available. I don't mean picking a few "important" real-world binaries and using their real data. Examples like Chrome and Firefox have already been tuned by sophisticated developers to minimize relocation overhead and may well not be representative of other programs of similar size and complexity. I mean collecting data from a large and varied corpus of ET_DYN files across machines, operating systems, and codebases. A pretty simple analysis tool can extract from any ET_DYN file the essential statistics (byte sizes of segments and relocs) and the list of R_*_RElATIVE reloc r_offset values. (I started writing one in Python and I can finish it if people want to use it.) It's easy enough to feed that with lots of ET_DYN files available in public collections such as Linux distros. The tool is simple to run and the data extracted not really revealing (beyond rough binary size), so it can be applied to lots of proprietary sets of ET_DYN files and the data contributed to the public corpus, from places like Google's internal binaries, Android system binaries, ET_DYN files in APKs in app stores, etc. across many companies. Given this corpus of "reloc traces" you can code up many competing encoding formats and do serious measurements of their space savings across the entire corpus from simple simulations without having to implement each encoding in an actual toolchain and dynamic linker to do the analysis. This is some work, but I don't think it's a huge undertaking in comparison to the actual full deployment roll-out of a new ELF dynamic linking ABI feature and its impact, which always wind up being much more than just the actual new code in toolchain and runtime implementations. I think the work all over that will ripple out from the deployment, and the many-year commitment to the new format that will inevitably be incurred, merit a rigorous and public data-driven approach. -- Thanks, Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180107/ef23ca2b/attachment-0001.html>
Florian Weimer via llvm-dev
2018-Jan-07 10:31 UTC
[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
* Roland McGrath:> Given this corpus of "reloc traces" you can code up many competing encoding > formats and do serious measurements of their space savings across the > entire corpus from simple simulations without having to implement each > encoding in an actual toolchain and dynamic linker to do the analysis.On the other hand, the linker currently assumes that the order in which relative relocations are emitted does not matter. With a different encoding scheme, future linkers might reorder the relocations or data objects themselves, either to reduce relocation encoding size, or to make relocations more efficient (perhaps to support vectorization). Unfortunately, the numbers which can be derived from ET_DYN files will not reflect to what extent such reordering is possible.
Apparently Analagous Threads
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section