thr3ads.net - llvm dev - [llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Rahul Chaudhry via llvm-dev

2017-Dec-15 20:23 UTC

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

On Thu, Dec 14, 2017 at 12:11 AM, Cary Coutant <ccoutant at gmail.com>
wrote:>> While adding a 'stride' field is definitely an improvement over
simple
>> delta+count encoding, it doesn't compare well against the bitmap
based
>> encoding.
>>
>> I took a look inside the encoding for the Vim binary. There are some
instances
>> in the bitmap based encoding like
>>   [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
>> that encode sequences of relocations applying to alternate words. The
stride
>> based encoding works very well on these and turns it into much more
compact
>>   [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
>> using stride==0x10 and count==0xff.
>
> Have you looked much at where the RELATIVE relocations are coming from?
>
> I've looked at a PIE build of gold, and they're almost all for
> vtables, which mostly have consecutive entries with 8-byte strides.
> There are a few for the GOT, a few for static constructors (in
> .init_array), and a few for other initialized data, but vtables seem
> to account for the vast majority. (Gold has almost 19,000 RELATIVE
> dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.)
>
> Where do the 16-byte strides come from? Vim is plain C, right? I'm
> guessing its RELATIVE relocation count is fairly low compared to big
> C++ apps. I'm also guessing that the pattern comes from some large
> structure or structures in the source code where initialized pointers
> alternate with non-pointer values. I'm also curious about Roland's
> app.
I took a look inside vim for the source of the ..5555.. pattern (relative
relocations applying to alternate words). One of the sources is the
"builtin_termcaps" symbol, which is an array of "struct
builtin_term":

  struct builtin_term
  {
    int   bt_entry;
    char  *bt_string;
  };

So the pattern makes sense. An encoding using strides will work really well
here with stride == 0x10.

There is another repeating pattern I noticed in vim ..9999... One of the
sources behind this pattern is the "cmdnames" symbol, which is an
array of
"struct cmdname":

  struct cmdname
  {
    char_u      *cmd_name;      /* name of the command */
    ex_func_T   cmd_func;       /* function for this command */
    long_u      cmd_argt;       /* flags declared above */
    int         cmd_addr_type;  /* flag for address type */
  };

In this struct, the first two fields are pointers, and the next two are
scalars. This explains the ..9999.. pattern for relative relocations. This is
an example where a stride based encoding does not work well, simply because
there is no single stride. The deltas are 8,24,8,24,8,24,...

I think these two examples demonstrate the main weakness of using a simple
stride based encoding: it is too sensitive to how the data structures are laid
out in the program source.

Rahul

Roland McGrath via llvm-dev

2018-Jan-07 07:43 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

The generic-abi thread has gone into broader subjects of the benefits and
desireability of the work. I'm willing to take it as given that the
encoded size of pure-relative address relocs (i.e. R_*_RELATIVE
equivalents)--ultimately the RODATA segment size of a given ET_DYN file--as
sole metric is a worthy goal and the ballpark savings ratios we're seeing
are worth committing to a new ABI. But I am circumspect about choosing an
encoding we will be supporting for decades to come. Whatever we do now
will surely be good enough that nobody will want to innovate again for many
years just to get to a little or a fair bit better. It behooves us to be
deliberate in getting it as good as we reasonably can get it now for the
broad range of ET_DYN files that will appear in years to come.

I tend to share the intuitions people have expressed about what kinds of
patterns of offsets are likely. I also have a contrary intuition that
there are large codebases with lots of formulaic or generated code and data
tables that may well have truly enormous numbers of such relocs that fit
highly regular patterns just slightly different from the ones we're
considering most likely.

Moreover I don't think there is any excuse for relying on our intuition
when there are vast quantities of actual data pretty readily available. I
don't mean picking a few "important" real-world binaries and using
their
real data. Examples like Chrome and Firefox have already been tuned by
sophisticated developers to minimize relocation overhead and may well not
be representative of other programs of similar size and complexity. I mean
collecting data from a large and varied corpus of ET_DYN files across
machines, operating systems, and codebases.

A pretty simple analysis tool can extract from any ET_DYN file the
essential statistics (byte sizes of segments and relocs) and the list of
R_*_RElATIVE reloc r_offset values. (I started writing one in Python and I
can finish it if people want to use it.) It's easy enough to feed that
with lots of ET_DYN files available in public collections such as Linux
distros. The tool is simple to run and the data extracted not really
revealing (beyond rough binary size), so it can be applied to lots of
proprietary sets of ET_DYN files and the data contributed to the public
corpus, from places like Google's internal binaries, Android system
binaries, ET_DYN files in APKs in app stores, etc. across many companies.

Given this corpus of "reloc traces" you can code up many competing
encoding
formats and do serious measurements of their space savings across the
entire corpus from simple simulations without having to implement each
encoding in an actual toolchain and dynamic linker to do the analysis.
This is some work, but I don't think it's a huge undertaking in
comparison
to the actual full deployment roll-out of a new ELF dynamic linking ABI
feature and its impact, which always wind up being much more than just the
actual new code in toolchain and runtime implementations. I think the work
all over that will ripple out from the deployment, and the many-year
commitment to the new format that will inevitably be incurred, merit a
rigorous and public data-driven approach.

Thanks,
Roland
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180107/ef23ca2b/attachment-0001.html>

Florian Weimer via llvm-dev

2018-Jan-07 10:31 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

* Roland McGrath:
> Given this corpus of "reloc traces" you can code up many
competing encoding
> formats and do serious measurements of their space savings across the
> entire corpus from simple simulations without having to implement each
> encoding in an actual toolchain and dynamic linker to do the analysis.
On the other hand, the linker currently assumes that the order in
which relative relocations are emitted does not matter.  With a
different encoding scheme, future linkers might reorder the
relocations or data objects themselves, either to reduce relocation
encoding size, or to make relocations more efficient (perhaps to
support vectorization).  Unfortunately, the numbers which can be
derived from ET_DYN files will not reflect to what extent such
reordering is possible.

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Jan 2018 - Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

Seemingly Similar Threads