thr3ads.net - llvm dev - [llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Rahul Chaudhry via llvm-dev

2017-Dec-13 00:53 UTC

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

On Mon, Dec 11, 2017 at 6:14 PM, Roland McGrath <roland at hack.frob.com>
wrote:>
> On Mon, Dec 11, 2017 at 3:50 PM Rahul Chaudhry via gnu-gabi <gnu-gabi at
sourceware.org> wrote:
>>
>> A simple combination of delta-encoding and run_length-encoding is one
of the
>> first schemes we experimented with (32-bit entries with 24-bit
'delta' and an
>> 8-bit 'count'). This gave really good results, but as Sri
mentions, we observed
>> several cases where the relative relocations were not on consecutive
offsets.
>> There were common cases where the relocations applied to alternate
words, and
>> that totally wrecked the scheme (a bunch of entries with delta==16 and
>> count==1).
>
>
> For the same issue in a different context, I recently implemented a scheme
using run-length-encoding but using a variable stride.  So for a run of
alternate words, you still get a single entry, but with stride 16 instead of 8. 
In my application, most cases of strides > 8 are a run of only 2 or 3 but
there are a few cases of dozens or hundreds with a stride of 16.  My case is a
solution tailored to exactly one application (a kernel), so there is a closed
sample set that's all that matters and the trade-off between simplicity of
the analysis and compactness of the results is different than the general case
you're addressing (my "analysis" consists of a few lines of AWK). 
But I wonder if it might be worthwhile to study the effect a variable-stride RLE
scheme or adding the variable-stride ability into your hybrid scheme has on your
sample applications.
>
> Since we're talking about specifying a new ABI that will be serving us
for many years to come and will be hard to change once deployed, it seems worth
spending quite a bit of effort up front to come to the most compact scheme
that's feasible.
I agree. Can you share more details of the encoding scheme that you found
useful (size of each entry, number of bits used for stride/count etc.)?

I just ran some experiments with an encoding with 32-bit entries: 16-bits for
delta, 8-bits for stride, and 8-bits for count. Here are the numbers, inlined
with those from the previous schemes for comparison:

1. Chrome browser (x86_64, built as PIE):
   605159 relocation entries (24 bytes each) in '.rela.dyn'
   594542 are R_X86_64_RELATIVE relocations (98.25%)
       14269008 bytes (13.61MB) in use in '.rela.dyn' section
         385420 bytes (0.37MB) using delta+count encoding
         232540 bytes (0.22MB) using delta+stride+count encoding
         109256 bytes  (0.10MB) using jump+bitmap encoding

2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c
net/http')
   83810 relocation entries (24 bytes each) in '.rela.dyn'
   83804 are R_X86_64_RELATIVE relocations (99.99%)
       2011296 bytes (1.92MB) in use in .rela.dyn section
        204476 bytes (0.20MB) using delta+count encoding
        132568 bytes (0.13MB) using delta+stride+count encoding
         43744 bytes (0.04MB) using jump+bitmap encoding

3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
   6680 relocation entries (24 bytes each) in '.rela.dyn'
   6272 are R_X86_64_RELATIVE relocations (93.89%)
       150528 bytes (0.14MB) in use in .rela.dyn section
        14388 bytes (0.01MB) using delta+count encoding
         7000 bytes (0.01MB) using delta+stride+count encoding
         1992 bytes (0.00MB) using jump+bitmap encoding

delta+count encoding is using 32-bit entries:
  24-bit delta: number of bytes since last offset.
   8-bit count: number of relocations to apply (consecutive words).

delta+stride+count encoding is using 32-bit entries:
  16-bit delta: number of bytes since last offset.
   8-bit stride: stride (in bytes) for applying 'count' relocations.
   8-bit count: number of relocations to apply (using 'stride').

jump+bitmap encoding is using 64-bit entries:
   8-bit jump: number of words since last offset.
  56-bit bitmap: bitmap for which words to apply relocations to.

While adding a 'stride' field is definitely an improvement over simple
delta+count encoding, it doesn't compare well against the bitmap based
encoding.

I took a look inside the encoding for the Vim binary. There are some instances
in the bitmap based encoding like
  [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
that encode sequences of relocations applying to alternate words. The stride
based encoding works very well on these and turns it into much more compact
  [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
using stride==0x10 and count==0xff.

However, for the vast majority of cases, the stride based encoding ends up with
count <= 2, and that kills it in the end.

I could try something more complex with 16-bit entries, but that can only give
2x improvement at best, so it still won't be better than the bitmap
approach.

Thanks,
Rahul

> --
>
>
> Thanks,
> Roland

Cary Coutant via llvm-dev

2017-Dec-14 08:11 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

> While adding a 'stride' field is definitely an improvement over
simple
> delta+count encoding, it doesn't compare well against the bitmap based
> encoding.
>
> I took a look inside the encoding for the Vim binary. There are some
instances
> in the bitmap based encoding like
>   [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
> that encode sequences of relocations applying to alternate words. The
stride
> based encoding works very well on these and turns it into much more compact
>   [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
> using stride==0x10 and count==0xff.
Have you looked much at where the RELATIVE relocations are coming from?

I've looked at a PIE build of gold, and they're almost all for
vtables, which mostly have consecutive entries with 8-byte strides.
There are a few for the GOT, a few for static constructors (in
.init_array), and a few for other initialized data, but vtables seem
to account for the vast majority. (Gold has almost 19,000 RELATIVE
dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.)

Where do the 16-byte strides come from? Vim is plain C, right? I'm
guessing its RELATIVE relocation count is fairly low compared to big
C++ apps. I'm also guessing that the pattern comes from some large
structure or structures in the source code where initialized pointers
alternate with non-pointer values. I'm also curious about Roland's
app.

In my opinion, the data and my intuition both support your choice of a
jump + bit vector representation.

-cary

Rahul Chaudhry via llvm-dev

2017-Dec-15 20:23 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

On Thu, Dec 14, 2017 at 12:11 AM, Cary Coutant <ccoutant at gmail.com>
wrote:>> While adding a 'stride' field is definitely an improvement over
simple
>> delta+count encoding, it doesn't compare well against the bitmap
based
>> encoding.
>>
>> I took a look inside the encoding for the Vim binary. There are some
instances
>> in the bitmap based encoding like
>>   [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
>> that encode sequences of relocations applying to alternate words. The
stride
>> based encoding works very well on these and turns it into much more
compact
>>   [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
>> using stride==0x10 and count==0xff.
>
> Have you looked much at where the RELATIVE relocations are coming from?
>
> I've looked at a PIE build of gold, and they're almost all for
> vtables, which mostly have consecutive entries with 8-byte strides.
> There are a few for the GOT, a few for static constructors (in
> .init_array), and a few for other initialized data, but vtables seem
> to account for the vast majority. (Gold has almost 19,000 RELATIVE
> dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.)
>
> Where do the 16-byte strides come from? Vim is plain C, right? I'm
> guessing its RELATIVE relocation count is fairly low compared to big
> C++ apps. I'm also guessing that the pattern comes from some large
> structure or structures in the source code where initialized pointers
> alternate with non-pointer values. I'm also curious about Roland's
> app.
I took a look inside vim for the source of the ..5555.. pattern (relative
relocations applying to alternate words). One of the sources is the
"builtin_termcaps" symbol, which is an array of "struct
builtin_term":

  struct builtin_term
  {
    int   bt_entry;
    char  *bt_string;
  };

So the pattern makes sense. An encoding using strides will work really well
here with stride == 0x10.

There is another repeating pattern I noticed in vim ..9999... One of the
sources behind this pattern is the "cmdnames" symbol, which is an
array of
"struct cmdname":

  struct cmdname
  {
    char_u      *cmd_name;      /* name of the command */
    ex_func_T   cmd_func;       /* function for this command */
    long_u      cmd_argt;       /* flags declared above */
    int         cmd_addr_type;  /* flag for address type */
  };

In this struct, the first two fields are pointers, and the next two are
scalars. This explains the ..9999.. pattern for relative relocations. This is
an example where a stride based encoding does not work well, simply because
there is no single stride. The deltas are 8,24,8,24,8,24,...

I think these two examples demonstrate the main weakness of using a simple
stride based encoding: it is too sensitive to how the data structures are laid
out in the program source.

Rahul

llvm dev - Dec 2017 - Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section