thr3ads.net - llvm dev - [llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section [May 2017]

If this information is useful, please help other people find it:
Share via:

Sriraman Tallam via llvm-dev

2017-May-08 20:55 UTC

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

+llvm-dev

Discussion here: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html

On Tue, May 2, 2017 at 10:17 AM, Suprateeka R Hegde
<hegdesmailbox at gmail.com> wrote:> On 02-May-2017 12:05 AM, Florian Weimer wrote:
>> On 05/01/2017 08:28 PM, Suprateeka R Hegde wrote:
>>> So the ratio shows ~96% is RELATIVE reloc. And only ~4% others.
This is
>>> not the case on HP-UX/Itanium. But as I said, this comparison does
not
>>> make sense as the runtime architecture and ISA are totally
different.
>>
>> It could be that HP-UX was written in a way to reduce relative
>> relocations,
>
> Rather, the Itanium runtime architecture itself provides a way to reduce
> them.
>
>> or that the final executables aren't actually PIC anymore.
>
> I was referring to shlibs (PIC) on HP-UX and it was implicit in my mind.
> Sorry for that.
>
> I just built a large C++ shlib both on HP-UX/Itanium with our aCC
> compiler and Linux x86-64 using GCC-6.2. The sources are almost same
> with only a couple of lines differing between platforms.
>
> (HP-UX/Linux)
> Total:    12224/38311
> RELATIVE: 18/6397
>
> I will try to check the reason for such a huge difference in RELATIVE
> reloc count. It might be useful for this discussion (just a guess)
>
> --
> Supra

Rahul Chaudhry via llvm-dev

2017-Dec-07 22:51 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

Sri and I have been working on this over the past few months, and we've made
some good progress that we'd like to share and get feedback on.

Our work is based on the 'experimental-relr' prototype from Cary that is
available at 'users/ccoutant/experimental-relr' branch in the binutils
repository [1], and was described earlier in this thread:
https://sourceware.org/ml/gnu-gabi/2017-q2/msg00003.html

We've taken the '.relr.dyn' section from Cary's prototype, and
implemented a
custom encoding to compactly represent the list of offsets. We're calling
the
new compressed section '.relrz.dyn' (for
relocations-relative-compressed).

The encoding used is a simple combination of delta-encoding and a bitmap of
offsets. The section consists of 64-bit entries: higher 8-bits contain delta
since last offset, and lower 56-bits contain a bitmap for which words to apply
the relocation to. This is best described by showing the code for decoding the
section:

typedef struct
{
  Elf64_Xword  r_data;  /* jump and bitmap for relative relocations */
} Elf64_Relrz;

#define ELF64_R_JUMP(val)    ((val) >> 56)
#define ELF64_R_BITS(val)    ((val) & 0xffffffffffffff)

#ifdef DO_RELRZ
  {
    ElfW(Addr) offset = 0;
    for (; relative < end; ++relative)
      {
        ElfW(Addr) jump = ELFW(R_JUMP) (relative->r_data);
        ElfW(Addr) bits = ELFW(R_BITS) (relative->r_data);
        offset += jump * sizeof(ElfW(Addr));
        if (jump == 0)
          {
            ++relative;
            offset = relative->r_data;
          }
        ElfW(Addr) r_offset = offset;
        for (; bits != 0; bits >>= 1)
          {
            if ((bits&1) != 0)
              elf_machine_relrz_relative (l_addr, (void *) (l_addr + r_offset));
            r_offset += sizeof(ElfW(Addr));
          }
      }
  }
#endif

Note that the 8-bit 'jump' encodes the number of _words_ since last
offset. The
case where jump would not fit in 8-bits is handled by setting jump to 0, and
emitting the full offset for the next relocation in the subsequent entry.

The above code is the entirety of the implementation for decoding and
processing '.relrz.dyn' sections in glibc dynamic loader.

This encoding can represent up to 56 relocation offsets in a single 64-bit
word. For many of the binaries we tested, this encoding provides >40x
compression for storing offsets over the original `.relr.dyn` section.

For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and 24-bits
for
the bitmap.

Here are three real world examples that demonstrate the savings:

1) Chrome browser (x86_64, built as PIE):
   File size (stripped): 152265064 bytes (145.21MB)
   605159 relocation entries (24 bytes each) in '.rela.dyn'
   594542 are R_X86_64_RELATIVE relocations (98.25%)
       14269008 bytes (13.61MB) in use in '.rela.dyn' section
         109256 bytes  (0.10MB) if moved to '.relrz.dyn' section

   Savings: 14159752 bytes, or 9.29% of original file size.

2) Go net/http test binary (x86_64, 'go test -buildmode=pie -c
net/http')
   File size (stripped): 10238168 bytes (9.76MB)
   83810 relocation entries (24 bytes each) in '.rela.dyn'
   83804 are R_X86_64_RELATIVE relocations (99.99%)
       2011296 bytes (1.92MB) in use in .rela.dyn section
         43744 bytes (0.04MB) if moved to .relrz.dyn section

   Savings: 1967552 bytes, or 19.21% of original file size.

3) Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
   File size (stripped): 3030032 bytes (2.89MB)
   6680 relocation entries (24 bytes each) in '.rela.dyn'
   6272 are R_X86_64_RELATIVE relocations (93.89%)
       150528 bytes (0.14MB) in use in .rela.dyn section
         1992 bytes (0.00MB) if moved to .relrz.dyn section

   Savings: 148536 bytes, or 4.90% of original file size.

Recent releases of Debian, Ubuntu, and several other distributions build
executables as PIE by default. Suprateeka posted some statistics earlier in
this thread on the prevalence of relative relocations in executables residing
in /usr/bin: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00013.html

The third example above shows that using '.relrz.dyn' sections to encode
relative relocations can bring decent savings to executable sizes in /usr/bin
across many distributions.

We have working ld.gold and ld.so implementations for arm, aarch64, and x86_64,
and would be happy to send patches to the binutils and glibc communities for
review.

However, before that can happen, we need agreement on the ABI side for the new
section type and the encoding. We haven't worked on a change of this
magnitude
before that touches so many different pieces from the linker, elf tools, and
the dynamic loader. Specifically, we need agreement and/or guidance on where
and how should the new section type and its encoding be documented. We're
proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ, DT_RELRZENT,
and DT_RELRZCOUNT that all the different parts of the toolchains can agree on.

Thanks,
Rahul

[1]:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/ccoutant/experimental-relr

On Mon, May 8, 2017 at 1:55 PM, Sriraman Tallam <tmsriram at google.com>
wrote:> +llvm-dev
>
> Discussion here: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html
>
> On Tue, May 2, 2017 at 10:17 AM, Suprateeka R Hegde
> <hegdesmailbox at gmail.com> wrote:
>> On 02-May-2017 12:05 AM, Florian Weimer wrote:
>>> On 05/01/2017 08:28 PM, Suprateeka R Hegde wrote:
>>>> So the ratio shows ~96% is RELATIVE reloc. And only ~4% others.
This is
>>>> not the case on HP-UX/Itanium. But as I said, this comparison
does not
>>>> make sense as the runtime architecture and ISA are totally
different.
>>>
>>> It could be that HP-UX was written in a way to reduce relative
>>> relocations,
>>
>> Rather, the Itanium runtime architecture itself provides a way to
reduce
>> them.
>>
>>> or that the final executables aren't actually PIC anymore.
>>
>> I was referring to shlibs (PIC) on HP-UX and it was implicit in my
mind.
>> Sorry for that.
>>
>> I just built a large C++ shlib both on HP-UX/Itanium with our aCC
>> compiler and Linux x86-64 using GCC-6.2. The sources are almost same
>> with only a couple of lines differing between platforms.
>>
>> (HP-UX/Linux)
>> Total:    12224/38311
>> RELATIVE: 18/6397
>>
>> I will try to check the reason for such a huge difference in RELATIVE
>> reloc count. It might be useful for this discussion (just a guess)
>>
>> --
>> Supra

Ian Lance Taylor via llvm-dev

2017-Dec-07 23:37 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

On Thu, Dec 7, 2017 at 2:51 PM, Rahul Chaudhry <rahulchaudhry at
google.com> wrote:>
> However, before that can happen, we need agreement on the ABI side for the
new
> section type and the encoding. We haven't worked on a change of this
magnitude
> before that touches so many different pieces from the linker, elf tools,
and
> the dynamic loader. Specifically, we need agreement and/or guidance on
where
> and how should the new section type and its encoding be documented.
We're
> proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ,
DT_RELRZENT,
> and DT_RELRZCOUNT that all the different parts of the toolchains can agree
on.
Sounds like good work.

The place to hold a discussion on ELF ABI issues is
generic-abi at googlegroups.com and gnu-gabi at sourceware.org.  Inasmuch as
there is an official ELF ABI any more now that SCO has gone under, it
is maintained on the generic-abi list.

Ian

Cary Coutant via llvm-dev

2017-Dec-09 06:36 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

> We've taken the '.relr.dyn' section from Cary's prototype,
and implemented a
> custom encoding to compactly represent the list of offsets. We're
calling the
> new compressed section '.relrz.dyn' (for
relocations-relative-compressed).
I'd suggest just using .relr.dyn -- your encoding is straightforward
enough that I'd just make that the standard representation for this
section type.
> The encoding used is a simple combination of delta-encoding and a bitmap of
> offsets. The section consists of 64-bit entries: higher 8-bits contain
delta
> since last offset, and lower 56-bits contain a bitmap for which words to
apply
> the relocation to. This is best described by showing the code for decoding
the
> section:
>
> ...
>
> The above code is the entirety of the implementation for decoding and
> processing '.relrz.dyn' sections in glibc dynamic loader.
>
> This encoding can represent up to 56 relocation offsets in a single 64-bit
> word. For many of the binaries we tested, this encoding provides >40x
> compression for storing offsets over the original `.relr.dyn` section.
>
> For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and
24-bits for
> the bitmap.
Very nice! Simple and effective.
> Here are three real world examples that demonstrate the savings:
Impressive numbers. I've gotta admit, the savings are better than I
expected.
> However, before that can happen, we need agreement on the ABI side for the
new
> section type and the encoding. We haven't worked on a change of this
magnitude
> before that touches so many different pieces from the linker, elf tools,
and
> the dynamic loader. Specifically, we need agreement and/or guidance on
where
> and how should the new section type and its encoding be documented.
We're
> proposing adding new defines for SHT_RELRZ, DT_RELRZ, DT_RELRZSZ,
DT_RELRZENT,
> and DT_RELRZCOUNT that all the different parts of the toolchains can agree
on.
Yes, as Ian mentioned, the generic ABI discussion is at
generic-abi at googlegroups.com. Most people who would be interested are
already on the gnu-gabi at sourceware.org list, but there are a few who
are not, and who may not yet have seen this discussion. I'll support
the proposal.

Thanks for taking this idea the extra mile!

-cary

Florian Weimer via llvm-dev

2017-Dec-09 23:06 UTC

head link

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

* Rahul Chaudhry via gnu-gabi:
> The encoding used is a simple combination of delta-encoding and a
> bitmap of offsets. The section consists of 64-bit entries: higher
> 8-bits contain delta since last offset, and lower 56-bits contain a
> bitmap for which words to apply the relocation to. This is best
> described by showing the code for decoding the section:
>
> typedef struct
> {
>   Elf64_Xword  r_data;  /* jump and bitmap for relative relocations */
> } Elf64_Relrz;
>
> #define ELF64_R_JUMP(val)    ((val) >> 56)
> #define ELF64_R_BITS(val)    ((val) & 0xffffffffffffff)
>
> #ifdef DO_RELRZ
>   {
>     ElfW(Addr) offset = 0;
>     for (; relative < end; ++relative)
>       {
>         ElfW(Addr) jump = ELFW(R_JUMP) (relative->r_data);
>         ElfW(Addr) bits = ELFW(R_BITS) (relative->r_data);
>         offset += jump * sizeof(ElfW(Addr));
>         if (jump == 0)
>           {
>             ++relative;
>             offset = relative->r_data;
>           }
>         ElfW(Addr) r_offset = offset;
>         for (; bits != 0; bits >>= 1)
>           {
>             if ((bits&1) != 0)
>               elf_machine_relrz_relative (l_addr, (void *) (l_addr +
r_offset));
>             r_offset += sizeof(ElfW(Addr));
>           }
>       }
>   }
> #endif
That data-dependent “if ((bits&1) != 0)” branch looks a bit nasty.

Have you investigated whether some sort of RLE-style encoding would be
beneficial? If there are blocks of relative relocations, it might even
be possible to use vector instructions to process them (although more
than four relocations at a time are probably not achievable in a
power-efficient manner on current x86-64).

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - May 2017 - Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

Possibly Parallel Threads