thr3ads.net - llvm dev - [llvm-dev] [RFC] Moving RELRO segment [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Vic (Chun-Ju) Yang via llvm-dev

2019-Aug-28 17:58 UTC

[llvm-dev] [RFC] Moving RELRO segment

Hey all,

TL;DR: Moving RELRO segment to be immediately after read-only segment so
that the dynamic linker has the option to merge the two virtual memory
areas at run time.

This is an RFC for moving RELRO segment. Currently, lld orders ELF sections
in the following order: R, RX, RWX, RW, and RW contains RELRO. At run time,
after RELRO is write-protected, we'd have VMAs in the order of: R, RX, RWX,
R (RELRO), RW. I'd like to propose that we move RELRO to be immediately
after the read-only sections, so that the order of VMAs become: R, R
(RELRO), RX, RWX, RW, and the dynamic linker would have the option to merge
the two read-only VMAs to reduce bookkeeping costs.

While I only tested this proposal on an ARM64 Android platform, the same
optimization should be applicable to other platforms as well. My test
showed an overall ~1MB decrease in kernel slab memory usage on
vm_area_struct, with about 150 processes running. For this to work, I had
to modify the dynamic linker:
  1. The dynamic linker needs to make the read-only VMA briefly writable in
order for it to have the same VM flags with the RELRO VMA so that they can
be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
  2. The cross-DSO CFI implementation in Android dynamic linker currently
assumes __cfi_check is at a lower address than all CFI targets, so CFI
check fails when RELRO is moved to below text section. After I added
support for CFI targets below __cfi_check, I don't see CFI failures anymore.
One drawback that comes with this change is that the number of LOAD
segments increases by one for DSOs with anything other than those in RELRO
in its RW LOAD segment.

This would be a somewhat tedious change (especially the part about having
to update all the unit tests), but the benefit is pretty good, especially
considering the kernel slab memory is not swappable/evictable. Please let
me know your thoughts!

Thanks,
Vic
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190828/b91b0827/attachment.html>

Rui Ueyama via llvm-dev

2019-Aug-29 08:22 UTC

head link

[llvm-dev] [RFC] Moving RELRO segment

Hi Vic,

I'm in favor of this proposal. Saving that amount of kernel memory by
changing the memory layout seems like a win. I believe that there are
programs in the wild that assume some specific segment order, and moving
the RELRO segment might break some of them, but looks like it's worth the
risk.

On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hey all,
>
> TL;DR: Moving RELRO segment to be immediately after read-only segment so
> that the dynamic linker has the option to merge the two virtual memory
> areas at run time.
>
> This is an RFC for moving RELRO segment. Currently, lld orders ELF
> sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At
> run time, after RELRO is write-protected, we'd have VMAs in the order
of:
> R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be
> immediately after the read-only sections, so that the order of VMAs become:
> R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option to
> merge the two read-only VMAs to reduce bookkeeping costs.
>
> While I only tested this proposal on an ARM64 Android platform, the same
> optimization should be applicable to other platforms as well. My test
> showed an overall ~1MB decrease in kernel slab memory usage on
> vm_area_struct, with about 150 processes running. For this to work, I had
> to modify the dynamic linker:
>   1. The dynamic linker needs to make the read-only VMA briefly writable
> in order for it to have the same VM flags with the RELRO VMA so that they
> can be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
>   2. The cross-DSO CFI implementation in Android dynamic linker currently
> assumes __cfi_check is at a lower address than all CFI targets, so CFI
> check fails when RELRO is moved to below text section. After I added
> support for CFI targets below __cfi_check, I don't see CFI failures
anymore.
> One drawback that comes with this change is that the number of LOAD
> segments increases by one for DSOs with anything other than those in RELRO
> in its RW LOAD segment.
>
> This would be a somewhat tedious change (especially the part about having
> to update all the unit tests), but the benefit is pretty good, especially
> considering the kernel slab memory is not swappable/evictable. Please let
> me know your thoughts!
>
> Thanks,
> Vic
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/82746d1f/attachment.html>

Peter Smith via llvm-dev

2019-Aug-29 09:42 UTC

head link

[llvm-dev] [RFC] Moving RELRO segment

Hello Vic,

I don't have a lot to add myself. I think that majority of the input
needs to come from the OS stakeholders. My main concern is if it
requires work on every platform to take advantage or avoid regressions
then perhaps it is worth adding as an option rather than changing the
default.

Some questions:
- Does this need work in every OS for correctness of programs? For
example you mention that cross-DSO CFI implementation in Android
needed to be updated, could that also be the case on other platforms?
- Does this need work in every OS to take advantage of it? For example
would this need a ld.so change on Linux?

The last time we updated the position of RELRO was in
https://reviews.llvm.org/D56828 it will be worth going through the
arguments in there to see if there is anything that triggers any
thoughts.

Peter

On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com>
wrote:>
> Hi Vic,
>
> I'm in favor of this proposal. Saving that amount of kernel memory by
changing the memory layout seems like a win. I believe that there are programs
in the wild that assume some specific segment order, and moving the RELRO
segment might break some of them, but looks like it's worth the risk.
>
> On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>
>> Hey all,
>>
>> TL;DR: Moving RELRO segment to be immediately after read-only segment
so that the dynamic linker has the option to merge the two virtual memory areas
at run time.
>>
>> This is an RFC for moving RELRO segment. Currently, lld orders ELF
sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At run
time, after RELRO is write-protected, we'd have VMAs in the order of: R, RX,
RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be immediately
after the read-only sections, so that the order of VMAs become: R, R (RELRO),
RX, RWX, RW, and the dynamic linker would have the option to merge the two
read-only VMAs to reduce bookkeeping costs.
>>
>> While I only tested this proposal on an ARM64 Android platform, the
same optimization should be applicable to other platforms as well. My test
showed an overall ~1MB decrease in kernel slab memory usage on vm_area_struct,
with about 150 processes running. For this to work, I had to modify the dynamic
linker:
>>   1. The dynamic linker needs to make the read-only VMA briefly
writable in order for it to have the same VM flags with the RELRO VMA so that
they can be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
>>   2. The cross-DSO CFI implementation in Android dynamic linker
currently assumes __cfi_check is at a lower address than all CFI targets, so CFI
check fails when RELRO is moved to below text section. After I added support for
CFI targets below __cfi_check, I don't see CFI failures anymore.
>> One drawback that comes with this change is that the number of LOAD
segments increases by one for DSOs with anything other than those in RELRO in
its RW LOAD segment.
>>
>> This would be a somewhat tedious change (especially the part about
having to update all the unit tests), but the benefit is pretty good, especially
considering the kernel slab memory is not swappable/evictable. Please let me
know your thoughts!
>>
>> Thanks,
>> Vic
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Chisnall via llvm-dev

2019-Aug-30 12:27 UTC

head link

[llvm-dev] [RFC] Moving RELRO segment

On 28/08/2019 18:58, Vic (Chun-Ju) Yang via llvm-dev
wrote:> This is an RFC for moving RELRO segment. Currently, lld orders ELF 
> sections in the following order: R, RX, RWX, RW, and RW contains RELRO. 
> At run time, after RELRO is write-protected, we'd have VMAs in the
order
> of: R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO
to
> be immediately after the read-only sections, so that the order of VMAs 
> become: R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the 
> option to merge the two read-only VMAs to reduce bookkeeping costs.
I am not convinced by this change.  With current hardware, to make any 
mapping more efficient, you need both the virtual to physical 
translation and the permissions to be the same.

Anything that is writeable at any point will be a CoW mapping that, when 
written, will be replaced by a different page.  Anything that is not 
ever writeable will be the same physical pages.  This means that the old 
order is (S for shared, P for private):

S S P P

The new order is:

S P S P P

This means that the translation for the shared part is *definitely* not 
contiguous.  Modern architectures currently (though not necessarily 
indefinitely) conflate protection and translation and so both versions 
require the same number of page table and TLB entries.

This; however, is true only when you think about single-level 
translation.  When you consider nested paging in a VM, things get more 
complex because the translation is a two-stage lookup and the protection 
is based on the intersection of the permissions at each level.

The hypervisor will typically try to use superpages for the second-level 
translation and so both of the shared pages have a high probability of 
hitting in the same PTE for the second-level translation.  The same is 
true for the RW and RELRO segments, because they will be allocated at 
the same time and any OS that does transparent superpage promotion (I 
think Linux does now?  FreeBSD has for almost a decade) will therefore 
try to allocate contiguous physical memory for the mappings if possible.

I would expect your scheme to translate to more memory traffic from 
page-table walks in any virtualised environment and I don't see (given 
that you have increased address space fragmentation) where you are 
seeing a saving.  With RELRO as part of RW, the kernel is free to split 
and recombine adjacent VM objects, with the new layout it is not able to 
combine adjacent objects because they are backed by different storage.

David

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Aug 2019 - [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

Seemingly Similar Threads