thr3ads.net - llvm dev - [llvm-dev] [RFC] Moving RELRO segment [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Vic (Chun-Ju) Yang via llvm-dev

2019-Aug-29 17:35 UTC

[llvm-dev] [RFC] Moving RELRO segment

On Thu, Aug 29, 2019 at 3:10 AM Fāng-ruì Sòng <maskray at google.com>
wrote:
> Hello Vic,
>
> To make sure I understand the proposal correctly, do you propose:
>
> Old: R RX RW(RELRO) RW
> New: R(R+RELRO) RX RW;      R includes the traditional R part and the
> RELRO part
> Runtime (before relocation resolving): RW RX RW
> Runtime (after relocation resolving): R RX RW
>I actually see two ways of implementing this, and yes what you mentioned
here is one of them:
  1. Move RELRO to before RX, and merge it with R segment. This is what you
said above.
  2. Move RELRO to before RX, but keep it as a separate segment. This is
what I implemented in my test.
As I mentioned in my reply to Peter, option 1 would allow existing
implementations to take advantage of this without any change. While I think
this optimization is well worth it, if we go with option 1, the dynamic
linkers won't have a choice to keep RO separate if they want to for
whatever reason (e.g. less VM commit, finer granularity in VM maps, not
wanting to have RO as writable even if for a short while.) So there's a
trade-off to be made here (or an option to be added, even though we all
want to avoid that if we can.)
>
> How to layout the segments if --no-rosegment is specified?
>
> One option is to keep the old layout if --no-rosegment is specified, the
> other is:
>
> Old: RX RW(RELRO) RW
> New: RX(R+RELRO+RX) RW;     RX includes the traditional R part, the RELRO
> part, and the RX part
> Runtime (before relocation resolving): RW RX RW;      ifunc can't run
if
> RX is not kept
> Runtime (before relocation resolving): RX RW   ;      some people may be
> concered with writable stuff (relocated part) being made executable
>Indeed I think weakening in the security aspect may be a problem if we are
to merge RELRO into RX. Keeping the old layout would be more
preferable IMHO.
>
>
> Another problem is that in the default -z relro -z lazy (-z now not
> specified) layout, .got and .got.plt will be separated by potentially huge
> code sections (e.g. .text). I'm still thinking what problems this
layout
> change may bring.
>Not sure if this is the same issue as what you mentioned here, but I also
see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame should
be as close to .text as possible due to fear of relocation overflow. If we
go with option 2 above, the distance would have to be made larger. With
option 1, we may still have some leeway in how to order sections within the
merged RELRO segment.

Vic
>
> On Thu, Aug 29, 2019 at 5:42 PM Peter Smith <peter.smith at
linaro.org>
> wrote:
>
>> Hello Vic,
>>
>> I don't have a lot to add myself. I think that majority of the
input
>> needs to come from the OS stakeholders. My main concern is if it
>> requires work on every platform to take advantage or avoid regressions
>> then perhaps it is worth adding as an option rather than changing the
>> default.
>>
>> Some questions:
>> - Does this need work in every OS for correctness of programs? For
>> example you mention that cross-DSO CFI implementation in Android
>> needed to be updated, could that also be the case on other platforms?
>> - Does this need work in every OS to take advantage of it? For example
>> would this need a ld.so change on Linux?
>>
>> The last time we updated the position of RELRO was in
>> https://reviews.llvm.org/D56828 it will be worth going through the
>> arguments in there to see if there is anything that triggers any
>> thoughts.
>>
>> Peter
>>
>> On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com>
wrote:
>> >
>> > Hi Vic,
>> >
>> > I'm in favor of this proposal. Saving that amount of kernel
memory by
>> changing the memory layout seems like a win. I believe that there are
>> programs in the wild that assume some specific segment order, and
moving
>> the RELRO segment might break some of them, but looks like it's
worth the
>> risk.
>> >
>> > On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev
<
>> llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Hey all,
>> >>
>> >> TL;DR: Moving RELRO segment to be immediately after read-only
segment
>> so that the dynamic linker has the option to merge the two virtual
memory
>> areas at run time.
>> >>
>> >> This is an RFC for moving RELRO segment. Currently, lld orders
ELF
>> sections in the following order: R, RX, RWX, RW, and RW contains RELRO.
At
>> run time, after RELRO is write-protected, we'd have VMAs in the
order of:
>> R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO
to be
>> immediately after the read-only sections, so that the order of VMAs
become:
>> R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option
to
>> merge the two read-only VMAs to reduce bookkeeping costs.
>> >>
>> >> While I only tested this proposal on an ARM64 Android
platform, the
>> same optimization should be applicable to other platforms as well. My
test
>> showed an overall ~1MB decrease in kernel slab memory usage on
>> vm_area_struct, with about 150 processes running. For this to work, I
had
>> to modify the dynamic linker:
>> >>   1. The dynamic linker needs to make the read-only VMA
briefly
>> writable in order for it to have the same VM flags with the RELRO VMA
so
>> that they can be merged. Specifically VM_ACCOUNT is set when a VMA is
made
>> writable.
>> >>   2. The cross-DSO CFI implementation in Android dynamic
linker
>> currently assumes __cfi_check is at a lower address than all CFI
targets,
>> so CFI check fails when RELRO is moved to below text section. After I
added
>> support for CFI targets below __cfi_check, I don't see CFI failures
anymore.
>> >> One drawback that comes with this change is that the number of
LOAD
>> segments increases by one for DSOs with anything other than those in
RELRO
>> in its RW LOAD segment.
>> >>
>> >> This would be a somewhat tedious change (especially the part
about
>> having to update all the unit tests), but the benefit is pretty good,
>> especially considering the kernel slab memory is not
swappable/evictable.
>> Please let me know your thoughts!
>> >>
>> >> Thanks,
>> >> Vic
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> 宋方睿
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/d16de5b7/attachment.html>

Fāng-ruì Sòng via llvm-dev

2019-Aug-30 11:54 UTC

head link

[llvm-dev] [RFC] Moving RELRO segment

> > Old: R RX RW(RELRO) RW
> > New: R(R+RELRO) RX RW;      R includes the traditional R part and the
> > RELRO part
> > Runtime (before relocation resolving): RW RX RW
> > Runtime (after relocation resolving): R RX RW
> >
> I actually see two ways of implementing this, and yes what you mentioned
> here is one of them:
>   1. Move RELRO to before RX, and merge it with R segment. This is what
you> said above.
>   2. Move RELRO to before RX, but keep it as a separate segment. This is
> what I implemented in my test.
> As I mentioned in my reply to Peter, option 1 would allow existing
> implementations to take advantage of this without any change. While I
think> this optimization is well worth it, if we go with option 1, the dynamic
> linkers won't have a choice to keep RO separate if they want to for
> whatever reason (e.g. less VM commit, finer granularity in VM maps, not
> wanting to have RO as writable even if for a short while.) So there's a
> trade-off to be made here (or an option to be added, even though we all
> want to avoid that if we can.)
Then you probably meant:

Old: R RX RW(RELRO) RW
New: R | RW(RELRO) RX RW
Runtime (before relocation resolving): R RW RX RW
Runtime (after relocation resolving): R R RX RW   ; the two R cannot be
merged

| means a maxpagesize alignment. I am not sure whether you are going to add
it
because I still do not understand where the saving comes from.

If the alignment is added, the R and RW maps can get contiguous
(non-overlapping) p_offset ranges. However, the RW map is private dirty,
it cannot be merged with adjacent maps so I am not clear how it can save
kernel memory.

If the alignment is not added, the two maps will get overlapping p_offset
ranges.
> My test showed an overall ~1MB decrease in kernel slab memory usage on
> vm_area_struct, with about 150 processes running. For this to work, I had
> to modify the dynamic linker:
Can you elaborate how this decreases the kernel slab memory usage on
vm_area_struct?  References to source code are very welcomed :) This is
contrary to my intuition because the second R is private dirty.  The number
of
VMAs do not decrease.
>   1. The dynamic linker needs to make the read-only VMA briefly writable
in> order for it to have the same VM flags with the RELRO VMA so that they can
> be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
Same question. I hope you can give a bit more details.
> > How to layout the segments if --no-rosegment is specified?
> > Runtime (before relocation resolving): RX RW   ;      some people may
be
> > concered with writable stuff (relocated part) being made executable
> Indeed I think weakening in the security aspect may be a problem if we are
> to merge RELRO into RX. Keeping the old layout would be more
> preferable IMHO.
This means the new layout conflicts with --no-rosegment.
In Driver.cpp, there should be a "... cannot be used together" error.
> > Another problem is that in the default -z relro -z lazy (-z now not
> > specified) layout, .got and .got.plt will be separated by potentially
huge> > code sections (e.g. .text). I'm still thinking what problems this
layout
> > change may bring.
> >
> Not sure if this is the same issue as what you mentioned here, but I also
> see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame should
> be as close to .text as possible due to fear of relocation overflow. If we
> go with option 2 above, the distance would have to be made larger. With
> option 1, we may still have some leeway in how to order sections within
the> merged RELRO segment.
For huge executables (>2G or 3G), it may cause relocation overflows
between .text and .rodata if other large sections like .dynsym and .dynstr
are
placed in between.

I do not worry too much about overflows potentially caused by moving
PT_GNU_RELRO around.  PT_GNU_RELRO is usually less than 10% of the size of
the
RX PT_LOAD.
> This would be a somewhat tedious change (especially the part about having
> to update all the unit tests), but the benefit is pretty good, especially
> considering the kernel slab memory is not swappable/evictable. Please let
> me know your thoughts!
Definitely! I have prototyped this and find ~260 tests will need address
changing..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190830/c8766a94/attachment.html>

Vic (Chun-Ju) Yang via llvm-dev

2019-Sep-03 17:40 UTC

head link

[llvm-dev] [RFC] Moving RELRO segment

On Fri, Aug 30, 2019 at 4:54 AM Fāng-ruì Sòng <maskray at google.com>
wrote:
> > > Old: R RX RW(RELRO) RW
> > > New: R(R+RELRO) RX RW;      R includes the traditional R part and
the
> > > RELRO part
> > > Runtime (before relocation resolving): RW RX RW
> > > Runtime (after relocation resolving): R RX RW
> > >
> > I actually see two ways of implementing this, and yes what you
mentioned
> > here is one of them:
> >   1. Move RELRO to before RX, and merge it with R segment. This is
what
> you
> > said above.
> >   2. Move RELRO to before RX, but keep it as a separate segment. This
is
> > what I implemented in my test.
> > As I mentioned in my reply to Peter, option 1 would allow existing
> > implementations to take advantage of this without any change. While I
> think
> > this optimization is well worth it, if we go with option 1, the
dynamic
> > linkers won't have a choice to keep RO separate if they want to
for
> > whatever reason (e.g. less VM commit, finer granularity in VM maps,
not
> > wanting to have RO as writable even if for a short while.) So
there's a
> > trade-off to be made here (or an option to be added, even though we
all
> > want to avoid that if we can.)
>
> Then you probably meant:
>
> Old: R RX RW(RELRO) RW
> New: R | RW(RELRO) RX RW
> Runtime (before relocation resolving): R RW RX RW
> Runtime (after relocation resolving): R R RX RW   ; the two R cannot be
> merged
>
> | means a maxpagesize alignment. I am not sure whether you are going to
> add it
> because I still do not understand where the saving comes from.
>
> If the alignment is added, the R and RW maps can get contiguous
> (non-overlapping) p_offset ranges. However, the RW map is private dirty,
> it cannot be merged with adjacent maps so I am not clear how it can save
> kernel memory.
>
My understanding (and my test result shows so) is that two VMAs can be
merged even when one of them contains dirty pages. As far as I can tell
from reading vma_merge() in mm/mmap.c in Linux kernel, there's nothing
preventing merging consecutively mmaped regions in that case. That said, we
may not care about this case too much if we decide that this change should
be put behind a flag, because in that case, I think we can just go with
option 1.

>
> If the alignment is not added, the two maps will get overlapping p_offset
> ranges.
>
> > My test showed an overall ~1MB decrease in kernel slab memory usage on
> > vm_area_struct, with about 150 processes running. For this to work, I
had
> > to modify the dynamic linker:
>
> Can you elaborate how this decreases the kernel slab memory usage on
> vm_area_struct?  References to source code are very welcomed :) This is
> contrary to my intuition because the second R is private dirty.  The
> number of
> VMAs do not decrease.
>In mm/mprotect.c, merging is done in mprotect_fixup(), which calls
vma_merge() to do the actual work. In the same function you can also see
VM_ACCOUNT flag is set for writable VMA, which is why I had to modify the
dynamic linker to make R section temporarily writable for it to be
mergeable with RELRO (they need to have the same flags to be merged.)
Again, IMO all these somewhat indirect manipulations of VMAs were because I
was hoping to give the dynamic linker an option to choose whether to take
advantage of this or not. If for any reason, we put this behind a build
time flag, there's no reason to jump through these hoops instead of just
going with option 1.
>
> >   1. The dynamic linker needs to make the read-only VMA briefly
writable
> in
> > order for it to have the same VM flags with the RELRO VMA so that they
> can
> > be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
>
> Same question. I hope you can give a bit more details.
>
> > > How to layout the segments if --no-rosegment is specified?
> > > Runtime (before relocation resolving): RX RW   ;      some people
may
> be
> > > concered with writable stuff (relocated part) being made
executable
> > Indeed I think weakening in the security aspect may be a problem if we
> are
> > to merge RELRO into RX. Keeping the old layout would be more
> > preferable IMHO.
>
> This means the new layout conflicts with --no-rosegment.
> In Driver.cpp, there should be a "... cannot be used together"
error.
>
> > > Another problem is that in the default -z relro -z lazy (-z now
not
> > > specified) layout, .got and .got.plt will be separated by
potentially
> huge
> > > code sections (e.g. .text). I'm still thinking what problems
this
> layout
> > > change may bring.
> > >
> > Not sure if this is the same issue as what you mentioned here, but I
also
> > see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame
> should
> > be as close to .text as possible due to fear of relocation overflow.
If
> we
> > go with option 2 above, the distance would have to be made larger.
With
> > option 1, we may still have some leeway in how to order sections
within
> the
> > merged RELRO segment.
>
> For huge executables (>2G or 3G), it may cause relocation overflows
> between .text and .rodata if other large sections like .dynsym and .dynstr
> are
> placed in between.
>
> I do not worry too much about overflows potentially caused by moving
> PT_GNU_RELRO around.  PT_GNU_RELRO is usually less than 10% of the size of
> the
> RX PT_LOAD.
>That's good to know!
>
> > This would be a somewhat tedious change (especially the part about
having
> > to update all the unit tests), but the benefit is pretty good,
especially
> > considering the kernel slab memory is not swappable/evictable. Please
let
> > me know your thoughts!
>
> Definitely! I have prototyped this and find ~260 tests will need address
> changing..
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190903/840b6f6f/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Sep 2019 - [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

[llvm-dev] [RFC] Moving RELRO segment

Apparently Analagous Threads