Hello Vic, I don't have a lot to add myself. I think that majority of the input needs to come from the OS stakeholders. My main concern is if it requires work on every platform to take advantage or avoid regressions then perhaps it is worth adding as an option rather than changing the default. Some questions: - Does this need work in every OS for correctness of programs? For example you mention that cross-DSO CFI implementation in Android needed to be updated, could that also be the case on other platforms? - Does this need work in every OS to take advantage of it? For example would this need a ld.so change on Linux? The last time we updated the position of RELRO was in https://reviews.llvm.org/D56828 it will be worth going through the arguments in there to see if there is anything that triggers any thoughts. Peter On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com> wrote:> > Hi Vic, > > I'm in favor of this proposal. Saving that amount of kernel memory by changing the memory layout seems like a win. I believe that there are programs in the wild that assume some specific segment order, and moving the RELRO segment might break some of them, but looks like it's worth the risk. > > On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hey all, >> >> TL;DR: Moving RELRO segment to be immediately after read-only segment so that the dynamic linker has the option to merge the two virtual memory areas at run time. >> >> This is an RFC for moving RELRO segment. Currently, lld orders ELF sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At run time, after RELRO is write-protected, we'd have VMAs in the order of: R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be immediately after the read-only sections, so that the order of VMAs become: R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option to merge the two read-only VMAs to reduce bookkeeping costs. >> >> While I only tested this proposal on an ARM64 Android platform, the same optimization should be applicable to other platforms as well. My test showed an overall ~1MB decrease in kernel slab memory usage on vm_area_struct, with about 150 processes running. For this to work, I had to modify the dynamic linker: >> 1. The dynamic linker needs to make the read-only VMA briefly writable in order for it to have the same VM flags with the RELRO VMA so that they can be merged. Specifically VM_ACCOUNT is set when a VMA is made writable. >> 2. The cross-DSO CFI implementation in Android dynamic linker currently assumes __cfi_check is at a lower address than all CFI targets, so CFI check fails when RELRO is moved to below text section. After I added support for CFI targets below __cfi_check, I don't see CFI failures anymore. >> One drawback that comes with this change is that the number of LOAD segments increases by one for DSOs with anything other than those in RELRO in its RW LOAD segment. >> >> This would be a somewhat tedious change (especially the part about having to update all the unit tests), but the benefit is pretty good, especially considering the kernel slab memory is not swappable/evictable. Please let me know your thoughts! >> >> Thanks, >> Vic >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hello Vic, To make sure I understand the proposal correctly, do you propose: Old: R RX RW(RELRO) RW New: R(R+RELRO) RX RW; R includes the traditional R part and the RELRO part Runtime (before relocation resolving): RW RX RW Runtime (after relocation resolving): R RX RW How to layout the segments if --no-rosegment is specified? One option is to keep the old layout if --no-rosegment is specified, the other is: Old: RX RW(RELRO) RW New: RX(R+RELRO+RX) RW; RX includes the traditional R part, the RELRO part, and the RX part Runtime (before relocation resolving): RW RX RW; ifunc can't run if RX is not kept Runtime (before relocation resolving): RX RW ; some people may be concered with writable stuff (relocated part) being made executable Another problem is that in the default -z relro -z lazy (-z now not specified) layout, .got and .got.plt will be separated by potentially huge code sections (e.g. .text). I'm still thinking what problems this layout change may bring. On Thu, Aug 29, 2019 at 5:42 PM Peter Smith <peter.smith at linaro.org> wrote:> Hello Vic, > > I don't have a lot to add myself. I think that majority of the input > needs to come from the OS stakeholders. My main concern is if it > requires work on every platform to take advantage or avoid regressions > then perhaps it is worth adding as an option rather than changing the > default. > > Some questions: > - Does this need work in every OS for correctness of programs? For > example you mention that cross-DSO CFI implementation in Android > needed to be updated, could that also be the case on other platforms? > - Does this need work in every OS to take advantage of it? For example > would this need a ld.so change on Linux? > > The last time we updated the position of RELRO was in > https://reviews.llvm.org/D56828 it will be worth going through the > arguments in there to see if there is anything that triggers any > thoughts. > > Peter > > On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com> wrote: > > > > Hi Vic, > > > > I'm in favor of this proposal. Saving that amount of kernel memory by > changing the memory layout seems like a win. I believe that there are > programs in the wild that assume some specific segment order, and moving > the RELRO segment might break some of them, but looks like it's worth the > risk. > > > > On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> Hey all, > >> > >> TL;DR: Moving RELRO segment to be immediately after read-only segment > so that the dynamic linker has the option to merge the two virtual memory > areas at run time. > >> > >> This is an RFC for moving RELRO segment. Currently, lld orders ELF > sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At > run time, after RELRO is write-protected, we'd have VMAs in the order of: > R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be > immediately after the read-only sections, so that the order of VMAs become: > R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option to > merge the two read-only VMAs to reduce bookkeeping costs. > >> > >> While I only tested this proposal on an ARM64 Android platform, the > same optimization should be applicable to other platforms as well. My test > showed an overall ~1MB decrease in kernel slab memory usage on > vm_area_struct, with about 150 processes running. For this to work, I had > to modify the dynamic linker: > >> 1. The dynamic linker needs to make the read-only VMA briefly > writable in order for it to have the same VM flags with the RELRO VMA so > that they can be merged. Specifically VM_ACCOUNT is set when a VMA is made > writable. > >> 2. The cross-DSO CFI implementation in Android dynamic linker > currently assumes __cfi_check is at a lower address than all CFI targets, > so CFI check fails when RELRO is moved to below text section. After I added > support for CFI targets below __cfi_check, I don't see CFI failures anymore. > >> One drawback that comes with this change is that the number of LOAD > segments increases by one for DSOs with anything other than those in RELRO > in its RW LOAD segment. > >> > >> This would be a somewhat tedious change (especially the part about > having to update all the unit tests), but the benefit is pretty good, > especially considering the kernel slab memory is not swappable/evictable. > Please let me know your thoughts! > >> > >> Thanks, > >> Vic > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- 宋方睿 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/88fe1845/attachment.html>
Vic (Chun-Ju) Yang via llvm-dev
2019-Aug-29 17:18 UTC
[llvm-dev] [RFC] Moving RELRO segment
On Thu, Aug 29, 2019 at 2:42 AM Peter Smith <peter.smith at linaro.org> wrote:> Hello Vic, > > I don't have a lot to add myself. I think that majority of the input > needs to come from the OS stakeholders. My main concern is if it > requires work on every platform to take advantage or avoid regressions > then perhaps it is worth adding as an option rather than changing the > default.> Some questions: > - Does this need work in every OS for correctness of programs? For > example you mention that cross-DSO CFI implementation in Android > needed to be updated, could that also be the case on other platforms? >Indeed this could be a problem for other platforms. I'm not familiar with what CFI implementations are commonly in use, but from what I can tell, the implementation mentioned in Clang CFI design doc has this problem as well, so I wouldn't be surprised that we see this problem in other implementations: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#cfi-shadow. Either those implementations need to be fixed or we need to add an option for where RELRO is placed (which brings more maintenance cost).> - Does this need work in every OS to take advantage of it? For example > would this need a ld.so change on Linux? >I can't say for sure for other platforms, but for Linux, I think it depends on how we implement this. If we still keep RO and RELRO segments separate, ld.so needs to be updated for the VM_ACCOUNT issue I mentioned in order to take advantage of this. However, we can consider merging RO segment into RELRO segment if they are adjacent to each other (i.e. make what's RO now a part of RELRO), so that we have one less LOAD and also existing linkers can take advantage of this without change (well, except for the CFI issue.)> > The last time we updated the position of RELRO was in > https://reviews.llvm.org/D56828 it will be worth going through the > arguments in there to see if there is anything that triggers any > thoughts. >Thanks for the pointer! I'll go through it. Vic> > Peter > > On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com> wrote: > > > > Hi Vic, > > > > I'm in favor of this proposal. Saving that amount of kernel memory by > changing the memory layout seems like a win. I believe that there are > programs in the wild that assume some specific segment order, and moving > the RELRO segment might break some of them, but looks like it's worth the > risk. > > > > On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> Hey all, > >> > >> TL;DR: Moving RELRO segment to be immediately after read-only segment > so that the dynamic linker has the option to merge the two virtual memory > areas at run time. > >> > >> This is an RFC for moving RELRO segment. Currently, lld orders ELF > sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At > run time, after RELRO is write-protected, we'd have VMAs in the order of: > R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be > immediately after the read-only sections, so that the order of VMAs become: > R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option to > merge the two read-only VMAs to reduce bookkeeping costs. > >> > >> While I only tested this proposal on an ARM64 Android platform, the > same optimization should be applicable to other platforms as well. My test > showed an overall ~1MB decrease in kernel slab memory usage on > vm_area_struct, with about 150 processes running. For this to work, I had > to modify the dynamic linker: > >> 1. The dynamic linker needs to make the read-only VMA briefly > writable in order for it to have the same VM flags with the RELRO VMA so > that they can be merged. Specifically VM_ACCOUNT is set when a VMA is made > writable. > >> 2. The cross-DSO CFI implementation in Android dynamic linker > currently assumes __cfi_check is at a lower address than all CFI targets, > so CFI check fails when RELRO is moved to below text section. After I added > support for CFI targets below __cfi_check, I don't see CFI failures anymore. > >> One drawback that comes with this change is that the number of LOAD > segments increases by one for DSOs with anything other than those in RELRO > in its RW LOAD segment. > >> > >> This would be a somewhat tedious change (especially the part about > having to update all the unit tests), but the benefit is pretty good, > especially considering the kernel slab memory is not swappable/evictable. > Please let me know your thoughts! > >> > >> Thanks, > >> Vic > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/3198fed7/attachment.html>
Vic (Chun-Ju) Yang via llvm-dev
2019-Aug-29 17:35 UTC
[llvm-dev] [RFC] Moving RELRO segment
On Thu, Aug 29, 2019 at 3:10 AM Fāng-ruì Sòng <maskray at google.com> wrote:> Hello Vic, > > To make sure I understand the proposal correctly, do you propose: > > Old: R RX RW(RELRO) RW > New: R(R+RELRO) RX RW; R includes the traditional R part and the > RELRO part > Runtime (before relocation resolving): RW RX RW > Runtime (after relocation resolving): R RX RW >I actually see two ways of implementing this, and yes what you mentioned here is one of them: 1. Move RELRO to before RX, and merge it with R segment. This is what you said above. 2. Move RELRO to before RX, but keep it as a separate segment. This is what I implemented in my test. As I mentioned in my reply to Peter, option 1 would allow existing implementations to take advantage of this without any change. While I think this optimization is well worth it, if we go with option 1, the dynamic linkers won't have a choice to keep RO separate if they want to for whatever reason (e.g. less VM commit, finer granularity in VM maps, not wanting to have RO as writable even if for a short while.) So there's a trade-off to be made here (or an option to be added, even though we all want to avoid that if we can.)> > How to layout the segments if --no-rosegment is specified? > > One option is to keep the old layout if --no-rosegment is specified, the > other is: > > Old: RX RW(RELRO) RW > New: RX(R+RELRO+RX) RW; RX includes the traditional R part, the RELRO > part, and the RX part > Runtime (before relocation resolving): RW RX RW; ifunc can't run if > RX is not kept > Runtime (before relocation resolving): RX RW ; some people may be > concered with writable stuff (relocated part) being made executable >Indeed I think weakening in the security aspect may be a problem if we are to merge RELRO into RX. Keeping the old layout would be more preferable IMHO.> > > Another problem is that in the default -z relro -z lazy (-z now not > specified) layout, .got and .got.plt will be separated by potentially huge > code sections (e.g. .text). I'm still thinking what problems this layout > change may bring. >Not sure if this is the same issue as what you mentioned here, but I also see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame should be as close to .text as possible due to fear of relocation overflow. If we go with option 2 above, the distance would have to be made larger. With option 1, we may still have some leeway in how to order sections within the merged RELRO segment. Vic> > On Thu, Aug 29, 2019 at 5:42 PM Peter Smith <peter.smith at linaro.org> > wrote: > >> Hello Vic, >> >> I don't have a lot to add myself. I think that majority of the input >> needs to come from the OS stakeholders. My main concern is if it >> requires work on every platform to take advantage or avoid regressions >> then perhaps it is worth adding as an option rather than changing the >> default. >> >> Some questions: >> - Does this need work in every OS for correctness of programs? For >> example you mention that cross-DSO CFI implementation in Android >> needed to be updated, could that also be the case on other platforms? >> - Does this need work in every OS to take advantage of it? For example >> would this need a ld.so change on Linux? >> >> The last time we updated the position of RELRO was in >> https://reviews.llvm.org/D56828 it will be worth going through the >> arguments in there to see if there is anything that triggers any >> thoughts. >> >> Peter >> >> On Thu, 29 Aug 2019 at 09:22, Rui Ueyama <ruiu at google.com> wrote: >> > >> > Hi Vic, >> > >> > I'm in favor of this proposal. Saving that amount of kernel memory by >> changing the memory layout seems like a win. I believe that there are >> programs in the wild that assume some specific segment order, and moving >> the RELRO segment might break some of them, but looks like it's worth the >> risk. >> > >> > On Thu, Aug 29, 2019 at 2:51 PM Vic (Chun-Ju) Yang via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> >> >> Hey all, >> >> >> >> TL;DR: Moving RELRO segment to be immediately after read-only segment >> so that the dynamic linker has the option to merge the two virtual memory >> areas at run time. >> >> >> >> This is an RFC for moving RELRO segment. Currently, lld orders ELF >> sections in the following order: R, RX, RWX, RW, and RW contains RELRO. At >> run time, after RELRO is write-protected, we'd have VMAs in the order of: >> R, RX, RWX, R (RELRO), RW. I'd like to propose that we move RELRO to be >> immediately after the read-only sections, so that the order of VMAs become: >> R, R (RELRO), RX, RWX, RW, and the dynamic linker would have the option to >> merge the two read-only VMAs to reduce bookkeeping costs. >> >> >> >> While I only tested this proposal on an ARM64 Android platform, the >> same optimization should be applicable to other platforms as well. My test >> showed an overall ~1MB decrease in kernel slab memory usage on >> vm_area_struct, with about 150 processes running. For this to work, I had >> to modify the dynamic linker: >> >> 1. The dynamic linker needs to make the read-only VMA briefly >> writable in order for it to have the same VM flags with the RELRO VMA so >> that they can be merged. Specifically VM_ACCOUNT is set when a VMA is made >> writable. >> >> 2. The cross-DSO CFI implementation in Android dynamic linker >> currently assumes __cfi_check is at a lower address than all CFI targets, >> so CFI check fails when RELRO is moved to below text section. After I added >> support for CFI targets below __cfi_check, I don't see CFI failures anymore. >> >> One drawback that comes with this change is that the number of LOAD >> segments increases by one for DSOs with anything other than those in RELRO >> in its RW LOAD segment. >> >> >> >> This would be a somewhat tedious change (especially the part about >> having to update all the unit tests), but the benefit is pretty good, >> especially considering the kernel slab memory is not swappable/evictable. >> Please let me know your thoughts! >> >> >> >> Thanks, >> >> Vic >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > -- > 宋方睿 >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/d16de5b7/attachment.html>