Renato Golin via llvm-dev
2015-Sep-25 08:19 UTC
[llvm-dev] Dynamic VMA in Sanitizers for AArch64
Hi folks, After long talks with lots of people, I think we have a winning strategy to deal with the variable nature of VMA address in AArch64. It seems that the best way forward is to try the dynamic calculation at runtime, evaluate the performance, and then, only if the hit is too great, think about compile-time alternatives. I'd like to know if everyone is in agreement, so we could get cracking. The Issues If you're not familiar with the problem, here's a quick run down... On most systems, the VMA address (and thus shadow mask and shift value) is a constant. This produces very efficient code, as the shadow address computation becomes an immediate shift plus a constant mask. But AArch64 is different. In order to execute 32-bit code, the kernel has to use 4k pages, and that is currently configured with either 39 or 48 bits VMA. For 64-bit only, 64k pages are set, and you can use either 42 or 48 VMA address. In theory, the kernel could use even more bits and different page sizes, and systems are free to choose, and have done so different values already. What it means is that the VMA value can change depending on the kernel, and cross-compilation for testing on multiple systems will not work unless the true value is computed at runtime. But it also means that the value will have to be stored in a global constant, which will require additional loads and register shifts per instrumentation, which can slow down the execution even further. The Current Status Right now, in order to test it, we made it into a compiler-build option. As you build Clang/LLVM, you can use a CMake option to set the VMA, with 39 being the default. We have 39 and 42 buildbots to make sure all works well, but that's clearly the wrong solution for anything other than enablement. With all the sanitizers going in for AArch64, we can now focus on making a good implementation for the VMA issue, in a way that benefits both LLVM and GCC, since they have different usages (static vs dynamic linkage). With the build time option and making it static, we have the best performance we could ever have. Means that any further change will impact performance, but they're necessary, so we just need to take the lower cost / higher benefit option. The Options The two options we have are: 1. Dynamic VMA: instrument main() to check the VMA value and set a global value. Instrument each function to load that value into a local register. Instrument each load/store/malloc/free to check the VMA based on that register. This may be optimised by the compiler for the compiler instrumented code, but will not for the library calls. 2. Add a compiler option -mvma=NN that chooses at compile time the VMA and makes it static in the user code. This has the same performance as currently for compiler instrumented code, but will not be for library calls, especially for the dynamic version. This is faster, but it's also less flexible than option 1, though more flexible than the current implementation. The Plan Right now, we're planning on implementing the full dynamic VMA and investigate the performance impacts. If it is within acceptable ranges, we just go along with it, and check the compile-time flag at a later time, as further optimisation. If impact is too great, we might want to profile and implement -mvma straight after the dynamic VMA checks. If that's the case, we should keep *both* implementations, so that users could choose what suits them best. Either way, I'd like to get the opinion of everybody to make sure I'm not forgetting anything before we start cracking the problem into an acceptable solution. cheers, --renato
Kristof Beyls via llvm-dev
2015-Sep-25 08:30 UTC
[llvm-dev] Dynamic VMA in Sanitizers for AArch64
Thanks for writing this up Renato. What you describe below has been the option I've preferred for a while, so it looks like a good approach to me. I just wanted to note that on AArch64, having the shadow offset in a register rather than as an immediate, could result in faster execution rather than slower, as the computation of the shadow address can be done in a single instruction rather than 2 that way. Assuming x(SHADOW_OFFSET) is the register containing the shadow offset: add x8, x(SHADOW_OFFSET), x0, #3 instead of lsr x8, x0, #3 orr x8, x8, #0x1000000000 But as you say, it'll need to be measured what the overall performance effect is of dynamic VMA support. Thanks, Kristof> -----Original Message----- > From: Renato Golin [mailto:renato.golin at linaro.org] > Sent: 25 September 2015 09:20 > To: Kostya Serebryany; Evgenii Stepanov; Kristof Beyls; James Molloy; > Adhemerval Zanella; Saleem Abdulrasool; Christophe Lyon > Cc: Jakub Jelinek; Ramana Radhakrishnan; Will Deacon; LLVM Dev > Subject: Dynamic VMA in Sanitizers for AArch64 > > Hi folks, > > After long talks with lots of people, I think we have a winning strategy > to deal with the variable nature of VMA address in AArch64. > It seems that the best way forward is to try the dynamic calculation at > runtime, evaluate the performance, and then, only if the hit is too > great, think about compile-time alternatives. I'd like to know if > everyone is in agreement, so we could get cracking. > > > The Issues > > If you're not familiar with the problem, here's a quick run down... > > On most systems, the VMA address (and thus shadow mask and shift > value) is a constant. This produces very efficient code, as the shadow > address computation becomes an immediate shift plus a constant mask. > But AArch64 is different. > > In order to execute 32-bit code, the kernel has to use 4k pages, and > that is currently configured with either 39 or 48 bits VMA. For 64-bit > only, 64k pages are set, and you can use either 42 or 48 VMA address. > In theory, the kernel could use even more bits and different page sizes, > and systems are free to choose, and have done so different values > already. > > What it means is that the VMA value can change depending on the kernel, > and cross-compilation for testing on multiple systems will not work > unless the true value is computed at runtime. But it also means that the > value will have to be stored in a global constant, which will require > additional loads and register shifts per instrumentation, which can slow > down the execution even further. > > > The Current Status > > Right now, in order to test it, we made it into a compiler-build option. > As you build Clang/LLVM, you can use a CMake option to set the VMA, with > 39 being the default. We have 39 and 42 buildbots to make sure all works > well, but that's clearly the wrong solution for anything other than > enablement. > > With all the sanitizers going in for AArch64, we can now focus on making > a good implementation for the VMA issue, in a way that benefits both > LLVM and GCC, since they have different usages (static vs dynamic > linkage). > > With the build time option and making it static, we have the best > performance we could ever have. Means that any further change will > impact performance, but they're necessary, so we just need to take the > lower cost / higher benefit option. > > > The Options > > The two options we have are: > > 1. Dynamic VMA: instrument main() to check the VMA value and set a > global value. Instrument each function to load that value into a local > register. Instrument each load/store/malloc/free to check the VMA based > on that register. This may be optimised by the compiler for the compiler > instrumented code, but will not for the library calls. > > 2. Add a compiler option -mvma=NN that chooses at compile time the VMA > and makes it static in the user code. This has the same performance as > currently for compiler instrumented code, but will not be for library > calls, especially for the dynamic version. This is faster, but it's also > less flexible than option 1, though more flexible than the current > implementation. > > > The Plan > > Right now, we're planning on implementing the full dynamic VMA and > investigate the performance impacts. If it is within acceptable ranges, > we just go along with it, and check the compile-time flag at a later > time, as further optimisation. > > If impact is too great, we might want to profile and implement -mvma > straight after the dynamic VMA checks. If that's the case, we should > keep *both* implementations, so that users could choose what suits them > best. > > Either way, I'd like to get the opinion of everybody to make sure I'm > not forgetting anything before we start cracking the problem into an > acceptable solution. > > cheers, > --renato
Jakub Jelinek via llvm-dev
2015-Sep-25 08:53 UTC
[llvm-dev] Dynamic VMA in Sanitizers for AArch64
On Fri, Sep 25, 2015 at 01:19:48AM -0700, Renato Golin wrote:> After long talks with lots of people, I think we have a winning > strategy to deal with the variable nature of VMA address in AArch64. > It seems that the best way forward is to try the dynamic calculation > at runtime, evaluate the performance, and then, only if the hit is too > great, think about compile-time alternatives. I'd like to know if > everyone is in agreement, so we could get cracking.You mean you want a dynamic shadow offset on aarch64 as opposed to fixed kAArch64_ShadowOffset64 (1UL << 36) one? I think this is completely unnecessary, all you need to change is libsanitizer internals IMNSHO, all that is needed is to make runtime decisions during libasan initialization on the memory layout, and also (because 39 bit VMA is too slow), dynamic decision whether to use 32-bit or 64-bit allocator. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64435 for details. Jakub
Yury Gribov via llvm-dev
2015-Sep-25 10:27 UTC
[llvm-dev] Dynamic VMA in Sanitizers for AArch64
On 09/25/2015 11:53 AM, Jakub Jelinek via llvm-dev wrote:> On Fri, Sep 25, 2015 at 01:19:48AM -0700, Renato Golin wrote: >> After long talks with lots of people, I think we have a winning >> strategy to deal with the variable nature of VMA address in AArch64. >> It seems that the best way forward is to try the dynamic calculation >> at runtime, evaluate the performance, and then, only if the hit is too >> great, think about compile-time alternatives. I'd like to know if >> everyone is in agreement, so we could get cracking. > > You mean you want a dynamic shadow offset on aarch64 as opposed to > fixed kAArch64_ShadowOffset64 (1UL << 36) one? > I think this is completely unnecessary, all you need to change is > libsanitizer internals IMNSHO, all that is needed is to make runtime > decisions during libasan initialization on the memory layout, and > also (because 39 bit VMA is too slow), dynamic decision whether to use > 32-bit or 64-bit allocator. See > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64435 > for details.Added kcc. FYI optional dynamic offset would also help in not-so-rare situations when ASan's shadow range is stolen by early constructors in unsanitized libraries. -Y