Jonas Wagner via llvm-dev
2016-Feb-09 14:57 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
Hi, I'm coming back to this old thread with data about the performance of NOPs. Recalling that I was considering transforming NOP instructions into branches and back, in order to dynamically enable code. One use case for this was enabling/disabling individual sanitizer checks (ASan, UBSan) on demand. I wrote a pass which takes an ASan-instrumented program, and replaces each ASan check with an llvm.experimental.patchpoint intrinsic. This intrinsic inserts a NOP of configurable size. It has otherwise no effect on the program semantics. It does prevent some optimizations, presumably because instructions cannot be moved across the patchpoint. Some results: - On SPEC, patchpoints introduce an overhead of ~25% compared to a version where ASan checks are removed. - This is almost half of the cost of the checks themselves. - The results are similar for NOPs of size 1 and 5 bytes. - Interestingly, the results are similar for NOPs of 0 bytes, too. These are patchpoints that don't insert any code and only inhibit optimizations. I've only tested this on one benchmark, though. To summarize, only part of the cost of NOPs is due to executing them. Their effect on optimizations is significant, too. I guess this would hold for branches and sanitizer checks as well. Best, Jonas On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner <jonas.wagner at epfl.ch> wrote:> Hello, > > There is some data on this, e.g, in “High System-Code Security with Low > Overhead” <http://dslab.epfl.ch/proj/asap/#publications>. In this work we > found that, for ASan as well as other instrumentation tools, most overhead > comes from the checks. Especially for CPU-intensive applications, the cost > of maintaining shadow memory is small. > > How did you measure this? If it was measured by removing the checks before > optimization happens, then what you may have been measuring is not the > execution overhead of the branches (which is what would be eliminated by > nop’ing them out) but the effect on the optimizer. > > Interesting. Indeed this was measured by removing some checks and then > re-optimizing the program. > > I’m aware of some impact checks may have on optimization. For example, > I’ve seen cases where much less inlining happens because functions with > checks are larger. Do you know other concrete examples? This is definitely > something I’ll have to be careful about. Philip Reames confirms this, too. > > On the other hand, we’ve also found that the benefit from removing a check > is roughly proportional to the number of cycles spent executing that > check’s instructions. Our model of this is not very precise, but it shows > that the cost of executing the check’s instructions matters. > > I'll try to measure this, and will come back when I have data. > > Best, > Jonas > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/2eb19061/attachment.html>
Philip Reames via llvm-dev
2016-Feb-09 16:07 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
On 02/09/2016 06:57 AM, Jonas Wagner wrote:> Hi, > > I'm coming back to this old thread with data about the performance of > NOPs. Recalling that I was considering transforming NOP instructions > into branches and back, in order to dynamically enable code. One use > case for this was enabling/disabling individual sanitizer checks > (ASan, UBSan) on demand. > > I wrote a pass which takes an ASan-instrumented program, and replaces > each ASan check with an llvm.experimental.patchpoint intrinsic. This > intrinsic inserts a NOP of configurable size. It has otherwise no > effect on the program semantics. It does prevent some optimizations, > presumably because instructions cannot be moved across the patchpoint. > > Some results: > - On SPEC, patchpoints introduce an overhead of ~25% compared to a > version where ASan checks are removed. > - This is almost half of the cost of the checks themselves. > - The results are similar for NOPs of size 1 and 5 bytes. > - Interestingly, the results are similar for NOPs of 0 bytes, too. > These are patchpoints that don't insert any code and only inhibit > optimizations. I've only tested this on one benchmark, though. > > To summarize, only part of the cost of NOPs is due to executing them. > Their effect on optimizations is significant, too. I guess this would > hold for branches and sanitizer checks as well.I don't think you can really draw strong conclusions from the experiments you described. What you've ended up measuring is nearly the impact of not optimizing over patchpoints at the check locations. This doesn't really tell you much about what a check (which is likely to inhibit optimization much less) costs over a nop at the same position. One bit of data you could extract from the experiment as constructed would be the relative cost of extra nops. You do mention that the results are similar for sizes 1-5 bytes, but similar is very vague in this context. Are the results statistically indistinguishable? Or is there a noticeable but small slowdown that results? (Numbers would be great here.)> > Best, > Jonas > > > On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner <jonas.wagner at epfl.ch > <mailto:jonas.wagner at epfl.ch>> wrote: > > Hello, > > There is some data on this, e.g, in “High System-Code > Security with Low Overhead” > <http://dslab.epfl.ch/proj/asap/#publications>. In this > work we found that, for ASan as well as other > instrumentation tools, most overhead comes from the > checks. Especially for CPU-intensive applications, the > cost of maintaining shadow memory is small. > > How did you measure this? If it was measured by removing the > checks before optimization happens, then what you may have > been measuring is not the execution overhead of the branches > (which is what would be eliminated by nop’ing them out) but > the effect on the optimizer. > > Interesting. Indeed this was measured by removing some checks and > then re-optimizing the program. > > I’m aware of some impact checks may have on optimization. For > example, I’ve seen cases where much less inlining happens because > functions with checks are larger. Do you know other concrete > examples? This is definitely something I’ll have to be careful > about. Philip Reames confirms this, too. > > On the other hand, we’ve also found that the benefit from removing > a check is roughly proportional to the number of cycles spent > executing that check’s instructions. Our model of this is not very > precise, but it shows that the cost of executing the check’s > instructions matters. > > I'll try to measure this, and will come back when I have data. > > Best, > Jonas > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/e1c3ae04/attachment.html>
Sean Silva via llvm-dev
2016-Feb-09 22:22 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
On Tue, Feb 9, 2016 at 8:07 AM, Philip Reames <listmail at philipreames.com> wrote:> > > On 02/09/2016 06:57 AM, Jonas Wagner wrote: > > Hi, > > I'm coming back to this old thread with data about the performance of > NOPs. Recalling that I was considering transforming NOP instructions into > branches and back, in order to dynamically enable code. One use case for > this was enabling/disabling individual sanitizer checks (ASan, UBSan) on > demand. > > I wrote a pass which takes an ASan-instrumented program, and replaces each > ASan check with an llvm.experimental.patchpoint intrinsic. This intrinsic > inserts a NOP of configurable size. It has otherwise no effect on the > program semantics. It does prevent some optimizations, presumably because > instructions cannot be moved across the patchpoint. > > Some results: > - On SPEC, patchpoints introduce an overhead of ~25% compared to a version > where ASan checks are removed. > - This is almost half of the cost of the checks themselves. > - The results are similar for NOPs of size 1 and 5 bytes. > - Interestingly, the results are similar for NOPs of 0 bytes, too. These > are patchpoints that don't insert any code and only inhibit optimizations. > I've only tested this on one benchmark, though. > > To summarize, only part of the cost of NOPs is due to executing them. > Their effect on optimizations is significant, too. I guess this would hold > for branches and sanitizer checks as well. > > I don't think you can really draw strong conclusions from the experiments > you described. What you've ended up measuring is nearly the impact of not > optimizing over patchpoints at the check locations. This doesn't really > tell you much about what a check (which is likely to inhibit optimization > much less) costs over a nop at the same position. > > One bit of data you could extract from the experiment as constructed would > be the relative cost of extra nops. You do mention that the results are > similar for sizes 1-5 bytes, but similar is very vague in this context. > Are the results statistically indistinguishable? Or is there a noticeable > but small slowdown that results? (Numbers would be great here.) >In this same vein, try inserting 1,2,3,4,5,6,... nops and measure the performance impact (the total size of nops is also interesting but is more difficult to measure reliably). I've used this kind of technique successfully in the past for e.g. measuring the cost of "stat" syscalls on windows. I call the technique "stuffing". Basically, make a plot of the performance degradation as you insert more and more redundant stuff (e.g. 1 nop, 2 nops, 3 nops, etc.). If the result is a strong linear trend, then you can pretty confidently extrapolate backward to the "0 nop" case to see the overhead of inserting 1 nop. -- Sean Silva> > > > Best, > Jonas > > > On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner <jonas.wagner at epfl.ch> > wrote: > >> Hello, >> >> There is some data on this, e.g, in “High System-Code Security with Low >> Overhead” <http://dslab.epfl.ch/proj/asap/#publications>. In this work >> we found that, for ASan as well as other instrumentation tools, most >> overhead comes from the checks. Especially for CPU-intensive applications, >> the cost of maintaining shadow memory is small. >> >> How did you measure this? If it was measured by removing the checks >> before optimization happens, then what you may have been measuring is not >> the execution overhead of the branches (which is what would be eliminated >> by nop’ing them out) but the effect on the optimizer. >> >> Interesting. Indeed this was measured by removing some checks and then >> re-optimizing the program. >> >> I’m aware of some impact checks may have on optimization. For example, >> I’ve seen cases where much less inlining happens because functions with >> checks are larger. Do you know other concrete examples? This is definitely >> something I’ll have to be careful about. Philip Reames confirms this, too. >> >> On the other hand, we’ve also found that the benefit from removing a >> check is roughly proportional to the number of cycles spent executing that >> check’s instructions. Our model of this is not very precise, but it shows >> that the cost of executing the check’s instructions matters. >> >> I'll try to measure this, and will come back when I have data. >> >> Best, >> Jonas >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/197c0df6/attachment.html>