Philip Reames via llvm-dev
2016-Jan-21 21:33 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:> > AFAIK, the cost of a well-predicted, not-taken branch is the same as a > nop on every x86 made in the last many years. > See http://www.agner.org/optimize/instruction_tables.pdf > <http://www.agner.org/optimize/instruction_tables.pdf> > Generally speaking a correctly-predicted not-taken branch is basically > identical to a nop, and a correctly-predicted taken branch is has an > extra overhead similar to an "add" or other extremely cheap operation.Specifically on this point only: While absolutely true for most micro-benchmarks, this is less true at large scale. I've definitely seen removing a highly predictable branch (in many, many places, some of which are hot) to benefit performance in the 5-10% range. For instance, removing highly predictable branches is the primary motivation of implicit null checking. (http://llvm.org/docs/FaultMaps.html). Where exactly the performance improvement comes from is hard to say, but, empirically, it does matter. (Caveat to above: I have not run an experiment that actually put in the same number of bytes in nops. It's possible the entire benefit I mentioned is code size related, but I doubt it given how many ticks a sample profiler will show on said branches.) p.s. Sean mentions down-thread that most of the slowdown from checks is in the effect on the optimizer, not the direct impact of the instructions emitted. This is absolutely our experience as well. I don't intend for anything I said above to imply otherwise. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/5c7b9209/attachment.html>
Sean Silva via llvm-dev
2016-Jan-21 21:51 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
On Thu, Jan 21, 2016 at 1:33 PM, Philip Reames <listmail at philipreames.com> wrote:> > > On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: > > > AFAIK, the cost of a well-predicted, not-taken branch is the same as a nop > on every x86 made in the last many years. See > http://www.agner.org/optimize/instruction_tables.pdf > <http://www.agner.org/optimize/instruction_tables.pdf> > Generally speaking a correctly-predicted not-taken branch is basically > identical to a nop, and a correctly-predicted taken branch is has an extra > overhead similar to an "add" or other extremely cheap operation. > > Specifically on this point only: While absolutely true for most > micro-benchmarks, this is less true at large scale. I've definitely seen > removing a highly predictable branch (in many, many places, some of which > are hot) to benefit performance in the 5-10% range. For instance, removing > highly predictable branches is the primary motivation of implicit null > checking. (http://llvm.org/docs/FaultMaps.html). Where exactly the > performance improvement comes from is hard to say, but, empirically, it > does matter. > > (Caveat to above: I have not run an experiment that actually put in the > same number of bytes in nops. It's possible the entire benefit I mentioned > is code size related, but I doubt it given how many ticks a sample profiler > will show on said branches.) >Interesting. Another possible explanation is that these extra branches cause contention on branch-prediction resources. In the past when talking with Dan about WebAssembly sandboxing, IIRC he said that they found about 15% overhead, due primarily to branch-prediction resource contention. In fact I think they had a pretty clear idea of wanting a new instruction which is just a "statically predict never taken and don't use any branch-prediction resources" branch (this is on x86 IIRC; some arches actually obviously have such an instruction!). -- Sean Silva> > p.s. Sean mentions down-thread that most of the slowdown from checks is in > the effect on the optimizer, not the direct impact of the instructions > emitted. This is absolutely our experience as well. I don't intend for > anything I said above to imply otherwise. > > Philip > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/2d6db705/attachment.html>
Philip Reames via llvm-dev
2016-Jan-21 22:09 UTC
[llvm-dev] Adding support for self-modifying branches to LLVM?
On 01/21/2016 01:51 PM, Sean Silva wrote:> > > On Thu, Jan 21, 2016 at 1:33 PM, Philip Reames > <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: > > > > On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: >> >> AFAIK, the cost of a well-predicted, not-taken branch is the same >> as a nop on every x86 made in the last many years. See >> http://www.agner.org/optimize/instruction_tables.pdf >> Generally speaking a correctly-predicted not-taken branch is >> basically identical to a nop, and a correctly-predicted taken >> branch is has an extra overhead similar to an "add" or other >> extremely cheap operation. > Specifically on this point only: While absolutely true for most > micro-benchmarks, this is less true at large scale. I've > definitely seen removing a highly predictable branch (in many, > many places, some of which are hot) to benefit performance in the > 5-10% range. For instance, removing highly predictable branches > is the primary motivation of implicit null checking. > (http://llvm.org/docs/FaultMaps.html). Where exactly the > performance improvement comes from is hard to say, but, > empirically, it does matter. > > (Caveat to above: I have not run an experiment that actually put > in the same number of bytes in nops. It's possible the entire > benefit I mentioned is code size related, but I doubt it given how > many ticks a sample profiler will show on said branches.) > > > Interesting. Another possible explanation is that these extra branches > cause contention on branch-prediction resources.I've heard and proposed this explanation in the past as well, but I've never heard of anyone able to categorically answer the question. The other explanation I've considered is that the processor has a finite speculation depth (i.e. how many in flight predicted branches), and the extra branches cause the processor to not be able to speculate "interesting" branches because they're full of uninteresting ones. However, my hardware friends tell me this is a somewhat questionable explanation since the check branches should be easy to satisfy and retire quickly.> In the past when talking with Dan about WebAssembly sandboxing, IIRC > he said that they found about 15% overhead, due primarily to > branch-prediction resource contention.15% seems a bit high to me, but I don't have anything concrete to share here unfortunately.> In fact I think they had a pretty clear idea of wanting a new > instruction which is just a "statically predict never taken and don't > use any branch-prediction resources" branch (this is on x86 IIRC; some > arches actually obviously have such an instruction!).This has been on my wish list for a while. It would make many things so much easier. The sickly amusing bit is that x86 has two different forms of this, neither of which actually work: 1) There are prefixes for branches which are supposed to control the prediction direction. My understanding is that code which tried using them was so often wrong, that modern chips interpret them as nop padding. We actually use this to produce near arbitrary length nops. :) 2) x86 (but not x86-64) had a "into" instruction which triggered an interrupt if the overflow bit is set. (Hey, signal handlers are just weird branches right? :p) However, this does not work as designed in x86-64. My understanding is that the original AMD implementation had a bug in this instruction and the bug essentially got written into the spec for all future chips. :( Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/b80b7e9f/attachment-0001.html>
Possibly Parallel Threads
- Adding support for self-modifying branches to LLVM?
- Adding support for self-modifying branches to LLVM?
- Adding support for self-modifying branches to LLVM?
- Adding support for self-modifying branches to LLVM?
- -sanitizer-coverage-prune-blocks=true and LibFuzzer