thr3ads.net - llvm dev - [llvm-dev] Adding support for self-modifying branches to LLVM? [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2016-Jan-21 21:33 UTC

[llvm-dev] Adding support for self-modifying branches to LLVM?

On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:>
> AFAIK, the cost of a well-predicted, not-taken branch is the same as a 
> nop on every x86 made in the last many years. 
> See http://www.agner.org/optimize/instruction_tables.pdf
> <http://www.agner.org/optimize/instruction_tables.pdf>
> Generally speaking a correctly-predicted not-taken branch is basically 
> identical to a nop, and a correctly-predicted taken branch is has an 
> extra overhead similar to an "add" or other extremely cheap
operation.Specifically on this point only: While absolutely true for most 
micro-benchmarks, this is less true at large scale.  I've definitely 
seen removing a highly predictable branch (in many, many places, some of 
which are hot) to benefit performance in the 5-10% range. For instance, 
removing highly predictable branches is the primary motivation of 
implicit null checking. (http://llvm.org/docs/FaultMaps.html).  Where 
exactly the performance improvement comes from is hard to say, but, 
empirically, it does matter.

(Caveat to above: I have not run an experiment that actually put in the 
same number of bytes in nops.  It's possible the entire benefit I 
mentioned is code size related, but I doubt it given how many ticks a 
sample profiler will show on said branches.)

p.s. Sean mentions down-thread that most of the slowdown from checks is 
in the effect on the optimizer, not the direct impact of the 
instructions emitted.  This is absolutely our experience as well.  I 
don't intend for anything I said above to imply otherwise.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/5c7b9209/attachment.html>

Sean Silva via llvm-dev

2016-Jan-21 21:51 UTC

head link

[llvm-dev] Adding support for self-modifying branches to LLVM?

On Thu, Jan 21, 2016 at 1:33 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
>
> On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:
>
>
> AFAIK, the cost of a well-predicted, not-taken branch is the same as a nop
> on every x86 made in the last many years. See
> http://www.agner.org/optimize/instruction_tables.pdf
> <http://www.agner.org/optimize/instruction_tables.pdf>
> Generally speaking a correctly-predicted not-taken branch is basically
> identical to a nop, and a correctly-predicted taken branch is has an extra
> overhead similar to an "add" or other extremely cheap operation.
>
> Specifically on this point only: While absolutely true for most
> micro-benchmarks, this is less true at large scale.  I've definitely
seen
> removing a highly predictable branch (in many, many places, some of which
> are hot) to benefit performance in the 5-10% range.  For instance, removing
> highly predictable branches is the primary motivation of implicit null
> checking.  (http://llvm.org/docs/FaultMaps.html).  Where exactly the
> performance improvement comes from is hard to say, but, empirically, it
> does matter.
>
> (Caveat to above: I have not run an experiment that actually put in the
> same number of bytes in nops.  It's possible the entire benefit I
mentioned
> is code size related, but I doubt it given how many ticks a sample profiler
> will show on said branches.)
>
Interesting. Another possible explanation is that these extra branches
cause contention on branch-prediction resources. In the past when talking
with Dan about WebAssembly sandboxing, IIRC he said that they found about
15% overhead, due primarily to branch-prediction resource contention. In
fact I think they had a pretty clear idea of wanting a new instruction
which is just a "statically predict never taken and don't use any
branch-prediction resources" branch (this is on x86 IIRC; some arches
actually obviously have such an instruction!).

-- Sean Silva



>
> p.s. Sean mentions down-thread that most of the slowdown from checks is in
> the effect on the optimizer, not the direct impact of the instructions
> emitted.  This is absolutely our experience as well.  I don't intend
for
> anything I said above to imply otherwise.
>
> Philip
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/2d6db705/attachment.html>

Philip Reames via llvm-dev

2016-Jan-21 22:09 UTC

head link

[llvm-dev] Adding support for self-modifying branches to LLVM?

On 01/21/2016 01:51 PM, Sean Silva wrote:>
>
> On Thu, Jan 21, 2016 at 1:33 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>
>
>     On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:
>>
>>     AFAIK, the cost of a well-predicted, not-taken branch is the same
>>     as a nop on every x86 made in the last many years. See
>>     http://www.agner.org/optimize/instruction_tables.pdf
>>     Generally speaking a correctly-predicted not-taken branch is
>>     basically identical to a nop, and a correctly-predicted taken
>>     branch is has an extra overhead similar to an "add" or
other
>>     extremely cheap operation.
>     Specifically on this point only: While absolutely true for most
>     micro-benchmarks, this is less true at large scale.  I've
>     definitely seen removing a highly predictable branch (in many,
>     many places, some of which are hot) to benefit performance in the
>     5-10% range.  For instance, removing highly predictable branches
>     is the primary motivation of implicit null checking. 
>     (http://llvm.org/docs/FaultMaps.html). Where exactly the
>     performance improvement comes from is hard to say, but,
>     empirically, it does matter.
>
>     (Caveat to above: I have not run an experiment that actually put
>     in the same number of bytes in nops.  It's possible the entire
>     benefit I mentioned is code size related, but I doubt it given how
>     many ticks a sample profiler will show on said branches.)
>
>
> Interesting. Another possible explanation is that these extra branches 
> cause contention on branch-prediction resources.I've heard and proposed this explanation in the past as well, but I've 
never heard of anyone able to categorically answer the question.

The other explanation I've considered is that the processor has a finite 
speculation depth (i.e. how many in flight predicted branches), and the 
extra branches cause the processor to not be able to speculate 
"interesting" branches because they're full of uninteresting ones.
However, my hardware friends tell me this is a somewhat questionable 
explanation since the check branches should be easy to satisfy and 
retire quickly.
> In the past when talking with Dan about WebAssembly sandboxing, IIRC 
> he said that they found about 15% overhead, due primarily to 
> branch-prediction resource contention.15% seems a bit high to me, but I don't have anything concrete to share 
here unfortunately.> In fact I think they had a pretty clear idea of wanting a new 
> instruction which is just a "statically predict never taken and
don't
> use any branch-prediction resources" branch (this is on x86 IIRC; some
> arches actually obviously have such an instruction!).This has been on my wish list for a while.  It would make many things so 
much easier.

The sickly amusing bit is that x86 has two different forms of this, 
neither of which actually work:
1) There are prefixes for branches which are supposed to control the 
prediction direction.  My understanding is that code which tried using 
them was so often wrong, that modern chips interpret them as nop 
padding.  We actually use this to produce near arbitrary length nops.  :)
2) x86 (but not x86-64) had a "into" instruction which triggered an 
interrupt if the overflow bit is set.  (Hey, signal handlers are just 
weird branches right? :p)  However, this does not work as designed in 
x86-64.  My understanding is that the original AMD implementation had a 
bug in this instruction and the bug essentially got written into the 
spec for all future chips.  :(

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/b80b7e9f/attachment-0001.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jan 2016 - Adding support for self-modifying branches to LLVM?

[llvm-dev] Adding support for self-modifying branches to LLVM?

[llvm-dev] Adding support for self-modifying branches to LLVM?

[llvm-dev] Adding support for self-modifying branches to LLVM?

Maybe Matching Threads