thr3ads.net - llvm dev - [llvm-dev] FENV_ACCESS and floating point LibFunc calls [May 2017]

If this information is useful, please help other people find it:
Share via:

Tim Northover via llvm-dev

2017-May-12 01:48 UTC

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

On 11 May 2017 at 18:30, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I note that on your bug that you have stated that the branch is faster than
> the conditional move. Faster code is a side effect of the fix in this
> particular case.
On the contrary: the faster code is pretty much the only reason this
can happen before the rest of the FENV support lands.

It's been said before, but I'll reiterate: LLVM IR does not model the
FENV on its instructions. CodeGen and other passes are free to
de-conditionalize exceptions, remove them, or add spurious ones just
for the giggles. What LLVM does now is not incorrect.

Tim.

Michael Clark via llvm-dev

2017-May-12 01:53 UTC

head link

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

> On 12 May 2017, at 1:48 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
> 
> On 11 May 2017 at 18:30, Michael Clark via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> I note that on your bug that you have stated that the branch is faster
than
>> the conditional move. Faster code is a side effect of the fix in this
>> particular case.
> 
> On the contrary: the faster code is pretty much the only reason this
> can happen before the rest of the FENV support lands.
> 
> It's been said before, but I'll reiterate: LLVM IR does not model
the
> FENV on its instructions. CodeGen and other passes are free to
> de-conditionalize exceptions, remove them, or add spurious ones just
> for the giggles. What LLVM does now is not incorrect.
OK. So we are in fact lucky that the correct case is actually faster, and it’s a
bug in the predicate lowering i.e. speculative execution and conditional move
being slower than a branch.

I’m curious how the select lowering models the cost, when I figure out where to
look in the codebase…

Michael.

Michael Clark via llvm-dev

2017-May-12 02:23 UTC

head link

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

> On 12 May 2017, at 1:53 PM, Michael Clark <michaeljclark at mac.com>
wrote:
> 
> 
>> On 12 May 2017, at 1:48 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
>> 
>> On 11 May 2017 at 18:30, Michael Clark via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I note that on your bug that you have stated that the branch is
faster than
>>> the conditional move. Faster code is a side effect of the fix in
this
>>> particular case.
>> 
>> On the contrary: the faster code is pretty much the only reason this
>> can happen before the rest of the FENV support lands.
>> 
>> It's been said before, but I'll reiterate: LLVM IR does not
model the
>> FENV on its instructions. CodeGen and other passes are free to
>> de-conditionalize exceptions, remove them, or add spurious ones just
>> for the giggles. What LLVM does now is not incorrect.
> 
> OK. So we are in fact lucky that the correct case is actually faster, and
it’s a bug in the predicate lowering i.e. speculative execution and conditional
move being slower than a branch.
> 
> I’m curious how the select lowering models the cost, when I figure out
where to look in the codebase…

Just as a few data points on the x86 branch predictor.

I have 6 small integer benchmarks that I am using to test a RISC-V to x86 binary
translator and I was using perf last night to read the performance counters. I
had these stats in my command line history as I was curious about branch
predictor accuracy. It seems branch prediction accuracy in all my experiments is
> 99%. Note the test programs are compiled by RISC-V GCC. RISC-V has no
conditional moves and branch mis-predict latency is only 3 cycles on Rocket, so
its also an architecture that prefers branches over predication. We are
translating RISC-V branches to x86 branches. We don’t use conditional moves in
any of our translations. I believe a predicted branch is just 1 cycle latency on
x86. Here is the translator: http://rv8.io/ <http://rv8.io/> (BTW  - the
RISC-V interpreter rv-sim seems to be a pathological test case for the
Clang/LLVM optimiser, with the Clang/LLVM code running at just over half the
speed of the GCC generated code, of course the translator is not really affected
by the speed of Clang, as we spend most time in the JIT code).


$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-sha512':

     2,386,668,826      cycles
     8,226,368,806      instructions              #    3.45  insn per cycle
       556,426,385      branches
         1,120,630      branch-misses             #    0.20% of all branches

       0.766480608 seconds time elapsed

$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-aes':

     3,390,012,091      cycles
     8,165,055,539      instructions              #    2.41  insn per cycle
       166,612,327      branches
           393,687      branch-misses             #    0.24% of all branches

       0.999783799 seconds time elapsed

$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-primes':

       585,513,229      cycles
     1,570,274,312      instructions              #    2.68  insn per cycle
       199,550,674      branches
         1,373,005      branch-misses             #    0.69% of all branches

       0.180905897 seconds time elapsed

$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-miniz':

     7,181,383,837      cycles
    12,171,106,005      instructions              #    1.69  insn per cycle
     1,309,704,230      branches
        10,246,710      branch-misses             #    0.78% of all branches

       2.120649526 seconds time elapsed

$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-dhrystone':

     1,705,866,284      cycles
     5,902,622,960      instructions              #    3.46  insn per cycle
       852,430,738      branches
            65,576      branch-misses             #    0.01% of all branches

       0.530201822 seconds time elapsed

$ perf stat -e cycles,instructions,branches,branch-misses
./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort

 Performance counter stats for './build/linux_x86_64/bin/rv-jit
build/riscv64-unknown-elf/bin/test-qsort':

       951,218,523      cycles
     2,060,457,742      instructions              #    2.17  insn per cycle
       432,171,433      branches
         3,844,290      branch-misses             #    0.89% of all branches

       0.288089656 seconds time elapsed

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170512/4d3c8704/attachment.html>

llvm dev - May 2017 - FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls