Tim Northover via llvm-dev
2017-May-12 01:48 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
On 11 May 2017 at 18:30, Michael Clark via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I note that on your bug that you have stated that the branch is faster than > the conditional move. Faster code is a side effect of the fix in this > particular case.On the contrary: the faster code is pretty much the only reason this can happen before the rest of the FENV support lands. It's been said before, but I'll reiterate: LLVM IR does not model the FENV on its instructions. CodeGen and other passes are free to de-conditionalize exceptions, remove them, or add spurious ones just for the giggles. What LLVM does now is not incorrect. Tim.
Michael Clark via llvm-dev
2017-May-12 01:53 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
> On 12 May 2017, at 1:48 PM, Tim Northover <t.p.northover at gmail.com> wrote: > > On 11 May 2017 at 18:30, Michael Clark via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> I note that on your bug that you have stated that the branch is faster than >> the conditional move. Faster code is a side effect of the fix in this >> particular case. > > On the contrary: the faster code is pretty much the only reason this > can happen before the rest of the FENV support lands. > > It's been said before, but I'll reiterate: LLVM IR does not model the > FENV on its instructions. CodeGen and other passes are free to > de-conditionalize exceptions, remove them, or add spurious ones just > for the giggles. What LLVM does now is not incorrect.OK. So we are in fact lucky that the correct case is actually faster, and it’s a bug in the predicate lowering i.e. speculative execution and conditional move being slower than a branch. I’m curious how the select lowering models the cost, when I figure out where to look in the codebase… Michael.
Michael Clark via llvm-dev
2017-May-12 02:23 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
> On 12 May 2017, at 1:53 PM, Michael Clark <michaeljclark at mac.com> wrote: > > >> On 12 May 2017, at 1:48 PM, Tim Northover <t.p.northover at gmail.com> wrote: >> >> On 11 May 2017 at 18:30, Michael Clark via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> I note that on your bug that you have stated that the branch is faster than >>> the conditional move. Faster code is a side effect of the fix in this >>> particular case. >> >> On the contrary: the faster code is pretty much the only reason this >> can happen before the rest of the FENV support lands. >> >> It's been said before, but I'll reiterate: LLVM IR does not model the >> FENV on its instructions. CodeGen and other passes are free to >> de-conditionalize exceptions, remove them, or add spurious ones just >> for the giggles. What LLVM does now is not incorrect. > > OK. So we are in fact lucky that the correct case is actually faster, and it’s a bug in the predicate lowering i.e. speculative execution and conditional move being slower than a branch. > > I’m curious how the select lowering models the cost, when I figure out where to look in the codebase…Just as a few data points on the x86 branch predictor. I have 6 small integer benchmarks that I am using to test a RISC-V to x86 binary translator and I was using perf last night to read the performance counters. I had these stats in my command line history as I was curious about branch predictor accuracy. It seems branch prediction accuracy in all my experiments is > 99%. Note the test programs are compiled by RISC-V GCC. RISC-V has no conditional moves and branch mis-predict latency is only 3 cycles on Rocket, so its also an architecture that prefers branches over predication. We are translating RISC-V branches to x86 branches. We don’t use conditional moves in any of our translations. I believe a predicted branch is just 1 cycle latency on x86. Here is the translator: http://rv8.io/ <http://rv8.io/> (BTW - the RISC-V interpreter rv-sim seems to be a pathological test case for the Clang/LLVM optimiser, with the Clang/LLVM code running at just over half the speed of the GCC generated code, of course the translator is not really affected by the speed of Clang, as we spend most time in the JIT code). $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512 Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512': 2,386,668,826 cycles 8,226,368,806 instructions # 3.45 insn per cycle 556,426,385 branches 1,120,630 branch-misses # 0.20% of all branches 0.766480608 seconds time elapsed $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes': 3,390,012,091 cycles 8,165,055,539 instructions # 2.41 insn per cycle 166,612,327 branches 393,687 branch-misses # 0.24% of all branches 0.999783799 seconds time elapsed $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes': 585,513,229 cycles 1,570,274,312 instructions # 2.68 insn per cycle 199,550,674 branches 1,373,005 branch-misses # 0.69% of all branches 0.180905897 seconds time elapsed $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz': 7,181,383,837 cycles 12,171,106,005 instructions # 1.69 insn per cycle 1,309,704,230 branches 10,246,710 branch-misses # 0.78% of all branches 2.120649526 seconds time elapsed $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone': 1,705,866,284 cycles 5,902,622,960 instructions # 3.46 insn per cycle 852,430,738 branches 65,576 branch-misses # 0.01% of all branches 0.530201822 seconds time elapsed $ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort': 951,218,523 cycles 2,060,457,742 instructions # 2.17 insn per cycle 432,171,433 branches 3,844,290 branch-misses # 0.89% of all branches 0.288089656 seconds time elapsed -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170512/4d3c8704/attachment.html>