Philip Reames
2014-Mar-26 21:34 UTC
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
On 03/26/2014 11:36 AM, Geoffrey Irving wrote:> I am trying to compute conservative lower and upper bounds for the > square of a double. I have set the rounding mode to FE_UPWARDS > elsewhere, so the code is > > struct Interval { > double nlo, hi; > }; > > Interval inspect_singleton_sqr(const double x) { > Interval s; > s.nlo = x * -x; > s.hi = x * x; > return s; > } > > Both multiplies are necessary, since they round in different > directions. However, clang does not know this, assumes that x * -x > -(x * x), and simplifies down to a single multiply: > > .LCPI1_0: > .quad -9223372036854775808 # double -0.000000e+00 > .quad -9223372036854775808 # double -0.000000e+00 > .text > .globl _Z21inspect_singleton_sqrd > .align 16, 0x90 > .type _Z21inspect_singleton_sqrd, at function > _Z21inspect_singleton_sqrd: # @_Z21inspect_singleton_sqrd > .cfi_startproc > # BB#0: > vmulsd %xmm0, %xmm0, %xmm1 > vxorpd .LCPI1_0(%rip), %xmm1, %xmm0 > ret > .Ltmp1: > .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd > .cfi_endproc > > I realize this is unsupported behavior, but it would be nice to still > be able to use clang to do numerical computation. Is there a way to > convince clang to not pull the multiplication outside addition? I > would like to avoid interfering with other optimizations in the > process: the problem is easy to solve by making a multiply function > and marking it never inline, but that would be terrible for > performance. I am happy to write clang specific code that turns on > only when __clang__ is defined. > > Thanks, > Geoffrey >Geoffrey - In the short term, if you marked this function as "/__attribute__((optnone))"/, that might have the effect you desire. See http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html As for the more general solution, I don't know. For the rest of this, I'm speaking only as an interested observer. None of this is particularly thought through. We really should have a good solution to this. I know we don't currently support floating point context and flags, but not being able to support standard idioms from numerical computation seems very problematic. Here's one proposal to get the conversation started - please don't take it too seriously. We could introduce an "fp_rounding_sensitive" annotation (flag?) on the instruction level which disables optimizations sensative to floating point rounding. This could be parsed by clang as either a function attribute or statement attribute. Alternatively, we could introduce an "fp_rounding_mode(x)" annotation with the semantics of specifying the desired rounding mode. The required work would be much the same. I'm picturing either of these being *really* course grained in their effect - at least to start. Something like disable all FP optimizations within their context. A toy implementation could even map the presence of such a flag to "optnone" as a starting point. We could improve incrementally over time. p.s. If we already have such a thing and I just don't remember, please feel free to point that out. :) Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/ec811890/attachment.html>
Geoffrey Irving
2014-Mar-26 21:46 UTC
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
On Wed, Mar 26, 2014 at 2:34 PM, Philip Reames <listmail at philipreames.com> wrote:> > On 03/26/2014 11:36 AM, Geoffrey Irving wrote: > > I am trying to compute conservative lower and upper bounds for the > square of a double. I have set the rounding mode to FE_UPWARDS > elsewhere, so the code is > > struct Interval { > double nlo, hi; > }; > > Interval inspect_singleton_sqr(const double x) { > Interval s; > s.nlo = x * -x; > s.hi = x * x; > return s; > } > > Both multiplies are necessary, since they round in different > directions. However, clang does not know this, assumes that x * -x > -(x * x), and simplifies down to a single multiply: > > .LCPI1_0: > .quad -9223372036854775808 # double -0.000000e+00 > .quad -9223372036854775808 # double -0.000000e+00 > .text > .globl _Z21inspect_singleton_sqrd > .align 16, 0x90 > .type _Z21inspect_singleton_sqrd, at function > _Z21inspect_singleton_sqrd: # @_Z21inspect_singleton_sqrd > .cfi_startproc > # BB#0: > vmulsd %xmm0, %xmm0, %xmm1 > vxorpd .LCPI1_0(%rip), %xmm1, %xmm0 > ret > .Ltmp1: > .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd > .cfi_endproc > > I realize this is unsupported behavior, but it would be nice to still > be able to use clang to do numerical computation. Is there a way to > convince clang to not pull the multiplication outside addition? I > would like to avoid interfering with other optimizations in the > process: the problem is easy to solve by making a multiply function > and marking it never inline, but that would be terrible for > performance. I am happy to write clang specific code that turns on > only when __clang__ is defined. > > Thanks, > Geoffrey > > > Geoffrey - In the short term, if you marked this function as > "__attribute__((optnone))", that might have the effect you desire. See > http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html > > As for the more general solution, I don't know. > > For the rest of this, I'm speaking only as an interested observer. None of > this is particularly thought through. > > We really should have a good solution to this. I know we don't currently > support floating point context and flags, but not being able to support > standard idioms from numerical computation seems very problematic. > > Here's one proposal to get the conversation started - please don't take it > too seriously. > > We could introduce an "fp_rounding_sensitive" annotation (flag?) on the > instruction level which disables optimizations sensative to floating point > rounding. This could be parsed by clang as either a function attribute or > statement attribute. > > Alternatively, we could introduce an "fp_rounding_mode(x)" annotation with > the semantics of specifying the desired rounding mode. The required work > would be much the same. > > I'm picturing either of these being *really* course grained in their effect > - at least to start. Something like disable all FP optimizations within > their context. > > A toy implementation could even map the presence of such a flag to "optnone" > as a starting point. We could improve incrementally over time. > > p.s. If we already have such a thing and I just don't remember, please feel > free to point that out. :)An alternative way to think about fp_rounding_sensitive is related to one of the possible solutions: static inline double neg(const double x) { union { double x; uint64_t i; } u; u.x = x; u.i = u.i ^ (uint64_t(1)<<63); return u.x; } If I replace x * -x with x * neg(x), everything works since clang doesn't incorrectly apply the identity. Unfortunately, it produces significantly worse code, which I've included below. Instead of using vxorps to do the high bit flip in an SSE register, it moves data into (and then out of) an integer register and does the bit flip there. In other words, my problem would be solved if my neg() function had the same nice code generation as normal -x, but didn't participate in the incorrect identities. I'm not sure if this can be arranged today; I tried asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y)); but it started spitting internal compiler errors at me: ./geode/exact/Interval.h:167:12: error: illegal "f" output constraint asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y)); ^ fatal error: error in backend: Access past stack top! Geoffrey ------------------------------------------------- .globl _Z11inspect_sqrN5geode8IntervalE .align 16, 0x90 .type _Z11inspect_sqrN5geode8IntervalE, at function _Z11inspect_sqrN5geode8IntervalE: # @_Z11inspect_sqrN5geode8IntervalE .cfi_startproc # BB#0: vxorps %xmm2, %xmm2, %xmm2 vucomisd %xmm0, %xmm2 jbe .LBB0_2 # BB#1: vmovd %xmm0, %rax movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000 xorq %rax, %rcx vmovd %rcx, %xmm2 vmulsd %xmm0, %xmm2, %xmm2 vmulsd %xmm1, %xmm1, %xmm1 vmovaps %xmm2, %xmm0 ret .LBB0_2: vucomisd %xmm1, %xmm2 jbe .LBB0_4 # BB#3: vmovd %xmm1, %rax movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000 xorq %rax, %rcx vmovd %rcx, %xmm2 vmulsd %xmm1, %xmm2, %xmm2 vmulsd %xmm0, %xmm0, %xmm1 vmovaps %xmm2, %xmm0 ret .LBB0_4: vmaxsd %xmm0, %xmm1, %xmm0 vmulsd %xmm0, %xmm0, %xmm1 vxorps %xmm2, %xmm2, %xmm2 vmovaps %xmm2, %xmm0 ret .Ltmp0: .size _Z11inspect_sqrN5geode8IntervalE, .Ltmp0-_Z11inspect_sqrN5geode8IntervalE .cfi_endproc
Robinson, Paul
2014-Mar-27 15:46 UTC
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
While I did propose a patch to add this attribute to Clang, and it was approved, I never committed it because the LLVM work to fully support it has been dangling in nobody-wants-to-review-it limbo forever. And it seemed like I shouldn't put the attribute into Clang if LLVM didn't fully honor the contract. I'll redo the Clang part (Aaron Ballman has refactored the attribute-handling stuff about a hundred times since I did the original patch so it no longer applies) and hopefully get that into Clang soon. Thanks, --paulr Geoffrey - In the short term, if you marked this function as "__attribute__((optnone))", that might have the effect you desire. See http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140327/da8e9d1e/attachment.html>
Philip Reames
2014-Mar-27 16:18 UTC
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
Paul, If any of those llvm changes are still pending, let me know. I'd be happy to review. Philip On 03/27/2014 08:46 AM, Robinson, Paul wrote:> > While I did propose a patch to add this attribute to Clang, and it was > approved, I never committed it because the LLVM work to fully support > it has been dangling in nobody-wants-to-review-it limbo forever. And > it seemed like I shouldn't put the attribute into Clang if LLVM didn't > fully honor the contract. > > I'll redo the Clang part (Aaron Ballman has refactored the > attribute-handling stuff about a hundred times since I did the > original patch so it no longer applies) and hopefully get that into > Clang soon. > > Thanks, > > --paulr > > Geoffrey - In the short term, if you marked this function as > "/__attribute__((optnone))"/, that might have the effect you desire. > See > http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140327/34116d9c/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- AVX512 instruction generated when JIT compiling for an avx2 architecture
- AVX512 instruction generated when JIT compiling for an avx2 architecture