thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] computing a conservatively rounded square of a double [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2014-Mar-26 21:34 UTC

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

On 03/26/2014 11:36 AM, Geoffrey Irving wrote:> I am trying to compute conservative lower and upper bounds for the
> square of a double.  I have set the rounding mode to FE_UPWARDS
> elsewhere, so the code is
>
> struct Interval {
>    double nlo, hi;
> };
>
> Interval inspect_singleton_sqr(const double x) {
>    Interval s;
>    s.nlo = x * -x;
>    s.hi = x * x;
>    return s;
> }
>
> Both multiplies are necessary, since they round in different
> directions.  However, clang does not know this, assumes that x * -x >
-(x * x), and simplifies down to a single multiply:
>
> .LCPI1_0:
>    .quad -9223372036854775808    # double -0.000000e+00
>    .quad -9223372036854775808    # double -0.000000e+00
>    .text
>    .globl  _Z21inspect_singleton_sqrd
>    .align  16, 0x90
>    .type _Z21inspect_singleton_sqrd, at function
> _Z21inspect_singleton_sqrd:             # @_Z21inspect_singleton_sqrd
>    .cfi_startproc
> # BB#0:
>    vmulsd  %xmm0, %xmm0, %xmm1
>    vxorpd  .LCPI1_0(%rip), %xmm1, %xmm0
>    ret
> .Ltmp1:
>    .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd
>    .cfi_endproc
>
> I realize this is unsupported behavior, but it would be nice to still
> be able to use clang to do numerical computation.  Is there a way to
> convince clang to not pull the multiplication outside addition?  I
> would like to avoid interfering with other optimizations in the
> process: the problem is easy to solve by making a multiply function
> and marking it never inline, but that would be terrible for
> performance.  I am happy to write clang specific code that turns on
> only when __clang__ is defined.
>
> Thanks,
> Geoffrey
>
Geoffrey - In the short term, if you marked this function as 
"/__attribute__((optnone))"/, that might have the effect you desire.  
See 
http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html

As for the more general solution, I don't know.

For the rest of this, I'm speaking only as an interested observer. None 
of this is particularly thought through.

We really should have a good solution to this.  I know we don't 
currently support floating point context and flags, but not being able 
to support standard idioms from numerical computation seems very 
problematic.

Here's one proposal to get the conversation started - please don't take 
it too seriously.

We could introduce an "fp_rounding_sensitive" annotation (flag?) on
the
instruction level which disables optimizations sensative to floating 
point rounding.  This could be parsed by clang as either a function 
attribute or statement attribute.

Alternatively, we could introduce an "fp_rounding_mode(x)" annotation 
with the semantics of specifying the desired rounding mode.  The 
required work would be much the same.

I'm picturing either of these being *really* course grained in their 
effect - at least to start.  Something like disable all FP optimizations 
within their context.

A toy implementation could even map the presence of such a flag to 
"optnone" as a starting point.  We could improve incrementally over
time.

p.s. If we already have such a thing and I just don't remember, please 
feel free to point that out.  :)

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/ec811890/attachment.html>

Geoffrey Irving

2014-Mar-26 21:46 UTC

head link

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

On Wed, Mar 26, 2014 at 2:34 PM, Philip Reames
<listmail at philipreames.com> wrote:>
> On 03/26/2014 11:36 AM, Geoffrey Irving wrote:
>
> I am trying to compute conservative lower and upper bounds for the
> square of a double.  I have set the rounding mode to FE_UPWARDS
> elsewhere, so the code is
>
> struct Interval {
>   double nlo, hi;
> };
>
> Interval inspect_singleton_sqr(const double x) {
>   Interval s;
>   s.nlo = x * -x;
>   s.hi = x * x;
>   return s;
> }
>
> Both multiplies are necessary, since they round in different
> directions.  However, clang does not know this, assumes that x * -x >
-(x * x), and simplifies down to a single multiply:
>
> .LCPI1_0:
>   .quad -9223372036854775808    # double -0.000000e+00
>   .quad -9223372036854775808    # double -0.000000e+00
>   .text
>   .globl  _Z21inspect_singleton_sqrd
>   .align  16, 0x90
>   .type _Z21inspect_singleton_sqrd, at function
> _Z21inspect_singleton_sqrd:             # @_Z21inspect_singleton_sqrd
>   .cfi_startproc
> # BB#0:
>   vmulsd  %xmm0, %xmm0, %xmm1
>   vxorpd  .LCPI1_0(%rip), %xmm1, %xmm0
>   ret
> .Ltmp1:
>   .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd
>   .cfi_endproc
>
> I realize this is unsupported behavior, but it would be nice to still
> be able to use clang to do numerical computation.  Is there a way to
> convince clang to not pull the multiplication outside addition?  I
> would like to avoid interfering with other optimizations in the
> process: the problem is easy to solve by making a multiply function
> and marking it never inline, but that would be terrible for
> performance.  I am happy to write clang specific code that turns on
> only when __clang__ is defined.
>
> Thanks,
> Geoffrey
>
>
> Geoffrey - In the short term, if you marked this function as
> "__attribute__((optnone))", that might have the effect you
desire.  See
>
http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html
>
> As for the more general solution, I don't know.
>
> For the rest of this, I'm speaking only as an interested observer. 
None of
> this is particularly thought through.
>
> We really should have a good solution to this.  I know we don't
currently
> support floating point context and flags, but not being able to support
> standard idioms from numerical computation seems very problematic.
>
> Here's one proposal to get the conversation started - please don't
take it
> too seriously.
>
> We could introduce an "fp_rounding_sensitive" annotation (flag?)
on the
> instruction level which disables optimizations sensative to floating point
> rounding.  This could be parsed by clang as either a function attribute or
> statement attribute.
>
> Alternatively, we could introduce an "fp_rounding_mode(x)"
annotation with
> the semantics of specifying the desired rounding mode.  The required work
> would be much the same.
>
> I'm picturing either of these being *really* course grained in their
effect
> - at least to start.  Something like disable all FP optimizations within
> their context.
>
> A toy implementation could even map the presence of such a flag to
"optnone"
> as a starting point.  We could improve incrementally over time.
>
> p.s. If we already have such a thing and I just don't remember, please
feel
> free to point that out.  :)
An alternative way to think about fp_rounding_sensitive is related to
one of the possible solutions:

static inline double neg(const double x) {
  union { double x; uint64_t i; } u;
  u.x = x;
  u.i = u.i ^ (uint64_t(1)<<63);
  return u.x;
}

If I replace x * -x with x * neg(x), everything works since clang
doesn't incorrectly apply the identity.  Unfortunately, it produces
significantly worse code, which I've included below.  Instead of using
vxorps to do the high bit flip in an SSE register, it moves data into
(and then out of) an integer register and does the bit flip there.

In other words, my problem would be solved if my neg() function had
the same nice code generation as normal -x, but didn't participate in
the incorrect identities.  I'm not sure if this can be arranged today;
I tried

   asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm"
(x), "fm" (y));

but it started spitting internal compiler errors at me:

    ./geode/exact/Interval.h:167:12: error: illegal "f" output
constraint
          asm ("vxorps %1, %2, %0" : "=f" (nx) :
"fm" (x), "fm" (y));
               ^
    fatal error: error in backend: Access past stack top!

Geoffrey

-------------------------------------------------

  .globl  _Z11inspect_sqrN5geode8IntervalE
  .align  16, 0x90
  .type _Z11inspect_sqrN5geode8IntervalE, at function
_Z11inspect_sqrN5geode8IntervalE:       # @_Z11inspect_sqrN5geode8IntervalE
  .cfi_startproc
# BB#0:
  vxorps  %xmm2, %xmm2, %xmm2
  vucomisd  %xmm0, %xmm2
  jbe .LBB0_2
# BB#1:
  vmovd %xmm0, %rax
  movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
  xorq  %rax, %rcx
  vmovd %rcx, %xmm2
  vmulsd  %xmm0, %xmm2, %xmm2
  vmulsd  %xmm1, %xmm1, %xmm1
  vmovaps %xmm2, %xmm0
  ret
.LBB0_2:
  vucomisd  %xmm1, %xmm2
  jbe .LBB0_4
# BB#3:
  vmovd %xmm1, %rax
  movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
  xorq  %rax, %rcx
  vmovd %rcx, %xmm2
  vmulsd  %xmm1, %xmm2, %xmm2
  vmulsd  %xmm0, %xmm0, %xmm1
  vmovaps %xmm2, %xmm0
  ret
.LBB0_4:
  vmaxsd  %xmm0, %xmm1, %xmm0
  vmulsd  %xmm0, %xmm0, %xmm1
  vxorps  %xmm2, %xmm2, %xmm2
  vmovaps %xmm2, %xmm0
  ret
.Ltmp0:
  .size _Z11inspect_sqrN5geode8IntervalE,
.Ltmp0-_Z11inspect_sqrN5geode8IntervalE
  .cfi_endproc

Robinson, Paul

2014-Mar-27 15:46 UTC

head link

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

While I did propose a patch to add this attribute to Clang, and it was approved,
I never committed it because the LLVM work to fully support it has been dangling
in nobody-wants-to-review-it limbo forever.  And it seemed like I shouldn't
put the attribute into Clang if LLVM didn't fully honor the contract.

I'll redo the Clang part (Aaron Ballman has refactored the
attribute-handling stuff about a hundred times since I did the original patch so
it no longer applies) and hopefully get that into Clang soon.
Thanks,
--paulr

Geoffrey - In the short term, if you marked this function as
"__attribute__((optnone))", that might have the effect you desire. 
See
http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140327/da8e9d1e/attachment.html>

Philip Reames

2014-Mar-27 16:18 UTC

head link

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

Paul,

If any of those llvm changes are still pending, let me know.  I'd be 
happy to review.

Philip

On 03/27/2014 08:46 AM, Robinson, Paul wrote:>
> While I did propose a patch to add this attribute to Clang, and it was 
> approved, I never committed it because the LLVM work to fully support 
> it has been dangling in nobody-wants-to-review-it limbo forever.  And 
> it seemed like I shouldn't put the attribute into Clang if LLVM
didn't
> fully honor the contract.
>
> I'll redo the Clang part (Aaron Ballman has refactored the 
> attribute-handling stuff about a hundred times since I did the 
> original patch so it no longer applies) and hopefully get that into 
> Clang soon.
>
> Thanks,
>
> --paulr
>
> Geoffrey - In the short term, if you marked this function as 
> "/__attribute__((optnone))"/, that might have the effect you
desire.
> See 
>
http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140327/34116d9c/attachment.html>

llvm dev - Mar 2014 - [LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double