thr3ads.net - llvm dev - [LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Nov-12 05:30 UTC

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

----- Original Message -----> Hi Hal, all.
> 
> I'm not sure why llvm.sqrt is 'special'.  Maybe because there
is a
> SSE
> packed sqrt instruction (SQRTPS) but not e.g. a packed sin
> instruction
> AFAIK.
This seems relevant:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-August/010248.html

Chris, et al., does the decision on how to treat sqrt predate our current way of
handling errno?

 -Hal
> 
> As mentioned in a recent mail to this list, I would like llvm.sqrt to
> be
> defined as NaN for argument x < 0.  I believe this would bring it
> more
> into line with the other intrinsics, and with the libm result, which
> is
> NaN for x < 0:
> http://pubs.opengroup.org/onlinepubs/007904975/functions/sqrt.html
> 
> Cheers,
>      Nick
> 
> On 10/11/2013 3:36 p.m., Hal Finkel wrote:
> > Hello everyone,
> >
> > The particular motivation for this e-mail is my desire for feedback
> > on how to fix PR17758; but there is a core design issue here, so
> > I'd like a wide audience.
> >
> > The underlying issue is that, because the semantics of llvm.sqrt
> > are purposefully defined to be different from libm sqrt (unlike
> > all of the other llvm.<libm function> intrinsics) (*), and
because
> > autovectorization relies on the vector forms of these intrinsics
> > when vectorizing function calls to libm math functions, we cannot
> > vectorize a libm sqrt() call into a vector llvm.sqrt call.
> > However, in fast-math mode, we'd like to vectorize calls to sqrt,
> > and so I modified Clang to emit calls to llvm.sqrt in fast-math
> > mode for sqrt (and sqrt[fl]). This makes it similar to the libm
> > pow and fma calls, which Clang always transforms into the llvm.pow
> > and llvm.fma intrinsics.
> >
> > Here's the problem: There is an InstCombine optimization for sqrt
> > (inside visitFPTrunc), and a bunch of optimizations inside
> > SimplifyLibCalls that apply only to the sqrt libm call, and not to
> > the intrinsics. The result, among other things, is PR17758, where
> > fast-math mode actually produces slower code for non-vectorized
> > sqrt calls.
> >
> > Some questions:
> >
> >   - Is the asymmetry between optimizations performed on libm calls
> >   and their corresponding llvm.<libm function> intrinsics
> >   intentional, or just due to a lack of motivation?
> >
> >   - Even if unintentional, is this asymmetry in any way desirable
> >   (for sqrt in particular, or in general)?
> >
> >   - I can refactor all existing optimizations to be libm-call vs.
> >   intrinsics agnostic, but is that the desired solution? If so,
> >   any advice on a particularly-nice way to do this would certainly
> >   be appreciated.
> >
> > For example, an alternative solution for PR17758 in particular
> > would be to revert the Clang change, introduce a new intrinsic for
> > sqrt that does match the libm semantics, and have vectorization
> > use that when available.
> >
> > Another alternative is to revert the Clang change and make
> > autovectorization of libm sqrt -> llvm.sqrt dependent on the
> > NoNaNsFPMath TargetOptions flag (this requires directly exposing
> > parts of TargetOptions to the IR level).
> >
> > I believe that both of these alternatives also require fixing the
> > inliner to deal properly with fast-math attributes during LTO
> > (unless I can just ignore this for now). This was the objection
> > raised to the TargetOptions solution when I first brought it up.
> >
> > (*) According to the language reference, the specific difference
> > is, "Unlike sqrt in libm, however, llvm.sqrt has undefined
> > behavior for negative numbers other than -0.0 (which allows for
> > better optimization, because there is no need to worry about errno
> > being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE
> > sqrt."
> >
> > Thanks again,
> > Hal
> >
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chris Lattner

2013-Nov-12 06:23 UTC

head link

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

On Nov 11, 2013, at 9:30 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> Hi Hal, all.
>> 
>> I'm not sure why llvm.sqrt is 'special'.  Maybe because
there is a
>> SSE
>> packed sqrt instruction (SQRTPS) but not e.g. a packed sin
>> instruction
>> AFAIK.
> 
> This seems relevant:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-August/010248.html
> 
> Chris, et al., does the decision on how to treat sqrt predate our current
way of handling errno?
The intention of llvm.sqrt is that it is the same as sqrt(), but doesn't
affect errno.  If it helps, I think it would be completely reasonable to change
this in langref.html:

"Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for
..."

to "... produces an undefined value", with a link back to
##undefined-values.

Transformations that apply to sqrt() should generally apply to llvm.sqrt as
well.

-Chris

Tim Northover

2013-Nov-12 06:39 UTC

head link

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

> The intention of llvm.sqrt is that it is the same as sqrt(), but
doesn't affect errno.  If it helps, I think it would be completely
reasonable to change this in langref.html:
>
> "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for
..."
>
> to "... produces an undefined value", with a link back to
##undefined-values.
I don't think that helps the front-end writers significantly (they'll
still have to check the input to get a sensible total function).

CodeGen and backends can already do whatever they like with the
current semantics, but it looks like we could tighten those up to "...
produces a NaN" without inconveniencing any targets I've found. This
also matches at least OSX and Linux man-pages about the sqrt function.

Cheers.

Tim.

Hal Finkel

2013-Nov-12 06:44 UTC

head link

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

----- Original Message -----> 
> On Nov 11, 2013, at 9:30 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > ----- Original Message -----
> >> Hi Hal, all.
> >> 
> >> I'm not sure why llvm.sqrt is 'special'.  Maybe
because there is a
> >> SSE
> >> packed sqrt instruction (SQRTPS) but not e.g. a packed sin
> >> instruction
> >> AFAIK.
> > 
> > This seems relevant:
> > http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-August/010248.html
> > 
> > Chris, et al., does the decision on how to treat sqrt predate our
> > current way of handling errno?
> 
> The intention of llvm.sqrt is that it is the same as sqrt(), but
> doesn't affect errno.  If it helps, I think it would be completely
> reasonable to change this in langref.html:
> 
> "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for
> ..."
> 
> to "... produces an undefined value", with a link back to
> ##undefined-values.
I'm not sure that helps, because it will prevents sqrt + -fno-math-errno (a
readnone sqrt) -> llvm.sqrt -- and thus still prevents the use of llvm.sqrt
to vectorize sqrt. I think what will help is just saying something like this:

  "Unlike sqrt in libm, after calling llvm.sqrt with a negative argument
(except for -0.0), the state of errno, and any other portions of the
floating-point environment that are used to record errors, is undefined."

FWIW, we may want to make a similar change for the other llvm.<libm
function> intrinsics, at least for the vector versions (as vector ops on many
ISAs don't affect the FP state bits like the scalar ops do).
> 
> Transformations that apply to sqrt() should generally apply to
> llvm.sqrt as well.
And the intent is for this to be true for all of the other llvm.<libm
function> intrinsics as well, right?

Thanks again,
Hal
> 
> -Chris
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Nov 2013 - [LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

Possibly Parallel Threads