thr3ads.net - llvm dev - [LLVMdev] Clarifying FMA-related TargetOptions [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Owen Anderson

2012-Feb-08 18:11 UTC

[LLVMdev] Clarifying FMA-related TargetOptions

Hello everyone,

I'd like to propose the attached patch to form FMA intrinsics aggressively,
but in order to do so I need some clarification on the intended semantics for
the various FP precision-related TargetOptions. I've summarized the three
relevant ones below:

UnsafeFPMath - Defaults to off, enables "less precise" results than
permitted by IEEE754. Comments specifically reference using hardware FSIN/FCOS
on X86.

NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), enables
higher-precision implementations than specified by IEEE754. Comments reference
FMA-like operations, and X87 without rounding all over the place.

LessPreciseFPMADOption - Defaults to off, enables "less precise" FP
multiply-add.

My general sense is that aggressive FMA formation is beyond the realm of what
UnsafeFPMath allows, but I'm unclear on the relationship between
NoExcessFPPrecision and LessPreciseFPMADOption. My understanding is that fused
multiply-add operations are "more precise" (i.e. closer to the
numerically true value) than the baseline (which would round between the
multiply and the add). By that reasoning, it seems like it should be covered by
!NoExcessFPPrecision. However, that opens the question of what
LessPreciseFPMADOption is intended to cover. Are there targets on which FMA is
actually "less precise" than the baseline sequence? Or is the comment
just poorly worded?

A related concern is that, while NoExcessFPPrecision seems applicable, it is the
only one of the above that defaults to the more-relaxed option. From testing my
patch, I can say that it does change the behavior of a number of benchmarks in
the LLVM test suite, and for that reason alone seems like it should not be
enabled by default.

Anyone more knowledgable about FP than me have any ideas?

--Owen

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fma.diff
Type: application/octet-stream
Size: 724 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment-0001.html>

Hal Finkel

2012-Feb-08 18:42 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

On Wed, 2012-02-08 at 10:11 -0800, Owen Anderson wrote:> Hello everyone,
> 
> 
> I'd like to propose the attached patch to form FMA intrinsics
> aggressively, but in order to do so I need some clarification on the
> intended semantics for the various FP precision-related
> TargetOptions.  I've summarized the three relevant ones below:
> 
> 
> UnsafeFPMath - Defaults to off, enables "less precise" results
than
> permitted by IEEE754.  Comments specifically reference using hardware
> FSIN/FCOS on X86.
> 
> 
> NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed),
> enables higher-precision implementations than specified by IEEE754.
>  Comments reference FMA-like operations, and X87 without rounding all
> over the place.
> 
> 
> LessPreciseFPMADOption - Defaults to off, enables "less precise"
FP
> multiply-add.
> 
> 
> My general sense is that aggressive FMA formation is beyond the realm
> of what UnsafeFPMath allows, but I'm unclear on the relationship
> between NoExcessFPPrecision and LessPreciseFPMADOption.  My
> understanding is that fused multiply-add operations are "more
> precise" (i.e. closer to the numerically true value) than the baseline
> (which would round between the multiply and the add).  By that
> reasoning, it seems like it should be covered by !NoExcessFPPrecision.
I agree, and this is what the PPC backend does.
>   However, that opens the question of what LessPreciseFPMADOption is
> intended to cover.  Are there targets on which FMA is actually "less
> precise" than the baseline sequence?  Or is the comment just poorly
> worded?
> 
> 
> A related concern is that, while NoExcessFPPrecision seems applicable,
> it is the only one of the above that defaults to the more-relaxed
> option.  From testing my patch, I can say that it does change the
> behavior of a number of benchmarks in the LLVM test suite, and for
> that reason alone seems like it should not be enabled by default.
This does not surprise me, however, care is required here. First, there
has been a previous thread on this recently, and a specifically
recommend that you read Stephen Canon's remarks:
http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/106578

In my experience, users of numerical codes expect that the compiler will
use FMA instructions where it can, unless specifically asked to avoid
doing so by the user. Even though this can sometimes produce a different
result (*almost* always a better one), the performance gain is too large
to be ignored by default. I highly recommend that we continue to enable
FMA instruction-generation by default (as is the current practice, not
only here, but in most vendor compilers with which I am familiar). We
should also implement the FP_CONTRACT pragma, but that is another
matter.

 -Hal
> 
> 
> Anyone more knowledgable about FP than me have any ideas?
> 
> 
> --Owen
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

James Molloy

2012-Feb-08 18:44 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

Hi Owen,

Having looked into this due to Clang failing PlumHall with it recently I can
give an opinion...

I think !NoExcessFPPrecision covers FMA completely. There are indeed some
algorithms which give incorrect results when FMA is enabled, examples being
those that do floating point comparisons such as: a * b + c - d. If c == d, it
is still possible for that result not to equal a*b, as "+c " will have
been fused with the multiply whereas "- d" won't.

I think Andy Trick (I think?!) gave a less contrived example a couple of weeks
back.

Therefore, it shouldn't be enabled by default. I say that because the C
standard defines a pragma to control it - #pragma FP_CONTRACT - which is what
Clang was failing with in PlumHall. This pragma defines a code section where FMA
may or may not be enabled. If we lack the ability to pass that information
through from the frontend to the backend (which we do, at the moment), we should
not enable the optimisation by default.

That said, I think we should enhance the IR to allow this information to be
passed from front to back ends. An attribute on fadd, fmul, fdiv, frem and fsub
in the same vein as "nsw" would be my suggestion.

Cheers,

James
________________________________________
From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf
Of Owen Anderson [resistor at mac.com]
Sent: 08 February 2012 18:11
To: List
Subject: [LLVMdev] Clarifying FMA-related TargetOptions

Hello everyone,

I'd like to propose the attached patch to form FMA intrinsics aggressively,
but in order to do so I need some clarification on the intended semantics for
the various FP precision-related TargetOptions.  I've summarized the three
relevant ones below:

UnsafeFPMath - Defaults to off, enables "less precise" results than
permitted by IEEE754.  Comments specifically reference using hardware FSIN/FCOS
on X86.

NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), enables
higher-precision implementations than specified by IEEE754.  Comments reference
FMA-like operations, and X87 without rounding all over the place.

LessPreciseFPMADOption - Defaults to off, enables "less precise" FP
multiply-add.

My general sense is that aggressive FMA formation is beyond the realm of what
UnsafeFPMath allows, but I'm unclear on the relationship between
NoExcessFPPrecision and LessPreciseFPMADOption.  My understanding is that fused
multiply-add operations are "more precise" (i.e. closer to the
numerically true value) than the baseline (which would round between the
multiply and the add).  By that reasoning, it seems like it should be covered by
!NoExcessFPPrecision.  However, that opens the question of what
LessPreciseFPMADOption is intended to cover.  Are there targets on which FMA is
actually "less precise" than the baseline sequence?  Or is the comment
just poorly worded?

A related concern is that, while NoExcessFPPrecision seems applicable, it is the
only one of the above that defaults to the more-relaxed option.  From testing my
patch, I can say that it does change the behavior of a number of benchmarks in
the LLVM test suite, and for that reason alone seems like it should not be
enabled by default.

Anyone more knowledgable about FP than me have any ideas?

--Owen


-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium.  Thank you.

Owen Anderson

2012-Feb-08 18:46 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

On Feb 8, 2012, at 10:42 AM, Hal Finkel wrote:> In my experience, users of numerical codes expect that the compiler will
> use FMA instructions where it can, unless specifically asked to avoid
> doing so by the user. Even though this can sometimes produce a different
> result (*almost* always a better one), the performance gain is too large
> to be ignored by default. I highly recommend that we continue to enable
> FMA instruction-generation by default (as is the current practice, not
> only here, but in most vendor compilers with which I am familiar). We
> should also implement the FP_CONTRACT pragma, but that is another
> matter.
The caveat I would add to this is that, when I tried enabling FMA-by-default on
an ARM target, I saw a large number of testcases in the LLVM test suite that
either failed their output comparisons, crashed, or failed to terminate (!!!). 
That seems pretty scary to me.

--Owen

Owen Anderson

2012-Feb-08 18:48 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

On Feb 8, 2012, at 10:44 AM, James Molloy wrote:
> Hi Owen,
> 
> Having looked into this due to Clang failing PlumHall with it recently I
can give an opinion...
> 
> I think !NoExcessFPPrecision covers FMA completely. There are indeed some
algorithms which give incorrect results when FMA is enabled, examples being
those that do floating point comparisons such as: a * b + c - d. If c == d, it
is still possible for that result not to equal a*b, as "+c " will have
been fused with the multiply whereas "- d" won't.
I agree that !NoExcessFPPrecision seems like it should cover FMA, but if that
that is the case, what does LessPreciseFPMADOption cover?

--Owen

Hal Finkel

2012-Feb-08 19:05 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

On Wed, 2012-02-08 at 18:44 +0000, James Molloy wrote:> Hi Owen,
> 
> Having looked into this due to Clang failing PlumHall with it recently I
can give an opinion...
> 
> I think !NoExcessFPPrecision covers FMA completely. There are indeed some
algorithms which give incorrect results when FMA is enabled, examples being
those that do floating point comparisons such as: a * b + c - d. If c == d, it
is still possible for that result not to equal a*b, as "+c " will have
been fused with the multiply whereas "- d" won't.
> 
> I think Andy Trick (I think?!) gave a less contrived example a couple of
weeks back.
> 
> Therefore, it shouldn't be enabled by default. I say that because the C
standard defines a pragma to control it - #pragma FP_CONTRACT - which is what
Clang was failing with in PlumHall. This pragma defines a code section where FMA
may or may not be enabled. If we lack the ability to pass that information
through from the frontend to the backend (which we do, at the moment), we should
not enable the optimisation by default.
Fair enough.
> 
> That said, I think we should enhance the IR to allow this information to be
passed from front to back ends. An attribute on fadd, fmul, fdiv, frem and fsub
in the same vein as "nsw" would be my suggestion.
> 
I agree that this is a good idea. I think this will be easy to support
if we end up defining some patterns in tablegen like fmul_combinable
(I'm not actually recommending such a long name) and define any FMA-like
patterns in terms of those.

 -Hal
> Cheers,
> 
> James
> ________________________________________
> From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Owen Anderson [resistor at mac.com]
> Sent: 08 February 2012 18:11
> To: List
> Subject: [LLVMdev] Clarifying FMA-related TargetOptions
> 
> Hello everyone,
> 
> I'd like to propose the attached patch to form FMA intrinsics
aggressively, but in order to do so I need some clarification on the intended
semantics for the various FP precision-related TargetOptions.  I've
summarized the three relevant ones below:
> 
> UnsafeFPMath - Defaults to off, enables "less precise" results
than permitted by IEEE754.  Comments specifically reference using hardware
FSIN/FCOS on X86.
> 
> NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed),
enables higher-precision implementations than specified by IEEE754.  Comments
reference FMA-like operations, and X87 without rounding all over the place.
> 
> LessPreciseFPMADOption - Defaults to off, enables "less precise"
FP multiply-add.
> 
> My general sense is that aggressive FMA formation is beyond the realm of
what UnsafeFPMath allows, but I'm unclear on the relationship between
NoExcessFPPrecision and LessPreciseFPMADOption.  My understanding is that fused
multiply-add operations are "more precise" (i.e. closer to the
numerically true value) than the baseline (which would round between the
multiply and the add).  By that reasoning, it seems like it should be covered by
!NoExcessFPPrecision.  However, that opens the question of what
LessPreciseFPMADOption is intended to cover.  Are there targets on which FMA is
actually "less precise" than the baseline sequence?  Or is the comment
just poorly worded?
> 
> A related concern is that, while NoExcessFPPrecision seems applicable, it
is the only one of the above that defaults to the more-relaxed option.  From
testing my patch, I can say that it does change the behavior of a number of
benchmarks in the LLVM test suite, and for that reason alone seems like it
should not be enabled by default.
> 
> Anyone more knowledgable about FP than me have any ideas?
> 
> --Owen
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium.  Thank you.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

David A. Greene

2012-Feb-08 20:38 UTC

head link

[LLVMdev] Clarifying FMA-related TargetOptions

Owen Anderson <resistor at mac.com> writes:
> A related concern is that, while NoExcessFPPrecision seems applicable,
> it is the only one of the above that defaults to the more-relaxed
> option.  From testing my patch, I can say that it does change the
> behavior of a number of benchmarks in the LLVM test suite, and for
> that reason alone seems like it should not be enabled by default.
>
> Anyone more knowledgable about FP than me have any ideas?
FWIW, we've found that having a switch to turn off FMA explicitly is
helpful for debugging.  We don't expose the switch to users but it has
saved us a few times when trying to track down numerical differences.

Our FP switches are not so precisely named.  We basically have fp0, fp1,
fp2 and fp3, analogous to O0, O1, O2, O3.  The idea is that the higher
the number, the less guarantee you have that your results will be the
same as scalar code (or code w/o FMA) would give you.  The tradeoff
being faster execution, of course.  We don't say anything about
precision directly.

                             -Dave

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Feb 2012 - [LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

[LLVMdev] Clarifying FMA-related TargetOptions

Apparently Analagous Threads