Hello everyone, I'd like to propose the attached patch to form FMA intrinsics aggressively, but in order to do so I need some clarification on the intended semantics for the various FP precision-related TargetOptions. I've summarized the three relevant ones below: UnsafeFPMath - Defaults to off, enables "less precise" results than permitted by IEEE754. Comments specifically reference using hardware FSIN/FCOS on X86. NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), enables higher-precision implementations than specified by IEEE754. Comments reference FMA-like operations, and X87 without rounding all over the place. LessPreciseFPMADOption - Defaults to off, enables "less precise" FP multiply-add. My general sense is that aggressive FMA formation is beyond the realm of what UnsafeFPMath allows, but I'm unclear on the relationship between NoExcessFPPrecision and LessPreciseFPMADOption. My understanding is that fused multiply-add operations are "more precise" (i.e. closer to the numerically true value) than the baseline (which would round between the multiply and the add). By that reasoning, it seems like it should be covered by !NoExcessFPPrecision. However, that opens the question of what LessPreciseFPMADOption is intended to cover. Are there targets on which FMA is actually "less precise" than the baseline sequence? Or is the comment just poorly worded? A related concern is that, while NoExcessFPPrecision seems applicable, it is the only one of the above that defaults to the more-relaxed option. From testing my patch, I can say that it does change the behavior of a number of benchmarks in the LLVM test suite, and for that reason alone seems like it should not be enabled by default. Anyone more knowledgable about FP than me have any ideas? --Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: fma.diff Type: application/octet-stream Size: 724 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120208/ee14b009/attachment-0001.html>
On Wed, 2012-02-08 at 10:11 -0800, Owen Anderson wrote:> Hello everyone, > > > I'd like to propose the attached patch to form FMA intrinsics > aggressively, but in order to do so I need some clarification on the > intended semantics for the various FP precision-related > TargetOptions. I've summarized the three relevant ones below: > > > UnsafeFPMath - Defaults to off, enables "less precise" results than > permitted by IEEE754. Comments specifically reference using hardware > FSIN/FCOS on X86. > > > NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), > enables higher-precision implementations than specified by IEEE754. > Comments reference FMA-like operations, and X87 without rounding all > over the place. > > > LessPreciseFPMADOption - Defaults to off, enables "less precise" FP > multiply-add. > > > My general sense is that aggressive FMA formation is beyond the realm > of what UnsafeFPMath allows, but I'm unclear on the relationship > between NoExcessFPPrecision and LessPreciseFPMADOption. My > understanding is that fused multiply-add operations are "more > precise" (i.e. closer to the numerically true value) than the baseline > (which would round between the multiply and the add). By that > reasoning, it seems like it should be covered by !NoExcessFPPrecision.I agree, and this is what the PPC backend does.> However, that opens the question of what LessPreciseFPMADOption is > intended to cover. Are there targets on which FMA is actually "less > precise" than the baseline sequence? Or is the comment just poorly > worded? > > > A related concern is that, while NoExcessFPPrecision seems applicable, > it is the only one of the above that defaults to the more-relaxed > option. From testing my patch, I can say that it does change the > behavior of a number of benchmarks in the LLVM test suite, and for > that reason alone seems like it should not be enabled by default.This does not surprise me, however, care is required here. First, there has been a previous thread on this recently, and a specifically recommend that you read Stephen Canon's remarks: http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/106578 In my experience, users of numerical codes expect that the compiler will use FMA instructions where it can, unless specifically asked to avoid doing so by the user. Even though this can sometimes produce a different result (*almost* always a better one), the performance gain is too large to be ignored by default. I highly recommend that we continue to enable FMA instruction-generation by default (as is the current practice, not only here, but in most vendor compilers with which I am familiar). We should also implement the FP_CONTRACT pragma, but that is another matter. -Hal> > > Anyone more knowledgable about FP than me have any ideas? > > > --Owen > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hi Owen, Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c " will have been fused with the multiply whereas "- d" won't. I think Andy Trick (I think?!) gave a less contrived example a couple of weeks back. Therefore, it shouldn't be enabled by default. I say that because the C standard defines a pragma to control it - #pragma FP_CONTRACT - which is what Clang was failing with in PlumHall. This pragma defines a code section where FMA may or may not be enabled. If we lack the ability to pass that information through from the frontend to the backend (which we do, at the moment), we should not enable the optimisation by default. That said, I think we should enhance the IR to allow this information to be passed from front to back ends. An attribute on fadd, fmul, fdiv, frem and fsub in the same vein as "nsw" would be my suggestion. Cheers, James ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf Of Owen Anderson [resistor at mac.com] Sent: 08 February 2012 18:11 To: List Subject: [LLVMdev] Clarifying FMA-related TargetOptions Hello everyone, I'd like to propose the attached patch to form FMA intrinsics aggressively, but in order to do so I need some clarification on the intended semantics for the various FP precision-related TargetOptions. I've summarized the three relevant ones below: UnsafeFPMath - Defaults to off, enables "less precise" results than permitted by IEEE754. Comments specifically reference using hardware FSIN/FCOS on X86. NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), enables higher-precision implementations than specified by IEEE754. Comments reference FMA-like operations, and X87 without rounding all over the place. LessPreciseFPMADOption - Defaults to off, enables "less precise" FP multiply-add. My general sense is that aggressive FMA formation is beyond the realm of what UnsafeFPMath allows, but I'm unclear on the relationship between NoExcessFPPrecision and LessPreciseFPMADOption. My understanding is that fused multiply-add operations are "more precise" (i.e. closer to the numerically true value) than the baseline (which would round between the multiply and the add). By that reasoning, it seems like it should be covered by !NoExcessFPPrecision. However, that opens the question of what LessPreciseFPMADOption is intended to cover. Are there targets on which FMA is actually "less precise" than the baseline sequence? Or is the comment just poorly worded? A related concern is that, while NoExcessFPPrecision seems applicable, it is the only one of the above that defaults to the more-relaxed option. From testing my patch, I can say that it does change the behavior of a number of benchmarks in the LLVM test suite, and for that reason alone seems like it should not be enabled by default. Anyone more knowledgable about FP than me have any ideas? --Owen -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Feb 8, 2012, at 10:42 AM, Hal Finkel wrote:> In my experience, users of numerical codes expect that the compiler will > use FMA instructions where it can, unless specifically asked to avoid > doing so by the user. Even though this can sometimes produce a different > result (*almost* always a better one), the performance gain is too large > to be ignored by default. I highly recommend that we continue to enable > FMA instruction-generation by default (as is the current practice, not > only here, but in most vendor compilers with which I am familiar). We > should also implement the FP_CONTRACT pragma, but that is another > matter.The caveat I would add to this is that, when I tried enabling FMA-by-default on an ARM target, I saw a large number of testcases in the LLVM test suite that either failed their output comparisons, crashed, or failed to terminate (!!!). That seems pretty scary to me. --Owen
On Feb 8, 2012, at 10:44 AM, James Molloy wrote:> Hi Owen, > > Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... > > I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c " will have been fused with the multiply whereas "- d" won't.I agree that !NoExcessFPPrecision seems like it should cover FMA, but if that that is the case, what does LessPreciseFPMADOption cover? --Owen
On Wed, 2012-02-08 at 18:44 +0000, James Molloy wrote:> Hi Owen, > > Having looked into this due to Clang failing PlumHall with it recently I can give an opinion... > > I think !NoExcessFPPrecision covers FMA completely. There are indeed some algorithms which give incorrect results when FMA is enabled, examples being those that do floating point comparisons such as: a * b + c - d. If c == d, it is still possible for that result not to equal a*b, as "+c " will have been fused with the multiply whereas "- d" won't. > > I think Andy Trick (I think?!) gave a less contrived example a couple of weeks back. > > Therefore, it shouldn't be enabled by default. I say that because the C standard defines a pragma to control it - #pragma FP_CONTRACT - which is what Clang was failing with in PlumHall. This pragma defines a code section where FMA may or may not be enabled. If we lack the ability to pass that information through from the frontend to the backend (which we do, at the moment), we should not enable the optimisation by default.Fair enough.> > That said, I think we should enhance the IR to allow this information to be passed from front to back ends. An attribute on fadd, fmul, fdiv, frem and fsub in the same vein as "nsw" would be my suggestion. >I agree that this is a good idea. I think this will be easy to support if we end up defining some patterns in tablegen like fmul_combinable (I'm not actually recommending such a long name) and define any FMA-like patterns in terms of those. -Hal> Cheers, > > James > ________________________________________ > From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf Of Owen Anderson [resistor at mac.com] > Sent: 08 February 2012 18:11 > To: List > Subject: [LLVMdev] Clarifying FMA-related TargetOptions > > Hello everyone, > > I'd like to propose the attached patch to form FMA intrinsics aggressively, but in order to do so I need some clarification on the intended semantics for the various FP precision-related TargetOptions. I've summarized the three relevant ones below: > > UnsafeFPMath - Defaults to off, enables "less precise" results than permitted by IEEE754. Comments specifically reference using hardware FSIN/FCOS on X86. > > NoExcessFPPrecision - Defaults to off (i.e. excess precision allowed), enables higher-precision implementations than specified by IEEE754. Comments reference FMA-like operations, and X87 without rounding all over the place. > > LessPreciseFPMADOption - Defaults to off, enables "less precise" FP multiply-add. > > My general sense is that aggressive FMA formation is beyond the realm of what UnsafeFPMath allows, but I'm unclear on the relationship between NoExcessFPPrecision and LessPreciseFPMADOption. My understanding is that fused multiply-add operations are "more precise" (i.e. closer to the numerically true value) than the baseline (which would round between the multiply and the add). By that reasoning, it seems like it should be covered by !NoExcessFPPrecision. However, that opens the question of what LessPreciseFPMADOption is intended to cover. Are there targets on which FMA is actually "less precise" than the baseline sequence? Or is the comment just poorly worded? > > A related concern is that, while NoExcessFPPrecision seems applicable, it is the only one of the above that defaults to the more-relaxed option. From testing my patch, I can say that it does change the behavior of a number of benchmarks in the LLVM test suite, and for that reason alone seems like it should not be enabled by default. > > Anyone more knowledgable about FP than me have any ideas? > > --Owen > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Owen Anderson <resistor at mac.com> writes:> A related concern is that, while NoExcessFPPrecision seems applicable, > it is the only one of the above that defaults to the more-relaxed > option. From testing my patch, I can say that it does change the > behavior of a number of benchmarks in the LLVM test suite, and for > that reason alone seems like it should not be enabled by default. > > Anyone more knowledgable about FP than me have any ideas?FWIW, we've found that having a switch to turn off FMA explicitly is helpful for debugging. We don't expose the switch to users but it has saved us a few times when trying to track down numerical differences. Our FP switches are not so precisely named. We basically have fp0, fp1, fp2 and fp3, analogous to O0, O1, O2, O3. The idea is that the higher the number, the less guarantee you have that your results will be the same as scalar code (or code w/o FMA) would give you. The tradeoff being faster execution, of course. We don't say anything about precision directly. -Dave