Samuel F Antao
2014-Jul-30 21:37 UTC
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi all, The AllowFPOpFusion option passed to a target can currently take 3 different values, Fast, Standard or Strict (TargetOptions.h), being Standard the default. In the DAGCombiner, during the combination of mul and add/subtract into multiply-and-add/subtract, this option is expected to be Fast in order to enable the combine. This means, that by default no multiply-and-add opcodes are going to be generated. If I understand it correctly, this is undesirable given that multiply-and-add for targets like PPC (I am not sure about all the other targets) does not pose any rounding problem and it can even be more accurate than performing the two operations separately. Also, in TargetOptions.h I read: Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd) which made me suspect that the check against Fast in the DAGCombiner is not correct. I was wondering if this is something to be fixed in the DAG combiner or if the target should set a different option to be checked by the DAGCombiner saying that mul-add/subtract is okay. Any comments? Thanks in advance! Samuel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140730/14792424/attachment.html>
Tim Northover
2014-Jul-31 13:54 UTC
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Samuel, On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com> wrote:> In the DAGCombiner, during the combination of mul and add/subtract into > multiply-and-add/subtract, this option is expected to be Fast in order to > enable the combine. This means, that by default no multiply-and-add opcodes > are going to be generated. If I understand it correctly, this is undesirable > given that multiply-and-add for targets like PPC (I am not sure about all > the other targets) does not pose any rounding problem and it can even be > more accurate than performing the two operations separately.That extra precision is actually what we're being very careful to avoid unless specifically told we're allowed. It can be just as harmful to carefully written floating-point code as dropping precision would be.> Also, in TargetOptions.h I read: > > Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd) > > which made me suspect that the check against Fast in the DAGCombiner is not > correct.I think it's OK. In the IR there are 3 different ways to express mul + add: 1. fmul + fadd. This must not be fused into a single step without intermediate rounding (unless we're in Fast mode). 2. call @llvm.fmuladd. This *may* be fused or not, depending on profitability (unless we're in Strict mode, in which case it's separate). 3. call @llvm.fma. This must not be split into two operations (unless we're in Fast mode). That middle one is there because C actually allows you to allow & disallow contraction within a limited region with "#pragma STDC FP_CONTRACT ON". So we need a way to represent the idea that it's not usually OK to fuse them (i.e. not Fast mode), but this particular one actually is OK. Cheers. Tim.
Samuel F Antao
2014-Jul-31 15:50 UTC
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Tim, Thanks for the thorough explanation. It makes perfect sense. I was not aware fast-math is supposed to prevent more precision being used than what is in the standard. I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off. Thanks again, Samuel Tim Northover <t.p.northover at gmail.com> wrote on 07/31/2014 09:54:55 AM:> From: Tim Northover <t.p.northover at gmail.com> > To: Samuel F Antao/Watson/IBM at IBMUS > Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>, Olivier H > Sallenave/Watson/IBM at IBMUS > Date: 07/31/2014 09:55 AM > Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines > > Hi Samuel, > > On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com> wrote: > > In the DAGCombiner, during the combination of mul and add/subtract into > > multiply-and-add/subtract, this option is expected to be Fast in orderto> > enable the combine. This means, that by default no multiply-and-addopcodes> > are going to be generated. If I understand it correctly, this isundesirable> > given that multiply-and-add for targets like PPC (I am not sure aboutall> > the other targets) does not pose any rounding problem and it can evenbe> > more accurate than performing the two operations separately. > > That extra precision is actually what we're being very careful to > avoid unless specifically told we're allowed. It can be just as > harmful to carefully written floating-point code as dropping precision > would be. > > > Also, in TargetOptions.h I read: > > > > Standard, // Only allow fusion of 'blessed' ops (currently justfmuladd)> > > > which made me suspect that the check against Fast in the DAGCombiner isnot> > correct. > > I think it's OK. In the IR there are 3 different ways to express mul +add:> > 1. fmul + fadd. This must not be fused into a single step without > intermediate rounding (unless we're in Fast mode). > 2. call @llvm.fmuladd. This *may* be fused or not, depending on > profitability (unless we're in Strict mode, in which case it's > separate). > 3. call @llvm.fma. This must not be split into two operations (unless > we're in Fast mode). > > That middle one is there because C actually allows you to allow & > disallow contraction within a limited region with "#pragma STDC > FP_CONTRACT ON". So we need a way to represent the idea that it's not > usually OK to fuse them (i.e. not Fast mode), but this particular one > actually is OK. > > Cheers. > > Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140731/92c5be11/attachment.html>
Apparently Analagous Threads
- [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
- [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
- [LLVMdev] Issue with std::call_once in PPC64 platform
- [LLVMdev] [cfe-dev] Address sanitizer regression test failures for PPC64 targets
- [LLVMdev] [cfe-dev] Address sanitizer regression test failures for PPC64 targets