Abe Skolnik via llvm-dev
2016-Sep-09 21:17 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
Dear all, In the process of investigating a performance difference between Clang & GCC when both compile the same non-toolchain program while using the "same"* compiler flags, I have found something that may be worth changing in Clang, developed a patch, and confirmed that the patch has its intended effect. *: "same" in quotes b/c the essence of the problem is that the _meaning_ of "-O3" on Clang differs from that of "-O3" on GCC in at least one way. The specific problem here relates to the default settings for FP contraction, e.g. fused multiply-add. At -O2 and higher, GCC defaults FP contraction to "fast", i.e. always on. I`m not suggesting that Clang/LLVM/both need to do the same, since Clang+LLVM has good support for "#pragma STDC FP_CONTRACT". If we keep Clang`s default for FP contraction at "on" [which really means "according to the pragma"] but change the default value of the _pragma_ [currently off] to on at -O3, then Clang will be more competitive with GCC at high optimization settings without resorting to the more-brutish "fast by default" at plain -O3 [as opposed to "-Ofast", "-O3 -ffast-math", etc.]. Since I don`t know what Objective-C [and Objective-C++] have to say about FP operations, I have made my patch very selective based on language. Also, I noticed that the CUDA front-end seems to already have its own defaults for FP contraction, so there`s no need to change this for every language. I needed to change one test case because it made an assumption that FP contraction is off by default when compiling with "-O3" but without any additional optimization-related flags. Patch relative to upstream code with Git ID b0768e805d1d33d730e5bd44ba578df043dfbc66 ------------------------------------------------------------------------------------ diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp index 619ea9c..d02d873 100644 --- a/clang/lib/Frontend/CompilerInvocation.cpp +++ b/clang/lib/Frontend/CompilerInvocation.cpp @@ -2437,6 +2437,13 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res, if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) { Res.getDiagnosticOpts().Warnings.push_back("spir-compat"); } + + // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with "LangOpts.C" on the next line. + if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ... + /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) // ... // just being careful + /*...*/ && (Res.getCodeGenOpts().OptimizationLevel >= 3) ) + LangOpts.DefaultFPContract = 1; + return Success; } diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp index 1c5921a..0949272 100644 --- a/clang/test/CodeGen/fp-contract-pragma.cpp +++ b/clang/test/CodeGen/fp-contract-pragma.cpp @@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) { // CHECK: _Z13fp_contract_2fff // CHECK: %[[M:.+]] = fmul float %a, %b // CHECK-NEXT: fadd float %[[M]], %c + #pragma STDC FP_CONTRACT OFF { #pragma STDC FP_CONTRACT ON } Please give me any and all feedback you may have on this suggested change and this proposed patch. Regards, Abe
Stephen Canon via llvm-dev
2016-Sep-09 21:31 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally. – Steve> On Sep 9, 2016, at 5:17 PM, Abe Skolnik via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Dear all, > > In the process of investigating a performance difference between Clang & GCC when both compile the same non-toolchain program while using the "same"* compiler flags, I have found something that may be worth changing in Clang, developed a patch, and confirmed that the patch has its intended effect. > > *: "same" in quotes b/c the essence of the problem is that the _meaning_ of "-O3" on Clang differs from that of "-O3" on GCC in at least one way. > > The specific problem here relates to the default settings for FP contraction, e.g. fused multiply-add. At -O2 and higher, GCC defaults FP contraction to "fast", i.e. always on. I`m not suggesting that Clang/LLVM/both need to do the same, since Clang+LLVM has good support for "#pragma STDC FP_CONTRACT". > > If we keep Clang`s default for FP contraction at "on" [which really means "according to the pragma"] but change the default value of the _pragma_ [currently off] to on at -O3, then Clang will be more competitive with GCC at high optimization settings without resorting to the more-brutish "fast by default" at plain -O3 [as opposed to "-Ofast", "-O3 -ffast-math", etc.]. > > Since I don`t know what Objective-C [and Objective-C++] have to say about FP operations, I have made my patch very selective based on language. Also, I noticed that the CUDA front-end seems to already have its own defaults for FP contraction, so there`s no need to change this for every language. > > I needed to change one test case because it made an assumption that FP contraction is off by default when compiling with "-O3" but without any additional optimization-related flags. > > Patch relative to upstream code with Git ID b0768e805d1d33d730e5bd44ba578df043dfbc66 > ------------------------------------------------------------------------------------ > > diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp > index 619ea9c..d02d873 100644 > --- a/clang/lib/Frontend/CompilerInvocation.cpp > +++ b/clang/lib/Frontend/CompilerInvocation.cpp > @@ -2437,6 +2437,13 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res, > if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) { > Res.getDiagnosticOpts().Warnings.push_back("spir-compat"); > } > + > + // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with "LangOpts.C" on the next line. > + if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ... > + /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) // ... // just being careful > + /*...*/ && (Res.getCodeGenOpts().OptimizationLevel >= 3) ) > + LangOpts.DefaultFPContract = 1; > + > return Success; > } > > diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp > index 1c5921a..0949272 100644 > --- a/clang/test/CodeGen/fp-contract-pragma.cpp > +++ b/clang/test/CodeGen/fp-contract-pragma.cpp > @@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) { > // CHECK: _Z13fp_contract_2fff > // CHECK: %[[M:.+]] = fmul float %a, %b > // CHECK-NEXT: fadd float %[[M]], %c > + #pragma STDC FP_CONTRACT OFF > { > #pragma STDC FP_CONTRACT ON > } > > Please give me any and all feedback you may have on this suggested change and this proposed patch. > > Regards, > > Abe > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Abe Skolnik via llvm-dev
2016-Sep-09 22:21 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
On 09/09/2016 04:31 PM, Stephen Canon wrote:> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.Why so? [honestly asking, not arguing] My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast". At any rate, the above change is trivial to apply to my recent proposed patch: just remove the "&& (Res.getCodeGenOpts().OptimizationLevel >= 3)" part of the condition. Regards, Abe
Maybe Matching Threads
- defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
- defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
- defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"