Chris Lattner via llvm-dev
2016-Sep-10 02:40 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
> On Sep 9, 2016, at 3:27 PM, Steve Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Sent from my iPhone > > On Sep 9, 2016, at 6:21 PM, Abe Skolnik <a.skolnik at samsung.com <mailto:a.skolnik at samsung.com>> wrote: > >> On 09/09/2016 04:31 PM, Stephen Canon wrote: >> >>> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally. >> >> Why so? [honestly asking, not arguing] >> >> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast". > > Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for: auto x = a*b; auto y = x+c; but we would do that at -O2 since we do mem2reg on x. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160909/f052e785/attachment.html>
Steve Canon via llvm-dev
2016-Sep-10 10:33 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
Sent from my iPhone> On Sep 9, 2016, at 10:40 PM, Chris Lattner <clattner at apple.com> wrote: > > >> On Sep 9, 2016, at 3:27 PM, Steve Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> >> >> Sent from my iPhone >> >>> On Sep 9, 2016, at 6:21 PM, Abe Skolnik <a.skolnik at samsung.com> wrote: >>> >>>> On 09/09/2016 04:31 PM, Stephen Canon wrote: >>>> >>>> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally. >>> >>> Why so? [honestly asking, not arguing] >>> >>> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast". >> >> Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c). > > I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for: > > auto x = a*b; > auto y = x+c; > > but we would do that at -O2 since we do mem2reg on x.In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement. IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present. You're definitely correct that there will still be differences; e.g.: x = a*b + c; y = a*b; It might be that at some optimization level we prove y is unused / constant / etc. When targeting a machine where fma is costlier than mul, we generate mul+add in one case and fma in the other. These cases are necessarily rarer than if we gate it on optimization level, however. (And we want the perf win for -O0 anyway). TLDR: yeah, let's do this. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160910/054b5969/attachment.html>
Chris Lattner via llvm-dev
2016-Sep-11 01:18 UTC
[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
On Sep 10, 2016, at 3:33 AM, Steve Canon <scanon at apple.com> wrote:>>> >>> Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c). >> >> I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for: >> >> auto x = a*b; >> auto y = x+c; >> >> but we would do that at -O2 since we do mem2reg on x. > > In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement.Ah ok. What’s GCC’s policy on this?> IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.Is language standard pedanticism what we want to base our policies on? It’s great to not violate the standard of course, but it would be suboptimal for switching a .c file to .cpp to change its behavior. I’m not sure which way this cuts on this topic though, or if the cost is worth bearing.> TLDR: yeah, let's do this.Nice :-) -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160910/e9ace59a/attachment.html>
Possibly Parallel Threads
- defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
- AArch64 fmul/fadd fusion
- AArch64 fmul/fadd fusion
- Question about Instruction Selection
- Question about Instruction Selection