Sebastian Pop via llvm-dev
2016-Oct-12 14:43 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:28 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> From: "Renato Golin" <renato.golin at linaro.org> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" >> <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>, >> "Sebastian Pop" <sebpop.llvm at gmail.com> >> Sent: Wednesday, October 12, 2016 9:16:39 AM >> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" >> >> On 12 October 2016 at 15:05, Hal Finkel <hfinkel at anl.gov> wrote: >> > This is something we need to understand. No, there's not always an >> > error bar. With FMA formation and without non-IEEE-compliant >> > optimizations (i.e. fast-math), the optimized answer should be >> > identical to the non-optimized answer. >> >> What about architectures that this is never respected, like Darwin? >> >> In the general case, indeed, optimisation levels should not change >> the >> IEEE representation and the tests should be deterministic. >> >> But we can't guarantee this will always be the case. >> >> >> > We still do see cross-system discrepancies sometimes because of >> > differences in denormal handling, but on the same system that >> > should be consistent (aside, perhaps, from compiler-level >> > constant-folding issues). >> >> But the test-suite doesn't run on a single system, nor it has one >> reference_output for each system. > > I agree and understand, and we may need a tolerance in practice to deal with differences from denormal handling, etc. However, if Sebastian is seeing differences on the same system, we should understand why. Is he running on an ARM Darwin system, or an x86 using fp80 arithmetic,My dev machine is an x86_64-linux. This is where I ran all my reported results. How do I determine whether I am using fp80 arithmetic? Thanks, Sebastian
Hal Finkel via llvm-dev
2016-Oct-12 14:53 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
----- Original Message -----> From: "Sebastian Pop" <sebpop.llvm at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Renato Golin" <renato.golin at linaro.org>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" > <llvm-dev at lists.llvm.org>, "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" > <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com> > Sent: Wednesday, October 12, 2016 9:43:37 AM > Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" > > On Wed, Oct 12, 2016 at 10:28 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- > >> From: "Renato Golin" <renato.golin at linaro.org> > >> To: "Hal Finkel" <hfinkel at anl.gov> > >> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" > >> <llvm-dev at lists.llvm.org>, "Matthias Braun" > >> <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" > >> <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>, > >> "Sebastian Pop" <sebpop.llvm at gmail.com> > >> Sent: Wednesday, October 12, 2016 9:16:39 AM > >> Subject: Re: [test-suite] making polybench/symm succeed with > >> "-Ofast" and "-ffp-contract=on" > >> > >> On 12 October 2016 at 15:05, Hal Finkel <hfinkel at anl.gov> wrote: > >> > This is something we need to understand. No, there's not always > >> > an > >> > error bar. With FMA formation and without non-IEEE-compliant > >> > optimizations (i.e. fast-math), the optimized answer should be > >> > identical to the non-optimized answer. > >> > >> What about architectures that this is never respected, like > >> Darwin? > >> > >> In the general case, indeed, optimisation levels should not change > >> the > >> IEEE representation and the tests should be deterministic. > >> > >> But we can't guarantee this will always be the case. > >> > >> > >> > We still do see cross-system discrepancies sometimes because of > >> > differences in denormal handling, but on the same system that > >> > should be consistent (aside, perhaps, from compiler-level > >> > constant-folding issues). > >> > >> But the test-suite doesn't run on a single system, nor it has one > >> reference_output for each system. > > > > I agree and understand, and we may need a tolerance in practice to > > deal with differences from denormal handling, etc. However, if > > Sebastian is seeing differences on the same system, we should > > understand why. Is he running on an ARM Darwin system, or an x86 > > using fp80 arithmetic, > > My dev machine is an x86_64-linux. This is where I ran all my > reported results. > How do I determine whether I am using fp80 arithmetic?I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. -Hal> > Thanks, > Sebastian >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Sebastian Pop via llvm-dev
2016-Oct-12 15:29 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. >The following tests pass at "-O3" and "-O3 -ffp-contract=on" compared with FP_ABSTOLERANCE=1e-5 against "-O0 -ffp-contract=off": polybench/linear-algebra/kernels/symm polybench/linear-algebra/solvers/gramschmidt polybench/stencils/seidel-2d The output of these 3 tests from "-O0 -ffp-contract=off" also matches the reference output. The following 2 tests still require increased FP_ABSTOLERANCE to pass compare between "-O3", "-O3 -ffp-contract=on" vs. "-O0 -ffp-contract=off" polybench/medley/reg_detect, FP_ABSTOLERANCE=1e4 polybench/stencils/adi, FP_ABSTOLERANCE=1e4 The reference output of these two is also not matching when compiled at "-O3" or "-O3 -ffp-contract=on". When configuring the test-suite without specifying CFLAGS, Polybench is compiled at no optimization level.
Matthias Braun via llvm-dev
2016-Oct-12 18:36 UTC
[llvm-dev] [cfe-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
> On Oct 12, 2016, at 7:53 AM, Hal Finkel via cfe-dev <cfe-dev at lists.llvm.org> wrote: > > ----- Original Message ----- >> From: "Sebastian Pop" <sebpop.llvm at gmail.com <mailto:sebpop.llvm at gmail.com>> >> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> >> Cc: "Renato Golin" <renato.golin at linaro.org <mailto:renato.golin at linaro.org>>, "Sebastian Paul Pop" <s.pop at samsung.com <mailto:s.pop at samsung.com>>, "llvm-dev" >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>, "Matthias Braun" <matze at braunis.de <mailto:matze at braunis.de>>, "Clang Dev" <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>, "nd" >> <nd at arm.com <mailto:nd at arm.com>>, "Abe Skolnik" <a.skolnik at samsung.com <mailto:a.skolnik at samsung.com>> >> Sent: Wednesday, October 12, 2016 9:43:37 AM >> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" >> >> On Wed, Oct 12, 2016 at 10:28 AM, Hal Finkel <hfinkel at anl.gov> wrote: >>> ----- Original Message ----- >>>> From: "Renato Golin" <renato.golin at linaro.org> >>>> To: "Hal Finkel" <hfinkel at anl.gov> >>>> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" >>>> <llvm-dev at lists.llvm.org>, "Matthias Braun" >>>> <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" >>>> <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>, >>>> "Sebastian Pop" <sebpop.llvm at gmail.com> >>>> Sent: Wednesday, October 12, 2016 9:16:39 AM >>>> Subject: Re: [test-suite] making polybench/symm succeed with >>>> "-Ofast" and "-ffp-contract=on" >>>> >>>> On 12 October 2016 at 15:05, Hal Finkel <hfinkel at anl.gov> wrote: >>>>> This is something we need to understand. No, there's not always >>>>> an >>>>> error bar. With FMA formation and without non-IEEE-compliant >>>>> optimizations (i.e. fast-math), the optimized answer should be >>>>> identical to the non-optimized answer. >>>> >>>> What about architectures that this is never respected, like >>>> Darwin? >>>> >>>> In the general case, indeed, optimisation levels should not change >>>> the >>>> IEEE representation and the tests should be deterministic. >>>> >>>> But we can't guarantee this will always be the case. >>>> >>>> >>>>> We still do see cross-system discrepancies sometimes because of >>>>> differences in denormal handling, but on the same system that >>>>> should be consistent (aside, perhaps, from compiler-level >>>>> constant-folding issues). >>>> >>>> But the test-suite doesn't run on a single system, nor it has one >>>> reference_output for each system. >>> >>> I agree and understand, and we may need a tolerance in practice to >>> deal with differences from denormal handling, etc. However, if >>> Sebastian is seeing differences on the same system, we should >>> understand why. Is he running on an ARM Darwin system, or an x86 >>> using fp80 arithmetic, >> >> My dev machine is an x86_64-linux. This is where I ran all my >> reported results. >> How do I determine whether I am using fp80 arithmetic? > > I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences.On x86_64 we generally use the SSE units as much as possible because they are faster. The only exception to the rule is long double which uses x87/fp80. (32bit is a different story and generally uses more x87/fp80 because of ABI constraints). - Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/31e6ea2e/attachment.html>