Renato Golin via llvm-dev
2016-Oct-12 12:49 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 13:04, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> The other problem is the reference output does not match > at "-O0 -ffp-contract=off". It might be that the reference output was recorded > at "-O3 -ffp-contract=off". I think that this hides either a compiler > bug or a test bug.Ah, yes! You mentioned before and I forgot to reply, you're absolutely right. If the tolerance is zero, then it's "ok" to "fail" at O0, because whatever O3 produces is "some" version of the expected value +- some delta. The error is expecting the tolerance to be zero (or smaller than delta). My point, since the beginning, has been to understand what the expected value (with its inherent error bars), and make that the reference output. Only then the test will be meaningful *and* accurate. But there are so many overloaded terms in this conversation that it's really hard to get a point across without going to great lengths to explain each one. :) cheers, --renato PS: the term "accurate" above is meant to "accurately test the expected error ranges the compiler is allowed to produce", not that the test will have a lower error bar. It demonstrates the term overloading quite well. :)
Sebastian Pop via llvm-dev
2016-Oct-12 13:26 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 8:49 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 12 October 2016 at 13:04, Sebastian Pop <sebpop.llvm at gmail.com> wrote: >> The other problem is the reference output does not match >> at "-O0 -ffp-contract=off". It might be that the reference output was recorded >> at "-O3 -ffp-contract=off". I think that this hides either a compiler >> bug or a test bug. > > Ah, yes! You mentioned before and I forgot to reply, you're absolutely right. > > If the tolerance is zero, then it's "ok" to "fail" at O0, because > whatever O3 produces is "some" version of the expected value +- some > delta. The error is expecting the tolerance to be zero (or smaller > than delta). > > My point, since the beginning, has been to understand what the > expected value (with its inherent error bars), and make that the > reference output. Only then the test will be meaningful *and* > accurate.Correct me if I misunderstood: you would be ok changing the reference output to exactly match the output of "-O0 -ffp-contract=off". I am asking this for practical reasons: clang currently only supports __attribute__((optnone)) to compile a function at -O0. All other optimization levels are not yet supported. In the updated patch for Proposal 2: https://reviews.llvm.org/D25346 we do use that attribute together with #pragma STDC FP_CONTRACT OFF to compile the kernel_StrictFP() function at "-O0 -ffp-contract=off". The output of kernel_StrictFP is then used in exact matching against the reference output. In polybench there are 5 benchmarks that need adjustment of the reference output to match the output of optnone. polybench/linear-algebra/kernels/symm polybench/linear-algebra/solvers/gramschmidt polybench/medley/reg_detect polybench/stencils/adi polybench/stencils/seidel-2d Thanks, Sebastian
Renato Golin via llvm-dev
2016-Oct-12 13:35 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> Correct me if I misunderstood: you would be ok changing the > reference output to exactly match the output of "-O0 -ffp-contract=off".No, that's not at all what I said. Matching identical outputs to FP tests makes no sense because there's *always* an error bar. The output of O0, O1, O2, O3, Ofast, Os, Oz should all be within the boundaries of an average and its associated error bar. By understanding what's the *expected* output and its associated error range we can accurately predict what will be the correct reference_output and the tolerance for each individual test. Your solution 2 "works" because you're doing the matching yourself, in the code, and for that, you pay the penalty of running it twice. But it's not easy to control the tolerance, nor it's stable for all platforms where we don't yet run the test suite. My original proposal, and what I'm still proposing here, is to understand the tests and make them right, by giving them proper references and tolerances. If the output is too large, reduce/sample in a way that doesn't increase the error ranges too much, enough to keep the tolerance low, so we can still catch bugs in the FP transformations. cheers, --renato