thr3ads.net - llvm dev - [llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2016-Oct-12 09:01 UTC

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 05:35, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> polybench/linear-algebra/solvers/gramschmidt/ exposes the same problems as
symm.
> It does not match the reference output at -O0 -ffp-contract=off,
> and it only passes all elements comparisons for FP_ABSTOLERANCE=1 for
> "-Ofast" vs. "-O0 -ffp-contract=off".
I think we're going about this in a completely wrong way.

The current reference output is specific to fp-contract=off, and
making it work for fp-contract=on makes no sense at all.

For all we know, fp-contract=on generates *more accurate* results, not
less. But it may also have less predictable results *across* different
targets, thus the need to a tolerance.

FP_TOLERANCE is *not* about making the new results match an old
reference, but about showing the *real* uncertainties of FP
transformation on *different* targets.

So, if you want to fix this test for good, here are the steps you need to take:

1. Checkout the test-suite on different platforms, x86_64, ARM,
AArch64, PPC, MIPS. The more the merrier.
2. Enable fp-contract=on, run the tests on all platforms, record the
outputs, ignore the differences.
3. Collate each platofrm's output for each test and see how different they
are

To make it easier to compare, in the past, I've used this trick:

1. Run in one platform, ex. x86_64, ignored the reference
2. Copy the output of those tests back to the reference_output
3. Run on a different platform, tweaking the tolerance until it
"passes"
4. Run on yet another platform, making sure you don't need to tweak
the tolerance yet again

If the tolerance is "too high" for that test, we can further discuss
how to change it to make it better. If not, you found a solution.

If you want to make it even better, do some analysis on the
distribution of the results, per test, and pick the average as the
reference output and one or two standard deviations as the tolerance.
This should pass on most architectures.

To simplify the analysis, you can reduce the output into a single
number, say, adding all the results up. This will generate more
inaccuracies than comparing each value, and if that's too large an
error, then you reduce the number of samples.

For example, on cholesky, we sampled every 16th item of the array:

  for (i = 0; i < n; i++) {
    for (j = 0; j < n; j++)
      print_element(A[i][j], j*16, printmat);
    fputs(printmat, stderr);
  }

using "print_element" because calling printf sucks.

These modifications are ok, because they don't change the tests nor
hides them from compiler changes.

cheers,
--renato

Sebastian Pop via llvm-dev

2016-Oct-12 12:04 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 4:01 AM, Renato Golin <renato.golin at linaro.org>
wrote:> On 12 October 2016 at 05:35, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:
>> polybench/linear-algebra/solvers/gramschmidt/ exposes the same problems
as symm.
>> It does not match the reference output at -O0 -ffp-contract=off,
>> and it only passes all elements comparisons for FP_ABSTOLERANCE=1 for
>> "-Ofast" vs. "-O0 -ffp-contract=off".
>
> I think we're going about this in a completely wrong way.
>
> The current reference output is specific to fp-contract=off, and
> making it work for fp-contract=on makes no sense at all.
Yes.

I want to mention that there are two problems: one is with the FP tolerance
as you describe below.
The other problem is the reference output does not match
at "-O0 -ffp-contract=off". It might be that the reference output was
recorded
at "-O3 -ffp-contract=off". I think that this hides either a compiler
bug or a test bug.

Sebastian
>
> For all we know, fp-contract=on generates *more accurate* results, not
> less. But it may also have less predictable results *across* different
> targets, thus the need to a tolerance.
>
> FP_TOLERANCE is *not* about making the new results match an old
> reference, but about showing the *real* uncertainties of FP
> transformation on *different* targets.
>
> So, if you want to fix this test for good, here are the steps you need to
take:
>
> 1. Checkout the test-suite on different platforms, x86_64, ARM,
> AArch64, PPC, MIPS. The more the merrier.
> 2. Enable fp-contract=on, run the tests on all platforms, record the
> outputs, ignore the differences.
> 3. Collate each platofrm's output for each test and see how different
they are
>
> To make it easier to compare, in the past, I've used this trick:
>
> 1. Run in one platform, ex. x86_64, ignored the reference
> 2. Copy the output of those tests back to the reference_output
> 3. Run on a different platform, tweaking the tolerance until it
"passes"
> 4. Run on yet another platform, making sure you don't need to tweak
> the tolerance yet again
>
> If the tolerance is "too high" for that test, we can further
discuss
> how to change it to make it better. If not, you found a solution.
>
> If you want to make it even better, do some analysis on the
> distribution of the results, per test, and pick the average as the
> reference output and one or two standard deviations as the tolerance.
> This should pass on most architectures.
>
> To simplify the analysis, you can reduce the output into a single
> number, say, adding all the results up. This will generate more
> inaccuracies than comparing each value, and if that's too large an
> error, then you reduce the number of samples.
>
> For example, on cholesky, we sampled every 16th item of the array:
>
>   for (i = 0; i < n; i++) {
>     for (j = 0; j < n; j++)
>       print_element(A[i][j], j*16, printmat);
>     fputs(printmat, stderr);
>   }
>
> using "print_element" because calling printf sucks.
>
> These modifications are ok, because they don't change the tests nor
> hides them from compiler changes.
>
> cheers,
> --renato

Renato Golin via llvm-dev

2016-Oct-12 12:49 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 13:04, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> The other problem is the reference output does not match
> at "-O0 -ffp-contract=off". It might be that the reference output
was recorded
> at "-O3 -ffp-contract=off". I think that this hides either a
compiler
> bug or a test bug.
Ah, yes! You mentioned before and I forgot to reply, you're absolutely
right.

If the tolerance is zero, then it's "ok" to "fail" at
O0, because
whatever O3 produces is "some" version of the expected value +- some
delta. The error is expecting the tolerance to be zero (or smaller
than delta).

My point, since the beginning, has been to understand what the
expected value (with its inherent error bars), and make that the
reference output. Only then the test will be meaningful *and*
accurate.

But there are so many overloaded terms in this conversation that it's
really hard to get a point across without going to great lengths to
explain each one. :)

cheers,
--renato

PS: the term "accurate" above is meant to "accurately test the
expected error ranges the compiler is allowed to produce", not that
the test will have a lower error bar. It demonstrates the term
overloading quite well. :)

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Oct 2016 - [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Reasonably Related Threads