Renato Golin via llvm-dev
2016-Oct-12  09:01 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 05:35, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> polybench/linear-algebra/solvers/gramschmidt/ exposes the same problems as symm. > It does not match the reference output at -O0 -ffp-contract=off, > and it only passes all elements comparisons for FP_ABSTOLERANCE=1 for > "-Ofast" vs. "-O0 -ffp-contract=off".I think we're going about this in a completely wrong way. The current reference output is specific to fp-contract=off, and making it work for fp-contract=on makes no sense at all. For all we know, fp-contract=on generates *more accurate* results, not less. But it may also have less predictable results *across* different targets, thus the need to a tolerance. FP_TOLERANCE is *not* about making the new results match an old reference, but about showing the *real* uncertainties of FP transformation on *different* targets. So, if you want to fix this test for good, here are the steps you need to take: 1. Checkout the test-suite on different platforms, x86_64, ARM, AArch64, PPC, MIPS. The more the merrier. 2. Enable fp-contract=on, run the tests on all platforms, record the outputs, ignore the differences. 3. Collate each platofrm's output for each test and see how different they are To make it easier to compare, in the past, I've used this trick: 1. Run in one platform, ex. x86_64, ignored the reference 2. Copy the output of those tests back to the reference_output 3. Run on a different platform, tweaking the tolerance until it "passes" 4. Run on yet another platform, making sure you don't need to tweak the tolerance yet again If the tolerance is "too high" for that test, we can further discuss how to change it to make it better. If not, you found a solution. If you want to make it even better, do some analysis on the distribution of the results, per test, and pick the average as the reference output and one or two standard deviations as the tolerance. This should pass on most architectures. To simplify the analysis, you can reduce the output into a single number, say, adding all the results up. This will generate more inaccuracies than comparing each value, and if that's too large an error, then you reduce the number of samples. For example, on cholesky, we sampled every 16th item of the array: for (i = 0; i < n; i++) { for (j = 0; j < n; j++) print_element(A[i][j], j*16, printmat); fputs(printmat, stderr); } using "print_element" because calling printf sucks. These modifications are ok, because they don't change the tests nor hides them from compiler changes. cheers, --renato
Sebastian Pop via llvm-dev
2016-Oct-12  12:04 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 4:01 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 12 October 2016 at 05:35, Sebastian Pop <sebpop.llvm at gmail.com> wrote: >> polybench/linear-algebra/solvers/gramschmidt/ exposes the same problems as symm. >> It does not match the reference output at -O0 -ffp-contract=off, >> and it only passes all elements comparisons for FP_ABSTOLERANCE=1 for >> "-Ofast" vs. "-O0 -ffp-contract=off". > > I think we're going about this in a completely wrong way. > > The current reference output is specific to fp-contract=off, and > making it work for fp-contract=on makes no sense at all.Yes. I want to mention that there are two problems: one is with the FP tolerance as you describe below. The other problem is the reference output does not match at "-O0 -ffp-contract=off". It might be that the reference output was recorded at "-O3 -ffp-contract=off". I think that this hides either a compiler bug or a test bug. Sebastian> > For all we know, fp-contract=on generates *more accurate* results, not > less. But it may also have less predictable results *across* different > targets, thus the need to a tolerance. > > FP_TOLERANCE is *not* about making the new results match an old > reference, but about showing the *real* uncertainties of FP > transformation on *different* targets. > > So, if you want to fix this test for good, here are the steps you need to take: > > 1. Checkout the test-suite on different platforms, x86_64, ARM, > AArch64, PPC, MIPS. The more the merrier. > 2. Enable fp-contract=on, run the tests on all platforms, record the > outputs, ignore the differences. > 3. Collate each platofrm's output for each test and see how different they are > > To make it easier to compare, in the past, I've used this trick: > > 1. Run in one platform, ex. x86_64, ignored the reference > 2. Copy the output of those tests back to the reference_output > 3. Run on a different platform, tweaking the tolerance until it "passes" > 4. Run on yet another platform, making sure you don't need to tweak > the tolerance yet again > > If the tolerance is "too high" for that test, we can further discuss > how to change it to make it better. If not, you found a solution. > > If you want to make it even better, do some analysis on the > distribution of the results, per test, and pick the average as the > reference output and one or two standard deviations as the tolerance. > This should pass on most architectures. > > To simplify the analysis, you can reduce the output into a single > number, say, adding all the results up. This will generate more > inaccuracies than comparing each value, and if that's too large an > error, then you reduce the number of samples. > > For example, on cholesky, we sampled every 16th item of the array: > > for (i = 0; i < n; i++) { > for (j = 0; j < n; j++) > print_element(A[i][j], j*16, printmat); > fputs(printmat, stderr); > } > > using "print_element" because calling printf sucks. > > These modifications are ok, because they don't change the tests nor > hides them from compiler changes. > > cheers, > --renato
Renato Golin via llvm-dev
2016-Oct-12  12:49 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 13:04, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> The other problem is the reference output does not match > at "-O0 -ffp-contract=off". It might be that the reference output was recorded > at "-O3 -ffp-contract=off". I think that this hides either a compiler > bug or a test bug.Ah, yes! You mentioned before and I forgot to reply, you're absolutely right. If the tolerance is zero, then it's "ok" to "fail" at O0, because whatever O3 produces is "some" version of the expected value +- some delta. The error is expecting the tolerance to be zero (or smaller than delta). My point, since the beginning, has been to understand what the expected value (with its inherent error bars), and make that the reference output. Only then the test will be meaningful *and* accurate. But there are so many overloaded terms in this conversation that it's really hard to get a point across without going to great lengths to explain each one. :) cheers, --renato PS: the term "accurate" above is meant to "accurately test the expected error ranges the compiler is allowed to produce", not that the test will have a lower error bar. It demonstrates the term overloading quite well. :)
Apparently Analagous Threads
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"