Sebastian Pop via llvm-dev
2016-Oct-12 15:29 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. >The following tests pass at "-O3" and "-O3 -ffp-contract=on" compared with FP_ABSTOLERANCE=1e-5 against "-O0 -ffp-contract=off": polybench/linear-algebra/kernels/symm polybench/linear-algebra/solvers/gramschmidt polybench/stencils/seidel-2d The output of these 3 tests from "-O0 -ffp-contract=off" also matches the reference output. The following 2 tests still require increased FP_ABSTOLERANCE to pass compare between "-O3", "-O3 -ffp-contract=on" vs. "-O0 -ffp-contract=off" polybench/medley/reg_detect, FP_ABSTOLERANCE=1e4 polybench/stencils/adi, FP_ABSTOLERANCE=1e4 The reference output of these two is also not matching when compiled at "-O3" or "-O3 -ffp-contract=on". When configuring the test-suite without specifying CFLAGS, Polybench is compiled at no optimization level.
Sebastian Pop via llvm-dev
2016-Oct-14 13:59 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 11:29 AM, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> The following 2 tests still require increased FP_ABSTOLERANCE to pass > compare between "-O3", "-O3 -ffp-contract=on" vs. "-O0 -ffp-contract=off" > > polybench/medley/reg_detect, FP_ABSTOLERANCE=1e4 > polybench/stencils/adi, FP_ABSTOLERANCE=1e4 > > The reference output of these two is also not matching when compiled at > "-O3" or "-O3 -ffp-contract=on". When configuring the test-suite without > specifying CFLAGS, Polybench is compiled at no optimization level.For these two, I had an error in my patch when initializing the data for the array_StrictFP. With the following fix, they both pass with FP_ABSTOLERANCE=1e-5. diff --git a/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c b/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c index d491535..eb00da9 100644 --- a/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c +++ b/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c @@ -195,7 +195,7 @@ int main(int argc, char** argv) polybench_stop_instruments; polybench_print_instruments; - init_array (n, POLYBENCH_ARRAY(X), POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(B)); + init_array (n, POLYBENCH_ARRAY(X_StrictFP), POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(B)); kernel_adi (tsteps, n, POLYBENCH_ARRAY(X_StrictFP), POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(B)); if (!check_FP(n, POLYBENCH_ARRAY(X), POLYBENCH_ARRAY(X_StrictFP))) diff --git a/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c b/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c index 6f6fbaf..ce7d2c5 100644 --- a/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c +++ b/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c @@ -202,7 +202,7 @@ int main(int argc, char** argv) init_array (maxgrid, POLYBENCH_ARRAY(sum_tang), POLYBENCH_ARRAY(mean), - POLYBENCH_ARRAY(path)); + POLYBENCH_ARRAY(path_StrictFP)); kernel_reg_detect_StrictFP(niter, maxgrid, length, POLYBENCH_ARRAY(sum_tang), POLYBENCH_ARRAY(mean),
Sebastian Pop via llvm-dev
2016-Oct-14 14:50 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 11:29 AM, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote: >> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences. >> > > The following tests pass at "-O3" and "-O3 -ffp-contract=on" compared > with FP_ABSTOLERANCE=1e-5 against "-O0 -ffp-contract=off": > > polybench/linear-algebra/kernels/symm > polybench/linear-algebra/solvers/gramschmidt > polybench/stencils/seidel-2d >These 3 tests are passing with the following configurations: -O3 -ffp-contract=off -O3 -ffp-contract=on -O0 -ffp-contract=off -O0 -ffp-contract=on They are not passing at: -Ofast -ffp-contract=on -Ofast -ffp-contract=off Using Abe's CMake/Makefile variables to detect the use of -ffast-math, we could change the FP_ABSTOLERANCE at -Ofast: something like this if(TEST_SUITE_USES_FAST_MATH) add_definitions(-DFP_ABSTOLERANCE=1e0) else() add_definitions(-DFP_ABSTOLERANCE=1e-5) endif() The tests are passing at -Ofast with the following tolerances: polybench/linear-algebra/kernels/symm, FP_ABSTOLERANCE=1e1 polybench/linear-algebra/solvers/gramschmidt, FP_ABSTOLERANCE=1e0 polybench/stencils/seidel-2d, FP_ABSTOLERANCE=1e-5 The 3 tests are currently not passing at -Ofast with these FP_ABSTOLERANCE because the output of array_StrictFP does not match the hash. The cause may be related to a bug in handling -ffast-math and __attribute__((optnone)): $ clang -O3 -ffast-math f.c -S -o ofast.s $ clang -O3 f.c -S -o o3.s $ diff -u o3.s ofast.s --- o3.s 2016-10-14 10:39:46.411567948 -0400 +++ ofast.s 2016-10-14 10:39:45.079567919 -0400 @@ -109,16 +109,16 @@ addq %rax, %rcx movslq -64(%rsp), %rax mulsd (%rcx,%rax,8), %xmm1 - addsd %xmm0, %xmm1 - movsd -24(%rsp), %xmm0 # xmm0 = mem[0],zero - mulsd -56(%rsp), %xmm0 - addsd %xmm1, %xmm0 + movsd -24(%rsp), %xmm2 # xmm2 = mem[0],zero + mulsd -56(%rsp), %xmm2 + addsd %xmm0, %xmm2 + addsd %xmm1, %xmm2 movq -32(%rsp), %rax movslq -68(%rsp), %rcx shlq $13, %rcx addq %rax, %rcx movslq -64(%rsp), %rax - movsd %xmm0, (%rcx,%rax,8) + movsd %xmm2, (%rcx,%rax,8) # BB#9: # %for.inc50 # in Loop: Header=BB0_3 Depth=2 movl -64(%rsp), %eax $ cat f.c __attribute__((optnone)) void kernel_symm_StrictFP(int ni, int nj, double alpha, double beta, double C[1024 + 0][1024 + 0], double A[1024 + 0][1024 + 0], double B[1024 + 0][1024 + 0]) { #pragma STDC FP_CONTRACT OFF int i, j, k; double acc; for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) { acc = 0; for (k = 0; k < j - 1; k++) { C[k][j] += alpha * A[k][i] * B[i][j]; acc += B[k][j] * A[k][i]; } C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha * acc; } }
Renato Golin via llvm-dev
2016-Oct-14 15:31 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 14 October 2016 at 14:59, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> For these two, I had an error in my patch when initializing the data > for the array_StrictFP. > With the following fix, they both pass with FP_ABSTOLERANCE=1e-5.That looks *a lot* better! :) cheers, --renato
Renato Golin via llvm-dev
2016-Oct-14 15:36 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 14 October 2016 at 15:50, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> These 3 tests are passing with the following configurations: > -O3 -ffp-contract=off > -O3 -ffp-contract=on > -O0 -ffp-contract=off > -O0 -ffp-contract=on > > They are not passing at: > -Ofast -ffp-contract=on > -Ofast -ffp-contract=offLet's separate completely FP-contract and fast-math. They're different things and need different solutions.> if(TEST_SUITE_USES_FAST_MATH) > add_definitions(-DFP_ABSTOLERANCE=1e0) > else() > add_definitions(-DFP_ABSTOLERANCE=1e-5) > endif()This doesn't make sense. If my program decreased precision by 5 orders of magnitude with -ffast-math, I'd be *very* worried. I hope that fast-math in Clang isn't that broken, so that's probably to do with the assumptions in the output reduction phase. But, as I said, let's do Ofast *later*. One thing at a time. cheers, --renato