thr3ads.net - llvm dev - [llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Sebastian Pop via llvm-dev

2016-Oct-12 15:29 UTC

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov>
wrote:> I don't think that Clang/LLVM uses it by default on x86_64. If
you're using -Ofast, however, that would explain it. I recommend looking at
-O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which
can legitimately cause differences.
>
The following tests pass at "-O3" and "-O3 -ffp-contract=on"
compared
with FP_ABSTOLERANCE=1e-5 against "-O0 -ffp-contract=off":

polybench/linear-algebra/kernels/symm
polybench/linear-algebra/solvers/gramschmidt
polybench/stencils/seidel-2d

The output of these 3 tests from "-O0 -ffp-contract=off" also matches
the reference output.

The following 2 tests still require increased FP_ABSTOLERANCE to pass
compare between "-O3", "-O3 -ffp-contract=on" vs. "-O0
-ffp-contract=off"

polybench/medley/reg_detect, FP_ABSTOLERANCE=1e4
polybench/stencils/adi, FP_ABSTOLERANCE=1e4

The reference output of these two is also not matching when compiled at
"-O3" or "-O3 -ffp-contract=on".  When configuring the
test-suite without
specifying CFLAGS, Polybench is compiled at no optimization level.

Sebastian Pop via llvm-dev

2016-Oct-14 13:59 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 11:29 AM, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> The following 2 tests still require increased FP_ABSTOLERANCE to pass
> compare between "-O3", "-O3 -ffp-contract=on" vs.
"-O0 -ffp-contract=off"
>
> polybench/medley/reg_detect, FP_ABSTOLERANCE=1e4
> polybench/stencils/adi, FP_ABSTOLERANCE=1e4
>
> The reference output of these two is also not matching when compiled at
> "-O3" or "-O3 -ffp-contract=on".  When configuring the
test-suite without
> specifying CFLAGS, Polybench is compiled at no optimization level.
For these two, I had an error in my patch when initializing the data
for the array_StrictFP.
With the following fix, they both pass with FP_ABSTOLERANCE=1e-5.

diff --git a/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c
b/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c
index d491535..eb00da9 100644
--- a/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c
+++ b/SingleSource/Benchmarks/Polybench/stencils/adi/adi.c
@@ -195,7 +195,7 @@ int main(int argc, char** argv)
   polybench_stop_instruments;
   polybench_print_instruments;

-  init_array (n, POLYBENCH_ARRAY(X), POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(B));
+  init_array (n, POLYBENCH_ARRAY(X_StrictFP), POLYBENCH_ARRAY(A),
POLYBENCH_ARRAY(B));
   kernel_adi (tsteps, n, POLYBENCH_ARRAY(X_StrictFP),
              POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(B));
   if (!check_FP(n, POLYBENCH_ARRAY(X), POLYBENCH_ARRAY(X_StrictFP)))
diff --git a/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c
b/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c
index 6f6fbaf..ce7d2c5 100644
--- a/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c
+++ b/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.c
@@ -202,7 +202,7 @@ int main(int argc, char** argv)
   init_array (maxgrid,
              POLYBENCH_ARRAY(sum_tang),
              POLYBENCH_ARRAY(mean),
-             POLYBENCH_ARRAY(path));
+             POLYBENCH_ARRAY(path_StrictFP));
   kernel_reg_detect_StrictFP(niter, maxgrid, length,
                              POLYBENCH_ARRAY(sum_tang),
                              POLYBENCH_ARRAY(mean),

Sebastian Pop via llvm-dev

2016-Oct-14 14:50 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 11:29 AM, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov>
wrote:
>> I don't think that Clang/LLVM uses it by default on x86_64. If
you're using -Ofast, however, that would explain it. I recommend looking at
-O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which
can legitimately cause differences.
>>
>
> The following tests pass at "-O3" and "-O3
-ffp-contract=on" compared
> with FP_ABSTOLERANCE=1e-5 against "-O0 -ffp-contract=off":
>
> polybench/linear-algebra/kernels/symm
> polybench/linear-algebra/solvers/gramschmidt
> polybench/stencils/seidel-2d
>
These 3 tests are passing with the following configurations:
-O3 -ffp-contract=off
-O3 -ffp-contract=on
-O0 -ffp-contract=off
-O0 -ffp-contract=on

They are not passing at:
-Ofast -ffp-contract=on
-Ofast -ffp-contract=off

Using Abe's CMake/Makefile variables to detect the use of -ffast-math,
we could change the FP_ABSTOLERANCE at -Ofast: something like this

if(TEST_SUITE_USES_FAST_MATH)
  add_definitions(-DFP_ABSTOLERANCE=1e0)
else()
  add_definitions(-DFP_ABSTOLERANCE=1e-5)
endif()

The tests are passing at -Ofast with the following tolerances:

polybench/linear-algebra/kernels/symm, FP_ABSTOLERANCE=1e1
polybench/linear-algebra/solvers/gramschmidt, FP_ABSTOLERANCE=1e0
polybench/stencils/seidel-2d, FP_ABSTOLERANCE=1e-5

The 3 tests are currently not passing at -Ofast with these FP_ABSTOLERANCE
because the output of array_StrictFP does not match the hash.
The cause may be related to a bug in handling -ffast-math and
__attribute__((optnone)):
$ clang -O3 -ffast-math f.c -S -o ofast.s
$ clang -O3 f.c -S -o o3.s
$ diff -u o3.s ofast.s
--- o3.s        2016-10-14 10:39:46.411567948 -0400
+++ ofast.s     2016-10-14 10:39:45.079567919 -0400
@@ -109,16 +109,16 @@
        addq    %rax, %rcx
        movslq  -64(%rsp), %rax
        mulsd   (%rcx,%rax,8), %xmm1
-       addsd   %xmm0, %xmm1
-       movsd   -24(%rsp), %xmm0        # xmm0 = mem[0],zero
-       mulsd   -56(%rsp), %xmm0
-       addsd   %xmm1, %xmm0
+       movsd   -24(%rsp), %xmm2        # xmm2 = mem[0],zero
+       mulsd   -56(%rsp), %xmm2
+       addsd   %xmm0, %xmm2
+       addsd   %xmm1, %xmm2
        movq    -32(%rsp), %rax
        movslq  -68(%rsp), %rcx
        shlq    $13, %rcx
        addq    %rax, %rcx
        movslq  -64(%rsp), %rax
-       movsd   %xmm0, (%rcx,%rax,8)
+       movsd   %xmm2, (%rcx,%rax,8)
 # BB#9:                                 # %for.inc50
                                         #   in Loop: Header=BB0_3 Depth=2
        movl    -64(%rsp), %eax

$ cat f.c
__attribute__((optnone))
void kernel_symm_StrictFP(int ni, int nj,
                          double alpha,
                          double beta,
                          double C[1024 + 0][1024 + 0],
                          double A[1024 + 0][1024 + 0],
                          double B[1024 + 0][1024 + 0])
{
#pragma STDC FP_CONTRACT OFF
  int i, j, k;
  double acc;
  for (i = 0; i < ni; i++)
    for (j = 0; j < nj; j++)
      {
 acc = 0;
 for (k = 0; k < j - 1; k++)
   {
     C[k][j] += alpha * A[k][i] * B[i][j];
     acc += B[k][j] * A[k][i];
   }
 C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha * acc;
      }
}

Renato Golin via llvm-dev

2016-Oct-14 15:31 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 14 October 2016 at 14:59, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> For these two, I had an error in my patch when initializing the data
> for the array_StrictFP.
> With the following fix, they both pass with FP_ABSTOLERANCE=1e-5.
That looks *a lot* better! :)

cheers,
--renato

Renato Golin via llvm-dev

2016-Oct-14 15:36 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 14 October 2016 at 15:50, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> These 3 tests are passing with the following configurations:
> -O3 -ffp-contract=off
> -O3 -ffp-contract=on
> -O0 -ffp-contract=off
> -O0 -ffp-contract=on
>
> They are not passing at:
> -Ofast -ffp-contract=on
> -Ofast -ffp-contract=off
Let's separate completely FP-contract and fast-math. They're different
things and need different solutions.

> if(TEST_SUITE_USES_FAST_MATH)
>   add_definitions(-DFP_ABSTOLERANCE=1e0)
> else()
>   add_definitions(-DFP_ABSTOLERANCE=1e-5)
> endif()
This doesn't make sense. If my program decreased precision by 5 orders
of magnitude with -ffast-math, I'd be *very* worried.

I hope that fast-math in Clang isn't that broken, so that's probably
to do with the assumptions in the output reduction phase.

But, as I said, let's do Ofast *later*. One thing at a time.

cheers,
--renato

llvm dev - Oct 2016 - [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"