Sebastian Pop via llvm-dev
2016-Oct-10 14:10 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
Hi, I would need some help to fix polybench/symm: void kernel_symm(int ni, int nj, DATA_TYPE alpha, DATA_TYPE beta, DATA_TYPE POLYBENCH_2D(C,NI,NJ,ni,nj), DATA_TYPE POLYBENCH_2D(A,NJ,NJ,nj,nj), DATA_TYPE POLYBENCH_2D(B,NI,NJ,ni,nj)) { int i, j, k; DATA_TYPE acc; /* C := alpha*A*B + beta*C, A is symetric */ for (i = 0; i < _PB_NI; i++) for (j = 0; j < _PB_NJ; j++) { acc = 0; for (k = 0; k < j - 1; k++) { C[k][j] += alpha * A[k][i] * B[i][j]; acc += B[k][j] * A[k][i]; } C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha * acc; } } Compiling this kernel with __attribute__((optnone)) and outputing the contents of the C[][] array does not match the reference output. Furthermore, compiling this kernel at -Ofast and comparing against -O0 only passes for FP_ABSTOLERANCE=10. All the 10 other polybench tests that I have transformed to check FP are passing at FP_ABSTOLERANCE=1e-5 (and most likely they could pass at an even more reduced tolerance.) The symm benchmark seems to accumulate all the errors as it is a big reduction from the first elements of the C[][] array into the last elements. I'm not sure we can rely on this benchmark to check FP correctness. One option is to completely specify which optimization flags have been used to compute the reference output and only use that to compile this benchmark. Please share your ideas on how to deal with this particular test. Thanks, Sebastian
Hal Finkel via llvm-dev
2016-Oct-10 22:02 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
----- Original Message -----> From: "Sebastian Pop" <sebpop.llvm at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" > <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>, > "Renato Golin" <renato.golin at linaro.org> > Sent: Monday, October 10, 2016 9:10:01 AM > Subject: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" > > Hi, > > I would need some help to fix polybench/symm: > > void kernel_symm(int ni, int nj, > DATA_TYPE alpha, > DATA_TYPE beta, > DATA_TYPE POLYBENCH_2D(C,NI,NJ,ni,nj), > DATA_TYPE POLYBENCH_2D(A,NJ,NJ,nj,nj), > DATA_TYPE POLYBENCH_2D(B,NI,NJ,ni,nj)) > { > int i, j, k; > DATA_TYPE acc; > > /* C := alpha*A*B + beta*C, A is symetric */ > for (i = 0; i < _PB_NI; i++) > for (j = 0; j < _PB_NJ; j++) > { > acc = 0; > for (k = 0; k < j - 1; k++) > { > C[k][j] += alpha * A[k][i] * B[i][j]; > acc += B[k][j] * A[k][i]; > } > C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha > * acc; > } > } > > Compiling this kernel with __attribute__((optnone)) and outputing the > contents of the C[][] array does not match the reference output.Why is this? What compiler are you using? Are we not using IEEE FP @ -O0 (e.g. using x87 floating point)? IEEE FP, without FMA, should be completely deterministic. Sounds like a bug.> Furthermore, compiling this kernel at -Ofast and comparing against > -O0 > only passes for FP_ABSTOLERANCE=10. > All the 10 other polybench tests that I have transformed to check FP > are passing at FP_ABSTOLERANCE=1e-5 (and most likely they could pass > at an even more reduced tolerance.) > > The symm benchmark seems to accumulate all the errors as it is a big > reduction from the first elements of the C[][] array into the last > elements. > I'm not sure we can rely on this benchmark to check FP correctness. > > One option is to completely specify which optimization flags have > been > used to compute the reference output and only use that to compile > this > benchmark. > > Please share your ideas on how to deal with this particular test.If the test is not numerically stable, we can: 1. Only test the non-FP-contracted output 2. Run the FP-contracted test only for a very small size (so that we'll stay within some reasonable tolerance of the reference output) 3. Change the matrix to something that will make the test numerically stable (it does not look like the matrix itself matters to the performance; where do the values come from?). -Hal> > Thanks, > Sebastian >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Sebastian Pop via llvm-dev
2016-Oct-11 11:15 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Mon, Oct 10, 2016 at 5:02 PM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> From: "Sebastian Pop" <sebpop.llvm at gmail.com> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias Braun" >> <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" <a.skolnik at samsung.com>, >> "Renato Golin" <renato.golin at linaro.org> >> Sent: Monday, October 10, 2016 9:10:01 AM >> Subject: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" >> >> Hi, >> >> I would need some help to fix polybench/symm: >> >> void kernel_symm(int ni, int nj, >> DATA_TYPE alpha, >> DATA_TYPE beta, >> DATA_TYPE POLYBENCH_2D(C,NI,NJ,ni,nj), >> DATA_TYPE POLYBENCH_2D(A,NJ,NJ,nj,nj), >> DATA_TYPE POLYBENCH_2D(B,NI,NJ,ni,nj)) >> { >> int i, j, k; >> DATA_TYPE acc; >> >> /* C := alpha*A*B + beta*C, A is symetric */ >> for (i = 0; i < _PB_NI; i++) >> for (j = 0; j < _PB_NJ; j++) >> { >> acc = 0; >> for (k = 0; k < j - 1; k++) >> { >> C[k][j] += alpha * A[k][i] * B[i][j]; >> acc += B[k][j] * A[k][i]; >> } >> C[i][j] = beta * C[i][j] + alpha * A[i][i] * B[i][j] + alpha >> * acc; >> } >> } >> >> Compiling this kernel with __attribute__((optnone)) and outputing the >> contents of the C[][] array does not match the reference output. > > Why is this? What compiler are you using? Are we not using IEEE FP @ -O0 (e.g. using x87 floating point)? IEEE FP, without FMA, should be completely deterministic. Sounds like a bug.This is with clang top of tree, on a x86_64-linux. I created https://reviews.llvm.org/D25465 with the changes that I have to the symm benchmark.> >> Furthermore, compiling this kernel at -Ofast and comparing against >> -O0 >> only passes for FP_ABSTOLERANCE=10. >> All the 10 other polybench tests that I have transformed to check FP >> are passing at FP_ABSTOLERANCE=1e-5 (and most likely they could pass >> at an even more reduced tolerance.) >> >> The symm benchmark seems to accumulate all the errors as it is a big >> reduction from the first elements of the C[][] array into the last >> elements. >> I'm not sure we can rely on this benchmark to check FP correctness. >> >> One option is to completely specify which optimization flags have >> been >> used to compute the reference output and only use that to compile >> this >> benchmark. >> >> Please share your ideas on how to deal with this particular test. > > If the test is not numerically stable, we can: > > 1. Only test the non-FP-contracted outputYes, this is what I'm doing.> 2. Run the FP-contracted test only for a very small size (so that we'll stay within some reasonable tolerance of the reference output) > 3. Change the matrix to something that will make the test numerically stable (it does not look like the matrix itself matters to the performance; where do the values come from?). >The values may be very large towards the end of the C array. The test now passes with FP_ABSTOLERANCE=1e-5 when lowering the values in the input arrays with this patch: diff --git a/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c b/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c index 0a1bdf3..7fc3cb1 100644 --- a/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c +++ b/SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.c @@ -35,12 +35,12 @@ void init_array(int ni, int nj, *beta = 2123; for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) { - C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni; - B[i][j] = ((DATA_TYPE) i*j) / ni; + C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni; + B[i][j] = ((DATA_TYPE) i-j) / ni; } for (i = 0; i < nj; i++) for (j = 0; j < nj; j++) - A[i][j] = ((DATA_TYPE) i*j) / ni; + A[i][j] = ((DATA_TYPE) i-j) / ni; } Of course we need to update the reference output hash if we decide to use this patch. Sebastian
Maybe Matching Threads
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"