Hal Finkel via llvm-dev
2016-Oct-12 03:20 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org> > To: "Sebastian Pop" <sebpop.llvm at gmail.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, > "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" > <a.skolnik at samsung.com> > Sent: Tuesday, October 11, 2016 6:33:43 AM > Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" > > On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at gmail.com> > wrote: > >> 1. Only test the non-FP-contracted output > > > > Yes, this is what I'm doing. > > If the whole test is about testing multiplications, what's the point > of this? > > > >> 2. Run the FP-contracted test only for a very small size (so that > >> we'll stay within some reasonable tolerance of the reference > >> output) > >> 3. Change the matrix to something that will make the test > >> numerically stable (it does not look like the matrix itself > >> matters to the performance; where do the values come from?). > > 3 is more sound, 2 may be more practical. > > > > - C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni; > > - B[i][j] = ((DATA_TYPE) i*j) / ni; > > + C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni; > > + B[i][j] = ((DATA_TYPE) i-j) / ni; > > } > > for (i = 0; i < nj; i++) > > for (j = 0; j < nj; j++) > > - A[i][j] = ((DATA_TYPE) i*j) / ni; > > + A[i][j] = ((DATA_TYPE) i-j) / ni; > > Changing from multiplication to subtraction changes completely the > nature of the test and goes towards "return 0;", ie, fiddling with > the > code so that the compiler "behaves" better. This is *not* a solution. > > Hal, > > For large scale numerical programs, if fp-contract can result in > large > scale differences, we need to think about this approach by default.Obviously a lot of people have done an awful lot of thinking about this over many years, and contractions-by-default is the reality on many systems. If you have a program that is numerically unstable, simulating a chaotic system, etc. then any difference, often no matter how small, will lead to large-scale differences in the output. As a result, there will be some tests that don't have a useful tolerance; sometimes these are badly-implemented tests, but sometimes the sensitivity represents an underling physical reality of a simulated system (there's a lot of very-interesting mathematical theory behind this, e.g. https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).>From a user-experience perspective, this can be very unfortunate. It can be hard to understand why compiler optimizations, or different compilers, produce executables that produce different outputs for identical input configurations. It contributes to feelings that floating point is hard and confusing. However, not using the contractions also leads to equally-confusing performance discrepancies between our compiler and others (and between the observed and expected performance). We have a classic "Damned if you do, damned if you don't" situation. However, I lean toward enabling the contractions by default because other compilers do it (so users need to learn about what's going on anyway - we can't shield them from this regardless of what we do) and it gives users the performance they expect (which increases our user base and makes many users happier).-Hal> > If the loop above cannot be contained in an 1e-8 range for double > values over a large dataset, than I guess the transformation is going > a bit too far. > > If not, we should be able to come up with a reasonable tolerance that > makes the test still be relevant. > > cheers, > --renato >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Sebastian Pop via llvm-dev
2016-Oct-12 03:39 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Tue, Oct 11, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> From: "Renato Golin" <renato.golin at linaro.org> >> To: "Sebastian Pop" <sebpop.llvm at gmail.com> >> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, >> "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" >> <a.skolnik at samsung.com> >> Sent: Tuesday, October 11, 2016 6:33:43 AM >> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" >> >> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at gmail.com> >> wrote: >> >> 1. Only test the non-FP-contracted output >> > >> > Yes, this is what I'm doing. >> >> If the whole test is about testing multiplications, what's the point >> of this? >> >> >> >> 2. Run the FP-contracted test only for a very small size (so that >> >> we'll stay within some reasonable tolerance of the reference >> >> output) >> >> 3. Change the matrix to something that will make the test >> >> numerically stable (it does not look like the matrix itself >> >> matters to the performance; where do the values come from?). >> >> 3 is more sound, 2 may be more practical. >> >> >> > - C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni; >> > - B[i][j] = ((DATA_TYPE) i*j) / ni; >> > + C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni; >> > + B[i][j] = ((DATA_TYPE) i-j) / ni; >> > } >> > for (i = 0; i < nj; i++) >> > for (j = 0; j < nj; j++) >> > - A[i][j] = ((DATA_TYPE) i*j) / ni; >> > + A[i][j] = ((DATA_TYPE) i-j) / ni; >> >> Changing from multiplication to subtraction changes completely the >> nature of the test and goes towards "return 0;", ie, fiddling with >> the >> code so that the compiler "behaves" better. This is *not* a solution. >> >> Hal, >> >> For large scale numerical programs, if fp-contract can result in >> large >> scale differences, we need to think about this approach by default. > > Obviously a lot of people have done an awful lot of thinking about this over many years, and contractions-by-default is the reality on many systems. If you have a program that is numerically unstable, simulating a chaotic system, etc. then any difference, often no matter how small, will lead to large-scale differences in the output. As a result, there will be some tests that don't have a useful tolerance; sometimes these are badly-implemented tests, but sometimes the sensitivity represents an underling physical reality of a simulated system (there's a lot of very-interesting mathematical theory behind this, e.g. https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions). > > From a user-experience perspective, this can be very unfortunate. It can be hard to understand why compiler optimizations, or different compilers, produce executables that produce different outputs for identical input configurations. It contributes to feelings that floating point is hard and confusing. However, not using the contractions also leads to equally-confusing performance discrepancies between our compiler and others (and between the observed and expected performance). We have a classic "Damned if you do, damned if you don't" situation. However, I lean toward enabling the contractions by default because other compilers do it (so users need to learn about what's going on anyway - we can't shield them from this regardless of what we do) and it gives users the performance they expect (which increases our user base and makes many users happier). >Thanks Hal for the explanations and summary of why we need to fix this in the compiler and in the test-suite. For a "non FP expert" like myself, could one of you "FP experts" choose from the proposed solutions on how to fix symm, and let me know what I should implement? To get polybench/symm out of my todo list, the sooner "FP experts" make up their mind on what they would like the test-suite to look like, the better. ;-) Thanks, Sebastian
Sebastian Pop via llvm-dev
2016-Oct-12 04:20 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Tue, Oct 11, 2016 at 10:39 PM, Sebastian Pop <sebpop.llvm at gmail.com> wrote:> On Tue, Oct 11, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> ----- Original Message ----- >>> From: "Renato Golin" <renato.golin at linaro.org> >>> To: "Sebastian Pop" <sebpop.llvm at gmail.com> >>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, >>> "Matthias Braun" <matze at braunis.de>, "Clang Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at arm.com>, "Abe Skolnik" >>> <a.skolnik at samsung.com> >>> Sent: Tuesday, October 11, 2016 6:33:43 AM >>> Subject: Re: [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" >>> >>> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at gmail.com> >>> wrote: >>> >> 1. Only test the non-FP-contracted output >>> > >>> > Yes, this is what I'm doing. >>> >>> If the whole test is about testing multiplications, what's the point >>> of this? >>> >>> >>> >> 2. Run the FP-contracted test only for a very small size (so that >>> >> we'll stay within some reasonable tolerance of the reference >>> >> output) >>> >> 3. Change the matrix to something that will make the test >>> >> numerically stable (it does not look like the matrix itself >>> >> matters to the performance; where do the values come from?). >>> >>> 3 is more sound, 2 may be more practical. >>> >>> >>> > - C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni; >>> > - B[i][j] = ((DATA_TYPE) i*j) / ni; >>> > + C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni; >>> > + B[i][j] = ((DATA_TYPE) i-j) / ni; >>> > } >>> > for (i = 0; i < nj; i++) >>> > for (j = 0; j < nj; j++) >>> > - A[i][j] = ((DATA_TYPE) i*j) / ni; >>> > + A[i][j] = ((DATA_TYPE) i-j) / ni; >>> >>> Changing from multiplication to subtraction changes completely the >>> nature of the test and goes towards "return 0;", ie, fiddling with >>> the >>> code so that the compiler "behaves" better. This is *not* a solution.It is not uncommon to see in several polybench tests adjustments to the initial values: /* LLVM: This change ensures we do not calculate nan values, which are formatted differently on different platforms and which may also be optimized unexpectedly. Original code: for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) { A[i][j] = ((DATA_TYPE) i*j) / ni; Q[i][j] = ((DATA_TYPE) i*(j+1)) / nj; } for (i = 0; i < nj; i++) for (j = 0; j < nj; j++) R[i][j] = ((DATA_TYPE) i*(j+2)) / nj; */ for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) { A[i][j] = ((DATA_TYPE) i*j+ni) / ni; Q[i][j] = ((DATA_TYPE) i*(j+1)+nj) / nj; } for (i = 0; i < nj; i++) for (j = 0; j < nj; j++) R[i][j] = ((DATA_TYPE) i*(j+2)+nj) / nj; git grepping gives us: linear-algebra/kernels/cholesky/cholesky.c: LLVM: This change ensures we do not calculate nan values, which are linear-algebra/kernels/cholesky/cholesky.c: LLVM: This change ensures we do not calculate nan values, which are linear-algebra/kernels/cholesky/cholesky.c: LLVM: This change ensures we do not calculate nan values, which are linear-algebra/kernels/trisolv/trisolv.c: LLVM: This change ensures we do not calculate nan values, which are linear-algebra/solvers/gramschmidt/gramschmidt.c: LLVM: This change ensures we do not calculate nan values, which are linear-algebra/solvers/lu/lu.c: LLVM: This change ensures we do not calculate nan values, which are
Renato Golin via llvm-dev
2016-Oct-12 08:33 UTC
[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 04:20, Hal Finkel <hfinkel at anl.gov> wrote:> Obviously a lot of people have done an awful lot of thinking about this over many years, and contractions-by-default is the reality on many systems. If you have a program that is numerically unstable, simulating a chaotic system, etc. then any difference, often no matter how small, will lead to large-scale differences in the output. As a result, there will be some tests that don't have a useful tolerance; sometimes these are badly-implemented tests, but sometimes the sensitivity represents an underling physical reality of a simulated system (there's a lot of very-interesting mathematical theory behind this, e.g. https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).Hi Hal, I think we're crossing the wires, here. There are three sources of uncertainties on chaotic systems: 1. Initial conditions, not affected by the compiler and "part of the problem, part of the solution". 2. Evolution, affected by the compiler, not limited by FP-reordering passes (UB can also play a role here). 3. Expectations, affected by the evolution and the nature of the problem and too high level to be of any consequence to the compiler. Initial conditions change in real life, but they must be the same in tests. Same for evolution and expectation. You can't use an external random number generator, you can't rely on different RNGs (that's why I added hand-coded ones to some tests). If the FP-contract pass affects (2), that's perfectly fine. But if if affects (3), for example via changing the precision / errors / deltas, then we have a problem.>From what I understand, FP-contraction actually makes calculations*more* precise, by removing one rounding operation every two. This means to me that whatever tolerance of a well designed *test* must be kept as low as possible. And this is the key: if the tolerance of a test needs to be *increased* because of FP-contract, then the test is wrong. Either the code, or the reference output, or how we get to the reference values is wrong. Disabling the checks, or increasing the tolerance beyond what's meaningful in this case will make an irrelevant test useless. Right now, it may be irrelevant and non-representative, but it can catch compiler FP errors. Adding a huge bracket or disabling FP-contract will remove even that small benefit. Right now, the tests have one value, which happens to be identical in virtually all platforms. This means the compiler is pretty good at keeping the semantics and lucky in keeping the same precision. But we both know this is wrong. And now we have a chance to make those tests better. Not more accurate per se, but more accurately testing the compiler. There is a big difference here. If we change the semantics of the code (mul -> sub), we're killing the original test. If we increase the tolerance without analysis or disable default passes, we're killing any chance to spot compiler problems in the future.> From a user-experience perspective, this can be very unfortunate. It can be hard to understand why compiler optimizations, or different compilers, produce executables that produce different outputs for identical input configurations. It contributes to feelings that floating point is hard and confusing.On the contrary, it serves an an education that FP is a finite representation of real numbers, and people shouldn't be expecting to get byte-exact values anyway. I have strong reservations against scientific code that doesn't take into account rounding issues, error calculations, and that takes the results at face value. It's like running one single Monte Carlo simulation and taking international politics decisions based on that result.> However, not using the contractions also leads to equally-confusing performance discrepancies between our compiler and others (and between the observed and expected performance).Let's not mix conformance and performance. Different compilers have different flags and behave differently. Enabling FP-contract in LLVM has *nothing* to do with what GCC does, but to do with "what's a better strategy for LLVM". We have refrained from following GCC blindly for a number of years and it would be really sad if we started now. If FP-contract=on is a good decision for LLVM, on merits of precision, performance and overall quality, then let's do it. If not, then let's put it under some flag and tell all people comparing with GCC to use that flag. But if we do go with it, we need to make sure our current tests don't just break apart and get hidden under a corner. GCC compatibility isn't *that* important. I'm not advocating against turning it on, I'm advocating against the easy path of hiding the tests. We might just as well remove them. I'll reply to Sebastian on a more practical way, but I wanted to make it clear that we're talking about the test and not the transformation itself, which needs to be analysed on its own merits, not on what GCC does. cheers, --renato
Maybe Matching Threads
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"