thr3ads.net - llvm dev - [llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on" [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Oct-12 03:20 UTC

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian Paul
Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at
lists.llvm.org>,
> "Matthias Braun" <matze at braunis.de>, "Clang
Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at
arm.com>, "Abe Skolnik"
> <a.skolnik at samsung.com>
> Sent: Tuesday, October 11, 2016 6:33:43 AM
> Subject: Re: [test-suite] making polybench/symm succeed with
"-Ofast" and "-ffp-contract=on"
> 
> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at gmail.com>
> wrote:
> >>  1. Only test the non-FP-contracted output
> >
> > Yes, this is what I'm doing.
> 
> If the whole test is about testing multiplications, what's the point
> of this?
> 
> 
> >>  2. Run the FP-contracted test only for a very small size (so that
> >>  we'll stay within some reasonable tolerance of the reference
> >>  output)
> >>  3. Change the matrix to something that will make the test
> >>  numerically stable (it does not look like the matrix itself
> >>  matters to the performance; where do the values come from?).
> 
> 3 is more sound, 2 may be more practical.
> 
> 
> > -      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni;
> > -      B[i][j] = ((DATA_TYPE) i*j) / ni;
> > +      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni;
> > +      B[i][j] = ((DATA_TYPE) i-j) / ni;
> >      }
> >    for (i = 0; i < nj; i++)
> >      for (j = 0; j < nj; j++)
> > -      A[i][j] = ((DATA_TYPE) i*j) / ni;
> > +      A[i][j] = ((DATA_TYPE) i-j) / ni;
> 
> Changing from multiplication to subtraction changes completely the
> nature of the test and goes towards "return 0;", ie, fiddling
with
> the
> code so that the compiler "behaves" better. This is *not* a
solution.
> 
> Hal,
> 
> For large scale numerical programs, if fp-contract can result in
> large
> scale differences, we need to think about this approach by default.
Obviously a lot of people have done an awful lot of thinking about this over
many years, and contractions-by-default is the reality on many systems. If you
have a program that is numerically unstable, simulating a chaotic system, etc.
then any difference, often no matter how small, will lead to large-scale
differences in the output. As a result, there will be some tests that don't
have a useful tolerance; sometimes these are badly-implemented tests, but
sometimes the sensitivity represents an underling physical reality of a
simulated system (there's a lot of very-interesting mathematical theory
behind this, e.g.
https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).
>From a user-experience perspective, this can be very unfortunate. It can be
hard to understand why compiler optimizations, or different compilers, produce
executables that produce different outputs for identical input configurations.
It contributes to feelings that floating point is hard and confusing. However,
not using the contractions also leads to equally-confusing performance
discrepancies between our compiler and others (and between the observed and
expected performance). We have a classic "Damned if you do, damned if you
don't" situation. However, I lean toward enabling the contractions by
default because other compilers do it (so users need to learn about what's
going on anyway - we can't shield them from this regardless of what we do)
and it gives users the performance they expect (which increases our user base
and makes many users happier).
 -Hal
> 
> If the loop above cannot be contained in an 1e-8 range for double
> values over a large dataset, than I guess the transformation is going
> a bit too far.
> 
> If not, we should be able to come up with a reasonable tolerance that
> makes the test still be relevant.
> 
> cheers,
> --renato
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Sebastian Pop via llvm-dev

2016-Oct-12 03:39 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Tue, Oct 11, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov>
wrote:> ----- Original Message -----
>> From: "Renato Golin" <renato.golin at linaro.org>
>> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Sebastian
Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev
at lists.llvm.org>,
>> "Matthias Braun" <matze at braunis.de>, "Clang
Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at
arm.com>, "Abe Skolnik"
>> <a.skolnik at samsung.com>
>> Sent: Tuesday, October 11, 2016 6:33:43 AM
>> Subject: Re: [test-suite] making polybench/symm succeed with
"-Ofast" and "-ffp-contract=on"
>>
>> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at
gmail.com>
>> wrote:
>> >>  1. Only test the non-FP-contracted output
>> >
>> > Yes, this is what I'm doing.
>>
>> If the whole test is about testing multiplications, what's the
point
>> of this?
>>
>>
>> >>  2. Run the FP-contracted test only for a very small size (so
that
>> >>  we'll stay within some reasonable tolerance of the
reference
>> >>  output)
>> >>  3. Change the matrix to something that will make the test
>> >>  numerically stable (it does not look like the matrix itself
>> >>  matters to the performance; where do the values come from?).
>>
>> 3 is more sound, 2 may be more practical.
>>
>>
>> > -      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni;
>> > -      B[i][j] = ((DATA_TYPE) i*j) / ni;
>> > +      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni;
>> > +      B[i][j] = ((DATA_TYPE) i-j) / ni;
>> >      }
>> >    for (i = 0; i < nj; i++)
>> >      for (j = 0; j < nj; j++)
>> > -      A[i][j] = ((DATA_TYPE) i*j) / ni;
>> > +      A[i][j] = ((DATA_TYPE) i-j) / ni;
>>
>> Changing from multiplication to subtraction changes completely the
>> nature of the test and goes towards "return 0;", ie, fiddling
with
>> the
>> code so that the compiler "behaves" better. This is *not* a
solution.
>>
>> Hal,
>>
>> For large scale numerical programs, if fp-contract can result in
>> large
>> scale differences, we need to think about this approach by default.
>
> Obviously a lot of people have done an awful lot of thinking about this
over many years, and contractions-by-default is the reality on many systems. If
you have a program that is numerically unstable, simulating a chaotic system,
etc. then any difference, often no matter how small, will lead to large-scale
differences in the output. As a result, there will be some tests that don't
have a useful tolerance; sometimes these are badly-implemented tests, but
sometimes the sensitivity represents an underling physical reality of a
simulated system (there's a lot of very-interesting mathematical theory
behind this, e.g.
https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).
>
> From a user-experience perspective, this can be very unfortunate. It can be
hard to understand why compiler optimizations, or different compilers, produce
executables that produce different outputs for identical input configurations.
It contributes to feelings that floating point is hard and confusing. However,
not using the contractions also leads to equally-confusing performance
discrepancies between our compiler and others (and between the observed and
expected performance). We have a classic "Damned if you do, damned if you
don't" situation. However, I lean toward enabling the contractions by
default because other compilers do it (so users need to learn about what's
going on anyway - we can't shield them from this regardless of what we do)
and it gives users the performance they expect (which increases our user base
and makes many users happier).
>
Thanks Hal for the explanations and summary of why we need to fix this
in the compiler and in the test-suite.

For a "non FP expert" like myself, could one of you "FP
experts"
choose from the proposed solutions on how to fix symm, and let me know
what I should implement?
To get polybench/symm out of my todo list, the sooner "FP experts"
make up their mind on what they would like the test-suite to look
like, the better.
;-)

Thanks,
Sebastian

Sebastian Pop via llvm-dev

2016-Oct-12 04:20 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Tue, Oct 11, 2016 at 10:39 PM, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> On Tue, Oct 11, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>> ----- Original Message -----
>>> From: "Renato Golin" <renato.golin at linaro.org>
>>> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
>>> Cc: "Hal Finkel" <hfinkel at anl.gov>,
"Sebastian Paul Pop" <s.pop at samsung.com>,
"llvm-dev" <llvm-dev at lists.llvm.org>,
>>> "Matthias Braun" <matze at braunis.de>, "Clang
Dev" <cfe-dev at lists.llvm.org>, "nd" <nd at
arm.com>, "Abe Skolnik"
>>> <a.skolnik at samsung.com>
>>> Sent: Tuesday, October 11, 2016 6:33:43 AM
>>> Subject: Re: [test-suite] making polybench/symm succeed with
"-Ofast" and "-ffp-contract=on"
>>>
>>> On 11 October 2016 at 12:15, Sebastian Pop <sebpop.llvm at
gmail.com>
>>> wrote:
>>> >>  1. Only test the non-FP-contracted output
>>> >
>>> > Yes, this is what I'm doing.
>>>
>>> If the whole test is about testing multiplications, what's the
point
>>> of this?
>>>
>>>
>>> >>  2. Run the FP-contracted test only for a very small size
(so that
>>> >>  we'll stay within some reasonable tolerance of the
reference
>>> >>  output)
>>> >>  3. Change the matrix to something that will make the test
>>> >>  numerically stable (it does not look like the matrix
itself
>>> >>  matters to the performance; where do the values come
from?).
>>>
>>> 3 is more sound, 2 may be more practical.
>>>
>>>
>>> > -      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i*j) / ni;
>>> > -      B[i][j] = ((DATA_TYPE) i*j) / ni;
>>> > +      C_StrictFP[i][j] = C[i][j] = ((DATA_TYPE) i-j) / ni;
>>> > +      B[i][j] = ((DATA_TYPE) i-j) / ni;
>>> >      }
>>> >    for (i = 0; i < nj; i++)
>>> >      for (j = 0; j < nj; j++)
>>> > -      A[i][j] = ((DATA_TYPE) i*j) / ni;
>>> > +      A[i][j] = ((DATA_TYPE) i-j) / ni;
>>>
>>> Changing from multiplication to subtraction changes completely the
>>> nature of the test and goes towards "return 0;", ie,
fiddling with
>>> the
>>> code so that the compiler "behaves" better. This is *not*
a solution.
It is not uncommon to see in several polybench tests adjustments to
the initial values:

  /*
  LLVM: This change ensures we do not calculate nan values, which are
        formatted differently on different platforms and which may also
        be optimized unexpectedly.
  Original code:
  for (i = 0; i < ni; i++)
    for (j = 0; j < nj; j++) {
      A[i][j] = ((DATA_TYPE) i*j) / ni;
      Q[i][j] = ((DATA_TYPE) i*(j+1)) / nj;
    }
  for (i = 0; i < nj; i++)
    for (j = 0; j < nj; j++)
      R[i][j] = ((DATA_TYPE) i*(j+2)) / nj;
  */
  for (i = 0; i < ni; i++)
    for (j = 0; j < nj; j++) {
      A[i][j] = ((DATA_TYPE) i*j+ni) / ni;
      Q[i][j] = ((DATA_TYPE) i*(j+1)+nj) / nj;
    }
  for (i = 0; i < nj; i++)
    for (j = 0; j < nj; j++)
      R[i][j] = ((DATA_TYPE) i*(j+2)+nj) / nj;

git grepping gives us:

linear-algebra/kernels/cholesky/cholesky.c:  LLVM: This change ensures
we do not calculate nan values, which are
linear-algebra/kernels/cholesky/cholesky.c:      LLVM: This change
ensures we do not calculate nan values, which are
linear-algebra/kernels/cholesky/cholesky.c:      LLVM: This change
ensures we do not calculate nan values, which are
linear-algebra/kernels/trisolv/trisolv.c:  LLVM: This change ensures
we do not calculate nan values, which are
linear-algebra/solvers/gramschmidt/gramschmidt.c:  LLVM: This change
ensures we do not calculate nan values, which are
linear-algebra/solvers/lu/lu.c:  LLVM: This change ensures we do not
calculate nan values, which are

Renato Golin via llvm-dev

2016-Oct-12 08:33 UTC

head link

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 04:20, Hal Finkel <hfinkel at anl.gov>
wrote:> Obviously a lot of people have done an awful lot of thinking about this
over many years, and contractions-by-default is the reality on many systems. If
you have a program that is numerically unstable, simulating a chaotic system,
etc. then any difference, often no matter how small, will lead to large-scale
differences in the output. As a result, there will be some tests that don't
have a useful tolerance; sometimes these are badly-implemented tests, but
sometimes the sensitivity represents an underling physical reality of a
simulated system (there's a lot of very-interesting mathematical theory
behind this, e.g.
https://en.wikipedia.org/wiki/Chaos_theory#Sensitivity_to_initial_coonditions).
Hi Hal,

I think we're crossing the wires, here.

There are three sources of uncertainties on chaotic systems:

1. Initial conditions, not affected by the compiler and "part of the
problem, part of the solution".
2. Evolution, affected by the compiler, not limited by FP-reordering
passes (UB can also play a role here).
3. Expectations, affected by the evolution and the nature of the
problem and too high level to be of any consequence to the compiler.

Initial conditions change in real life, but they must be the same in
tests. Same for evolution and expectation. You can't use an external
random number generator, you can't rely on different RNGs (that's why
I added hand-coded ones to some tests).

If the FP-contract pass affects (2), that's perfectly fine. But if if
affects (3), for example via changing the precision / errors / deltas,
then we have a problem.
>From what I understand, FP-contraction actually makes calculations*more* precise, by removing one rounding operation every two. This
means to me that whatever tolerance of a well designed *test* must be
kept as low as possible.

And this is the key: if the tolerance of a test needs to be
*increased* because of FP-contract, then the test is wrong. Either the
code, or the reference output, or how we get to the reference values
is wrong. Disabling the checks, or increasing the tolerance beyond
what's meaningful in this case will make an irrelevant test useless.
Right now, it may be irrelevant and non-representative, but it can
catch compiler FP errors. Adding a huge bracket or disabling
FP-contract will remove even that small benefit.

Right now, the tests have one value, which happens to be identical in
virtually all platforms. This means the compiler is pretty good at
keeping the semantics and lucky in keeping the same precision. But we
both know this is wrong.

And now we have a chance to make those tests better. Not more accurate
per se, but more accurately testing the compiler. There is a big
difference here.

If we change the semantics of the code (mul -> sub), we're killing the
original test. If we increase the tolerance without analysis or
disable default passes, we're killing any chance to spot compiler
problems in the future.

> From a user-experience perspective, this can be very unfortunate. It can be
hard to understand why compiler optimizations, or different compilers, produce
executables that produce different outputs for identical input configurations.
It contributes to feelings that floating point is hard and confusing.
On the contrary, it serves an an education that FP is a finite
representation of real numbers, and people shouldn't be expecting to
get byte-exact values anyway. I have strong reservations against
scientific code that doesn't take into account rounding issues, error
calculations, and that takes the results at face value. It's like
running one single Monte Carlo simulation and taking international
politics decisions based on that result.

> However, not using the contractions also leads to equally-confusing
performance discrepancies between our compiler and others (and between the
observed and expected performance).
Let's not mix conformance and performance. Different compilers have
different flags and behave differently. Enabling FP-contract in LLVM
has *nothing* to do with what GCC does, but to do with "what's a
better strategy for LLVM". We have refrained from following GCC
blindly for a number of years and it would be really sad if we started
now.

If FP-contract=on is a good decision for LLVM, on merits of precision,
performance and overall quality, then let's do it. If not, then let's
put it under some flag and tell all people comparing with GCC to use
that flag.

But if we do go with it, we need to make sure our current tests don't
just break apart and get hidden under a corner. GCC compatibility
isn't *that* important.

I'm not advocating against turning it on, I'm advocating against the
easy path of hiding the tests. We might just as well remove them.

I'll reply to Sebastian on a more practical way, but I wanted to make
it clear that we're talking about the test and not the transformation
itself, which needs to be analysed on its own merits, not on what GCC
does.

cheers,
--renato

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Oct 2016 - [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

Maybe Matching Threads