thr3ads.net - llvm dev - [llvm-dev] Floating point variance in the test suite [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2021-Jun-23 19:49 UTC

[llvm-dev] Floating point variance in the test suite

Hi,

I would be looking forward to improving the tests. I had to fix the
fpcmp program twice already because of how it backtracked to find the
beginning of a number. If no tolerance is specified, it should be
nearly equal to a binary comparison. fpcmp is always used to compare
program output, even with not tolerance specified:

  if(REFERENCE_OUTPUT)
    set(DIFFPROG ${FPCMP})
    if(FP_TOLERANCE)
      set(DIFFPROG "${DIFFPROG} -r ${FP_TOLERANCE}")
    endif()
    if(FP_ABSTOLERANCE)
      set(DIFFPROG "${DIFFPROG} -a ${FP_ABSTOLERANCE}")
    endif()
    llvm_test_verify(WORKDIR ${CMAKE_CURRENT_BINARY_DIR}
      ${DIFFPROG} %o ${REFERENCE_OUTPUT}
    )

That is, the only issue should be the misleading message return by
fpcmp. I was thinking adding a `-b` for strict binary switch to fpmcp,
but I don't think that false positive due to `0` and `0.0` being
considered equal being that much of a problem.

Add me as a reviewer if you are going to fix these.

Michael



Am Mi., 23. Juni 2021 um 12:57 Uhr schrieb Kaylor, Andrew via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Hi everyone,
>
>
>
> I’d like to restart an old discussion about how to handle floating point
variance in the LLVM test suite. This was brought up years ago by Sebastian Pop
(https://lists.llvm.org/pipermail/llvm-dev/2016-October/105730.html) and
Sebastian did some work to fix things at that time, but I’ve discovered recently
that a lot of things aren’t working the way they are supposed to, and some
fundamental problems were never addressed.
>
>
>
> The basic issue is that at least some tests fail if the floating point
calculations don’t exactly match the reference results. In 2016, Sebastian
attempted to modify the Polybench tests to allow some tolerance, but that
attempt was not entirely successful. Other tests don’t seem to have a way to
handle this. I don’t know how many.
>
>
>
> Melanie Blower has been trying for some time now to commit a change that
would make the fp-contract=on the default setting for clang (as the
documentation says it is), but this change caused failures in the test suite.
Melanie recently committed a change
(https://reviews.llvm.org/rT24550c3385e8e3703ed364e1ce20b06de97bbeee) which
overrides the default and sets fp-contract=off for the failing tests, but this
is not a good long term solution, as it leaves fp contraction untested (though
apparently it has been for quite some time).
>
>
>
> The test suite attempts to handle floating point tests in several different
ways (depending on the test configuration):
>
>
>
> 1. Tests write floating point results to a text file and the fpcmp utility
is used to compare them against reference output. The method allows absolute and
relative tolerance to be specified, I think.
>
>
>
> 2. Tests write floating point results to a text file which is then hashed
and compared against a reference hash value.
>
>
>
> 3. (Sebastian’s 2016 change) Tests are run with two kernels, one which is
compiled with FMA explicitly disabled and one with the test suite configured
options. The results of these kernels are compared with a specified tolerance,
then the FMA-disabled results are written to a text file, hashed and compared to
a reference output.
>
>
>
> I’ve discovered a few problems with this.
>
>
>
> First, many of the tests are producing hashed results but using fpcmp to
compare the hash values. I created https://bugs.llvm.org/show_bug.cgi?id=50818
to track this problem. I don’t know the configuration well enough to fix it, but
it seems like it should be a simple problem. If the hash values match, it works
(for the wrong reasons). It doesn’t allow any FP tolerance, but that seems to be
expected (hashing and FP tolerance cannot be combined in the configuration
files). Personally, I don’t like the use of hashing with floating point results,
so I’d like to get rid of method 2.
>
>
>
> Second, when the third method is used FMA is being disabled using the STDC
FP_CONTRACT pragma. Currently, LLVM does not respect this pragma when
fp-contract=fast is used. This seems to be accepted behavior, but in my opinion
it’s obviously wrong. A new setting was added recently for
fp-contract=fast-honor-pragmas. I would very much like to see this work the
other way -- by default fp-contract=fast should honor the pragmas, and if
someone needs a setting that doesn’t that can be added. In any event, the
relevant information here is that Sebastian’s FMA disabling solution doesn’t
work for fp-contract=fast. Both kernels are compiled with FMA enabled, so their
results match, but the test fails the hash comparison because the “FMA disabled”
kernel really had FMA enabled.
>
>
>
> Third, when the third method is used, it’s checking the intermediate
results using “FP_ABSTOLERANCE” and in some cases FMA exceeds the tolerance
currently configured. For example, in the Polybench symm test I got this output:
>
>
>
> A[5][911] = 85644607039.746628 and B[5][911] = 85644607039.746643 differ
more than FP_ABSTOLERANCE = 0.000010
>
>
>
> The difference there looks pretty reasonable to me, but because we’re
looking for a minimal absolute difference, the test failed. Incidentally, the
test failed with a message that said “fpcmp-target: FP Comparison failed, not a
numeric difference between '0' and 'b'” because the above output
got hashed and compared to a reference hash value using the tool that expected
both hash values to be floating point values. The LNT bots don’t even give you
this much information though, as far as I can tell, they just tell you the test
failed. But I digress.
>
>
>
> Finally, a few of the Polybench tests are intending to use the third method
above but aren’t actually calling the “strict” kernel so they fail with
fp-contract=on.
>
>
>
> So, that’s what I know. There is an additional problem that has never been
addressed regarding running the test suite with fast-math enabled.
>
>
>
> Now I suppose I should say what kind of feedback I’m looking for.
>
>
>
> I guess the first thing I want is to find out who is interested in the
results and behavior of these tests. I assume there are people who care, but
they get touched so infrequently and are in such a bad state that I don’t know
if anyone is even paying attention beyond trying to keep the bots green. I don’t
want to spend a lot of time cleaning up tests that aren’t useful to anyone just
because they happen to be in the test suite.
>
>
>
> Beyond that, I’d like to get general feedback on the strategy that we
should be adopting in the test suite. In the earlier discussion of this issue,
the consensus seemed to be that the problems should be addressed on a
case-by-case basis, doing the “right thing” for each test. This kind of implies
a level of test ownership. I would like input for each test from someone who
cares about that test.
>
>
>
> Finally, I’d like to hear some general opinions on whether the tests we
have now are useful and sufficient for floating point behavior.
>
>
>
> Thanks,
>
> Andy
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kaylor, Andrew via llvm-dev

2021-Jun-24 15:05 UTC

head link

[llvm-dev] Floating point variance in the test suite

> fpcmp is always used to compare program output, even with not tolerance
specified
Are you saying fpcmp is used even when the test produces integer or non-numeric
output that has to be compared?

-----Original Message-----
From: Michael Kruse <llvmdev at meinersbur.de> 
Sent: Wednesday, June 23, 2021 3:49 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Floating point variance in the test suite

Hi,

I would be looking forward to improving the tests. I had to fix the fpcmp
program twice already because of how it backtracked to find the beginning of a
number. If no tolerance is specified, it should be nearly equal to a binary
comparison. fpcmp is always used to compare program output, even with not
tolerance specified:

  if(REFERENCE_OUTPUT)
    set(DIFFPROG ${FPCMP})
    if(FP_TOLERANCE)
      set(DIFFPROG "${DIFFPROG} -r ${FP_TOLERANCE}")
    endif()
    if(FP_ABSTOLERANCE)
      set(DIFFPROG "${DIFFPROG} -a ${FP_ABSTOLERANCE}")
    endif()
    llvm_test_verify(WORKDIR ${CMAKE_CURRENT_BINARY_DIR}
      ${DIFFPROG} %o ${REFERENCE_OUTPUT}
    )

That is, the only issue should be the misleading message return by fpcmp. I was
thinking adding a `-b` for strict binary switch to fpmcp, but I don't think
that false positive due to `0` and `0.0` being considered equal being that much
of a problem.

Add me as a reviewer if you are going to fix these.

Michael



Am Mi., 23. Juni 2021 um 12:57 Uhr schrieb Kaylor, Andrew via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Hi everyone,
>
>
>
> I’d like to restart an old discussion about how to handle floating point
variance in the LLVM test suite. This was brought up years ago by Sebastian Pop
(https://lists.llvm.org/pipermail/llvm-dev/2016-October/105730.html) and
Sebastian did some work to fix things at that time, but I’ve discovered recently
that a lot of things aren’t working the way they are supposed to, and some
fundamental problems were never addressed.
>
>
>
> The basic issue is that at least some tests fail if the floating point
calculations don’t exactly match the reference results. In 2016, Sebastian
attempted to modify the Polybench tests to allow some tolerance, but that
attempt was not entirely successful. Other tests don’t seem to have a way to
handle this. I don’t know how many.
>
>
>
> Melanie Blower has been trying for some time now to commit a change that
would make the fp-contract=on the default setting for clang (as the
documentation says it is), but this change caused failures in the test suite.
Melanie recently committed a change
(https://reviews.llvm.org/rT24550c3385e8e3703ed364e1ce20b06de97bbeee) which
overrides the default and sets fp-contract=off for the failing tests, but this
is not a good long term solution, as it leaves fp contraction untested (though
apparently it has been for quite some time).
>
>
>
> The test suite attempts to handle floating point tests in several different
ways (depending on the test configuration):
>
>
>
> 1. Tests write floating point results to a text file and the fpcmp utility
is used to compare them against reference output. The method allows absolute and
relative tolerance to be specified, I think.
>
>
>
> 2. Tests write floating point results to a text file which is then hashed
and compared against a reference hash value.
>
>
>
> 3. (Sebastian’s 2016 change) Tests are run with two kernels, one which is
compiled with FMA explicitly disabled and one with the test suite configured
options. The results of these kernels are compared with a specified tolerance,
then the FMA-disabled results are written to a text file, hashed and compared to
a reference output.
>
>
>
> I’ve discovered a few problems with this.
>
>
>
> First, many of the tests are producing hashed results but using fpcmp to
compare the hash values. I created https://bugs.llvm.org/show_bug.cgi?id=50818
to track this problem. I don’t know the configuration well enough to fix it, but
it seems like it should be a simple problem. If the hash values match, it works
(for the wrong reasons). It doesn’t allow any FP tolerance, but that seems to be
expected (hashing and FP tolerance cannot be combined in the configuration
files). Personally, I don’t like the use of hashing with floating point results,
so I’d like to get rid of method 2.
>
>
>
> Second, when the third method is used FMA is being disabled using the STDC
FP_CONTRACT pragma. Currently, LLVM does not respect this pragma when
fp-contract=fast is used. This seems to be accepted behavior, but in my opinion
it’s obviously wrong. A new setting was added recently for
fp-contract=fast-honor-pragmas. I would very much like to see this work the
other way -- by default fp-contract=fast should honor the pragmas, and if
someone needs a setting that doesn’t that can be added. In any event, the
relevant information here is that Sebastian’s FMA disabling solution doesn’t
work for fp-contract=fast. Both kernels are compiled with FMA enabled, so their
results match, but the test fails the hash comparison because the “FMA disabled”
kernel really had FMA enabled.
>
>
>
> Third, when the third method is used, it’s checking the intermediate
results using “FP_ABSTOLERANCE” and in some cases FMA exceeds the tolerance
currently configured. For example, in the Polybench symm test I got this output:
>
>
>
> A[5][911] = 85644607039.746628 and B[5][911] = 85644607039.746643 
> differ more than FP_ABSTOLERANCE = 0.000010
>
>
>
> The difference there looks pretty reasonable to me, but because we’re
looking for a minimal absolute difference, the test failed. Incidentally, the
test failed with a message that said “fpcmp-target: FP Comparison failed, not a
numeric difference between '0' and 'b'” because the above output
got hashed and compared to a reference hash value using the tool that expected
both hash values to be floating point values. The LNT bots don’t even give you
this much information though, as far as I can tell, they just tell you the test
failed. But I digress.
>
>
>
> Finally, a few of the Polybench tests are intending to use the third method
above but aren’t actually calling the “strict” kernel so they fail with
fp-contract=on.
>
>
>
> So, that’s what I know. There is an additional problem that has never been
addressed regarding running the test suite with fast-math enabled.
>
>
>
> Now I suppose I should say what kind of feedback I’m looking for.
>
>
>
> I guess the first thing I want is to find out who is interested in the
results and behavior of these tests. I assume there are people who care, but
they get touched so infrequently and are in such a bad state that I don’t know
if anyone is even paying attention beyond trying to keep the bots green. I don’t
want to spend a lot of time cleaning up tests that aren’t useful to anyone just
because they happen to be in the test suite.
>
>
>
> Beyond that, I’d like to get general feedback on the strategy that we
should be adopting in the test suite. In the earlier discussion of this issue,
the consensus seemed to be that the problems should be addressed on a
case-by-case basis, doing the “right thing” for each test. This kind of implies
a level of test ownership. I would like input for each test from someone who
cares about that test.
>
>
>
> Finally, I’d like to hear some general opinions on whether the tests we
have now are useful and sufficient for floating point behavior.
>
>
>
> Thanks,
>
> Andy
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jun 2021 - Floating point variance in the test suite

[llvm-dev] Floating point variance in the test suite

[llvm-dev] Floating point variance in the test suite