thr3ads.net - llvm dev - [llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on" [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Sebastian Pop via llvm-dev

2016-Oct-08 00:34 UTC

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

Hi,

I would like to provide a summary of the different proposals on how to
fix the test-suite to make it succeed when specifying extra CFLAGS
"-Ofast" and "-ffp-contract=on".  I would like to expose the
issue and
proposed ways to fix it to other potential reviewers that could
provide extra feedback.  We also need to decide which proposal (or
combination of) to implement and commit.

Proposal 1: https://reviews.llvm.org/D25277
modify the CMakes to compile and run each of these benchmarks twice:
once with added CFLAGS -ffp-contract=off.  Record on disk the full
output of both runs and compare with FP_TOLERANCE.  Hash the output of
the run with -ffp-contract=off and exact match against the reference
output.

The good for Proposal 1:
- changes contained in the build system: no change to the code of the benchmarks
- runs benchmarks under an extra configuration with CFLAGS += -ffp-contract=off

The bad for Proposal 1:
- compilation time will double
- running time on the device will double
- build system is more complex
- the build directory goes from 300M to 1.2G due to the extra
reference outputs recorded under -ffp-contract=off,
- when running test-suite over small devices it will cost 1G more
transfer over the network.

Proposal 2: https://reviews.llvm.org/D25346
like Proposal 1, except that there are no files written to disk
(transferred over the network from the device to the host that does
the fpcmp and hashing), the outputs of both normal compilation and the
kernel compiled under "#pragma STDC FP_CONTRACT OFF" are computed and
compared on the device running the benchmark.  The output of
-ffp-contract=off is written to disk, and as currently done in the
test-suite, the output is hashed and exactly matched against the
reference output.

The good for Proposal 2:
- no modifications to CMake and Makefiles
- no extra space to store the extra reference output
- tests both user CFLAGS specified mode and fast-math and fp-contraction=off.

The bad for Proposal 2:
- compilation time will double: e.g., Polly will optimize both kernels,
- memory requirements on the device will almost double: added one
extra output array, input arrays are not modified, so no need to
duplicate them,
- compute time on the device will more than double: running the kernel
twice, plus an extra loop over both outputs to compare with
FP_TOLERANCE.
- requires modifications to the code of the benchmarks: some
benchmarks may not be easily modified and will need to be only run
under -ffp-contract=off (as in Proposal 3.)

Proposal 3: https://reviews.llvm.org/D25351
modify the Makefiles and CMakes to explicitly specify the flags under
which the results will match the recorded reference output.

The good for Proposal 3:
- no modifications to the benchmarks
- minimal modifications to the build system

The bad for Proposal 3:
- these benchmarks will not be tested with -ffp-contract=on: exact
matching of the reference output requires -ffp-contract=off
- adding more tests (as in Proposals 1 and 2) is actually a good thing
for the test-suite

I would like to invite other people to review the above proposals and
suggest a way forward on fixing the current state of the test-suite
when running under CFLAGS="-Ofast" and "-ffp-contract=on."
Once
consensus is achieved, I am willing to implement and follow up with
addressing all reviews necessary to commit the change to the
test-suite.

Thank you,
Sebastian

Hal Finkel via llvm-dev

2016-Oct-08 00:56 UTC

head link

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

----- Original Message -----> From: "Sebastian Pop" <sebpop.llvm at gmail.com>
> To: "Renato Golin" <renato.golin at linaro.org>
> Cc: "Kristof Beyls" <Kristof.Beyls at arm.com>,
"Sebastian Paul Pop" <s.pop at samsung.com>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>, "nd" <nd at arm.com>,
"Abe Skolnik" <a.skolnik at samsung.com>, "Clang Dev"
> <cfe-dev at lists.llvm.org>, "Hal Finkel" <hfinkel at
anl.gov>, "Stephen Canon" <scanon at apple.com>,
"Matthias Braun"
> <matze at braunis.de>
> Sent: Friday, October 7, 2016 7:34:40 PM
> Subject: [test-suite] making the test-suite succeed with "-Ofast"
and "-ffp-contract=on"
> 
> Hi,
> 
> I would like to provide a summary of the different proposals on how
> to
> fix the test-suite to make it succeed when specifying extra CFLAGS
> "-Ofast" and "-ffp-contract=on".  I would like to
expose the issue
> and
> proposed ways to fix it to other potential reviewers that could
> provide extra feedback.  We also need to decide which proposal (or
> combination of) to implement and commit.
> 
> Proposal 1: https://reviews.llvm.org/D25277
> modify the CMakes to compile and run each of these benchmarks twice:
> once with added CFLAGS -ffp-contract=off.  Record on disk the full
> output of both runs and compare with FP_TOLERANCE.  Hash the output
> of
> the run with -ffp-contract=off and exact match against the reference
> output.
> 
> The good for Proposal 1:
> - changes contained in the build system: no change to the code of the
> benchmarks
> - runs benchmarks under an extra configuration with CFLAGS +>
-ffp-contract=off
> 
> The bad for Proposal 1:
> - compilation time will double
> - running time on the device will double
> - build system is more complex
> - the build directory goes from 300M to 1.2G due to the extra
> reference outputs recorded under -ffp-contract=off,
> - when running test-suite over small devices it will cost 1G more
> transfer over the network.
I prefer proposal 1 (although, to be fair, it was something I suggested). Being
the the business of trying to heavily modify every benchmark that does
floating-point computation, as in proposal 2, does not seem to scale well, and
can't always be done regardless.

We can make some effort to reduce the size of the problems being computed by
some of the benchmarks (e.g. pollybench); I think that is reasonable and will
help with the extra space requirements. That having been said, functionally
speaking, our test suite is at least an order of magnitude too small, and so my
sympathy is somewhat limited. We're going to have to find a way to execute
the test suite in stages on smaller devices to limit the peak usage, if not
because of this then because we've added a lot more test applications and
benchmarks in the future.

 -Hal
> 
> Proposal 2: https://reviews.llvm.org/D25346
> like Proposal 1, except that there are no files written to disk
> (transferred over the network from the device to the host that does
> the fpcmp and hashing), the outputs of both normal compilation and
> the
> kernel compiled under "#pragma STDC FP_CONTRACT OFF" are computed
and
> compared on the device running the benchmark.  The output of
> -ffp-contract=off is written to disk, and as currently done in the
> test-suite, the output is hashed and exactly matched against the
> reference output.
> 
> The good for Proposal 2:
> - no modifications to CMake and Makefiles
> - no extra space to store the extra reference output
> - tests both user CFLAGS specified mode and fast-math and
> fp-contraction=off.
> 
> The bad for Proposal 2:
> - compilation time will double: e.g., Polly will optimize both
> kernels,
> - memory requirements on the device will almost double: added one
> extra output array, input arrays are not modified, so no need to
> duplicate them,
> - compute time on the device will more than double: running the
> kernel
> twice, plus an extra loop over both outputs to compare with
> FP_TOLERANCE.
> - requires modifications to the code of the benchmarks: some
> benchmarks may not be easily modified and will need to be only run
> under -ffp-contract=off (as in Proposal 3.)
> 
> Proposal 3: https://reviews.llvm.org/D25351
> modify the Makefiles and CMakes to explicitly specify the flags under
> which the results will match the recorded reference output.
> 
> The good for Proposal 3:
> - no modifications to the benchmarks
> - minimal modifications to the build system
> 
> The bad for Proposal 3:
> - these benchmarks will not be tested with -ffp-contract=on: exact
> matching of the reference output requires -ffp-contract=off
> - adding more tests (as in Proposals 1 and 2) is actually a good
> thing
> for the test-suite
> 
> I would like to invite other people to review the above proposals and
> suggest a way forward on fixing the current state of the test-suite
> when running under CFLAGS="-Ofast" and
"-ffp-contract=on." Once
> consensus is achieved, I am willing to implement and follow up with
> addressing all reviews necessary to commit the change to the
> test-suite.
> 
> Thank you,
> Sebastian
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Matthias Braun via llvm-dev

2016-Oct-08 01:28 UTC

head link

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

> On Oct 7, 2016, at 5:56 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> ----- Original Message -----
>> From: "Sebastian Pop" <sebpop.llvm at gmail.com
<mailto:sebpop.llvm at gmail.com>>
>> To: "Renato Golin" <renato.golin at linaro.org
<mailto:renato.golin at linaro.org>>
>> Cc: "Kristof Beyls" <Kristof.Beyls at arm.com
<mailto:Kristof.Beyls at arm.com>>, "Sebastian Paul Pop"
<s.pop at samsung.com <mailto:s.pop at samsung.com>>,
"llvm-dev"
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>, "nd" <nd at arm.com <mailto:nd at
arm.com>>, "Abe Skolnik" <a.skolnik at samsung.com
<mailto:a.skolnik at samsung.com>>, "Clang Dev"
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>>, "Hal Finkel" <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>>, "Stephen Canon" <scanon at
apple.com <mailto:scanon at apple.com>>, "Matthias Braun"
>> <matze at braunis.de <mailto:matze at braunis.de>>
>> Sent: Friday, October 7, 2016 7:34:40 PM
>> Subject: [test-suite] making the test-suite succeed with
"-Ofast" and "-ffp-contract=on"
>> 
>> Hi,
>> 
>> I would like to provide a summary of the different proposals on how
>> to
>> fix the test-suite to make it succeed when specifying extra CFLAGS
>> "-Ofast" and "-ffp-contract=on".  I would like to
expose the issue
>> and
>> proposed ways to fix it to other potential reviewers that could
>> provide extra feedback.  We also need to decide which proposal (or
>> combination of) to implement and commit.
>> 
>> Proposal 1: https://reviews.llvm.org/D25277
>> modify the CMakes to compile and run each of these benchmarks twice:
>> once with added CFLAGS -ffp-contract=off.  Record on disk the full
>> output of both runs and compare with FP_TOLERANCE.  Hash the output
>> of
>> the run with -ffp-contract=off and exact match against the reference
>> output.
>> 
>> The good for Proposal 1:
>> - changes contained in the build system: no change to the code of the
>> benchmarks
>> - runs benchmarks under an extra configuration with CFLAGS +>>
-ffp-contract=off
>> 
>> The bad for Proposal 1:
>> - compilation time will double
>> - running time on the device will double
>> - build system is more complex
>> - the build directory goes from 300M to 1.2G due to the extra
>> reference outputs recorded under -ffp-contract=off,
>> - when running test-suite over small devices it will cost 1G more
>> transfer over the network.
> 
> I prefer proposal 1 (although, to be fair, it was something I suggested).
Being the the business of trying to heavily modify every benchmark that does
floating-point computation, as in proposal 2, does not seem to scale well, and
can't always be done regardless.
> 
> We can make some effort to reduce the size of the problems being computed
by some of the benchmarks (e.g. pollybench); I think that is reasonable and will
help with the extra space requirements. That having been said, functionally
speaking, our test suite is at least an order of magnitude too small, and so my
sympathy is somewhat limited. We're going to have to find a way to execute
the test suite in stages on smaller devices to limit the peak usage, if not
because of this then because we've added a lot more test applications and
benchmarks in the future.
> 
> -Hal
> 
>> 
>> Proposal 2: https://reviews.llvm.org/D25346
>> like Proposal 1, except that there are no files written to disk
>> (transferred over the network from the device to the host that does
>> the fpcmp and hashing), the outputs of both normal compilation and
>> the
>> kernel compiled under "#pragma STDC FP_CONTRACT OFF" are
computed and
>> compared on the device running the benchmark.  The output of
>> -ffp-contract=off is written to disk, and as currently done in the
>> test-suite, the output is hashed and exactly matched against the
>> reference output.
>> 
>> The good for Proposal 2:
>> - no modifications to CMake and Makefiles
>> - no extra space to store the extra reference output
>> - tests both user CFLAGS specified mode and fast-math and
>> fp-contraction=off.
>> 
>> The bad for Proposal 2:
>> - compilation time will double: e.g., Polly will optimize both
>> kernels,
>> - memory requirements on the device will almost double: added one
>> extra output array, input arrays are not modified, so no need to
>> duplicate them,
>> - compute time on the device will more than double: running the
>> kernel
>> twice, plus an extra loop over both outputs to compare with
>> FP_TOLERANCE.
>> - requires modifications to the code of the benchmarks: some
>> benchmarks may not be easily modified and will need to be only run
>> under -ffp-contract=off (as in Proposal 3.)
>> 
>> Proposal 3: https://reviews.llvm.org/D25351
>> modify the Makefiles and CMakes to explicitly specify the flags under
>> which the results will match the recorded reference output.
>> 
>> The good for Proposal 3:
>> - no modifications to the benchmarks
>> - minimal modifications to the build system
>> 
>> The bad for Proposal 3:
>> - these benchmarks will not be tested with -ffp-contract=on: exact
>> matching of the reference output requires -ffp-contract=off
>> - adding more tests (as in Proposals 1 and 2) is actually a good
>> thing
>> for the test-suite
>> 
>> I would like to invite other people to review the above proposals and
>> suggest a way forward on fixing the current state of the test-suite
>> when running under CFLAGS="-Ofast" and
"-ffp-contract=on." Once
>> consensus is achieved, I am willing to implement and follow up with
>> addressing all reviews necessary to commit the change to the
>> test-suite.
>> 
>> Thank you,
>> Sebastian
>> 
- First: I don't think we can find a 100% solution for the -ffp-contract=on
differences; fpcmp with tolerances won't work on the output of oggenc.
Luckily this seems to be the only problematic benchmark today. But at least for
that one I see no better solution than adding the -ffp-contract=off switch.

- We should consider Polybench to be the problem here! Benchmarks that just run
for a few seconds and produce hundreds of megabytes output are useless as a
compiler/CPU benchmarks (time is really spend in libc, the kernel and waiting
for disks). In case of well behaving benchmarks Proposal 1 is unnecessary: We
can just ship the reference results together with the benchmark and use fpcmp
with tolerances, we do that with most other benchmarks today. We just don't
really want to do that in the case of Polybench because the output is so huge,
so instead we went for just shipping a md5sum of the output which now failed in
combination with floating point accuracy swings, starting this whole
discussion...

- Because of the nature of Polybench I'd rather see Proposal 2 implemented.
Compilation time of polybench is small compared to many of the other benchmarks,
if we run into memory issues we can reduce the size of the arrays to normalize
runtimes (so far I have no reason to believe we do though, looking at some
random polybenchs it seemed by default they create a 1024*1024 array of doubles
which should only be 8Meg per array). And hey if we modify the benchmarks anyway
we could also add some checksumming to the code (maybe bitcasting the doubles to
integers, adjusting for endianess and XOR'ing them together is enough?) and
avoid all the I/O.

- I personally could live with Proposal 3 on the grounds of just declaring
polybench a problematic benchmark so -ffp-contract=off is fine as a stopgap
measure and relying on the fact that we have several other benchmarks that have
smaller references outputs and use fpcmp correctly. Of course Proposal 2 is the
saner solution here.

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161007/af11fbcb/attachment.html>

Hal Finkel via llvm-dev

2016-Oct-08 01:46 UTC

head link

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

----- Original Message -----> From: "Hal Finkel via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Sebastian Pop" <sebpop.llvm at gmail.com>
> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>,
"llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias
Braun"
> <matze at braunis.de>, "Clang Dev" <cfe-dev at
lists.llvm.org>, "nd" <nd at arm.com>, "Abe
Skolnik" <a.skolnik at samsung.com>
> Sent: Friday, October 7, 2016 7:56:53 PM
> Subject: Re: [llvm-dev] [test-suite] making the test-suite succeed with
"-Ofast" and "-ffp-contract=on"
> 
> ----- Original Message -----
> > From: "Sebastian Pop" <sebpop.llvm at gmail.com>
> > To: "Renato Golin" <renato.golin at linaro.org>
> > Cc: "Kristof Beyls" <Kristof.Beyls at arm.com>,
"Sebastian Paul Pop"
> > <s.pop at samsung.com>, "llvm-dev"
> > <llvm-dev at lists.llvm.org>, "nd" <nd at
arm.com>, "Abe Skolnik"
> > <a.skolnik at samsung.com>, "Clang Dev"
> > <cfe-dev at lists.llvm.org>, "Hal Finkel" <hfinkel
at anl.gov>, "Stephen
> > Canon" <scanon at apple.com>, "Matthias Braun"
> > <matze at braunis.de>
> > Sent: Friday, October 7, 2016 7:34:40 PM
> > Subject: [test-suite] making the test-suite succeed with
"-Ofast"
> > and "-ffp-contract=on"
> > 
> > Hi,
> > 
> > I would like to provide a summary of the different proposals on how
> > to
> > fix the test-suite to make it succeed when specifying extra CFLAGS
> > "-Ofast" and "-ffp-contract=on".  I would like to
expose the issue
> > and
> > proposed ways to fix it to other potential reviewers that could
> > provide extra feedback.  We also need to decide which proposal (or
> > combination of) to implement and commit.
> > 
> > Proposal 1: https://reviews.llvm.org/D25277
> > modify the CMakes to compile and run each of these benchmarks
> > twice:
> > once with added CFLAGS -ffp-contract=off.  Record on disk the full
> > output of both runs and compare with FP_TOLERANCE.  Hash the output
> > of
> > the run with -ffp-contract=off and exact match against the
> > reference
> > output.
> > 
> > The good for Proposal 1:
> > - changes contained in the build system: no change to the code of
> > the
> > benchmarks
> > - runs benchmarks under an extra configuration with CFLAGS +> >
-ffp-contract=off
> > 
> > The bad for Proposal 1:
> > - compilation time will double
> > - running time on the device will double
> > - build system is more complex
> > - the build directory goes from 300M to 1.2G due to the extra
> > reference outputs recorded under -ffp-contract=off,
> > - when running test-suite over small devices it will cost 1G more
> > transfer over the network.
> 
> I prefer proposal 1 (although, to be fair, it was something I
> suggested). Being the the business of trying to heavily modify every
> benchmark that does floating-point computation, as in proposal 2,
> does not seem to scale well, and can't always be done regardless.
> 
> We can make some effort to reduce the size of the problems being
> computed by some of the benchmarks (e.g. pollybench); I think that
> is reasonable and will help with the extra space requirements. That
> having been said, functionally speaking, our test suite is at least
> an order of magnitude too small, and so my sympathy is somewhat
> limited. We're going to have to find a way to execute the test suite
> in stages on smaller devices to limit the peak usage, if not because
> of this then because we've added a lot more test applications and
> benchmarks in the future.
Another aspect to this is that we should have this kind of infrastructure for
other purposes as well. We have a similar lack of testing for -ffast-math. We
don't even do a good job of (i.e. have good buildbot coverage for) running
the test suite @ -O1. -O2 and -O3 are much better tested. More regular testing
at -O0 (especially with -g to pick up crashes in our debug-info generation
logic) is needed as well.

 -Hal
> 
>  -Hal
> 
> > 
> > Proposal 2: https://reviews.llvm.org/D25346
> > like Proposal 1, except that there are no files written to disk
> > (transferred over the network from the device to the host that does
> > the fpcmp and hashing), the outputs of both normal compilation and
> > the
> > kernel compiled under "#pragma STDC FP_CONTRACT OFF" are
computed
> > and
> > compared on the device running the benchmark.  The output of
> > -ffp-contract=off is written to disk, and as currently done in the
> > test-suite, the output is hashed and exactly matched against the
> > reference output.
> > 
> > The good for Proposal 2:
> > - no modifications to CMake and Makefiles
> > - no extra space to store the extra reference output
> > - tests both user CFLAGS specified mode and fast-math and
> > fp-contraction=off.
> > 
> > The bad for Proposal 2:
> > - compilation time will double: e.g., Polly will optimize both
> > kernels,
> > - memory requirements on the device will almost double: added one
> > extra output array, input arrays are not modified, so no need to
> > duplicate them,
> > - compute time on the device will more than double: running the
> > kernel
> > twice, plus an extra loop over both outputs to compare with
> > FP_TOLERANCE.
> > - requires modifications to the code of the benchmarks: some
> > benchmarks may not be easily modified and will need to be only run
> > under -ffp-contract=off (as in Proposal 3.)
> > 
> > Proposal 3: https://reviews.llvm.org/D25351
> > modify the Makefiles and CMakes to explicitly specify the flags
> > under
> > which the results will match the recorded reference output.
> > 
> > The good for Proposal 3:
> > - no modifications to the benchmarks
> > - minimal modifications to the build system
> > 
> > The bad for Proposal 3:
> > - these benchmarks will not be tested with -ffp-contract=on: exact
> > matching of the reference output requires -ffp-contract=off
> > - adding more tests (as in Proposals 1 and 2) is actually a good
> > thing
> > for the test-suite
> > 
> > I would like to invite other people to review the above proposals
> > and
> > suggest a way forward on fixing the current state of the test-suite
> > when running under CFLAGS="-Ofast" and
"-ffp-contract=on." Once
> > consensus is achieved, I am willing to implement and follow up with
> > addressing all reviews necessary to commit the change to the
> > test-suite.
> > 
> > Thank you,
> > Sebastian
> > 
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Oct 2016 - [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

[llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

Apparently Analagous Threads