Abe Skolnik via llvm-dev
2016-Oct-05 22:29 UTC
[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs
Dear all,
Today I had an idea that might satisfy all the needs for improvement we
currently have "on the
plate" WRT the repo.-wise sizes of reference outputs and the issues
surrounding FP
optimizations and how to allow them while still allowing test programs in
"test-suite" the
output[s] of which depend upon FP computations [and for which relatively-small
changes in FP
accuracy, whether up/more-accurate or down/less-accurate, change the actual
observed output].
For non-FP-dependent, fully-deterministic programs, we can choose the shortest
[in # of bytes
as reported by "ls"] of the following:
* hash
* compressed output
* raw output
[in increasing order of "likely" size]
... or we can establish some minimum differentiating factors, e.g.
"compressed output must be
at least 2x smaller than raw output, otherwise stick to raw output" and
"hash must be at least
10x smaller than compressed output, otherwise stick to compressed output".
If needed/{strongly
desired}, the rules can even be a little more complicated than that, e.g.
"compressed output
must be at least 2x smaller than raw output OR at least 4096 bytes smaller than
raw output,
otherwise stick to raw output".
For programs that _are_ either FP-dependent, not-fully-deterministic, or both, I
propose that
we shall only choose from the set {compressed output, raw output} because:
1) small-enough variation in the result is expected, normal, and tolerated
and
2) since this way the raw reference output will be available at the
"lit"-running host
[after decompression, if needed],
the "fpcmp" program will be able to be told how much tolerance
to allow for each run.
If we only choose from the set {compressed ref. output, raw ref. output} for
these tests, then
it should be relatively easy to run some tests with output-changing FP
optimizations enabled,
since those runs won`t depend on the {no-output-changing-FP-optimizations} build
having run
first. Although Hal`s suggestion to have the
{no-output-changing-FP-optimizations} build
produce the output that will be analyzed by the {output-changing FP
optimizations enabled}
builds is an excellent suggestion, it seems that implementing it in the context
of "lit" is a
large amount more difficult than we had hoped for. If anybody reading this
knows how to make
"lit" only start one test after another one has finished, please chime
in.
If compressed ref. outputs will be accepted by the community, then please let me
know which of
the following would be acceptable to depend on the ability to decompress:
bz2
gzip
xz
I`m perfectly willing to write [a] wrapper[s] that will probe the system for
programs that can
decompress whatever it is and will choose the best one.
Regards,
Abe
Kristof Beyls via llvm-dev
2016-Oct-06 09:02 UTC
[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs
Hi Abe,
My 2 cents:
I have been using the test-suite mainly in benchmarking mode as a convenient way
to track performance changes in top-of-trunk.
I've observed that some of the programs (IIRC, especially the ones in
SingleSource/Benchmarks/Polybench/) produce a lot of output (megabytes).
This caused a lot of noise in performance measurements, as the execution time
was dominated by printing out the data, rather than the actual useful
computations. Renato removed the worst noise in http://reviews.llvm.org/D10991.
That experience made me think that for the programs in the test-suite, ideally
they should print out only a small amount of output to be checked.
For example, by adapting individual programs that output a lot of data to only
print a summary/aggregate of the data, that somehow is likely to change
when a miscomputation happened.
If we could go in that direction, I don't see much need for storing hashes
or even compressed output as reference data.
I think that needing compressed reference data may make the test-suite ever so
slightly harder to set up: another dependency on an external tool. Not that I
can imagine that having a dependency on e.g. gzip would be problematic on any
platform.
Anyway, I thought I'd just share my opinion of it being ideal that the
programs in the test-suite would only produce small outputs, to avoid noisy
benchmark results. If that would be a direction we could go into, there may not
be much needed for storing hashes or compressed reference output.
Thanks,
Kristof
On 6 Oct 2016, at 00:29, Abe Skolnik via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Dear all,
Today I had an idea that might satisfy all the needs for improvement we
currently have "on the plate" WRT the repo.-wise sizes of reference
outputs and the issues surrounding FP optimizations and how to allow them while
still allowing test programs in "test-suite" the output[s] of which
depend upon FP computations [and for which relatively-small changes in FP
accuracy, whether up/more-accurate or down/less-accurate, change the actual
observed output].
For non-FP-dependent, fully-deterministic programs, we can choose the shortest
[in # of bytes as reported by "ls"] of the following:
* hash
* compressed output
* raw output
[in increasing order of "likely" size]
... or we can establish some minimum differentiating factors, e.g.
"compressed output must be at least 2x smaller than raw output, otherwise
stick to raw output" and "hash must be at least 10x smaller than
compressed output, otherwise stick to compressed output". If
needed/{strongly desired}, the rules can even be a little more complicated than
that, e.g. "compressed output must be at least 2x smaller than raw output
OR at least 4096 bytes smaller than raw output, otherwise stick to raw
output".
For programs that _are_ either FP-dependent, not-fully-deterministic, or both, I
propose that we shall only choose from the set {compressed output, raw output}
because:
1) small-enough variation in the result is expected, normal, and tolerated
and
2) since this way the raw reference output will be available at the
"lit"-running host [after decompression, if needed],
the "fpcmp" program will be able to be told how much tolerance to
allow for each run.
If we only choose from the set {compressed ref. output, raw ref. output} for
these tests, then it should be relatively easy to run some tests with
output-changing FP optimizations enabled, since those runs won`t depend on the
{no-output-changing-FP-optimizations} build having run first. Although Hal`s
suggestion to have the {no-output-changing-FP-optimizations} build produce the
output that will be analyzed by the {output-changing FP optimizations enabled}
builds is an excellent suggestion, it seems that implementing it in the context
of "lit" is a large amount more difficult than we had hoped for. If
anybody reading this knows how to make "lit" only start one test after
another one has finished, please chime in.
If compressed ref. outputs will be accepted by the community, then please let me
know which of the following would be acceptable to depend on the ability to
decompress:
bz2
gzip
xz
I`m perfectly willing to write [a] wrapper[s] that will probe the system for
programs that can decompress whatever it is and will choose the best one.
Regards,
Abe
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161006/b7203bd8/attachment.html>
Sebastian Pop via llvm-dev
2016-Oct-06 15:11 UTC
[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs
On Thu, Oct 6, 2016 at 5:02 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi Abe, > > My 2 cents: > I have been using the test-suite mainly in benchmarking mode as a convenient > way to track performance changes in top-of-trunk. > I've observed that some of the programs (IIRC, especially the ones in > SingleSource/Benchmarks/Polybench/) produce a lot of output (megabytes). > This caused a lot of noise in performance measurements, as the execution > time was dominated by printing out the data, rather than the actual useful > computations. Renato removed the worst noise in > http://reviews.llvm.org/D10991. > > That experience made me think that for the programs in the test-suite, > ideally they should print out only a small amount of output to be checked. > For example, by adapting individual programs that output a lot of data to > only print a summary/aggregate of the data, that somehow is likely to change > when a miscomputation happened. > > If we could go in that direction, I don't see much need for storing hashes > or even compressed output as reference data. > I think that needing compressed reference data may make the test-suite ever > so slightly harder to set up: another dependency on an external tool. Not > that I can imagine that having a dependency on e.g. gzip would be > problematic on any platform. > > Anyway, I thought I'd just share my opinion of it being ideal that the > programs in the test-suite would only produce small outputs, to avoid noisy > benchmark results. If that would be a direction we could go into, there may > not be much needed for storing hashes or compressed reference output. >Kristof, I agree with your point of view. There is a very easy way to output only one double from the polybench: - compile the kernel with fp-contract=off and -fno-fast-math - add a "+" reduction loop of all the elements in the output array (also compiled with strict FP computations such that the output is deterministic) - print the result of the reduction instead of printing the full array. Thanks, Sebastian