thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Sebastian Pop via llvm-dev

2016-Oct-06 15:11 UTC

[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

On Thu, Oct 6, 2016 at 5:02 AM, Kristof Beyls via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi Abe,
>
> My 2 cents:
> I have been using the test-suite mainly in benchmarking mode as a
convenient
> way to track performance changes in top-of-trunk.
> I've observed that some of the programs (IIRC, especially the ones in
> SingleSource/Benchmarks/Polybench/) produce a lot of output (megabytes).
> This caused a lot of noise in performance measurements, as the execution
> time was dominated by printing out the data, rather than the actual useful
> computations. Renato removed the worst noise in
> http://reviews.llvm.org/D10991.
>
> That experience made me think that for the programs in the test-suite,
> ideally they should print out only a small amount of output to be checked.
> For example, by adapting individual programs that output a lot of data to
> only print a summary/aggregate of the data, that somehow is likely to
change
> when a miscomputation happened.
>
> If we could go in that direction, I don't see much need for storing
hashes
> or even compressed output as reference data.
> I think that needing compressed reference data may make the test-suite ever
> so slightly harder to set up: another dependency on an external tool. Not
> that I can imagine that having a dependency on e.g. gzip would be
> problematic on any platform.
>
> Anyway, I thought I'd just share my opinion of it being ideal that the
> programs in the test-suite would only produce small outputs, to avoid noisy
> benchmark results. If that would be a direction we could go into, there may
> not be much needed for storing hashes or compressed reference output.
>
Kristof, I agree with your point of view.

There is a very easy way to output only one double from the polybench:
- compile the kernel with fp-contract=off and -fno-fast-math
- add a "+" reduction loop of all the elements in the output array
(also compiled with strict FP computations such that the output is
deterministic)
- print the result of the reduction instead of printing the full array.

Thanks,
Sebastian

Sebastian Pop via llvm-dev

2016-Oct-06 15:15 UTC

head link

[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

Adding Tobi in CC to get his review about the proposed change to Polybench.

Thanks,
Sebastian

On Thu, Oct 6, 2016 at 11:11 AM, Sebastian Pop <sebpop.llvm at gmail.com>
wrote:> On Thu, Oct 6, 2016 at 5:02 AM, Kristof Beyls via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi Abe,
>>
>> My 2 cents:
>> I have been using the test-suite mainly in benchmarking mode as a
convenient
>> way to track performance changes in top-of-trunk.
>> I've observed that some of the programs (IIRC, especially the ones
in
>> SingleSource/Benchmarks/Polybench/) produce a lot of output
(megabytes).
>> This caused a lot of noise in performance measurements, as the
execution
>> time was dominated by printing out the data, rather than the actual
useful
>> computations. Renato removed the worst noise in
>> http://reviews.llvm.org/D10991.
>>
>> That experience made me think that for the programs in the test-suite,
>> ideally they should print out only a small amount of output to be
checked.
>> For example, by adapting individual programs that output a lot of data
to
>> only print a summary/aggregate of the data, that somehow is likely to
change
>> when a miscomputation happened.
>>
>> If we could go in that direction, I don't see much need for storing
hashes
>> or even compressed output as reference data.
>> I think that needing compressed reference data may make the test-suite
ever
>> so slightly harder to set up: another dependency on an external tool.
Not
>> that I can imagine that having a dependency on e.g. gzip would be
>> problematic on any platform.
>>
>> Anyway, I thought I'd just share my opinion of it being ideal that
the
>> programs in the test-suite would only produce small outputs, to avoid
noisy
>> benchmark results. If that would be a direction we could go into, there
may
>> not be much needed for storing hashes or compressed reference output.
>>
>
> Kristof, I agree with your point of view.
>
> There is a very easy way to output only one double from the polybench:
> - compile the kernel with fp-contract=off and -fno-fast-math
> - add a "+" reduction loop of all the elements in the output
array
> (also compiled with strict FP computations such that the output is
> deterministic)
> - print the result of the reduction instead of printing the full array.
>
> Thanks,
> Sebastian

Renato Golin via llvm-dev

2016-Oct-06 15:38 UTC

head link

[llvm-dev] [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

On 6 October 2016 at 16:11, Sebastian Pop via cfe-dev
<cfe-dev at lists.llvm.org> wrote:> There is a very easy way to output only one double from the polybench:
> - compile the kernel with fp-contract=off and -fno-fast-math
Sebastian, please stop crossing the wires. This is a separate discussion.

> - add a "+" reduction loop of all the elements in the output
array
> (also compiled with strict FP computations such that the output is
> deterministic)
addition can saturate/overflow and lose precision, especially if we
have hundreds of thousands of results or if the type is float, not
double. Whatever the aggregation function we use has to be meaningful.

One way I did in the past was to aggregate in blocks when the results
weren't likely to saturate/overflow/lose precision, ie. the end result
had a similar magnitude as the individual results.

This gave us huge benefits in I/O and comparison times, and can work
with polybench, but someone will have to go through it and make sure
the aggregated numbers are not orders of magnitude greater than the
individual results.

cheers,
--renato

Sebastian Pop via llvm-dev

2016-Oct-06 18:17 UTC

head link

[llvm-dev] [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

On Thu, Oct 6, 2016 at 11:38 AM, Renato Golin <renato.golin at linaro.org>
wrote:> On 6 October 2016 at 16:11, Sebastian Pop via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>> There is a very easy way to output only one double from the polybench:
>> - compile the kernel with fp-contract=off and -fno-fast-math
>
> Sebastian, please stop crossing the wires. This is a separate discussion.
We need to get deterministic output for all possible combinations of
CFLAGS the users will compile the test-suite with.
>
>
>> - add a "+" reduction loop of all the elements in the output
array
>> (also compiled with strict FP computations such that the output is
>> deterministic)
>
> addition can saturate/overflow and lose precision, especially if we
> have hundreds of thousands of results or if the type is float, not
> double. Whatever the aggregation function we use has to be meaningful.
Agreed.
I'm also fine using any stable hashing function and link polybench
tests against that.

llvm dev - Oct 2016 - [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

[llvm-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

[llvm-dev] [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs

[llvm-dev] [cfe-dev] test-suite: a new proposal for how to move forward to make "test-suite" more automatic, more flexible, and more maintainable, especially WRT reference outputs