thr3ads.net - llvm dev - [llvm-dev] Noisy benchmark results? [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Mikael Holmén via llvm-dev

2017-Feb-27 08:46 UTC

[llvm-dev] Noisy benchmark results?

Hi,

I'm trying to run the benchmark suite:
  http://llvm.org/docs/TestingGuide.html#test-suite-quickstart

I'm doing it the lnt way, as described at:
  http://llvm.org/docs/lnt/quickstart.html

I don't know what to expect but the results seems to be quite noisy and 
unstable. E.g I've done two runs on two different commits that only 
differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine 
with:

  lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite 
/data/repo/test-suite -j 8

And then I get the following top execution time regressions:
  http://i.imgur.com/sv1xzlK.png

The numbers bounce around a lot if I do more runs.

Given the amount of noise I see here I don't know to sort out 
significant regressions if I actually do a real change in the compiler.

Are the above results expected?

How to use this?


As a bonus question, if I instead run the benchmarks with an added -m32:
  lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang> 
--test-suite /data/repo/test-suite -j 8

I get three failures:

--- Tested: 2465 tests --
FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
FAIL: 
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time 
(495 of 2465)

Is this known/expected or do I do something stupid?

Thanks,
Mikael

Kristof Beyls via llvm-dev

2017-Feb-27 09:36 UTC

head link

[llvm-dev] Noisy benchmark results?

Hi Mikael,

Some noisiness in benchmark results is expected, but the numbers you see seem to
be higher than I'd expect.
A number of tricks people use to get lower noise results are (with the lnt
runtest nt command line options to enable it between brackets):
* Only build the benchmarks in parallel, but do the actual running of the
benchmark code at most one at a time. (--threads 1 --build-threads 6).
* Make lnt use linux perf to get more accurate timing for short-running
benchmarks (--use-perf=1)
* Pin the running benchmark to a specific core, so the OS doesn't move the
benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
* Only run the programs that are marked as a benchmark; some of the tests in the
test-suite are not intended to be used as a benchmark (--benchmarking-only)
* Make sure each program gets run multiple times, so that LNT has a higher
chance of recognizing which programs are inherently noisy (--multisample=3)

I hope this is the kind of answer you were looking for?
Do the above measures reduce the noisiness to acceptable levels for your setup?

Thanks,

Kristof

> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> I'm trying to run the benchmark suite:
> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
> 
> I'm doing it the lnt way, as described at:
> http://llvm.org/docs/lnt/quickstart.html
> 
> I don't know what to expect but the results seems to be quite noisy and
unstable. E.g I've done two runs on two different commits that only differ
by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
> 
> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite
/data/repo/test-suite -j 8
> 
> And then I get the following top execution time regressions:
> http://i.imgur.com/sv1xzlK.png
> 
> The numbers bounce around a lot if I do more runs.
> 
> Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.
> 
> Are the above results expected?
> 
> How to use this?
> 
> 
> As a bonus question, if I instead run the benchmarks with an added -m32:
> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8
> 
> I get three failures:
> 
> --- Tested: 2465 tests --
> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time
(495 of 2465)
> 
> Is this known/expected or do I do something stupid?
> 
> Thanks,
> Mikael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Bruce Hoult via llvm-dev

2017-Feb-27 10:32 UTC

head link

[llvm-dev] Noisy benchmark results?

Two other things:

1) I get massively more stable execution times on 16.04 than on 14.04 on
both x86 and ARM because 16.04 does far fewer gratuitous moves from one
core to another, even without explicit pinning.

2) turn off ASLR: "echo 0 > /proc/sys/kernel/randomize_va_space".
As well
as getting stable addresses for debugging repeatability, it also stabilizes
execution time variability due to "random" conflicts in caches, hash
collisions in branch prediction or BTB, maybe even uop cache.


On Mon, Feb 27, 2017 at 12:36 PM, Kristof Beyls via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Mikael,
>
> Some noisiness in benchmark results is expected, but the numbers you see
> seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt
> runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the
> benchmark code at most one at a time. (--threads 1 --build-threads 6).
> * Make lnt use linux perf to get more accurate timing for short-running
> benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move
the
> benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests
> in the test-suite are not intended to be used as a benchmark
> (--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher
> chance of recognizing which programs are inherently noisy (--multisample=3)
>
> I hope this is the kind of answer you were looking for?
> Do the above measures reduce the noisiness to acceptable levels for your
> setup?
>
> Thanks,
>
> Kristof
>
>
> > On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > I'm trying to run the benchmark suite:
> > http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
> >
> > I'm doing it the lnt way, as described at:
> > http://llvm.org/docs/lnt/quickstart.html
> >
> > I don't know what to expect but the results seems to be quite
noisy and
> unstable. E.g I've done two runs on two different commits that only
differ
> by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
> >
> > lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang>
--test-suite
> /data/repo/test-suite -j 8
> >
> > And then I get the following top execution time regressions:
> > http://i.imgur.com/sv1xzlK.png
> >
> > The numbers bounce around a lot if I do more runs.
> >
> > Given the amount of noise I see here I don't know to sort out
> significant regressions if I actually do a real change in the compiler.
> >
> > Are the above results expected?
> >
> > How to use this?
> >
> >
> > As a bonus question, if I instead run the benchmarks with an added
-m32:
> > lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc
<path-to-my-clang>
> --test-suite /data/repo/test-suite -j 8
> >
> > I get three failures:
> >
> > --- Tested: 2465 tests --
> > FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of
2465)
> > FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of
> 2465)
> > FAIL:
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time
> (495 of 2465)
> >
> > Is this known/expected or do I do something stupid?
> >
> > Thanks,
> > Mikael
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/e931af78/attachment.html>

Matthias Braun via llvm-dev

2017-Feb-27 19:42 UTC

head link

[llvm-dev] Noisy benchmark results?

In addition to all the good points given in this thread:

- Nowadays I'd recommend using 'lnt runtest test-suite' instead of
'nt' to use the cmake/lit based variant.
- Alternatively if you just need an A/B comparison run the benchmarks directly
as described in
http://www.llvm.org/docs/TestSuiteMakefileGuide.html#running-the-test-suite-via-cmake
and use test-suite/utils/compare.py
- Use --benchmarking-only (lnt) / -DTEST_SUITE_BENCHMARKING_ONLY (cmake) to
remove a number of tests that are useless for performance testing (like all the
unittests in there)
- I created a blacklist of benchmarks that are noisy for my target by rerunning
the test-suite a few times with the same compiler. I can feed this blacklist to
`utils/compare.py --filter-blacklist`
- As we are on the topic. I recommend this talk from last years dev meeting to
dampen the expectation that every good compiler transformations must lead to
better (or at least neutral) performance: 
https://www.youtube.com/watch?v=IX16gcX4vDQ&t=24s  I think one lesson we
should draw from this is that we can use benchmarking as an indicator for
problems but there is no way around checking the assembly differences manually
for the things where we measured different performance.

- Matthias
> On Feb 27, 2017, at 12:46 AM, Mikael Holmén via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> I'm trying to run the benchmark suite:
> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
> 
> I'm doing it the lnt way, as described at:
> http://llvm.org/docs/lnt/quickstart.html
> 
> I don't know what to expect but the results seems to be quite noisy and
unstable. E.g I've done two runs on two different commits that only differ
by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
> 
> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite
/data/repo/test-suite -j 8
> 
> And then I get the following top execution time regressions:
> http://i.imgur.com/sv1xzlK.png
> 
> The numbers bounce around a lot if I do more runs.
> 
> Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.
> 
> Are the above results expected?
> 
> How to use this?
> 
> 
> As a bonus question, if I instead run the benchmarks with an added -m32:
> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8
> 
> I get three failures:
> 
> --- Tested: 2465 tests --
> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time
(495 of 2465)
> 
> Is this known/expected or do I do something stupid?
> 
> Thanks,
> Mikael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mikael Holmén via llvm-dev

2017-Feb-28 11:55 UTC

head link

[llvm-dev] Noisy benchmark results?

Hi,

On 02/27/2017 10:36 AM, Kristof Beyls wrote:> Hi Mikael,
>
> Some noisiness in benchmark results is expected, but the numbers you see
seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt
runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the
benchmark code at most one at a time. (--threads 1 --build-threads 6).
> * Make lnt use linux perf to get more accurate timing for short-running
benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move
the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests
in the test-suite are not intended to be used as a benchmark
(--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher
chance of recognizing which programs are inherently noisy (--multisample=3)
>
> I hope this is the kind of answer you were looking for?
Spot on! Thanks!
> Do the above measures reduce the noisiness to acceptable levels for your
setup?
I ran with all your suggestions above and now I have:

regressions:  http://i.imgur.com/kjA2WpG.png
improvements: http://i.imgur.com/WmRlHka.png

for two runs on two commits that only differs by a white space.

Is this as stable as you normally get it?

I suppose it's enough to be able to see if my "real" change messes
something up or not.

Thanks,
Mikael
>
> Thanks,
>
> Kristof
>
>
>> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> Hi,
>>
>> I'm trying to run the benchmark suite:
>> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>>
>> I'm doing it the lnt way, as described at:
>> http://llvm.org/docs/lnt/quickstart.html
>>
>> I don't know what to expect but the results seems to be quite noisy
and unstable. E.g I've done two runs on two different commits that only
differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>>
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8
>>
>> And then I get the following top execution time regressions:
>> http://i.imgur.com/sv1xzlK.png
>>
>> The numbers bounce around a lot if I do more runs.
>>
>> Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.
>>
>> Are the above results expected?
>>
>> How to use this?
>>
>>
>> As a bonus question, if I instead run the benchmarks with an added
-m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc
<path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>>
>> I get three failures:
>>
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of
2465)
>> FAIL:
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of
2465)
>>
>> Is this known/expected or do I do something stupid?
>>
>> Thanks,
>> Mikael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Mehdi Amini via llvm-dev

2017-Feb-28 20:51 UTC

head link

[llvm-dev] Noisy benchmark results?

> On Feb 27, 2017, at 1:36 AM, Kristof Beyls via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Mikael,
> 
> Some noisiness in benchmark results is expected, but the numbers you see
seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt
runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the
benchmark code at most one at a time. (--threads 1 --build-threads 6).
This seems critical, I always do that.
> * Make lnt use linux perf to get more accurate timing for short-running
benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move
the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests
in the test-suite are not intended to be used as a benchmark
(--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher
chance of recognizing which programs are inherently noisy (--multisample=3)
This as well, with usually 5 multisamples.

I’d add to this good list: disable frequency scaling / turbo boost. In case of
thermal throttling it can skew the results.

— 
Mehdi


> 
> I hope this is the kind of answer you were looking for?
> Do the above measures reduce the noisiness to acceptable levels for your
setup?
> 
> Thanks,
> 
> Kristof
> 
> 
>> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> Hi,
>> 
>> I'm trying to run the benchmark suite:
>> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>> 
>> I'm doing it the lnt way, as described at:
>> http://llvm.org/docs/lnt/quickstart.html
>> 
>> I don't know what to expect but the results seems to be quite noisy
and unstable. E.g I've done two runs on two different commits that only
differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>> 
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8
>> 
>> And then I get the following top execution time regressions:
>> http://i.imgur.com/sv1xzlK.png
>> 
>> The numbers bounce around a lot if I do more runs.
>> 
>> Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.
>> 
>> Are the above results expected?
>> 
>> How to use this?
>> 
>> 
>> As a bonus question, if I instead run the benchmarks with an added
-m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc
<path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> 
>> I get three failures:
>> 
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of
2465)
>> FAIL:
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of
2465)
>> 
>> Is this known/expected or do I do something stupid?
>> 
>> Thanks,
>> Mikael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

2017-Feb-28 20:53 UTC

head link

[llvm-dev] Noisy benchmark results?

> On Feb 27, 2017, at 11:42 AM, Matthias Braun via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> In addition to all the good points given in this thread:
> 
> - Nowadays I'd recommend using 'lnt runtest test-suite' instead
of 'nt' to use the cmake/lit based variant.
> - Alternatively if you just need an A/B comparison run the benchmarks
directly as described in
http://www.llvm.org/docs/TestSuiteMakefileGuide.html#running-the-test-suite-via-cmake
and use test-suite/utils/compare.py
I’m interested if you can get multi-sample runs with this?

— 
Mehdi

> - Use --benchmarking-only (lnt) / -DTEST_SUITE_BENCHMARKING_ONLY (cmake) to
remove a number of tests that are useless for performance testing (like all the
unittests in there)
> - I created a blacklist of benchmarks that are noisy for my target by
rerunning the test-suite a few times with the same compiler. I can feed this
blacklist to `utils/compare.py --filter-blacklist`
> - As we are on the topic. I recommend this talk from last years dev meeting
to dampen the expectation that every good compiler transformations must lead to
better (or at least neutral) performance: 
https://www.youtube.com/watch?v=IX16gcX4vDQ&t=24s  I think one lesson we
should draw from this is that we can use benchmarking as an indicator for
problems but there is no way around checking the assembly differences manually
for the things where we measured different performance.
> 
> - Matthias
> 
>> On Feb 27, 2017, at 12:46 AM, Mikael Holmén via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> 
>> Hi,
>> 
>> I'm trying to run the benchmark suite:
>> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>> 
>> I'm doing it the lnt way, as described at:
>> http://llvm.org/docs/lnt/quickstart.html
>> 
>> I don't know what to expect but the results seems to be quite noisy
and unstable. E.g I've done two runs on two different commits that only
differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>> 
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8
>> 
>> And then I get the following top execution time regressions:
>> http://i.imgur.com/sv1xzlK.png
>> 
>> The numbers bounce around a lot if I do more runs.
>> 
>> Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.
>> 
>> Are the above results expected?
>> 
>> How to use this?
>> 
>> 
>> As a bonus question, if I instead run the benchmarks with an added
-m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc
<path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> 
>> I get three failures:
>> 
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of
2465)
>> FAIL:
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of
2465)
>> 
>> Is this known/expected or do I do something stupid?
>> 
>> Thanks,
>> Mikael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mikael Holmén via llvm-dev

2017-Mar-01 07:36 UTC

head link

[llvm-dev] Noisy benchmark results?

Hi,

Thank you to everyone that responded.

Good hints!

Now, I'm sure that I haven't read every piece of documentation about the
test suite, but don't you think the tips and tricks you've responded 
with here should make into the quick start web page to help the next 
test-suite newbie that wants to run this and get stable results?

E.g. at
  http://llvm.org/docs/lnt/quickstart.html
for the lnt way?

There could be a bullet 3 under "Running Tests" or just some extra 
proposed flags under "2" to describe a few things one could do if the 
results bounce around a lot.

Thanks again,
Mikael

On 02/27/2017 09:46 AM, Mikael Holmén via llvm-dev
wrote:> Hi,
>
> I'm trying to run the benchmark suite:
>  http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>
> I'm doing it the lnt way, as described at:
>  http://llvm.org/docs/lnt/quickstart.html
>
> I don't know what to expect but the results seems to be quite noisy and
> unstable. E.g I've done two runs on two different commits that only
> differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine
> with:
>
>  lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang>
--test-suite
> /data/repo/test-suite -j 8
>
> And then I get the following top execution time regressions:
>  http://i.imgur.com/sv1xzlK.png
>
> The numbers bounce around a lot if I do more runs.
>
> Given the amount of noise I see here I don't know to sort out
> significant regressions if I actually do a real change in the compiler.
>
> Are the above results expected?
>
> How to use this?
>
>
> As a bonus question, if I instead run the benchmarks with an added -m32:
>  lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc
<path-to-my-clang>
> --test-suite /data/repo/test-suite -j 8
>
> I get three failures:
>
> --- Tested: 2465 tests --
> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
> FAIL:
> MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time
> (495 of 2465)
>
> Is this known/expected or do I do something stupid?
>
> Thanks,
> Mikael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kristof Beyls via llvm-dev

2017-Mar-01 08:22 UTC

head link

[llvm-dev] Noisy benchmark results?

On 1 Mar 2017, at 08:36, Mikael Holmén via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi,

Thank you to everyone that responded.

Good hints!

Now, I'm sure that I haven't read every piece of documentation about the
test suite, but don't you think the tips and tricks you've responded
with here should make into the quick start web page to help the next test-suite
newbie that wants to run this and get stable results?

It definitely should.
It's sometimes hard for the non-newbies to figure out what documentation is
missing the most, so thank you very much for pointing this out!

I've added some documentation in the patch under review at
https://reviews.llvm.org/D30488.
Please have a look and leave your comments. I'll leave the patch in review
until the end of the week before committing it.

Thanks,

Kristof


E.g. at
http://llvm.org/docs/lnt/quickstart.html
for the lnt way?

There could be a bullet 3 under "Running Tests" or just some extra
proposed flags under "2" to describe a few things one could do if the
results bounce around a lot.

Thanks again,
Mikael

On 02/27/2017 09:46 AM, Mikael Holmén via llvm-dev wrote:
Hi,

I'm trying to run the benchmark suite:
http://llvm.org/docs/TestingGuide.html#test-suite-quickstart

I'm doing it the lnt way, as described at:
http://llvm.org/docs/lnt/quickstart.html

I don't know what to expect but the results seems to be quite noisy and
unstable. E.g I've done two runs on two different commits that only
differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine
with:

lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite
/data/repo/test-suite -j 8

And then I get the following top execution time regressions:
http://i.imgur.com/sv1xzlK.png

The numbers bounce around a lot if I do more runs.

Given the amount of noise I see here I don't know to sort out
significant regressions if I actually do a real change in the compiler.

Are the above results expected?

How to use this?


As a bonus question, if I instead run the benchmarks with an added -m32:
lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang>
--test-suite /data/repo/test-suite -j 8

I get three failures:

--- Tested: 2465 tests --
FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
FAIL:
MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time
(495 of 2465)

Is this known/expected or do I do something stupid?

Thanks,
Mikael
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/6fa60777/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Feb 2017 - Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

[llvm-dev] Noisy benchmark results?

Seemingly Similar Threads