thr3ads.net - llvm dev - [LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2014-Jan-07 18:06 UTC

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

Hi,

I would like to announce a new set of LNT -O3 performance testers.

In a discussion titled "Question about results reliability in LNT
infrustructure" Anton suggested that one way to get statistically
reliable test results from the LNT infrastructure is to use a larger
sample size (5-10) as well as a more robust statistical test
(Wilcoxon/Mann-Whitney). Another requirement to make the performance
results we get from our testers useful is to have a per-commit
performance run.

I would like to announce that I set up 4 identical machines* that
publicly report LNT results for 'clang -O3' at:

http://llvm.org/perf/db_default/v4/nts/machine/34

We currently catch in average groups of 3-5 commits. As most commits
obviously do not impact performance this seems to be enough to track
down performance regressions/changes easily.

The results that have been reported so far seem to provide sufficient
information to catch performance changes. Specifically, when setting the
aggregation function to median, most runs are shown to not impact
performance:

e.g:
http://llvm.org/perf/db_default/v4/nts/19939?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=19934&submit=Update

We still have a couple of runs that report performance differences, but
where looking at the performance graph of the changed test cases makes
it very clear that those are false positives due to test case noise.

Here comes the point of this mail. I am currently not sure when I find
time to improve the LNT infrastructure to take advantage of the data
provided. So in case someone else would like to have a look and e.g. add
the Wilcoxon/Mann-Whitney test this would be highly appreciated.

I also have a couple of more machines. Hence, if the LNT infrastructure
is in place we can use them to increase the reliability of the results
even more.

Cheers,
Tobias

* Also have sufficiently close performance characteristics when running
LNT tests for the same version

Sean Silva

2014-Jan-08 01:48 UTC

head link

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <tobias at grosser.es>
wrote:
> Hi,
>
> I would like to announce a new set of LNT -O3 performance testers.
>
> In a discussion titled "Question about results reliability in LNT
> infrustructure" Anton suggested that one way to get statistically
reliable
> test results from the LNT infrastructure is to use a larger sample size
> (5-10) as well as a more robust statistical test (Wilcoxon/Mann-Whitney).
> Another requirement to make the performance results we get from our testers
> useful is to have a per-commit performance run.
>
> I would like to announce that I set up 4 identical machines* that publicly
> report LNT results for 'clang -O3' at:
>
> http://llvm.org/perf/db_default/v4/nts/machine/34
>
> We currently catch in average groups of 3-5 commits. As most commits
> obviously do not impact performance this seems to be enough to track down
> performance regressions/changes easily.
>
If possible, I think it would be a good idea to filter out commits that
don't affect code generation. This would allow machine resources to be
better used.

Is there some way we can easily filter commits based on whether they affect
code generation or not? Would it be reliable enough to check if the commit
touches any of our integration tests?

As a rough estimate:

sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
706
sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test |
wc -l
317

So it seems like if this is reasonable we can effectively double our
performance testing coverage by filtering like this.

-- Sean Silva

>
> The results that have been reported so far seem to provide sufficient
> information to catch performance changes. Specifically, when setting the
> aggregation function to median, most runs are shown to not impact
> performance:
>
> e.g: http://llvm.org/perf/db_default/v4/nts/19939?num_
> comparison_runs=10&test_filter=&test_min_value_filter>
&aggregation_fn=median&compare_to=19934&submit=Update
>
> We still have a couple of runs that report performance differences, but
> where looking at the performance graph of the changed test cases makes it
> very clear that those are false positives due to test case noise.
>
> Here comes the point of this mail. I am currently not sure when I find
> time to improve the LNT infrastructure to take advantage of the data
> provided. So in case someone else would like to have a look and e.g. add
> the Wilcoxon/Mann-Whitney test this would be highly appreciated.
>
> I also have a couple of more machines. Hence, if the LNT infrastructure is
> in place we can use them to increase the reliability of the results even
> more.
>
> Cheers,
> Tobias
>
> * Also have sufficiently close performance characteristics when running
> LNT tests for the same version
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140107/909e1cb6/attachment.html>

Tobias Grosser

2014-Jan-08 02:02 UTC

head link

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

On 01/08/2014 02:48 AM, Sean Silva wrote:> On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <tobias at
grosser.es> wrote:
>
>> Hi,
>>
>> I would like to announce a new set of LNT -O3 performance testers.
>>
>> In a discussion titled "Question about results reliability in LNT
>> infrustructure" Anton suggested that one way to get statistically
reliable
>> test results from the LNT infrastructure is to use a larger sample size
>> (5-10) as well as a more robust statistical test
(Wilcoxon/Mann-Whitney).
>> Another requirement to make the performance results we get from our
testers
>> useful is to have a per-commit performance run.
>>
>> I would like to announce that I set up 4 identical machines* that
publicly
>> report LNT results for 'clang -O3' at:
>>
>> http://llvm.org/perf/db_default/v4/nts/machine/34
>>
>> We currently catch in average groups of 3-5 commits. As most commits
>> obviously do not impact performance this seems to be enough to track
down
>> performance regressions/changes easily.
>>
>
> If possible, I think it would be a good idea to filter out commits that
> don't affect code generation. This would allow machine resources to be
> better used.
>
> Is there some way we can easily filter commits based on whether they affect
> code generation or not? Would it be reliable enough to check if the commit
> touches any of our integration tests?
>
> As a rough estimate:
>
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc
-l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago'
./test | wc -l
> 317
>
> So it seems like if this is reasonable we can effectively double our
> performance testing coverage by filtering like this.
Hi Sean,

this is a very interesting idea. Though I have no idea if checking for 
'test/ this will be enough or not. If we keep the performance tester 
running for a while, we can probably validate this assumption by 
checking if runs that do not contain integration tests showed 
performance changes (and what kind of changes).

As said before, I would be glad if I could get help with further 
improvements on the software side.

Cheers,
Tobias

Diego Novillo

2014-Jan-08 14:58 UTC

head link

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

On Tue, Jan 7, 2014 at 8:48 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc
-l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago'
./test | wc -l
> 317
Wouldn't this also catch commits to code generation that added tests as
well?


Diego.

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jan 2014 - [LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

Possibly Parallel Threads