Hi Tobias, Renato,
Thanks for your attention to my RFC.
On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
> We had a longer discussion here on llvmdev names 'Questions about
> results reliability in LNT infrastructure'. Anton suggested to do the
> following:
>
> 1. Get 5-10 samples per run
> 2. Do the Wilcoxon/Mann-Whitney test
My current analysis uses student's t test, assuming that programs with
similar run time have similar stdev, which seems to be
over-simplification after gone through your previous discussion about
results reliability. I will go ahead and implement it and see if that
produces better results.
On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
>> - Show and graph total compile time
>> There is no obvious way to scale up the compile time of
>> individual benchmarks, so total time is the best thing we can do to
>> minimize error.
>> LNT: [PATCH 1/3] Add Total to run view and graph plot
>
> I did not see the effect of these changes in your images and also
> honestly do not fully understand what you are doing. What is the
> total compile time? Don't we already show the compile time in run
> view? How is the total time different to this compile time?
It is hard to spot minor improvements or regressions over a large number
of tests from independent machine noise. So I added a "total time"
analysis to the run report and able to graph its trend, hoping that
noise will cancel out and will help us to easily spot. (Screenshot attached)
On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
> I am a little sceptical on this. Machines should generally not be
> noisy. However, if for some reason there is noise on the machine, the
> noise is as likely to appear during this pre-noise-test than during
> the actual benchmark runs, maybe during both, but maybe also only
> during the benchmark. So I am afraid we might often run in the
> situation where this test says OK but the later test is still
> suffering noise.
I agree that measuring before each run may not be very useful. The main
purpose of it is for adaptive problem scaling.
On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
> In general, I see such changes as a second step. First, we want to
> have a system in place that allows us to reliably detect if a
> benchmark is noisy or not, second we want to increase the number of
> benchmarks that are not noisy and where we can use the results.
Ok.
Cheers,
Yi Kong
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2548782
-------------- next part --------------
A non-text attachment was scrubbed...
Name: total_time_graph.png
Type: image/png
Size: 41778 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140430/30660369/attachment.png>