thr3ads.net - llvm dev - [LLVMdev] [LNT] Question about results reliability in LNT infrustructure [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2013-Jun-30 16:19 UTC

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 06/30/2013 02:14 AM, Anton Korobeynikov wrote:> Hi Tobi,
>
> First of all, all this is http://llvm.org/bugs/show_bug.cgi?id=1367 :)
>
>> The statistical test ministat is performing seems simple and pretty
>> standard. Is there any reason we could not do something similar? Or are
we
>> doing it already and it just does not work as expected?
> The main problem with such sort of tests is that we cannot trust them,
unless:
> 1. The data has the normal distribution
> 2. The sample size if large (say, > 50)
>
> Here we have only 3 points and, no, I won't trust the ministat's
> t-test and normal-approximation based confidence bounds. They are *too
> short* (=the real confidence level is no 99.5%, but, actually 40-50%,
> for example).
Hi Anton,

I trust your knowledge about statistics, but am wondering why ministat 
(and it's t-test) is promoted as a statistical sane tool for 
benchmarking results. Is the use of the t-test for benchmark results a 
bad idea in general? Would ministat be a better tool if it implemented 
the Wilcoxon/Mann-Whitney test?
> I'd ask for:
>
> 1. Increasing sample size to at least 5-10
> 2. Do the Wilcoxon/Mann-Whitney test
Reading about the Wilcoxon/Mann-Whitney, it seems to be a more robust 
test that frees us from the normal-approximation assumption. As its 
implementation also does not look overly complicated, it may be a good 
choice.

Regarding the number of samples. I think the most important point is 
that we get some measurement of confidence by which we can sort our 
results and make it visible in the UI. For different use cases we can 
adapt the number of samples based on the required confidence and the 
amount of noise/lost regressions we can accept. This may also be a great 
use for the adaptive sampling that Chris suggested.

Is there anything stopping us from implementing such a test and exposing 
its results in the UI?

Cheers,
Tobi

Anton Korobeynikov

2013-Jun-30 19:05 UTC

head link

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Hi Tobias,
> I trust your knowledge about statistics, but am wondering why ministat (and
> it's t-test) is promoted as a statistical sane tool for benchmarking
> results.I do not know... Ask author of ministat?
> Is the use of the t-test for benchmark results a bad idea in
> general?No, in general. But one should be aware about the assumptions of the
underlying theory. t-test is fine as soon as our data follows the
normal distribution (and hence the test would be exact) or the sample
size is large (then we have the asymptotic normality of the mean due
to CLT).
> Would ministat be a better tool if it implemented the
> Wilcoxon/Mann-Whitney test?The precision would be much better for small sample sizes (say, in range 10-50).

But in any case, never trust someone who will claim he can reliably
estimate the variance from 3 data points.
> Is there anything stopping us from implementing such a test and exposing
its
> results in the UI?I do not think so...

--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Chris Matthews

2013-Jul-01 01:04 UTC

head link

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

I think we need to be using tests with the fewest assumptions possible.  I don’t
think there are many assumptions that would hold for all the benchmarks.

Chris Matthews
chris.matthews at apple.com
phone: 36335


On Jun 30, 2013, at 12:05 PM, Anton Korobeynikov <anton at
korobeynikov.info> wrote:
> Hi Tobias,
> 
>> I trust your knowledge about statistics, but am wondering why ministat
(and
>> it's t-test) is promoted as a statistical sane tool for
benchmarking
>> results.
> I do not know... Ask author of ministat?
> 
>> Is the use of the t-test for benchmark results a bad idea in
>> general?
> No, in general. But one should be aware about the assumptions of the
> underlying theory. t-test is fine as soon as our data follows the
> normal distribution (and hence the test would be exact) or the sample
> size is large (then we have the asymptotic normality of the mean due
> to CLT).
> 
>> Would ministat be a better tool if it implemented the
>> Wilcoxon/Mann-Whitney test?
> The precision would be much better for small sample sizes (say, in range
10-50).
> 
> But in any case, never trust someone who will claim he can reliably
> estimate the variance from 3 data points.
> 
>> Is there anything stopping us from implementing such a test and
exposing its
>> results in the UI?
> I do not think so...
> 
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130630/722d2962/attachment.html>

Renato Golin

2013-Jul-01 07:37 UTC

head link

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 30 June 2013 20:05, Anton Korobeynikov <anton at korobeynikov.info>
wrote:
> But in any case, never trust someone who will claim he can reliably
> estimate the variance from 3 data points.
>
One cannot stress this enough. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c54abb2d/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jun 2013 - [LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Reasonably Related Threads