thr3ads.net - llvm dev - [LLVMdev] RFC:LNT Improvements [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2014-Apr-30 09:37 UTC

[LLVMdev] RFC:LNT Improvements

On 30 April 2014 10:21, Tobias Grosser <tobias at grosser.es>
wrote:> To my understanding, the first patches should just improve LNT to report
how
> reliable the results are it reports. So there is no way that this can
effect
> the test suite runs, which means I do not see why we would want to delay
> such changes.
>
> In fact, if we have a good idea which kernels are reliable and which ones
> are not, we can probably use this information to actually mark benchmarks
> that are known to be noisy.
Right, yes, that'd be a good first step. I just wanted to make sure
that we don't just assume 10 runs is ok for everyone and consider it
done.

> Reporting numbers that are not 100% reliable makes the results useless as
> well. As ARM boards are cheap, you could just put 5 boxes in place and we
> get the samples we need. Even if this is not yet feasible, I would rather
> run 5 samples of the benchmarks you really care, then running everything
> once and getting unreliable number.
That'd be another source of noise. You can't consider 5 boards'
results to be the same as 5 results in 1 board.

They're cheap (as in quality) and different boards (of the same brand
and batch) have different manufacturing defects that are only exposed
when we crush them to death with compiler tests and benchmarks. Nobody
in the factory has ever tested for that, since they only expect you to
run light stuff like media players, web servers, routers.

> Let me rephrase. "Machines on which you would like to run benchmarks
should
> have a consistent and low enough level of noise"
No 32-bit ARM machine I have tested so far fits that bill.

> So do you think the benchmark.sh script proposed by Yi Kong is useful for
> ARM?
I'm also sceptical about that. I don't think that the noise on setup
will be any better or worse than noise during tests.

The only way to be sure is to run it every time and to understand the
curve, find a cut and warn on every noise level above the cut. Mind
you, this cut will be dynamic as the number of results grow, but once
we have a few dozen runs, it should stabilise.

But that is not a replacement for running the test multiple times or
for longer times. We need statistical significance.

cheers,
--renato

Tobias Grosser

2014-Apr-30 09:50 UTC

head link

[LLVMdev] RFC:LNT Improvements

On 30/04/2014 11:37, Renato Golin wrote:> On 30 April 2014 10:21, Tobias Grosser <tobias at grosser.es> wrote:
>> To my understanding, the first patches should just improve LNT to
report how
>> reliable the results are it reports. So there is no way that this can
effect
>> the test suite runs, which means I do not see why we would want to
delay
>> such changes.
>>
>> In fact, if we have a good idea which kernels are reliable and which
ones
>> are not, we can probably use this information to actually mark
benchmarks
>> that are known to be noisy.
>
> Right, yes, that'd be a good first step. I just wanted to make sure
> that we don't just assume 10 runs is ok for everyone and consider it
> done.
Right.
>> Reporting numbers that are not 100% reliable makes the results useless
as
>> well. As ARM boards are cheap, you could just put 5 boxes in place and
we
>> get the samples we need. Even if this is not yet feasible, I would
rather
>> run 5 samples of the benchmarks you really care, then running
everything
>> once and getting unreliable number.
>
> That'd be another source of noise. You can't consider 5 boards'
> results to be the same as 5 results in 1 board.
>
> They're cheap (as in quality) and different boards (of the same brand
> and batch) have different manufacturing defects that are only exposed
> when we crush them to death with compiler tests and benchmarks. Nobody
> in the factory has ever tested for that, since they only expect you to
> run light stuff like media players, web servers, routers.
I see the point. There are ways around this e.g. by running different 
benchmarks on different boards, but what it boils down to is that we 
first need to reliably measure and report the quality of the results.
Only then we can judge the effects of changes that are aimed to increase 
the quality.
>> Let me rephrase. "Machines on which you would like to run
benchmarks should
>> have a consistent and low enough level of noise"
>
> No 32-bit ARM machine I have tested so far fits that bill.
>
>
>> So do you think the benchmark.sh script proposed by Yi Kong is useful
for
>> ARM?
>
> I'm also sceptical about that. I don't think that the noise on
setup
> will be any better or worse than noise during tests.
>
> The only way to be sure is to run it every time and to understand the
> curve, find a cut and warn on every noise level above the cut. Mind
> you, this cut will be dynamic as the number of results grow, but once
> we have a few dozen runs, it should stabilise.
>
> But that is not a replacement for running the test multiple times or
> for longer times. We need statistical significance.
Agreed.

My proposal is to do this right ahead. As there is enough data from the 
public X86 -O3 runs (10 samples each run, with 3-5 commits between each 
run), the only missing piece seems to be the LNT changes to report on
the statistical significance. Yi Kong started to hack on this already
and might be able to adjust his changes. Let's wait for his opinion.

Cheers,
Tobias

Renato Golin

2014-Apr-30 10:18 UTC

head link

[LLVMdev] RFC:LNT Improvements

On 30 April 2014 10:50, Tobias Grosser <tobias at grosser.es>
wrote:> Only then we can judge the effects of changes that are aimed to increase
the
> quality.
Agreed.

> My proposal is to do this right ahead. As there is enough data from the
> public X86 -O3 runs (10 samples each run, with 3-5 commits between each
> run), the only missing piece seems to be the LNT changes to report on
> the statistical significance. Yi Kong started to hack on this already
> and might be able to adjust his changes. Let's wait for his opinion.
Ok.

--renato

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Apr 2014 - [LLVMdev] RFC:LNT Improvements

[LLVMdev] RFC:LNT Improvements

[LLVMdev] RFC:LNT Improvements

[LLVMdev] RFC:LNT Improvements

Reasonably Related Threads