thr3ads.net - llvm dev - [LLVMdev] [LNT] Question about results reliability in LNT infrustructure [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jul-01 16:41 UTC

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com> wrote:
> One thing that LNT is doing to help “smooth” the results for you is by
> presenting the min of the data at a particular revision, which (hopefully)
> is approximating the actual runtime without noise.
>
That's an interesting idea, as you said, if you run multiple times on every
revision.

On ARM, every run takes *at least* 1h, other architectures might be a lot
worse. It'd be very important on those architectures if you could extract
point information from group data, and min doesn't fit in that model. You
could take min from a group of runs, but again, that's no different than
moving averages. Though, "moving mins" might make more sense than
"moving
averages" for the reasons you exposed.

Also, on tests that take as long as noise to run (0.010s or less on A15),
the minimum is not relevant, since runtime will flatten everything under
0.010 onto 0.010, making your test always report 0.010, even when there are
regressions.

I really cannot see how you can statistically enhance data in a scenario
where the measuring rod is larger than the signal. We need to change the
wannabe-benchmarks to behave like proper benchmarks, and move everything
else into "Applications" for correctness and specifically NOT time
them.
Less is more.

That works well with a lot of samples per revision, but not for
across> revisions, where we really need the smoothing.   One way to explore this is
> to turn
>
I was really looking forward to that hear the end of that sentence... ;)

We also lack any way to coordinate or annotate regressions, that is a
whole> separate problem though.
>
Yup. I'm having visions of tag clouds, bugzilla integration, cross
architectural regression detection, etc. But I'll ignore that for now,
let's solve one big problem at a time. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/1937dd25/attachment.html>

Tobias Grosser

2013-Jul-02 05:11 UTC

head link

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 07/01/2013 09:41 AM, Renato Golin wrote:> On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com>
wrote:
>
>> One thing that LNT is doing to help “smooth” the results for you is by
>> presenting the min of the data at a particular revision, which
(hopefully)
>> is approximating the actual runtime without noise.
>>
>
> That's an interesting idea, as you said, if you run multiple times on
every
> revision.
>
> On ARM, every run takes *at least* 1h, other architectures might be a lot
> worse. It'd be very important on those architectures if you could
extract
> point information from group data, and min doesn't fit in that model.
You
> could take min from a group of runs, but again, that's no different
than
> moving averages. Though, "moving mins" might make more sense than
"moving
> averages" for the reasons you exposed.
I get your point. On the other side it may be worth getting first 
statistically reliable and noise free numbers with a lower resolution in 
terms of commits. Given those reliable numbers, we can then work on 
improving the resolution (without introducing noice). Also, multiple 
runs per revision should be easy to parallelize on different machines, 
such that confidence in the results seems to be a problem that can be 
solved by additional hardware.
> Also, on tests that take as long as noise to run (0.010s or less on A15),
> the minimum is not relevant, since runtime will flatten everything under
> 0.010 onto 0.010, making your test always report 0.010, even when there are
> regressions.
>
> I really cannot see how you can statistically enhance data in a scenario
> where the measuring rod is larger than the signal. We need to change the
> wannabe-benchmarks to behave like proper benchmarks, and move everything
> else into "Applications" for correctness and specifically NOT
time them.
> Less is more.
It is out of question that we can not improve the existing data, but it 
would be great to at least reliably detect that some data is just plain 
noise.
> That works well with a lot of samples per revision, but not for across
>> revisions, where we really need the smoothing.   One way to explore
this is
>> to turn
>>
>
> I was really looking forward to that hear the end of that sentence... ;)
>
>
>
> We also lack any way to coordinate or annotate regressions, that is a whole
>> separate problem though.
>>
>
> Yup. I'm having visions of tag clouds, bugzilla integration, cross
> architectural regression detection, etc. But I'll ignore that for now,
> let's solve one big problem at a time. ;)
Yes, there is a lot of stuff that would really help.

Tobi

Sergei Larin

2013-Jul-03 17:29 UTC

head link

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Tobias,

  I seem to trigger an assert in Polly lib/Analysis/TempScopInfo.cpp

void TempScopInfo::buildAffineCondition(Value &V, bool inverted,
                                        Comparison **Comp) const {
...
  ICmpInst *ICmp = dyn_cast<ICmpInst>(&V);
  assert(ICmp && "Only ICmpInst of constant as condition
supported!");
...

  The code it chokes on looks like this (see below). The problem is this
OR-ed compare result:

  %cmp3 = icmp sgt i32 %j.0, 2
  %cmp5 = icmp eq i32 %j.0, 1
  %or.cond13 = or i1 %cmp3, %cmp5  
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
=Value V


My question - is this a bug or a (missing) feature?  ...and how it should be
handled in theory?

Thanks.

Sergei

define i32 @main() #0 {
entry:
  %j.0.lcssa.reg2mem = alloca i32, align 8
  br label %entry.split

entry.split:                                      ; preds = %entry
  %call = tail call i32 @foo(i32 0, i32 0) #2
  %call1 = tail call i32 @foo(i32 %call, i32 %call) #2
  br label %for.cond2

for.cond2:                                        ; preds = %for.inc,
%entry.split
  %j.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.inc ]
  %cmp3 = icmp sgt i32 %j.0, 2
  %cmp5 = icmp eq i32 %j.0, 1
  %or.cond13 = or i1 %cmp3, %cmp5
  store i32 %j.0, i32* %j.0.lcssa.reg2mem, align 8
  br i1 %or.cond13, label %for.end8, label %for.inc,
!llvm.listen.preserve.while.opt !0

for.inc:                                          ; preds = %for.cond2
  %inc = add nsw i32 %j.0, 1
  br label %for.cond2

for.end8:                                         ; preds = %for.cond2
  %j.0.lcssa.reload = load i32* %j.0.lcssa.reg2mem, align 8
  %cmp10 = icmp eq i32 %j.0.lcssa.reload, 1
  %add = add nsw i32 %j.0.lcssa.reload, 1
  %retval.0 = select i1 %cmp10, i32 1, i32 %add
  ret i32 %retval.0
}


---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jul 2013 - [LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Apparently Analagous Threads