thr3ads.net - llvm dev - [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives [May 2015]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2015-May-21 08:43 UTC

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com>
wrote:> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So
if
> just store a hash of the binary in the database, we should be able to pool
> all samples we have collected while the binary is the the same as it
> currently is, which will let us use significantly more datapoints for the
> reference.
+1

> Also, we can trivially eliminate running the regression detection algorithm
> if the binary hasn't changed.
+2!

--renato

Chris Matthews

2015-May-21 18:24 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

I agree this is a great idea.  I think it needs to be fleshed out a little
though.

It would still be wise to run the regression detection algorithm, because the
test suite changes and the machines change, and the algorithm is not perfect
yet.  It would be a valuable source of information though.

This is not a small change to how LNT works, so I think some due diligence is
necessary.  Is clang *really* that deterministic, especially over successive
revs?  I know it is supposed to be.  Does anyone have any data to show this is
going to be an effective approach?  It seems like there are benchmarks in the
test-suite which use __DATE__ and __TIME__ in them. I assume that will be a
problem?
> On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at
linaro.org> wrote:
> 
> On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com> wrote:
>> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
>> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change.
So if
>> just store a hash of the binary in the database, we should be able to
pool
>> all samples we have collected while the binary is the the same as it
>> currently is, which will let us use significantly more datapoints for
the
>> reference.
> 
> +1
> 
> 
>> Also, we can trivially eliminate running the regression detection
algorithm
>> if the binary hasn't changed.
> 
> +2!
> 
> --renato
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2015-May-21 18:30 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

We also need to consider multiple binaries, like shared libraries that are
compiled, for instance, libc++.

Maybe a simple binary diff + ldd + binary diff on all deps would work...

Cheers,
Renato
On 21 May 2015 7:24 pm, "Chris Matthews" <chris.matthews at
apple.com> wrote:
> I agree this is a great idea.  I think it needs to be fleshed out a little
> though.
>
> It would still be wise to run the regression detection algorithm, because
> the test suite changes and the machines change, and the algorithm is not
> perfect yet.  It would be a valuable source of information though.
>
> This is not a small change to how LNT works, so I think some due diligence
> is necessary.  Is clang *really* that deterministic, especially over
> successive revs?  I know it is supposed to be.  Does anyone have any data
> to show this is going to be an effective approach?  It seems like there are
> benchmarks in the test-suite which use __DATE__ and __TIME__ in them. I
> assume that will be a problem?
>
> > On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at
linaro.org>
> wrote:
> >
> > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com>
wrote:
> >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions
actually
> >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to
change.
> So if
> >> just store a hash of the binary in the database, we should be able
to
> pool
> >> all samples we have collected while the binary is the the same as
it
> >> currently is, which will let us use significantly more datapoints
for
> the
> >> reference.
> >
> > +1
> >
> >
> >> Also, we can trivially eliminate running the regression detection
> algorithm
> >> if the binary hasn't changed.
> >
> > +2!
> >
> > --renato
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150521/1603eb9b/attachment.html>

Sean Silva

2015-May-21 21:13 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

On Thu, May 21, 2015 at 11:24 AM, Chris Matthews <chris.matthews at
apple.com>
wrote:
> I agree this is a great idea.  I think it needs to be fleshed out a little
> though.
>
> It would still be wise to run the regression detection algorithm, because
> the test suite changes and the machines change, and the algorithm is not
> perfect yet.  It would be a valuable source of information though.
>
How would running it as part of regular testing change anything? Presumably
the only purpose it would serve is retrospectively going back and seeing
false-positives in the aggregate. But if we are already doing offline
analysis, we can run the regression detection algorithm (or any prospective
ones) offline on the raw data; it doesn't take that long.

>
> This is not a small change to how LNT works, so I think some due diligence
> is necessary.  Is clang *really* that deterministic, especially over
> successive revs?

Yes. Actually, google's build system depends on this for its caching
strategy to work and so the google guys are usually on top of any issues in
this respect (thanks google guys!).


> I know it is supposed to be.  Does anyone have any data to show this is
> going to be an effective approach?  It seems like there are benchmarks in
> the test-suite which use __DATE__ and __TIME__ in them. I assume that will
> be a problem?
>
__DATE__ and __TIME__ should be easy to solve by modifying the benchmark,
or teaching clang to always return a fixed value for them (maybe we already
have this? IIRC google's build system does something like this; or maybe
the do it at the OS level).

-- Sean Silva

>
> > On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at
linaro.org>
> wrote:
> >
> > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com>
wrote:
> >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions
actually
> >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to
change.
> So if
> >> just store a hash of the binary in the database, we should be able
to
> pool
> >> all samples we have collected while the binary is the the same as
it
> >> currently is, which will let us use significantly more datapoints
for
> the
> >> reference.
> >
> > +1
> >
> >
> >> Also, we can trivially eliminate running the regression detection
> algorithm
> >> if the binary hasn't changed.
> >
> > +2!
> >
> > --renato
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150521/39198687/attachment.html>

Kristof Beyls via llvm-dev

2015-Oct-02 07:44 UTC

head link

[llvm-dev] [LLVMdev] Proposal: change LNT's regression detection algorithm and how it is used to reduce false positives

FWIW - the patch to record the hash of binaries from the test-suite
into the LNT database has finally landed yesterday, see r249026, r249034,
r249035.

So far, LNT only records the hash data into its database, but doesn't use
it in any analysis or chart yet.
If you upgrade your instance of LNT now, hashes will start being recorded.
Future uses of these hashes in LNT analyses will be able to make use of
historical hashes from the point in time you've started using the now
top-of-trunk LNT.

One idea on how to use the data, next to the automatic noise analysis
algorithm, is to color the background of charts based on the hash value,
so that it's immediately visible for which time periods the binary remained
the same. At least for the sparklines on the daily report page, this
shouldn't be too hard to do.

We ought to also upgrade the instance of LNT running at llvm.org/perf,
but I'm still a bit confused over who knows how to do that? Tanya or
Daniel, could you do that?

Thanks,

Kristof
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Chris Matthews
> Sent: 21 May 2015 19:25
> To: Renato Golin
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Proposal: change LNT's regression detection
> algorithm and how it is used to reduce false positives
> 
> I agree this is a great idea.  I think it needs to be fleshed out a
> little though.
> 
> It would still be wise to run the regression detection algorithm,
> because the test suite changes and the machines change, and the
> algorithm is not perfect yet.  It would be a valuable source of
> information though.
> 
> This is not a small change to how LNT works, so I think some due
> diligence is necessary.  Is clang *really* that deterministic,
> especially over successive revs?  I know it is supposed to be.  Does
> anyone have any data to show this is going to be an effective approach?
> It seems like there are benchmarks in the test-suite which use __DATE__
> and __TIME__ in them. I assume that will be a problem?
> 
> > On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at
linaro.org>
> wrote:
> >
> > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com>
wrote:
> >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions
> >> actually caused the binary of
MultiSource/Benchmarks/BitBench/five11
> >> to change. So if just store a hash of the binary in the database,
we
> >> should be able to pool all samples we have collected while the
binary
> >> is the the same as it currently is, which will let us use
> >> significantly more datapoints for the reference.
> >
> > +1
> >
> >
> >> Also, we can trivially eliminate running the regression detection
> >> algorithm if the binary hasn't changed.
> >
> > +2!
> >
> > --renato
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - May 2015 - [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[llvm-dev] [LLVMdev] Proposal: change LNT's regression detection algorithm and how it is used to reduce false positives