thr3ads.net - llvm dev - [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives [May 2015]

If this information is useful, please help other people find it:
Share via:

Chris Matthews

2015-May-19 04:02 UTC

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

The reruns flag already does that.  It helps a bit, but only as long as the the
benchmark is flagged as regressed.

> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> 
> 
> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <mzolotukhin at
apple.com <mailto:mzolotukhin at apple.com>> wrote:
> Hi Chris and others!
> 
> I totally support any work in this direction.
> 
> In the current state LNT’s regression detection system is too noisy, which
makes it almost impossible to use in some cases. If after each run a developer
gets a dozen of ‘regressions’, none of which happens to be real, he/she won’t
care about such reports after a while. We clearly need to filter out as much
noise as we can - and as it turns out even simplest techniques could help here.
For example, the technique I used (which you mentioned earlier) takes ~15 lines
of code to implement and filters out almost all noise in our internal data-sets.
It’d be really cool to have something more scientifically-proven though:)
> 
> One thing to add from me - I think we should try to do our best in
assumption that we don’t have enough samples. Of course, the more data we have -
the better, but in many cases we can’t (or we don’t want) to increase number os
samples, since it dramatically increases testing time.
> 
> Why not just start out with only a few samples, then collect more for
benchmarks that appear to have changed?
> 
> -- Sean Silva
>  
> That’s not to discourage anyone from increasing number of samples, or
adding techniques relying on a significant number of samples, but rather to try
mining as many ‘samples’ as possible from the data we have - e.g. I absolutely
agree with your idea to pass more than 1 previous run.
> 
> Thanks,
> Michael
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/bbcc16bf/attachment.html>

Chris Matthews

2015-May-19 04:08 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

In r237661 I committed an initial set of tests. All the failing tests are
commented out right now (python 2.7 does not have xfail).  I extracted the data
sets to the top of the file so they are easy to paste into ipython etc.

> On May 18, 2015, at 9:02 PM, Chris Matthews <chris.matthews at
apple.com> wrote:
> 
> The reruns flag already does that.  It helps a bit, but only as long as the
the benchmark is flagged as regressed.
> 
> 
>> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <mzolotukhin at
apple.com <mailto:mzolotukhin at apple.com>> wrote:
>> Hi Chris and others!
>> 
>> I totally support any work in this direction.
>> 
>> In the current state LNT’s regression detection system is too noisy,
which makes it almost impossible to use in some cases. If after each run a
developer gets a dozen of ‘regressions’, none of which happens to be real,
he/she won’t care about such reports after a while. We clearly need to filter
out as much noise as we can - and as it turns out even simplest techniques could
help here. For example, the technique I used (which you mentioned earlier) takes
~15 lines of code to implement and filters out almost all noise in our internal
data-sets. It’d be really cool to have something more scientifically-proven
though:)
>> 
>> One thing to add from me - I think we should try to do our best in
assumption that we don’t have enough samples. Of course, the more data we have -
the better, but in many cases we can’t (or we don’t want) to increase number os
samples, since it dramatically increases testing time.
>> 
>> Why not just start out with only a few samples, then collect more for
benchmarks that appear to have changed?
>> 
>> -- Sean Silva
>>  
>> That’s not to discourage anyone from increasing number of samples, or
adding techniques relying on a significant number of samples, but rather to try
mining as many ‘samples’ as possible from the data we have - e.g. I absolutely
agree with your idea to pass more than 1 previous run.
>> 
>> Thanks,
>> Michael
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/f166c385/attachment.html>

Sean Silva

2015-May-20 22:31 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

I found an interesting datapoint:

In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So
if just store a hash of the binary in the database, we should be able to
pool all samples we have collected while the binary is the the same as it
currently is, which will let us use significantly more datapoints for the
reference.

Also, we can trivially eliminate running the regression detection algorithm
if the binary hasn't changed.

-- Sean Silva

On Mon, May 18, 2015 at 9:02 PM, Chris Matthews <chris.matthews at
apple.com>
wrote:
> The reruns flag already does that.  It helps a bit, but only as long as
> the the benchmark is flagged as regressed.
>
>
> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
>
> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <
> mzolotukhin at apple.com> wrote:
>
>> Hi Chris and others!
>>
>> I totally support any work in this direction.
>>
>> In the current state LNT’s regression detection system is too noisy,
>> which makes it almost impossible to use in some cases. If after each
run a
>> developer gets a dozen of ‘regressions’, none of which happens to be
real,
>> he/she won’t care about such reports after a while. We clearly need to
>> filter out as much noise as we can - and as it turns out even simplest
>> techniques could help here. For example, the technique I used (which
you
>> mentioned earlier) takes ~15 lines of code to implement and filters out
>> almost all noise in our internal data-sets. It’d be really cool to have
>> something more scientifically-proven though:)
>>
>> One thing to add from me - I think we should try to do our best in
>> assumption that we don’t have enough samples. Of course, the more data
we
>> have - the better, but in many cases we can’t (or we don’t want) to
>> increase number os samples, since it dramatically increases testing
time.
>>
>
> Why not just start out with only a few samples, then collect more for
> benchmarks that appear to have changed?
>
> -- Sean Silva
>
>
>> That’s not to discourage anyone from increasing number of samples, or
>> adding techniques relying on a significant number of samples, but
rather to
>> try mining as many ‘samples’ as possible from the data we have - e.g. I
>> absolutely agree with your idea to pass more than 1 previous run.
>>
>> Thanks,
>> Michael
>>
>> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/b57c5e84/attachment.html>

Smith, Kevin B

2015-May-20 22:53 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Ø  Also, we can trivially eliminate running the regression detection algorithm
if the binary hasn't changed.

Strongly agree.  Internal performance regression testing for Intel compiler uses
this method to help eliminate noise. It is
a great first line method to greatly cut down on developer time wasted chasing
phantoms.

Kevin Smith

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Sean Silva
Sent: Wednesday, May 20, 2015 3:31 PM
To: Chris Matthews
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposal: change LNT’s regression detection algorithm and
how it is used to reduce false positives

I found an interesting datapoint:

In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually caused
the binary of MultiSource/Benchmarks/BitBench/five11 to change. So if just store
a hash of the binary in the database, we should be able to pool all samples we
have collected while the binary is the the same as it currently is, which will
let us use significantly more datapoints for the reference.

Also, we can trivially eliminate running the regression detection algorithm if
the binary hasn't changed.

-- Sean Silva

On Mon, May 18, 2015 at 9:02 PM, Chris Matthews <chris.matthews at
apple.com<mailto:chris.matthews at apple.com>> wrote:
The reruns flag already does that.  It helps a bit, but only as long as the the
benchmark is flagged as regressed.


On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at
gmail.com<mailto:chisophugis at gmail.com>> wrote:



On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <mzolotukhin at
apple.com<mailto:mzolotukhin at apple.com>> wrote:
Hi Chris and others!

I totally support any work in this direction.

In the current state LNT’s regression detection system is too noisy, which makes
it almost impossible to use in some cases. If after each run a developer gets a
dozen of ‘regressions’, none of which happens to be real, he/she won’t care
about such reports after a while. We clearly need to filter out as much noise as
we can - and as it turns out even simplest techniques could help here. For
example, the technique I used (which you mentioned earlier) takes ~15 lines of
code to implement and filters out almost all noise in our internal data-sets.
It’d be really cool to have something more scientifically-proven though:)

One thing to add from me - I think we should try to do our best in assumption
that we don’t have enough samples. Of course, the more data we have - the
better, but in many cases we can’t (or we don’t want) to increase number os
samples, since it dramatically increases testing time.

Why not just start out with only a few samples, then collect more for benchmarks
that appear to have changed?

-- Sean Silva

That’s not to discourage anyone from increasing number of samples, or adding
techniques relying on a significant number of samples, but rather to try mining
as many ‘samples’ as possible from the data we have - e.g. I absolutely agree
with your idea to pass more than 1 previous run.

Thanks,
Michael

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/5cdef6c9/attachment.html>

Renato Golin

2015-May-21 08:43 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com>
wrote:> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So
if
> just store a hash of the binary in the database, we should be able to pool
> all samples we have collected while the binary is the the same as it
> currently is, which will let us use significantly more datapoints for the
> reference.
+1

> Also, we can trivially eliminate running the regression detection algorithm
> if the binary hasn't changed.
+2!

--renato

Sean Silva

2015-May-27 02:05 UTC

head link

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Update: in that same block of 10,000 LLVM/Clang revisions, this the number
of distinct SHA1 hashes for the binaries of the following benchmarks:

7 MultiSource/Applications/aha/aha
2 MultiSource/Benchmarks/BitBench/drop3/drop3
10 MultiSource/Benchmarks/BitBench/five11/five11
7 MultiSource/Benchmarks/BitBench/uudecode/uudecode
3 MultiSource/Benchmarks/BitBench/uuencode/uuencode
5 MultiSource/Benchmarks/Trimaran/enc-rc4/rc4
11 SingleSource/Benchmarks/BenchmarkGame/n-body
2 SingleSource/Benchmarks/Shootout/ackermann

Let me know if there are any specific benchmarks you would like me to test.

-- Sean Silva


On Wed, May 20, 2015 at 3:31 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> I found an interesting datapoint:
>
> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So
> if just store a hash of the binary in the database, we should be able to
> pool all samples we have collected while the binary is the the same as it
> currently is, which will let us use significantly more datapoints for the
> reference.
>
> Also, we can trivially eliminate running the regression detection
> algorithm if the binary hasn't changed.
>
> -- Sean Silva
>
> On Mon, May 18, 2015 at 9:02 PM, Chris Matthews <chris.matthews at
apple.com>
> wrote:
>
>> The reruns flag already does that.  It helps a bit, but only as long as
>> the the benchmark is flagged as regressed.
>>
>>
>> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>>
>>
>>
>> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <
>> mzolotukhin at apple.com> wrote:
>>
>>> Hi Chris and others!
>>>
>>> I totally support any work in this direction.
>>>
>>> In the current state LNT’s regression detection system is too
noisy,
>>> which makes it almost impossible to use in some cases. If after
each run a
>>> developer gets a dozen of ‘regressions’, none of which happens to
be real,
>>> he/she won’t care about such reports after a while. We clearly need
to
>>> filter out as much noise as we can - and as it turns out even
simplest
>>> techniques could help here. For example, the technique I used
(which you
>>> mentioned earlier) takes ~15 lines of code to implement and filters
out
>>> almost all noise in our internal data-sets. It’d be really cool to
have
>>> something more scientifically-proven though:)
>>>
>>> One thing to add from me - I think we should try to do our best in
>>> assumption that we don’t have enough samples. Of course, the more
data we
>>> have - the better, but in many cases we can’t (or we don’t want) to
>>> increase number os samples, since it dramatically increases testing
time.
>>>
>>
>> Why not just start out with only a few samples, then collect more for
>> benchmarks that appear to have changed?
>>
>> -- Sean Silva
>>
>>
>>> That’s not to discourage anyone from increasing number of samples,
or
>>> adding techniques relying on a significant number of samples, but
rather to
>>> try mining as many ‘samples’ as possible from the data we have -
e.g. I
>>> absolutely agree with your idea to pass more than 1 previous run.
>>>
>>> Thanks,
>>> Michael
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150526/7f20cf54/attachment.html>

llvm dev - May 2015 - [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives