Sean Silva
2015-May-21 21:13 UTC
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
On Thu, May 21, 2015 at 11:24 AM, Chris Matthews <chris.matthews at apple.com> wrote:> I agree this is a great idea. I think it needs to be fleshed out a little > though. > > It would still be wise to run the regression detection algorithm, because > the test suite changes and the machines change, and the algorithm is not > perfect yet. It would be a valuable source of information though. >How would running it as part of regular testing change anything? Presumably the only purpose it would serve is retrospectively going back and seeing false-positives in the aggregate. But if we are already doing offline analysis, we can run the regression detection algorithm (or any prospective ones) offline on the raw data; it doesn't take that long.> > This is not a small change to how LNT works, so I think some due diligence > is necessary. Is clang *really* that deterministic, especially over > successive revs?Yes. Actually, google's build system depends on this for its caching strategy to work and so the google guys are usually on top of any issues in this respect (thanks google guys!).> I know it is supposed to be. Does anyone have any data to show this is > going to be an effective approach? It seems like there are benchmarks in > the test-suite which use __DATE__ and __TIME__ in them. I assume that will > be a problem? >__DATE__ and __TIME__ should be easy to solve by modifying the benchmark, or teaching clang to always return a fixed value for them (maybe we already have this? IIRC google's build system does something like this; or maybe the do it at the OS level). -- Sean Silva> > > On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at linaro.org> > wrote: > > > > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com> wrote: > >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually > >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. > So if > >> just store a hash of the binary in the database, we should be able to > pool > >> all samples we have collected while the binary is the the same as it > >> currently is, which will let us use significantly more datapoints for > the > >> reference. > > > > +1 > > > > > >> Also, we can trivially eliminate running the regression detection > algorithm > >> if the binary hasn't changed. > > > > +2! > > > > --renato > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150521/39198687/attachment.html>
Smith, Kevin B
2015-May-26 16:53 UTC
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
Intel has a binary comparator tool that we have been using for several years for comparing output binaries to see if the code within them is considered identical. We use it to eliminate runs (and therefore some performance noise) from our own performance tracking tools. We are willing to contribute the source code for this to the LLVM community if there is interest. There are two programs involved: getdep, which displays the list of DLL/.so dependencies of the image in question, and cmpimage itself, which does the comparison ignoring the parts not contributed by the compiler. The cmpimage program is also almost completely derived from the published object format descriptions. Let me know if there is interest in these pieces of tooling, and if so, what you think next steps should be. Kevin B. Smith From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Sean Silva Sent: Thursday, May 21, 2015 2:14 PM To: Chris Matthews Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives On Thu, May 21, 2015 at 11:24 AM, Chris Matthews <chris.matthews at apple.com<mailto:chris.matthews at apple.com>> wrote: I agree this is a great idea. I think it needs to be fleshed out a little though. It would still be wise to run the regression detection algorithm, because the test suite changes and the machines change, and the algorithm is not perfect yet. It would be a valuable source of information though. How would running it as part of regular testing change anything? Presumably the only purpose it would serve is retrospectively going back and seeing false-positives in the aggregate. But if we are already doing offline analysis, we can run the regression detection algorithm (or any prospective ones) offline on the raw data; it doesn't take that long. This is not a small change to how LNT works, so I think some due diligence is necessary. Is clang *really* that deterministic, especially over successive revs? Yes. Actually, google's build system depends on this for its caching strategy to work and so the google guys are usually on top of any issues in this respect (thanks google guys!). I know it is supposed to be. Does anyone have any data to show this is going to be an effective approach? It seems like there are benchmarks in the test-suite which use __DATE__ and __TIME__ in them. I assume that will be a problem? __DATE__ and __TIME__ should be easy to solve by modifying the benchmark, or teaching clang to always return a fixed value for them (maybe we already have this? IIRC google's build system does something like this; or maybe the do it at the OS level). -- Sean Silva> On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote: > > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com<mailto:chisophugis at gmail.com>> wrote: >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So if >> just store a hash of the binary in the database, we should be able to pool >> all samples we have collected while the binary is the the same as it >> currently is, which will let us use significantly more datapoints for the >> reference. > > +1 > > >> Also, we can trivially eliminate running the regression detection algorithm >> if the binary hasn't changed. > > +2! > > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150526/411a58ea/attachment.html>
Chris Matthews
2015-May-26 18:46 UTC
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
I’d love to experiment with this approach, and any tool I don’t have to write my self is a bonus! OSX support too?> On May 26, 2015, at 9:53 AM, Smith, Kevin B <kevin.b.smith at intel.com> wrote: > > Intel has a binary comparator tool that we have been using for several years for comparing output binaries > to see if the code within them is considered identical. We use it to eliminate runs (and therefore some performance noise) > from our own performance tracking tools. > > We are willing to contribute the source code for this to the LLVM community if there is interest. > > There are two programs involved: getdep, which displays the list of DLL/.so dependencies of the image in question, and cmpimage itself, which does the comparison ignoring the parts not contributed by the compiler. The cmpimage program is also almost completely derived from the published object format descriptions. > > Let me know if there is interest in these pieces of tooling, and if so, what you think next steps should be. > > Kevin B. Smith > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Sean Silva > Sent: Thursday, May 21, 2015 2:14 PM > To: Chris Matthews > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives > > > > On Thu, May 21, 2015 at 11:24 AM, Chris Matthews <chris.matthews at apple.com <mailto:chris.matthews at apple.com>> wrote: > I agree this is a great idea. I think it needs to be fleshed out a little though. > > It would still be wise to run the regression detection algorithm, because the test suite changes and the machines change, and the algorithm is not perfect yet. It would be a valuable source of information though. > > How would running it as part of regular testing change anything? Presumably the only purpose it would serve is retrospectively going back and seeing false-positives in the aggregate. But if we are already doing offline analysis, we can run the regression detection algorithm (or any prospective ones) offline on the raw data; it doesn't take that long. > > > This is not a small change to how LNT works, so I think some due diligence is necessary. Is clang *really* that deterministic, especially over successive revs? > > Yes. Actually, google's build system depends on this for its caching strategy to work and so the google guys are usually on top of any issues in this respect (thanks google guys!). > > > I know it is supposed to be. Does anyone have any data to show this is going to be an effective approach? It seems like there are benchmarks in the test-suite which use __DATE__ and __TIME__ in them. I assume that will be a problem? > > __DATE__ and __TIME__ should be easy to solve by modifying the benchmark, or teaching clang to always return a fixed value for them (maybe we already have this? IIRC google's build system does something like this; or maybe the do it at the OS level). > > -- Sean Silva > > > > On May 21, 2015, at 1:43 AM, Renato Golin <renato.golin at linaro.org <mailto:renato.golin at linaro.org>> wrote: > > > > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote: > >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually > >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So if > >> just store a hash of the binary in the database, we should be able to pool > >> all samples we have collected while the binary is the the same as it > >> currently is, which will let us use significantly more datapoints for the > >> reference. > > > > +1 > > > > > >> Also, we can trivially eliminate running the regression detection algorithm > >> if the binary hasn't changed. > > > > +2! > > > > --renato > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150526/65f72705/attachment.html>
Philip Reames
2015-May-28 17:14 UTC
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
I'd love to see this tool contributed, even it isn't used for regression detection work. I've got a couple of hacked up scripts which do similar things and having a robust tool available for this would be very useful. Philip On 05/26/2015 09:53 AM, Smith, Kevin B wrote:> > Intel has a binary comparator tool that we have been using for several > years for comparing output binaries > > to see if the code within them is considered identical. We use it to > eliminate runs (and therefore some performance noise) > > from our own performance tracking tools. > > We are willing to contribute the source code for this to the LLVM > community if there is interest. > > There are two programs involved: getdep, which displays the list of > DLL/.so dependencies of the image in question, and cmpimage itself, > which does the comparison ignoring the parts not contributed by the > compiler. The cmpimage program is also almost completely derived from > the published object format descriptions. > > Let me know if there is interest in these pieces of tooling, and if > so, what you think next steps should be. > > Kevin B. Smith > > *From:*llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] *On Behalf Of *Sean Silva > *Sent:* Thursday, May 21, 2015 2:14 PM > *To:* Chris Matthews > *Cc:* LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] Proposal: change LNT’s regression detection > algorithm and how it is used to reduce false positives > > On Thu, May 21, 2015 at 11:24 AM, Chris Matthews > <chris.matthews at apple.com <mailto:chris.matthews at apple.com>> wrote: > > I agree this is a great idea. I think it needs to be fleshed out a > little though. > > It would still be wise to run the regression detection algorithm, > because the test suite changes and the machines change, and the > algorithm is not perfect yet. It would be a valuable source of > information though. > > How would running it as part of regular testing change anything? > Presumably the only purpose it would serve is retrospectively going > back and seeing false-positives in the aggregate. But if we are > already doing offline analysis, we can run the regression detection > algorithm (or any prospective ones) offline on the raw data; it > doesn't take that long. > > > This is not a small change to how LNT works, so I think some due > diligence is necessary. Is clang *really* that deterministic, > especially over successive revs? > > Yes. Actually, google's build system depends on this for its caching > strategy to work and so the google guys are usually on top of any > issues in this respect (thanks google guys!). > > I know it is supposed to be. Does anyone have any data to show > this is going to be an effective approach? It seems like there > are benchmarks in the test-suite which use __DATE__ and __TIME__ > in them. I assume that will be a problem? > > __DATE__ and __TIME__ should be easy to solve by modifying the > benchmark, or teaching clang to always return a fixed value for them > (maybe we already have this? IIRC google's build system does something > like this; or maybe the do it at the OS level). > > -- Sean Silva > > > > On May 21, 2015, at 1:43 AM, Renato Golin > <renato.golin at linaro.org <mailto:renato.golin at linaro.org>> wrote: > > > > On 20 May 2015 at 23:31, Sean Silva <chisophugis at gmail.com > <mailto:chisophugis at gmail.com>> wrote: > >> In the last 10,000 revisions of LLVM+Clang, only 10 revisions > actually > >> caused the binary of MultiSource/Benchmarks/BitBench/five11 to > change. So if > >> just store a hash of the binary in the database, we should be > able to pool > >> all samples we have collected while the binary is the the same > as it > >> currently is, which will let us use significantly more > datapoints for the > >> reference. > > > > +1 > > > > > >> Also, we can trivially eliminate running the regression > detection algorithm > >> if the binary hasn't changed. > > > > +2! > > > > --renato > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/8e58d54b/attachment.html>