Yi Kong
2014-May-20 16:20 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20 May 2014 16:40, Tobias Grosser <tobias at grosser.es> wrote:> On 20/05/2014 16:01, Yi Kong wrote: >> >> I've set up a public LNT server to show the result of perf stat. There >> is a huge improvement compared with timeit tool. >> http://parkas16.inria.fr:8000/ > > > Hi Yi Kong, > > thanks for testing these changes. > > >> Patch is updated to pin the process to a single core, the readings are >> even more accurate. It's hard coded to run everything on core 0, so >> don't run parallel testing with it for now. The tool now depends on >> Linux perf and schedtool. > > > I think this sounds like a very good direction. > > How did you evaluate the improvements exactly? The following run shows e.g > two execution time changes:I sent a screenshot of original results in the previous mail. We used to have lots of noise readings, both from small machine background noise and large noise from the timing tool. Now noise from timing tool is eliminated and only few machine background noise is left. This makes manual investigation possible.> > http://parkas16.inria.fr:8000/db_default/v4/nts/9 > > Are they expected? If I change e.g. the aggregation function to median > they disappear. Similarly the graph for one of them does not suggest an > actual performance change:Yes, some false positives due to machine noise is expected. Median is more tolerant to machine noise, therefore they disappear. As suggested by Chandler, we should also lock the CPU frequency to further reduce machine noise.> > http://parkas16.inria.fr:8000/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.0=1.428.3&submit=Update > > Cheers, > Tobias
Tobias Grosser
2014-May-20 16:55 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20/05/2014 18:20, Yi Kong wrote:> On 20 May 2014 16:40, Tobias Grosser <tobias at grosser.es> wrote: >> On 20/05/2014 16:01, Yi Kong wrote: >>> >>> I've set up a public LNT server to show the result of perf stat. There >>> is a huge improvement compared with timeit tool. >>> http://parkas16.inria.fr:8000/ >> >> >> Hi Yi Kong, >> >> thanks for testing these changes. >> >> >>> Patch is updated to pin the process to a single core, the readings are >>> even more accurate. It's hard coded to run everything on core 0, so >>> don't run parallel testing with it for now. The tool now depends on >>> Linux perf and schedtool. >> >> >> I think this sounds like a very good direction. >> >> How did you evaluate the improvements exactly? The following run shows e.g >> two execution time changes: > > I sent a screenshot of original results in the previous mail. We used > to have lots of noise readings, both from small machine background > noise and large noise from the timing tool. Now noise from timing tool > is eliminated and only few machine background noise is left. This > makes manual investigation possible.I think we need to get this down to zero even at the cost of missing regressions. We have many commits and runs per day, having one or two noisy results per run means people will still not look at performance changes.>> http://parkas16.inria.fr:8000/db_default/v4/nts/9 >> >> Are they expected? If I change e.g. the aggregation function to median >> they disappear. Similarly the graph for one of them does not suggest an >> actual performance change: > > Yes, some false positives due to machine noise is expected. Median is > more tolerant to machine noise, therefore they disappear.Right. What I find interesting is that this change filters several results that seem to not be filtered out by our statistical test. Is this right? In the optimal case, we should be able to set the confidence level we require high enough to filter out these results as well. Is this right? Is there currently anything that blocks us from increasing the confidence level further reducing the noise level at the cost of some missed regressions?> As suggested by Chandler, we should also lock the CPU frequency to > further reduce machine noise.I set it to 2667.000 Mhz on parkas16. You can try if this improves something. Cheers, Tobias
Yi Kong
2014-May-20 20:00 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20 May 2014 17:55, Tobias Grosser <tobias at grosser.es> wrote:> On 20/05/2014 18:20, Yi Kong wrote: >> >> On 20 May 2014 16:40, Tobias Grosser <tobias at grosser.es> wrote: >>> >>> On 20/05/2014 16:01, Yi Kong wrote: >>>> >>>> >>>> I've set up a public LNT server to show the result of perf stat. There >>>> is a huge improvement compared with timeit tool. >>>> http://parkas16.inria.fr:8000/ >>> >>> >>> >>> Hi Yi Kong, >>> >>> thanks for testing these changes. >>> >>> >>>> Patch is updated to pin the process to a single core, the readings are >>>> even more accurate. It's hard coded to run everything on core 0, so >>>> don't run parallel testing with it for now. The tool now depends on >>>> Linux perf and schedtool. >>> >>> >>> >>> I think this sounds like a very good direction. >>> >>> How did you evaluate the improvements exactly? The following run shows >>> e.g >>> two execution time changes: >> >> >> I sent a screenshot of original results in the previous mail. We used >> to have lots of noise readings, both from small machine background >> noise and large noise from the timing tool. Now noise from timing tool >> is eliminated and only few machine background noise is left. This >> makes manual investigation possible. > > > I think we need to get this down to zero even at the cost of missing > regressions. We have many commits and runs per day, having one or two noisy > results per run means people will still not look at performance changes. > > >>> http://parkas16.inria.fr:8000/db_default/v4/nts/9 >>> >>> Are they expected? If I change e.g. the aggregation function to median >>> they disappear. Similarly the graph for one of them does not suggest an >>> actual performance change: >> >> >> Yes, some false positives due to machine noise is expected. Median is >> more tolerant to machine noise, therefore they disappear. > > > Right. > > What I find interesting is that this change filters several results that > seem to not be filtered out by our statistical test. Is this right?Yes. MWU test is nonparametric, it examines the order rather than the actual value of the samples. However eliminating with median uses actual value(if medians of two samples are close enough, we treat them as equal).> In the optimal case, we should be able to set the confidence level we > require high enough to filter out these results as well. Is this right?Yes. The lowest confidence we can set is still quite high(90%). We can certainly add a lower confidence option, but I can't find any MWU table lower than that on the Internet. Also, we should modify value analysis(based how close the medians/minimums are) to vary according to the confidence level as well. However this analysis is parametric, we needs to know how data is actually distributed for every test. I don't think there is a non-parametric test which does the same thing.> Is there currently anything that blocks us from increasing the confidence > level further reducing the noise level at the cost of some missed > regressions? > > >> As suggested by Chandler, we should also lock the CPU frequency to >> further reduce machine noise. > > > I set it to 2667.000 Mhz on parkas16. You can try if this improves > something.Sure. I'm shutting down the server to run the tests.> > Cheers, > Tobias
Yi Kong
2014-May-21 20:13 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20 May 2014 17:55, Tobias Grosser <tobias at grosser.es> wrote:> I set it to 2667.000 Mhz on parkas16. You can try if this improves > something.I don't see any sign of improvement. Since we can't get raw data any better, I think we should get this merged and tested on build bots for a while and see how it goes. Cheers, Yi Kong