Yi Kong
2014-May-20 14:01 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
I've set up a public LNT server to show the result of perf stat. There is a huge improvement compared with timeit tool. http://parkas16.inria.fr:8000/ Patch is updated to pin the process to a single core, the readings are even more accurate. It's hard coded to run everything on core 0, so don't run parallel testing with it for now. The tool now depends on Linux perf and schedtool. I'm running tests on ARM Cortex boards to verify the improvements. Please also check if this works on your system. Thanks, Yi On 16 May 2014 20:51, Chandler Carruth <chandlerc at google.com> wrote:> > On Fri, May 16, 2014 at 12:45 PM, Yi Kong <kongy.dev at gmail.com> wrote: >> >> On 16 May 2014 18:40, "Chandler Carruth" <chandlerc at google.com> wrote: >> > >> > Why not use the cycle count which perf exposes from hardware? That would >> > seem even better to me, but data would be better. =] >> >> That's an interesting idea. However I'm concerned if that will miss some >> aspects of compiler optimization. For example frequent cache misses would >> have much smaller impact on the result if the processor goes to lower >> frequency during the stall period. Nonetheless it's definitely worth to try >> out. > > Sure, but we should disable frequency throttling on any machine from which > we want numbers that look *remotely* stable. > > The other thing you might try doing while you're wrapping these tools is to > use schedtool to pin the process to a single core. On most modern x86 > machines you can see 2-3% swing in lots of small details, and when the > process migrates between cores this makes the numbers very hard to analyze.
Tobias Grosser
2014-May-20 15:40 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20/05/2014 16:01, Yi Kong wrote:> I've set up a public LNT server to show the result of perf stat. There > is a huge improvement compared with timeit tool. > http://parkas16.inria.fr:8000/Hi Yi Kong, thanks for testing these changes.> Patch is updated to pin the process to a single core, the readings are > even more accurate. It's hard coded to run everything on core 0, so > don't run parallel testing with it for now. The tool now depends on > Linux perf and schedtool.I think this sounds like a very good direction. How did you evaluate the improvements exactly? The following run shows e.g two execution time changes: http://parkas16.inria.fr:8000/db_default/v4/nts/9 Are they expected? If I change e.g. the aggregation function to median they disappear. Similarly the graph for one of them does not suggest an actual performance change: http://parkas16.inria.fr:8000/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.0=1.428.3&submit=Update Cheers, Tobias
Yi Kong
2014-May-20 16:20 UTC
[LLVMdev] Use perf tool for more accurate time measuring on Linux
On 20 May 2014 16:40, Tobias Grosser <tobias at grosser.es> wrote:> On 20/05/2014 16:01, Yi Kong wrote: >> >> I've set up a public LNT server to show the result of perf stat. There >> is a huge improvement compared with timeit tool. >> http://parkas16.inria.fr:8000/ > > > Hi Yi Kong, > > thanks for testing these changes. > > >> Patch is updated to pin the process to a single core, the readings are >> even more accurate. It's hard coded to run everything on core 0, so >> don't run parallel testing with it for now. The tool now depends on >> Linux perf and schedtool. > > > I think this sounds like a very good direction. > > How did you evaluate the improvements exactly? The following run shows e.g > two execution time changes:I sent a screenshot of original results in the previous mail. We used to have lots of noise readings, both from small machine background noise and large noise from the timing tool. Now noise from timing tool is eliminated and only few machine background noise is left. This makes manual investigation possible.> > http://parkas16.inria.fr:8000/db_default/v4/nts/9 > > Are they expected? If I change e.g. the aggregation function to median > they disappear. Similarly the graph for one of them does not suggest an > actual performance change:Yes, some false positives due to machine noise is expected. Median is more tolerant to machine noise, therefore they disappear. As suggested by Chandler, we should also lock the CPU frequency to further reduce machine noise.> > http://parkas16.inria.fr:8000/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.0=1.428.3&submit=Update > > Cheers, > Tobias