thr3ads.net - llvm dev - [LLVMdev] Use perf tool for more accurate time measuring on Linux [May 2014]

If this information is useful, please help other people find it:
Share via:

Yi Kong

2014-May-16 18:45 UTC

[LLVMdev] Use perf tool for more accurate time measuring on Linux

On 16 May 2014 18:40, "Chandler Carruth" <chandlerc at
google.com> wrote:>
> Why not use the cycle count which perf exposes from hardware? That wouldseem even better to me, but data would be better. =]

That's an interesting idea. However I'm concerned if that will miss some
aspects of compiler optimization. For example frequent cache misses would
have much smaller impact on the result if the processor goes to lower
frequency during the stall period. Nonetheless it's definitely worth to try
out.
>
>
> On Fri, May 16, 2014 at 11:17 AM, Yi Kong <kongy.dev at gmail.com>
wrote:
>>
>> On 16 May 2014 18:08, Hal Finkel <hfinkel at anl.gov> wrote:
>> > ----- Original Message -----
>> >> From: "Yi Kong" <kongy.dev at gmail.com>
>> >> To: "Hal Finkel" <hfinkel at anl.gov>,
"Renato Golin" <renato.golin at linaro.org>, "Tobias Grosser" <tobias at
grosser.es>>> >> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
>> >> Sent: Friday, May 16, 2014 11:37:28 AM
>> >> Subject: Use perf tool for more accurate time measuring on
Linux
>> >>
>> >> Hi all,
>> >>
>> >> The LLVM benchmarking system produces very noisy results even
on
>> >> quiet
>> >> machines. One of the sources of inaccuracy is the timing tool
we are
>> >> using. Because it is a user-space tool, the OS can context
switch it
>> >> and we will get an outlier result. Perf stat uses
SW_TASK_CLOCK
>> >> counter in kernel to measure time, therefore more accurate. It
also
>> >> does not get context switched.
>> >>
>> >> I've implemented a wrapper script over perf stat which
mimics the
>> >> behaviour of timeit tool in test suite, so that nothing else
needs to
>> >> be modified. The script is not yet feature complete as timeit,
but
>> >> enough to run nightly tests.
>> >>
>> >> I carried out experiments on several machines and saw
different level
>> >> of improvements. I am no longer seeing outlier results, and
MAD is
>> >> considerably lower. The run-by-run changes results over the
same
>> >> revision shrank from around 10 to only 2-3. The MAD reduced
from
>> >> around 0.01 to 0.003 on a quiet machine.
>> >
>> > That sounds great, thanks for working on this!
>> >
>> > First, we'd definitely need more documentation on what perf is
and howto get it. The testing guide (and lnt dependencies), at least, need to be
updated. FWIW, I don't have any machines on which this is already installed
(and so it certainly is not installed by default). On Ubuntu, it is claimed
that both linux-base and linux-tools-common provide a 'pref' utility,
and
on rpm systems it looks like there is a perf package.>> >
>> > +# FIXME: How to measure sys time?
>> > +echo sys 0.0000 >> $REPORT
>> >
>> > Is this just the difference between the real time and the task
time
(or would that be a reasonable approximation)?>>
>> Not on some occasions. But for most programs, that should a reasonable
>> approximation. Since test suite does not care about the sys time, we
>> can leave it for now.
>> $ time -p sleep 5
>> real 5.00
>> user 0.00
>> sys 0.00
>>
>> >
>> > Thanks again,
>> > Hal
>> >
>> >>
>> >> I've attached the patch and please experiment with it.
>> >>
>> >> Cheers,
>> >> Yi Kong
>> >>
>> >
>> > --
>> > Hal Finkel
>> > Assistant Computational Scientist
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140516/957f6bf0/attachment.html>

Chandler Carruth

2014-May-16 19:51 UTC

head link

[LLVMdev] Use perf tool for more accurate time measuring on Linux

On Fri, May 16, 2014 at 12:45 PM, Yi Kong <kongy.dev at gmail.com> wrote:
> On 16 May 2014 18:40, "Chandler Carruth" <chandlerc at
google.com> wrote:
> >
> > Why not use the cycle count which perf exposes from hardware? That
would
> seem even better to me, but data would be better. =]
>
> That's an interesting idea. However I'm concerned if that will miss
some
> aspects of compiler optimization. For example frequent cache misses would
> have much smaller impact on the result if the processor goes to lower
> frequency during the stall period. Nonetheless it's definitely worth to
try
> out.
>Sure, but we should disable frequency throttling on any machine from which
we want numbers that look *remotely* stable.

The other thing you might try doing while you're wrapping these tools is to
use schedtool to pin the process to a single core. On most modern x86
machines you can see 2-3% swing in lots of small details, and when the
process migrates between cores this makes the numbers very hard to analyze.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140516/df15380a/attachment.html>

Yi Kong

2014-May-20 14:01 UTC

head link

[LLVMdev] Use perf tool for more accurate time measuring on Linux

I've set up a public LNT server to show the result of perf stat. There
is a huge improvement compared with timeit tool.
http://parkas16.inria.fr:8000/

Patch is updated to pin the process to a single core, the readings are
even more accurate. It's hard coded to run everything on core 0, so
don't run parallel testing with it for now. The tool now depends on
Linux perf and schedtool.

I'm running tests on ARM Cortex boards to verify the improvements.
Please also check if this works on your system.

Thanks,
Yi

On 16 May 2014 20:51, Chandler Carruth <chandlerc at google.com>
wrote:>
> On Fri, May 16, 2014 at 12:45 PM, Yi Kong <kongy.dev at gmail.com>
wrote:
>>
>> On 16 May 2014 18:40, "Chandler Carruth" <chandlerc at
google.com> wrote:
>> >
>> > Why not use the cycle count which perf exposes from hardware? That
would
>> > seem even better to me, but data would be better. =]
>>
>> That's an interesting idea. However I'm concerned if that will
miss some
>> aspects of compiler optimization. For example frequent cache misses
would
>> have much smaller impact on the result if the processor goes to lower
>> frequency during the stall period. Nonetheless it's definitely
worth to try
>> out.
>
> Sure, but we should disable frequency throttling on any machine from which
> we want numbers that look *remotely* stable.
>
> The other thing you might try doing while you're wrapping these tools
is to
> use schedtool to pin the process to a single core. On most modern x86
> machines you can see 2-3% swing in lots of small details, and when the
> process migrates between cores this makes the numbers very hard to analyze.

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - May 2014 - [LLVMdev] Use perf tool for more accurate time measuring on Linux

[LLVMdev] Use perf tool for more accurate time measuring on Linux

[LLVMdev] Use perf tool for more accurate time measuring on Linux

[LLVMdev] Use perf tool for more accurate time measuring on Linux

Apparently Analagous Threads