thr3ads.net - llvm dev - [llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Mircea Trofin via llvm-dev

2021-Jul-19 14:36 UTC

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
> llvm-dev <llvm-dev at lists.llvm.org>:
> > Now, to the questions. First, there doesn't seem to be a common
time
> unit for
> > "exec_time" among the different tests. For instance,
SingleSource/ seem
> to use
> > seconds while MicroBenchmarks seem to use μs. So, we can't
reliably judge
> > changes. Although I get the fact that micro-benchmarks are different
in
> nature
> > than Single/MultiSource benchmarks, so maybe one should focus only on
> > the one or the other depending on what they're interested in.
>
> Usually one does not compare executions of the entire test-suite, but
> look for which programs have regressed. In this scenario only relative
> changes between programs matter, so μs are only compared to μs and
> seconds only compared to seconds.
>
>
> > In any case, it would at least be great if the JSON data contained the
> time unit per test,
> > but that is not happening either.
>
> What do you mean? Don't you get the exec_time per program?
>
>
> > Do you think that the lack of time unit info is a problem ? If yes, do
> you like the
> > solution of adding the time unit in the JSON or do you want to propose
> an alternative?
>
> You could also normalize the time unit that is emitted to JSON to s or ms.
>
> >
> > The second question has to do with re-running the benchmarks: I do
> > cmake + make + llvm-lit -v -j 1 -o out.json .
> > but if I try to do the latter another time, it just does/shows
nothing.
> Is there any reason
> > that the benchmarks can't be run a second time? Could I somehow
run it a
> second time ?
>
> Running the programs a second time did work for me in the past.
> Remember to change the output to another file or the previous .json
> will be overwritten.
>
>
> > Lastly, slightly off-topic but while we're on the subject of
> benchmarking,
> > do you think it's reliable to run with -j <number of cores>
? I'm a
> little bit afraid of
> > the shared caches (because misses should be counted in the CPU time,
> which
> > is what is measured in "exec_time" AFAIU)
> > and any potential multi-threading that the tests may use.
>
> It depends. You can run in parallel, but then you should increase the
> number of samples (executions) appropriately to counter the increased
> noise. Depending on how many cores your system has, it might not be
> worth it, but instead try to make the system as deterministic as
> possible (single thread, thread affinity, avoid background processes,
> use perf instead of timeit, avoid context switches etc. ). To avoid
> systematic bias because always the same cache-sensitive programs run
> in parallel, use the --shuffle option.
>
> Also, depending on what you are trying to achieve (and what your platformtarget is), you could enable perfcounter
<https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection;
if instruction counts are sufficient (for example), the value will probably
not vary much with  multi-threading.

...but it's probably best to avoid system noise altogether. On Intel, afaik
that includes disabling turbo boost and hyperthreading, along with
Michael's recommendations.

Michael> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/1eeae8fa/attachment.html>

Stefanos Baziotis via llvm-dev

2021-Jul-19 19:46 UTC

head link

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

Hi,

Usually one does not compare executions of the entire test-suite,
but> look for which programs have regressed. In this scenario only relative
> changes between programs matter, so μs are only compared to μs and
> seconds only compared to seconds.

That's true, but there are different insights one can get from, say, a 30%
increase in a program that initially took 100μs and one which initially
took 10s.

What do you mean? Don't you get the exec_time per program?


Yes, but JSON file does not include the time _unit_. Actually, I think the
correct phrasing
is "unit of time", not "time unit", my bad. In any case, I
mean that you get
e.g., "exec_time": 4, but you don't know if this 4 is 4 seconds or
4 μs or whatever other unit of time.

For example, the only reason that it seems that MultiSource/ use
seconds is just because I ran a bunch of them manually (and because
some outputs saved by llvm-lit, which measure in seconds, match
the numbers on JSON).

If we know the unit of time per test case (or per X grouping of tests
for that matter), we could then, e.g., normalize the times, as you
suggest, or anyway, know the unit of time and act accordingly.

Running the programs a second time did work for me in the past.


Ok, it seems it works for me if I wait, but it seems it behaves differently
the second time. Anyway, not important.

It depends. You can run in parallel, but then you should increase
the> number of samples (executions) appropriately to counter the increased
> noise. Depending on how many cores your system has, it might not be
> worth it, but instead try to make the system as deterministic as
> possible (single thread, thread affinity, avoid background processes,
> use perf instead of timeit, avoid context switches etc. ). To avoid
> systematic bias because always the same cache-sensitive programs run
> in parallel, use the --shuffle option.

I see, thanks. I didn't know about the --shuffle option, interesting.

Btw, when using perf (i.e., using TEST_SUITE_USE_PERF in cmake), it seems
that perf runs both during the
build (i.e., make) and the run (i.e., llvm-lit) of the tests. It's not
important but do you happen to know
why does this happen?

Also, depending on what you are trying to achieve (and what your
platform> target is), you could enable perfcounter
> <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>
> collection;

Thanks, that can be useful in a bunch of cases. I should not that perf
stats are not included in the
JSON file. Is the "canonical" way to access them to follow the
CMakeFiles/<benchmark name>.dir/<benchmark name>.time.perfstats ?

For example, let's say that I want the perf stats for
test-suite/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp
To find them, I should go to the same path but in the build directory,
i.e.,: test-suite-build/SingleSource/Benchmarks/Adobe-C++/
and then follow the pattern above, so, the .perfstats file will be in:
test-suite-build/SingleSource/Benchmarks/Adobe-C++/CMakeFiles/loop_unroll.dir/loop_unroll.cpp.time.perfstats

Sorry for the long path strings, but I couldn't make it clear otherwise.

Thanks to both,
Stefanos

Στις Δευ, 19 Ιουλ 2021 στις 5:36 μ.μ., ο/η Mircea Trofin <mtrofin at
google.com>
έγραψε:
>
>
> On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
>> llvm-dev <llvm-dev at lists.llvm.org>:
>> > Now, to the questions. First, there doesn't seem to be a
common time
>> unit for
>> > "exec_time" among the different tests. For instance,
SingleSource/ seem
>> to use
>> > seconds while MicroBenchmarks seem to use μs. So, we can't
reliably
>> judge
>> > changes. Although I get the fact that micro-benchmarks are
different in
>> nature
>> > than Single/MultiSource benchmarks, so maybe one should focus only
on
>> > the one or the other depending on what they're interested in.
>>
>> Usually one does not compare executions of the entire test-suite, but
>> look for which programs have regressed. In this scenario only relative
>> changes between programs matter, so μs are only compared to μs and
>> seconds only compared to seconds.
>>
>>
>> > In any case, it would at least be great if the JSON data contained
the
>> time unit per test,
>> > but that is not happening either.
>>
>> What do you mean? Don't you get the exec_time per program?
>>
>>
>> > Do you think that the lack of time unit info is a problem ? If
yes, do
>> you like the
>> > solution of adding the time unit in the JSON or do you want to
propose
>> an alternative?
>>
>> You could also normalize the time unit that is emitted to JSON to s or
ms.
>>
>> >
>> > The second question has to do with re-running the benchmarks: I do
>> > cmake + make + llvm-lit -v -j 1 -o out.json .
>> > but if I try to do the latter another time, it just does/shows
nothing.
>> Is there any reason
>> > that the benchmarks can't be run a second time? Could I
somehow run it
>> a second time ?
>>
>> Running the programs a second time did work for me in the past.
>> Remember to change the output to another file or the previous .json
>> will be overwritten.
>>
>>
>> > Lastly, slightly off-topic but while we're on the subject of
>> benchmarking,
>> > do you think it's reliable to run with -j <number of
cores> ? I'm a
>> little bit afraid of
>> > the shared caches (because misses should be counted in the CPU
time,
>> which
>> > is what is measured in "exec_time" AFAIU)
>> > and any potential multi-threading that the tests may use.
>>
>> It depends. You can run in parallel, but then you should increase the
>> number of samples (executions) appropriately to counter the increased
>> noise. Depending on how many cores your system has, it might not be
>> worth it, but instead try to make the system as deterministic as
>> possible (single thread, thread affinity, avoid background processes,
>> use perf instead of timeit, avoid context switches etc. ). To avoid
>> systematic bias because always the same cache-sensitive programs run
>> in parallel, use the --shuffle option.
>>
>> Also, depending on what you are trying to achieve (and what your
platform
> target is), you could enable perfcounter
>
<https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection;
> if instruction counts are sufficient (for example), the value will probably
> not vary much with  multi-threading.
>
> ...but it's probably best to avoid system noise altogether. On Intel,
> afaik that includes disabling turbo boost and hyperthreading, along with
> Michael's recommendations.
>
> Michael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/fefc511d/attachment.html>

llvm dev - Jul 2021 - Questions About LLVM Test Suite: Time Units, Re-running benchmarks

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks