thr3ads.net - llvm dev - [llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2021-Jul-19 03:57 UTC

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
llvm-dev <llvm-dev at lists.llvm.org>:> Now, to the questions. First, there doesn't seem to be a common time
unit for
> "exec_time" among the different tests. For instance,
SingleSource/ seem to use
> seconds while MicroBenchmarks seem to use μs. So, we can't reliably
judge
> changes. Although I get the fact that micro-benchmarks are different in
nature
> than Single/MultiSource benchmarks, so maybe one should focus only on
> the one or the other depending on what they're interested in.
Usually one does not compare executions of the entire test-suite, but
look for which programs have regressed. In this scenario only relative
changes between programs matter, so μs are only compared to μs and
seconds only compared to seconds.

> In any case, it would at least be great if the JSON data contained the time
unit per test,
> but that is not happening either.
What do you mean? Don't you get the exec_time per program?

> Do you think that the lack of time unit info is a problem ? If yes, do you
like the
> solution of adding the time unit in the JSON or do you want to propose an
alternative?
You could also normalize the time unit that is emitted to JSON to s or ms.
>
> The second question has to do with re-running the benchmarks: I do
> cmake + make + llvm-lit -v -j 1 -o out.json .
> but if I try to do the latter another time, it just does/shows nothing. Is
there any reason
> that the benchmarks can't be run a second time? Could I somehow run it
a second time ?
Running the programs a second time did work for me in the past.
Remember to change the output to another file or the previous .json
will be overwritten.

> Lastly, slightly off-topic but while we're on the subject of
benchmarking,
> do you think it's reliable to run with -j <number of cores> ?
I'm a little bit afraid of
> the shared caches (because misses should be counted in the CPU time, which
> is what is measured in "exec_time" AFAIU)
> and any potential multi-threading that the tests may use.
It depends. You can run in parallel, but then you should increase the
number of samples (executions) appropriately to counter the increased
noise. Depending on how many cores your system has, it might not be
worth it, but instead try to make the system as deterministic as
possible (single thread, thread affinity, avoid background processes,
use perf instead of timeit, avoid context switches etc. ). To avoid
systematic bias because always the same cache-sensitive programs run
in parallel, use the --shuffle option.

Michael

Mircea Trofin via llvm-dev

2021-Jul-19 14:36 UTC

head link

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
> llvm-dev <llvm-dev at lists.llvm.org>:
> > Now, to the questions. First, there doesn't seem to be a common
time
> unit for
> > "exec_time" among the different tests. For instance,
SingleSource/ seem
> to use
> > seconds while MicroBenchmarks seem to use μs. So, we can't
reliably judge
> > changes. Although I get the fact that micro-benchmarks are different
in
> nature
> > than Single/MultiSource benchmarks, so maybe one should focus only on
> > the one or the other depending on what they're interested in.
>
> Usually one does not compare executions of the entire test-suite, but
> look for which programs have regressed. In this scenario only relative
> changes between programs matter, so μs are only compared to μs and
> seconds only compared to seconds.
>
>
> > In any case, it would at least be great if the JSON data contained the
> time unit per test,
> > but that is not happening either.
>
> What do you mean? Don't you get the exec_time per program?
>
>
> > Do you think that the lack of time unit info is a problem ? If yes, do
> you like the
> > solution of adding the time unit in the JSON or do you want to propose
> an alternative?
>
> You could also normalize the time unit that is emitted to JSON to s or ms.
>
> >
> > The second question has to do with re-running the benchmarks: I do
> > cmake + make + llvm-lit -v -j 1 -o out.json .
> > but if I try to do the latter another time, it just does/shows
nothing.
> Is there any reason
> > that the benchmarks can't be run a second time? Could I somehow
run it a
> second time ?
>
> Running the programs a second time did work for me in the past.
> Remember to change the output to another file or the previous .json
> will be overwritten.
>
>
> > Lastly, slightly off-topic but while we're on the subject of
> benchmarking,
> > do you think it's reliable to run with -j <number of cores>
? I'm a
> little bit afraid of
> > the shared caches (because misses should be counted in the CPU time,
> which
> > is what is measured in "exec_time" AFAIU)
> > and any potential multi-threading that the tests may use.
>
> It depends. You can run in parallel, but then you should increase the
> number of samples (executions) appropriately to counter the increased
> noise. Depending on how many cores your system has, it might not be
> worth it, but instead try to make the system as deterministic as
> possible (single thread, thread affinity, avoid background processes,
> use perf instead of timeit, avoid context switches etc. ). To avoid
> systematic bias because always the same cache-sensitive programs run
> in parallel, use the --shuffle option.
>
> Also, depending on what you are trying to achieve (and what your platformtarget is), you could enable perfcounter
<https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection;
if instruction counts are sufficient (for example), the value will probably
not vary much with  multi-threading.

...but it's probably best to avoid system noise altogether. On Intel, afaik
that includes disabling turbo boost and hyperthreading, along with
Michael's recommendations.

Michael> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/1eeae8fa/attachment.html>

llvm dev - Jul 2021 - Questions About LLVM Test Suite: Time Units, Re-running benchmarks

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks