Mircea Trofin via llvm-dev
2021-Jul-19 14:36 UTC
[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks
On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via > llvm-dev <llvm-dev at lists.llvm.org>: > > Now, to the questions. First, there doesn't seem to be a common time > unit for > > "exec_time" among the different tests. For instance, SingleSource/ seem > to use > > seconds while MicroBenchmarks seem to use μs. So, we can't reliably judge > > changes. Although I get the fact that micro-benchmarks are different in > nature > > than Single/MultiSource benchmarks, so maybe one should focus only on > > the one or the other depending on what they're interested in. > > Usually one does not compare executions of the entire test-suite, but > look for which programs have regressed. In this scenario only relative > changes between programs matter, so μs are only compared to μs and > seconds only compared to seconds. > > > > In any case, it would at least be great if the JSON data contained the > time unit per test, > > but that is not happening either. > > What do you mean? Don't you get the exec_time per program? > > > > Do you think that the lack of time unit info is a problem ? If yes, do > you like the > > solution of adding the time unit in the JSON or do you want to propose > an alternative? > > You could also normalize the time unit that is emitted to JSON to s or ms. > > > > > The second question has to do with re-running the benchmarks: I do > > cmake + make + llvm-lit -v -j 1 -o out.json . > > but if I try to do the latter another time, it just does/shows nothing. > Is there any reason > > that the benchmarks can't be run a second time? Could I somehow run it a > second time ? > > Running the programs a second time did work for me in the past. > Remember to change the output to another file or the previous .json > will be overwritten. > > > > Lastly, slightly off-topic but while we're on the subject of > benchmarking, > > do you think it's reliable to run with -j <number of cores> ? I'm a > little bit afraid of > > the shared caches (because misses should be counted in the CPU time, > which > > is what is measured in "exec_time" AFAIU) > > and any potential multi-threading that the tests may use. > > It depends. You can run in parallel, but then you should increase the > number of samples (executions) appropriately to counter the increased > noise. Depending on how many cores your system has, it might not be > worth it, but instead try to make the system as deterministic as > possible (single thread, thread affinity, avoid background processes, > use perf instead of timeit, avoid context switches etc. ). To avoid > systematic bias because always the same cache-sensitive programs run > in parallel, use the --shuffle option. > > Also, depending on what you are trying to achieve (and what your platformtarget is), you could enable perfcounter <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection; if instruction counts are sufficient (for example), the value will probably not vary much with multi-threading. ...but it's probably best to avoid system noise altogether. On Intel, afaik that includes disabling turbo boost and hyperthreading, along with Michael's recommendations. Michael> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/1eeae8fa/attachment.html>
Stefanos Baziotis via llvm-dev
2021-Jul-19 19:46 UTC
[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks
Hi, Usually one does not compare executions of the entire test-suite, but> look for which programs have regressed. In this scenario only relative > changes between programs matter, so μs are only compared to μs and > seconds only compared to seconds.That's true, but there are different insights one can get from, say, a 30% increase in a program that initially took 100μs and one which initially took 10s. What do you mean? Don't you get the exec_time per program? Yes, but JSON file does not include the time _unit_. Actually, I think the correct phrasing is "unit of time", not "time unit", my bad. In any case, I mean that you get e.g., "exec_time": 4, but you don't know if this 4 is 4 seconds or 4 μs or whatever other unit of time. For example, the only reason that it seems that MultiSource/ use seconds is just because I ran a bunch of them manually (and because some outputs saved by llvm-lit, which measure in seconds, match the numbers on JSON). If we know the unit of time per test case (or per X grouping of tests for that matter), we could then, e.g., normalize the times, as you suggest, or anyway, know the unit of time and act accordingly. Running the programs a second time did work for me in the past. Ok, it seems it works for me if I wait, but it seems it behaves differently the second time. Anyway, not important. It depends. You can run in parallel, but then you should increase the> number of samples (executions) appropriately to counter the increased > noise. Depending on how many cores your system has, it might not be > worth it, but instead try to make the system as deterministic as > possible (single thread, thread affinity, avoid background processes, > use perf instead of timeit, avoid context switches etc. ). To avoid > systematic bias because always the same cache-sensitive programs run > in parallel, use the --shuffle option.I see, thanks. I didn't know about the --shuffle option, interesting. Btw, when using perf (i.e., using TEST_SUITE_USE_PERF in cmake), it seems that perf runs both during the build (i.e., make) and the run (i.e., llvm-lit) of the tests. It's not important but do you happen to know why does this happen? Also, depending on what you are trying to achieve (and what your platform> target is), you could enable perfcounter > <https://github.com/google/benchmark/blob/main/docs/perf_counters.md> > collection;Thanks, that can be useful in a bunch of cases. I should not that perf stats are not included in the JSON file. Is the "canonical" way to access them to follow the CMakeFiles/<benchmark name>.dir/<benchmark name>.time.perfstats ? For example, let's say that I want the perf stats for test-suite/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp To find them, I should go to the same path but in the build directory, i.e.,: test-suite-build/SingleSource/Benchmarks/Adobe-C++/ and then follow the pattern above, so, the .perfstats file will be in: test-suite-build/SingleSource/Benchmarks/Adobe-C++/CMakeFiles/loop_unroll.dir/loop_unroll.cpp.time.perfstats Sorry for the long path strings, but I couldn't make it clear otherwise. Thanks to both, Stefanos Στις Δευ, 19 Ιουλ 2021 στις 5:36 μ.μ., ο/η Mircea Trofin <mtrofin at google.com> έγραψε:> > > On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via >> llvm-dev <llvm-dev at lists.llvm.org>: >> > Now, to the questions. First, there doesn't seem to be a common time >> unit for >> > "exec_time" among the different tests. For instance, SingleSource/ seem >> to use >> > seconds while MicroBenchmarks seem to use μs. So, we can't reliably >> judge >> > changes. Although I get the fact that micro-benchmarks are different in >> nature >> > than Single/MultiSource benchmarks, so maybe one should focus only on >> > the one or the other depending on what they're interested in. >> >> Usually one does not compare executions of the entire test-suite, but >> look for which programs have regressed. In this scenario only relative >> changes between programs matter, so μs are only compared to μs and >> seconds only compared to seconds. >> >> >> > In any case, it would at least be great if the JSON data contained the >> time unit per test, >> > but that is not happening either. >> >> What do you mean? Don't you get the exec_time per program? >> >> >> > Do you think that the lack of time unit info is a problem ? If yes, do >> you like the >> > solution of adding the time unit in the JSON or do you want to propose >> an alternative? >> >> You could also normalize the time unit that is emitted to JSON to s or ms. >> >> > >> > The second question has to do with re-running the benchmarks: I do >> > cmake + make + llvm-lit -v -j 1 -o out.json . >> > but if I try to do the latter another time, it just does/shows nothing. >> Is there any reason >> > that the benchmarks can't be run a second time? Could I somehow run it >> a second time ? >> >> Running the programs a second time did work for me in the past. >> Remember to change the output to another file or the previous .json >> will be overwritten. >> >> >> > Lastly, slightly off-topic but while we're on the subject of >> benchmarking, >> > do you think it's reliable to run with -j <number of cores> ? I'm a >> little bit afraid of >> > the shared caches (because misses should be counted in the CPU time, >> which >> > is what is measured in "exec_time" AFAIU) >> > and any potential multi-threading that the tests may use. >> >> It depends. You can run in parallel, but then you should increase the >> number of samples (executions) appropriately to counter the increased >> noise. Depending on how many cores your system has, it might not be >> worth it, but instead try to make the system as deterministic as >> possible (single thread, thread affinity, avoid background processes, >> use perf instead of timeit, avoid context switches etc. ). To avoid >> systematic bias because always the same cache-sensitive programs run >> in parallel, use the --shuffle option. >> >> Also, depending on what you are trying to achieve (and what your platform > target is), you could enable perfcounter > <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection; > if instruction counts are sufficient (for example), the value will probably > not vary much with multi-threading. > > ...but it's probably best to avoid system noise altogether. On Intel, > afaik that includes disabling turbo boost and hyperthreading, along with > Michael's recommendations. > > Michael >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/fefc511d/attachment.html>