Stefanos Baziotis via llvm-dev
2021-Jul-18 16:14 UTC
[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks
Hi, I'm not very familiar with the LLVM test suite and I'd like to ask some questions. I wanted to get a feeling of the impact to runtime performance of some changes inside LLVM, so I thought of running llvm test-suite benchmarks. Build options: O3.cmake + DTEST_SUITE_BENCHMARKING_ONLY=True Run: llvm-lit -v -j 1 -o out.json . What I think I should be looking for is the "metrics" -> "exec_time" in the JSON file. Now, to the questions. First, there doesn't seem to be a common time unit for "exec_time" among the different tests. For instance, SingleSource/ seem to use seconds while MicroBenchmarks seem to use μs. So, we can't reliably judge changes. Although I get the fact that micro-benchmarks are different in nature than Single/MultiSource benchmarks, so maybe one should focus only on the one or the other depending on what they're interested in. In any case, it would at least be great if the JSON data contained the time unit per test, but that is not happening either. Do you think that the lack of time unit info is a problem ? If yes, do you like the solution of adding the time unit in the JSON or do you want to propose an alternative? The second question has to do with re-running the benchmarks: I do cmake + make + llvm-lit -v -j 1 -o out.json . but if I try to do the latter another time, it just does/shows nothing. Is there any reason that the benchmarks can't be run a second time? Could I somehow run it a second time ? Lastly, slightly off-topic but while we're on the subject of benchmarking, do you think it's reliable to run with -j <number of cores> ? I'm a little bit afraid of the shared caches (because misses should be counted in the CPU time, which is what is measured in "exec_time" AFAIU) and any potential multi-threading that the tests may use. Best, Stefanos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210718/44eb79fe/attachment.html>
Michael Kruse via llvm-dev
2021-Jul-19 03:57 UTC
[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks
Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org>:> Now, to the questions. First, there doesn't seem to be a common time unit for > "exec_time" among the different tests. For instance, SingleSource/ seem to use > seconds while MicroBenchmarks seem to use μs. So, we can't reliably judge > changes. Although I get the fact that micro-benchmarks are different in nature > than Single/MultiSource benchmarks, so maybe one should focus only on > the one or the other depending on what they're interested in.Usually one does not compare executions of the entire test-suite, but look for which programs have regressed. In this scenario only relative changes between programs matter, so μs are only compared to μs and seconds only compared to seconds.> In any case, it would at least be great if the JSON data contained the time unit per test, > but that is not happening either.What do you mean? Don't you get the exec_time per program?> Do you think that the lack of time unit info is a problem ? If yes, do you like the > solution of adding the time unit in the JSON or do you want to propose an alternative?You could also normalize the time unit that is emitted to JSON to s or ms.> > The second question has to do with re-running the benchmarks: I do > cmake + make + llvm-lit -v -j 1 -o out.json . > but if I try to do the latter another time, it just does/shows nothing. Is there any reason > that the benchmarks can't be run a second time? Could I somehow run it a second time ?Running the programs a second time did work for me in the past. Remember to change the output to another file or the previous .json will be overwritten.> Lastly, slightly off-topic but while we're on the subject of benchmarking, > do you think it's reliable to run with -j <number of cores> ? I'm a little bit afraid of > the shared caches (because misses should be counted in the CPU time, which > is what is measured in "exec_time" AFAIU) > and any potential multi-threading that the tests may use.It depends. You can run in parallel, but then you should increase the number of samples (executions) appropriately to counter the increased noise. Depending on how many cores your system has, it might not be worth it, but instead try to make the system as deterministic as possible (single thread, thread affinity, avoid background processes, use perf instead of timeit, avoid context switches etc. ). To avoid systematic bias because always the same cache-sensitive programs run in parallel, use the --shuffle option. Michael