While porting my backends to llvm-3.2, I found a few places where the
optimizers could have performed better. I believe the mainstream targets can
also benefits from my tweaks. But before upstreaming my changes, I would like
to quantify their merits on other applications --- not just my domain specific
codes. In a word, it seemed the right time for me to start using LNT :)
I followed the LNT quickstart guide, and somehow got some results, but have
trouble using them. How are the other developers using lnt ? What is the
workflow for comparing a patched lllvm to the baseline version ?
I am mainly interested in the tests' execution time.
1. I am bit confused by the 'lnt runtest' output:
$LNT_TOP/sandbox/bin/lnt runtest nt \> --sandbox=$LNT_TOP/SANDBOX3 \
> --cc=$LLVM_BIN/clang \
> --cxx=$LLVM_BIN/clang++ \
> --test-suite=$LLVM_SRCS/projects/test-suite \
> --llvm-src=$LLVM_SRCS \
> --llvm-obj=$LLVM_BUILD
2013-03-11 18:41:22: checking source versions
2013-03-11 18:41:25: scanning for LNT-based test modules
2013-03-11 18:41:25: found 0 LNT-based test modules
2013-03-11 18:41:25: using nickname: 'm0__clang_DEV__x86_64'
2013-03-11 18:41:25: starting test in
'.../LLVM/LNT/SANDBOX3/test-2013-03-11_18-41-22'
2013-03-11 18:41:25: configuring...
2013-03-11 18:41:34: executing "nightly tests" with -j1...
2013-03-11 19:00:42: executing test modules
2013-03-11 19:00:42: loading nightly test data...
2013-03-11 19:00:42: capturing machine information
2013-03-11 19:00:42: generating report:
'.../LLVM/LNT/SANDBOX3/test-2013-03-11_18-41-22/report.json'
2013-03-11 18:41:22: submitting result to dummy instance
No handlers could be found for logger "lnt.server.db.migrate"
Importing 'tmpItDGOn.json'
Import succeeded.
Processing Times
----------------
Load : 0.00s
Import : 0.22s
Report : 0.48s
Total : 0.48s
Imported Data
-------------
Added Machines: 1
Added Runs : 1
Added Tests : 493
--- Tested: 986 tests --
FAIL: MultiSource/Applications/Burg/burg.execution_time (494 of 986)
FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (495 of 986)
FAIL: MultiSource/Applications/lemon/lemon.execution_time (496 of 986)
FAIL: MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-
bitcount.execution_time (497 of 986)
FAIL: MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.execution_time
(498 of 986)
FAIL: MultiSource/Benchmarks/Olden/voronoi/voronoi.execution_time (499 of 986)
FAIL: MultiSource/Benchmarks/Ptrdist/anagram/anagram.execution_time (500 of
986)
FAIL: MultiSource/Benchmarks/mafft/pairlocalalign.execution_time (501 of 986)
FAIL: SingleSource/Benchmarks/BenchmarkGame/puzzle.execution_time (502 of 986)
...
How many tests are there really : 493 or 986 ? Or does 493 refer to the number
of built test programs, and the compilation time and execution time count as 2
separate tests for the same program ?
2. Running lnt several times on the same unmodified clang+llvm binaries gives
different results (execution time can vary wildly : ~200%). I am running lnt
on linux/x86_64. I tried to deactivate the cpufreq thing, the machine was not
loaded, but this did not change. Is there any way to run the tests multiple
times and use statistics to get reproducible numbers (within a confidence
interval) ? Or to handle the stats at the 'lnt import' stage ? Or in the
web
display ? In the web display, how to use the 'run' and 'order'
fields ?
Thanks for any hint.
Cheers,
--
Arnaud de Grandmaison