Johannes Doerfert via llvm-dev
2021-May-19 00:25 UTC
[llvm-dev] LLVM LNT floating point performance tests on X86 - using the llvm-test-suite benchmarks
You can run the LNT tests locally and I would assume the tests to be impacted (on X86). The Polybench benchmarks, probably some others, have hased result files. Thus, any change to the output is flagged regardless how minor. I'd run it without and with this patch and compare the results. If they are in the expected tolerance I'd recreate the hash files for them and create a dependent commit for the LLVM test suite. Does that make sense? ~ Johannes On 5/18/21 3:32 PM, Blower, Melanie I wrote:> Hello. > I have a patch to commit to community https://reviews.llvm.org/D74436?id=282577 that changes command line settings for floating point. When I committed it previously, it was ultimately rolled back due to bot failures with LNT. > > Looking for suggestions on how to use the llvm-test-suite benchmarks to analyze this issue so I can commit this change. > > We think the key difference in the tests that regressed when I tried to commit the change was caused by differences in unrolling decisions when the fmuladd intrinsic was present. > > As far as I can tell, the LNT bots aren't currently running on any x86 systems, so I have no idea what settings the bots used when they were running. I'm really not sure how to proceed. > > It seems to me that FMA should give better performance on systems that support it on any non-trivial benchmark. > > Thanks!
Blower, Melanie I via llvm-dev
2021-May-19 14:18 UTC
[llvm-dev] LLVM LNT floating point performance tests on X86 - using the llvm-test-suite benchmarks
Thank you, I have more questions. I am using a shared Linux system (Intel(R) Xeon(R) Platinum 8260M CPU @ 2.40GHz) to build and run the llvm-test-suite. Do I need to execute the tests on a quiescent system? I tried running a "null check" i.e. execute and collect results from llvm-lit run using the same set of test executables and the differences between the two runs (ideally would be zero since it's the same test executable) ranged from +14% to -18%. What is the acceptable tolerance? I work in clang not the backend optimization so I am not familiar with analysis techniques to understand what optimization transformations occurred due to my patch. Do you have any tips about that? Using the real test (unpatched compiler versus patched compiler), I compared the assembly for symm.test since it’s SingleSource, compiling with the 2 different compilers test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test 10.88 9.93 -8.7% Here’s the only difference in the .s file, seems unlikely that this would account for 8% difference in time. .LBB6_21: # in Loop: Header=BB6_18 Depth=2 leaq (%r12,%r10), %rdx movsd (%rdx,%rax,8), %xmm3 # xmm3 = mem[0],zero mulsd %xmm1, %xmm3 movsd (%r9), %xmm4 # xmm4 = mem[0],zero mulsd %xmm0, %xmm4 mulsd (%rsi), %xmm4 addsd %xmm3, %xmm4 With patch for contract changes: .LBB6_21: # in Loop: Header=BB6_18 Depth=2 movsd (%r9), %xmm3 # xmm3 = mem[0],zero mulsd %xmm0, %xmm3 mulsd (%rsi), %xmm3 leaq (%r12,%r10), %rdx movsd (%rdx,%rax,8), %xmm4 # xmm4 = mem[0],zero mulsd %xmm1, %xmm4 addsd %xmm3, %xmm4 The difference for test flops-5 was 25% but the code differences are bigger. I can try dump-after-all as first step. I'm nowhere near "generating new hash values" but won't the hash value be relative to the target microarchitecture? So if my system is different arch than bot, the hash value I compute here wouldn't compare equal to the bot hash? This is how I tested. Is the build line correct for this purpose (caches/O3.cmake), should I use different options when creating the test executables? git clone https://github.com/llvm/llvm-test-suite.git test-suite cmake -DCMAKE_C_COMPILER=/iusers/sandbox/llorg-ContractOn/deploy/linux_prod/bin/clang \ -DTEST_SUITE_BENCHMARKING_ONLY=true -DTEST_SUITE_RUN_BENCHMARKS=true \ -C/iusers/test-suite/cmake/caches/O3.cmake \ /iusers/test-suite make llvm-lit -v -j 1 -o results.json . (Repeat in a different build directory using the unmodified compiler) python3 test-suite/utils/compare.py -f --filter-short build-llorg-default/results.json build-llorg-Contract/results.json >& my-result.txt> -----Original Message----- > From: Johannes Doerfert <johannesdoerfert at gmail.com> > Sent: Tuesday, May 18, 2021 8:25 PM > To: Blower, Melanie I <melanie.blower at intel.com>; hal.finkle.llvm at gmail.com; > spatel+llvm at rotateright.com; llvm-dev <llvm-dev at lists.llvm.org>; > florian_hahn at apple.com > Subject: Re: LLVM LNT floating point performance tests on X86 - using the llvm- > test-suite benchmarks > > You can run the LNT tests locally and I would assume the tests to be impacted > (on X86). > > The Polybench benchmarks, probably some others, have hased result files. > Thus, any change > to the output is flagged regardless how minor. I'd run it without and with this > patch and compare the results. If they are in the expected tolerance I'd recreate > the hash files for them and create a dependent commit for the LLVM test suite. > > Does that make sense? > > ~ Johannes > > > On 5/18/21 3:32 PM, Blower, Melanie I wrote: > > Hello. > > I have a patch to commit to community > https://reviews.llvm.org/D74436?id=282577 that changes command line > settings for floating point. When I committed it previously, it was ultimately > rolled back due to bot failures with LNT. > > > > Looking for suggestions on how to use the llvm-test-suite benchmarks to > analyze this issue so I can commit this change. > > > > We think the key difference in the tests that regressed when I tried to commit > the change was caused by differences in unrolling decisions when the fmuladd > intrinsic was present. > > > > As far as I can tell, the LNT bots aren't currently running on any x86 systems, > so I have no idea what settings the bots used when they were running. I'm really > not sure how to proceed. > > > > It seems to me that FMA should give better performance on systems that > support it on any non-trivial benchmark. > > > > Thanks!