Johannes Doerfert via llvm-dev
2021-May-19 15:30 UTC
[llvm-dev] LLVM LNT floating point performance tests on X86 - using the llvm-test-suite benchmarks
On 5/19/21 9:18 AM, Blower, Melanie I wrote:> Thank you, I have more questions. > > I am using a shared Linux system (Intel(R) Xeon(R) Platinum 8260M CPU @ 2.40GHz) to build and run the llvm-test-suite. Do I need to execute the tests on a quiescent system? I tried running a "null check" i.e. execute and collect results from llvm-lit run using the same set of test executables and the differences between the two runs (ideally would be zero since it's the same test executable) ranged from +14% to -18%. > > What is the acceptable tolerance?I'm not following what the "results" is here.> I work in clang not the backend optimization so I am not familiar with analysis techniques to understand what optimization transformations occurred due to my patch. Do you have any tips about that?It doesn't necessarily matter. If you want to know without any other information, you could compare the outputs of -mllvm -print-all w/ and w/o your patch. I don't think it is strictly necessary if the tests are not impacted too much.> Using the real test (unpatched compiler versus patched compiler), I compared the assembly for symm.test since it’s SingleSource, compiling with the 2 different compilers > > test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test 10.88 9.93 -8.7% > > Here’s the only difference in the .s file, seems unlikely that this would account for 8% difference in time. > > > .LBB6_21: # in Loop: Header=BB6_18 Depth=2 > leaq (%r12,%r10), %rdx > movsd (%rdx,%rax,8), %xmm3 # xmm3 = mem[0],zero > mulsd %xmm1, %xmm3 > movsd (%r9), %xmm4 # xmm4 = mem[0],zero > mulsd %xmm0, %xmm4 > mulsd (%rsi), %xmm4 > addsd %xmm3, %xmm4 > > With patch for contract changes: > .LBB6_21: # in Loop: Header=BB6_18 Depth=2 > movsd (%r9), %xmm3 # xmm3 = mem[0],zero > mulsd %xmm0, %xmm3 > mulsd (%rsi), %xmm3 > leaq (%r12,%r10), %rdx > movsd (%rdx,%rax,8), %xmm4 # xmm4 = mem[0],zero > mulsd %xmm1, %xmm4 > addsd %xmm3, %xmm4 > > The difference for test flops-5 was 25% but the code differences are bigger. I can try dump-after-all as first step.I'm not sure we need to look at this right now.> I'm nowhere near "generating new hash values" but won't the hash value be relative to the target microarchitecture? So if my system is different arch than bot, the hash value I compute here wouldn't compare equal to the bot hash? > > This is how I tested. Is the build line correct for this purpose (caches/O3.cmake), should I use different options when creating the test executables? > > git clone https://github.com/llvm/llvm-test-suite.git test-suite > cmake -DCMAKE_C_COMPILER=/iusers/sandbox/llorg-ContractOn/deploy/linux_prod/bin/clang \ > -DTEST_SUITE_BENCHMARKING_ONLY=true -DTEST_SUITE_RUN_BENCHMARKS=true \ > -C/iusers/test-suite/cmake/caches/O3.cmake \ > /iusers/test-suite > make > llvm-lit -v -j 1 -o results.json . > (Repeat in a different build directory using the unmodified compiler) > python3 test-suite/utils/compare.py -f --filter-short build-llorg-default/results.json build-llorg-Contract/results.json >& my-result.txt >I'm a little confused what you are doing, trying to do. I was expecting you run the symm executable compiled w/ and w/o your patch, then look at the numbers that are printed at the end. So compare the program results, but not the compile or execution time. If the results are pretty much equivalent, we can use the results w/ patch to create a new hash file. If not, we need to investigate why. Does that make sense? ~ Johannes> >> -----Original Message----- >> From: Johannes Doerfert <johannesdoerfert at gmail.com> >> Sent: Tuesday, May 18, 2021 8:25 PM >> To: Blower, Melanie I <melanie.blower at intel.com>; hal.finkle.llvm at gmail.com; >> spatel+llvm at rotateright.com; llvm-dev <llvm-dev at lists.llvm.org>; >> florian_hahn at apple.com >> Subject: Re: LLVM LNT floating point performance tests on X86 - using the llvm- >> test-suite benchmarks >> >> You can run the LNT tests locally and I would assume the tests to be impacted >> (on X86). >> >> The Polybench benchmarks, probably some others, have hased result files. >> Thus, any change >> to the output is flagged regardless how minor. I'd run it without and with this >> patch and compare the results. If they are in the expected tolerance I'd recreate >> the hash files for them and create a dependent commit for the LLVM test suite. >> >> Does that make sense? >> >> ~ Johannes >> >> >> On 5/18/21 3:32 PM, Blower, Melanie I wrote: >>> Hello. >>> I have a patch to commit to community >> https://reviews.llvm.org/D74436?id=282577 that changes command line >> settings for floating point. When I committed it previously, it was ultimately >> rolled back due to bot failures with LNT. >>> Looking for suggestions on how to use the llvm-test-suite benchmarks to >> analyze this issue so I can commit this change. >>> We think the key difference in the tests that regressed when I tried to commit >> the change was caused by differences in unrolling decisions when the fmuladd >> intrinsic was present. >>> As far as I can tell, the LNT bots aren't currently running on any x86 systems, >> so I have no idea what settings the bots used when they were running. I'm really >> not sure how to proceed. >>> It seems to me that FMA should give better performance on systems that >> support it on any non-trivial benchmark. >>> Thanks!
Blower, Melanie I via llvm-dev
2021-May-19 16:08 UTC
[llvm-dev] LLVM LNT floating point performance tests on X86 - using the llvm-test-suite benchmarks
What I'm trying to do is to determine whether the patch I'm submitting is going to cause benchmarking problems that force the patch to be reverted--since that happened the last time I committed the patch (several months ago). Since my patch did cause problems last time, I want to run the tests now and develop an explanation for any regressions. I thought that "compare.py" would tell me what I need to know. I assumed the summary line for symm was telling me that symm had improved 8.7%. (Where 8.7% is describing the difference between 10.88 and 9.93) But I looked at the lines in result.json for the 2 different executions (using not patched and patched) and the "hash" is the same for symm. Can I find what I need to know from results.json? For both runs there are 736 lines in results.json showing "code" : "PASS". Does that mean it's all OK and I just need to see if the hash value is the same? I also put one reply below. Thanks a lot.> -----Original Message----- > From: Johannes Doerfert <johannesdoerfert at gmail.com> > Sent: Wednesday, May 19, 2021 11:31 AM > To: Blower, Melanie I <melanie.blower at intel.com>; > spatel+llvm at rotateright.com; llvm-dev <llvm-dev at lists.llvm.org>; > florian_hahn at apple.com; hal.finkel.llvm at gmail.com > Subject: Re: LLVM LNT floating point performance tests on X86 - using the llvm- > test-suite benchmarks > > > On 5/19/21 9:18 AM, Blower, Melanie I wrote: > > Thank you, I have more questions. > > > > I am using a shared Linux system (Intel(R) Xeon(R) Platinum 8260M CPU @ > 2.40GHz) to build and run the llvm-test-suite. Do I need to execute the tests on > a quiescent system? I tried running a "null check" i.e. execute and collect results > from llvm-lit run using the same set of test executables and the differences > between the two runs (ideally would be zero since it's the same test executable) > ranged from +14% to -18%. > > > > What is the acceptable tolerance? > > I'm not following what the "results" is here.[Blower, Melanie] I mean the results.json file which is created by the llvm-lit run. One of the results.json file was execution results from the unpatched compiler, and the other results.json file was execution results from the patched compiler. (I also did the "null check" but let's ignore that)> > > > I work in clang not the backend optimization so I am not familiar with analysis > techniques to understand what optimization transformations occurred due to > my patch. Do you have any tips about that? > It doesn't necessarily matter. If you want to know without any other > information, you could compare the outputs of -mllvm -print-all w/ and w/o > your patch. I don't think it is strictly necessary if the tests are not impacted too > much. > > > > Using the real test (unpatched compiler versus patched compiler), I > > compared the assembly for symm.test since it’s SingleSource, compiling > > with the 2 different compilers > > > > test-suite :: SingleSource/Benchmarks/Polybench/linear- > algebra/kernels/symm/symm.test 10.88 9.93 -8.7% > > > > Here’s the only difference in the .s file, seems unlikely that this would account > for 8% difference in time. > > > > > > .LBB6_21: # in Loop: Header=BB6_18 Depth=2 > > leaq (%r12,%r10), %rdx > > movsd (%rdx,%rax,8), %xmm3 # xmm3 = mem[0],zero > > mulsd %xmm1, %xmm3 > > movsd (%r9), %xmm4 # xmm4 = mem[0],zero > > mulsd %xmm0, %xmm4 > > mulsd (%rsi), %xmm4 > > addsd %xmm3, %xmm4 > > > > With patch for contract changes: > > .LBB6_21: # in Loop: Header=BB6_18 Depth=2 > > movsd (%r9), %xmm3 # xmm3 = mem[0],zero > > mulsd %xmm0, %xmm3 > > mulsd (%rsi), %xmm3 > > leaq (%r12,%r10), %rdx > > movsd (%rdx,%rax,8), %xmm4 # xmm4 = mem[0],zero > > mulsd %xmm1, %xmm4 > > addsd %xmm3, %xmm4 > > > > The difference for test flops-5 was 25% but the code differences are bigger. I > can try dump-after-all as first step. > > I'm not sure we need to look at this right now. > > > > I'm nowhere near "generating new hash values" but won't the hash value be > relative to the target microarchitecture? So if my system is different arch than > bot, the hash value I compute here wouldn't compare equal to the bot hash? > > > > This is how I tested. Is the build line correct for this purpose > (caches/O3.cmake), should I use different options when creating the test > executables? > > > > git clone https://github.com/llvm/llvm-test-suite.git test-suite cmake > > -DCMAKE_C_COMPILER=/iusers/sandbox/llorg- > ContractOn/deploy/linux_prod/bin/clang \ > > -DTEST_SUITE_BENCHMARKING_ONLY=true - > DTEST_SUITE_RUN_BENCHMARKS=true \ > > -C/iusers/test-suite/cmake/caches/O3.cmake \ > > /iusers/test-suite > > make > > llvm-lit -v -j 1 -o results.json . > > (Repeat in a different build directory using the unmodified compiler) > > python3 test-suite/utils/compare.py -f --filter-short > > build-llorg-default/results.json build-llorg-Contract/results.json >& > > my-result.txt > > > I'm a little confused what you are doing, trying to do. I was expecting you run > the symm executable compiled w/ and w/o your patch, then look at the > numbers that are printed at the end. So compare the program results, but not > the compile or execution time. If the results are pretty much equivalent, we can > use the results w/ patch to create a new hash file. If not, we need to investigate > why. Does that make sense? > > ~ Johannes > > > > > >> -----Original Message----- > >> From: Johannes Doerfert <johannesdoerfert at gmail.com> > >> Sent: Tuesday, May 18, 2021 8:25 PM > >> To: Blower, Melanie I <melanie.blower at intel.com>; > >> hal.finkle.llvm at gmail.com; > >> spatel+llvm at rotateright.com; llvm-dev <llvm-dev at lists.llvm.org>; > >> florian_hahn at apple.com > >> Subject: Re: LLVM LNT floating point performance tests on X86 - using > >> the llvm- test-suite benchmarks > >> > >> You can run the LNT tests locally and I would assume the tests to be > >> impacted (on X86). > >> > >> The Polybench benchmarks, probably some others, have hased result files. > >> Thus, any change > >> to the output is flagged regardless how minor. I'd run it without and > >> with this patch and compare the results. If they are in the expected > >> tolerance I'd recreate the hash files for them and create a dependent commit > for the LLVM test suite. > >> > >> Does that make sense? > >> > >> ~ Johannes > >> > >> > >> On 5/18/21 3:32 PM, Blower, Melanie I wrote: > >>> Hello. > >>> I have a patch to commit to community > >> https://reviews.llvm.org/D74436?id=282577 that changes command line > >> settings for floating point. When I committed it previously, it was > >> ultimately rolled back due to bot failures with LNT. > >>> Looking for suggestions on how to use the llvm-test-suite benchmarks > >>> to > >> analyze this issue so I can commit this change. > >>> We think the key difference in the tests that regressed when I tried > >>> to commit > >> the change was caused by differences in unrolling decisions when the > >> fmuladd intrinsic was present. > >>> As far as I can tell, the LNT bots aren't currently running on any > >>> x86 systems, > >> so I have no idea what settings the bots used when they were running. > >> I'm really not sure how to proceed. > >>> It seems to me that FMA should give better performance on systems > >>> that > >> support it on any non-trivial benchmark. > >>> Thanks!