Jingu Kang via llvm-dev
2021-Jun-18 12:13 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
I appreciate your replies. I have seen below performance data. For AArch64, the performance data from llvm-test-suite is as below. Metric: exec_time Program results_base results_loop_dist diff test-suite...ications/JM/lencod/lencod.test 3.95 4.29 8.8% test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 1574.29 8.1% test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 24.50 7.3% test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 24.17 5.1% test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 616.70 4.6% test-suite...t:BENCHMARK_asin_novec_double_ 330.25 342.17 3.6% test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 2.67 3.3% test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 806.36 3.2% test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 64.93 3.0% test-suite...gebra/kernels/syr2k/syr2k.test 6.53 6.73 3.0% test-suite...t/StatementReordering-flt.test 2.33 2.40 2.8% test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 810.05 2.6% test-suite...s/gramschmidt/gramschmidt.test 1.44 1.48 2.5% test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 39.37 2.5% test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 612.34 2.4% Geomean difference -0.0% results_base results_loop_dist diff count 584.000000 584.000000 584.000000 mean 2761.681991 2759.451499 -0.000020 std 30145.555650 30124.858004 0.011093 min 0.608782 0.608729 -0.116286 25% 3.125425 3.106625 -0.000461 50% 130.212207 130.582658 0.000004 75% 602.708659 612.931769 0.000438 max 511340.880000 511059.980000 0.087630 For AArch64, the performance data from SPEC benchmark is as below. SPEC2006 Benchmark Improvement(%) 400.perlbench -1.786911228 401.bzip2 -3.174199894 403.gcc 0.717990522 429.mcf 2.053027806 445.gobmk 0.775388165 456.hmmer 43.39308377 458.sjeng 0.133933093 462.libquantum 4.647923489 464.h264ref -0.059568786 471.omnetpp 1.352515266 473.astar 0.362752409 483.xalancbmk 0.746580249 SPEC2017 Benchmark Improvement(%) 500.perlbench_r 0.415424516 502.gcc_r -0.112915812 505.mcf_r 0.238633706 520.omnetpp_r 0.114830748 523.xalancbmk_r 0.460107636 525.x264_r -0.401915964 531.deepsjeng_r 0.010064227 541.leela_r 0.394797504 557.xz_r 0.111781366 Thanks JinGu Kang> -----Original Message----- > From: Michael Kruse <llvmdev at meinersbur.de> > Sent: 17 June 2021 19:13 > To: Jingu Kang <Jingu.Kang at arm.com> > Cc: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the pipeline > of new pass manager > > The LoopDistribute pass doesn't do anything unless it sees > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) because it > does not have a profitability heuristic. It cannot say whether loop distribution is > good for performance or not. What makes it improve hmmer is that the > distributed loops can be vectoried. > However, LoopDistribute is located before the vectorizer and cannot say in > advance whether a distributed loop will be vectorized or not. > If not, then it potentially only increased loop overhead. > > To make -enable-loop-distribute on by default would mean that we could > consider loop distribution to be usually beneficial without causing major > regressions. We need a lot more data to support that conclusion. > > Alternatively, we could consider loop-distribution a canonicalization. > A later LoopFuse would do the profitability heuristic to re-fuse loops again if > loop distribution did not gain anything. > > Michael
Jingu Kang via llvm-dev
2021-Jun-21 13:27 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
For considering the LoopDistribute pass as a canonicalization with the profitability heuristic of LoopFuse pass, it looks the LoopFuse pass does not also have proper profitability function. If possible, I would like to enable the LoopDistribute pass based on the performance data. As you can see on the previous email, the Geomean difference from llvm-test-suite is -0.0%. From spec benchmarks, we can see 43% performance improvement on 456.hmmer of SPEC2006. Based on this data, I think we could say the pass is usually beneficial without causing major regression. How do you think about it? Thanks JinGu Kang> -----Original Message----- > From: Jingu Kang > Sent: 18 June 2021 13:13 > To: Michael Kruse <llvmdev at meinersbur.de>; Kyrylo Tkachov > <Kyrylo.Tkachov at arm.com>; Sjoerd Meijer <Sjoerd.Meijer at arm.com> > Cc: llvm-dev at lists.llvm.org > Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in the pipeline > of new pass manager > > I appreciate your replies. I have seen below performance data. > > For AArch64, the performance data from llvm-test-suite is as below. > > Metric: exec_time > > Program results_base results_loop_dist diff > test-suite...ications/JM/lencod/lencod.test 3.95 4.29 8.8% > test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 1574.29 8.1% > test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 24.50 7.3% > test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 24.17 5.1% > test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 616.70 4.6% > test-suite...t:BENCHMARK_asin_novec_double_ 330.25 342.17 3.6% > test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 2.67 3.3% > test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 806.36 3.2% > test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 64.93 3.0% > test-suite...gebra/kernels/syr2k/syr2k.test 6.53 6.73 3.0% > test-suite...t/StatementReordering-flt.test 2.33 2.40 2.8% > test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 810.05 2.6% > test-suite...s/gramschmidt/gramschmidt.test 1.44 1.48 2.5% > test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 39.37 2.5% > test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 612.34 2.4% > Geomean difference -0.0% > results_base results_loop_dist diff > count 584.000000 584.000000 584.000000 > mean 2761.681991 2759.451499 -0.000020 > std 30145.555650 30124.858004 0.011093 > min 0.608782 0.608729 -0.116286 > 25% 3.125425 3.106625 -0.000461 > 50% 130.212207 130.582658 0.000004 > 75% 602.708659 612.931769 0.000438 > max 511340.880000 511059.980000 0.087630 > > For AArch64, the performance data from SPEC benchmark is as below. > > SPEC2006 > Benchmark Improvement(%) > 400.perlbench -1.786911228 > 401.bzip2 -3.174199894 > 403.gcc 0.717990522 > 429.mcf 2.053027806 > 445.gobmk 0.775388165 > 456.hmmer 43.39308377 > 458.sjeng 0.133933093 > 462.libquantum 4.647923489 > 464.h264ref -0.059568786 > 471.omnetpp 1.352515266 > 473.astar 0.362752409 > 483.xalancbmk 0.746580249 > > SPEC2017 > Benchmark Improvement(%) > 500.perlbench_r 0.415424516 > 502.gcc_r -0.112915812 > 505.mcf_r 0.238633706 > 520.omnetpp_r 0.114830748 > 523.xalancbmk_r 0.460107636 > 525.x264_r -0.401915964 > 531.deepsjeng_r 0.010064227 > 541.leela_r 0.394797504 > 557.xz_r 0.111781366 > > Thanks > JinGu Kang > > > -----Original Message----- > > From: Michael Kruse <llvmdev at meinersbur.de> > > Sent: 17 June 2021 19:13 > > To: Jingu Kang <Jingu.Kang at arm.com> > > Cc: llvm-dev at lists.llvm.org > > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in > > the pipeline of new pass manager > > > > The LoopDistribute pass doesn't do anything unless it sees > > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) > > because it does not have a profitability heuristic. It cannot say > > whether loop distribution is good for performance or not. What makes > > it improve hmmer is that the distributed loops can be vectoried. > > However, LoopDistribute is located before the vectorizer and cannot > > say in advance whether a distributed loop will be vectorized or not. > > If not, then it potentially only increased loop overhead. > > > > To make -enable-loop-distribute on by default would mean that we could > > consider loop distribution to be usually beneficial without causing > > major regressions. We need a lot more data to support that conclusion. > > > > Alternatively, we could consider loop-distribution a canonicalization. > > A later LoopFuse would do the profitability heuristic to re-fuse loops > > again if loop distribution did not gain anything. > > > > Michael