Michael Kruse via llvm-dev
2021-Jun-17 18:13 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
The LoopDistribute pass doesn't do anything unless it sees llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) because it does not have a profitability heuristic. It cannot say whether loop distribution is good for performance or not. What makes it improve hmmer is that the distributed loops can be vectoried. However, LoopDistribute is located before the vectorizer and cannot say in advance whether a distributed loop will be vectorized or not. If not, then it potentially only increased loop overhead. To make -enable-loop-distribute on by default would mean that we could consider loop distribution to be usually beneficial without causing major regressions. We need a lot more data to support that conclusion. Alternatively, we could consider loop-distribution a canonicalization. A later LoopFuse would do the profitability heuristic to re-fuse loops again if loop distribution did not gain anything. Michael
Jingu Kang via llvm-dev
2021-Jun-18 12:13 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
I appreciate your replies. I have seen below performance data. For AArch64, the performance data from llvm-test-suite is as below. Metric: exec_time Program results_base results_loop_dist diff test-suite...ications/JM/lencod/lencod.test 3.95 4.29 8.8% test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 1574.29 8.1% test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 24.50 7.3% test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 24.17 5.1% test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 616.70 4.6% test-suite...t:BENCHMARK_asin_novec_double_ 330.25 342.17 3.6% test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 2.67 3.3% test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 806.36 3.2% test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 64.93 3.0% test-suite...gebra/kernels/syr2k/syr2k.test 6.53 6.73 3.0% test-suite...t/StatementReordering-flt.test 2.33 2.40 2.8% test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 810.05 2.6% test-suite...s/gramschmidt/gramschmidt.test 1.44 1.48 2.5% test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 39.37 2.5% test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 612.34 2.4% Geomean difference -0.0% results_base results_loop_dist diff count 584.000000 584.000000 584.000000 mean 2761.681991 2759.451499 -0.000020 std 30145.555650 30124.858004 0.011093 min 0.608782 0.608729 -0.116286 25% 3.125425 3.106625 -0.000461 50% 130.212207 130.582658 0.000004 75% 602.708659 612.931769 0.000438 max 511340.880000 511059.980000 0.087630 For AArch64, the performance data from SPEC benchmark is as below. SPEC2006 Benchmark Improvement(%) 400.perlbench -1.786911228 401.bzip2 -3.174199894 403.gcc 0.717990522 429.mcf 2.053027806 445.gobmk 0.775388165 456.hmmer 43.39308377 458.sjeng 0.133933093 462.libquantum 4.647923489 464.h264ref -0.059568786 471.omnetpp 1.352515266 473.astar 0.362752409 483.xalancbmk 0.746580249 SPEC2017 Benchmark Improvement(%) 500.perlbench_r 0.415424516 502.gcc_r -0.112915812 505.mcf_r 0.238633706 520.omnetpp_r 0.114830748 523.xalancbmk_r 0.460107636 525.x264_r -0.401915964 531.deepsjeng_r 0.010064227 541.leela_r 0.394797504 557.xz_r 0.111781366 Thanks JinGu Kang> -----Original Message----- > From: Michael Kruse <llvmdev at meinersbur.de> > Sent: 17 June 2021 19:13 > To: Jingu Kang <Jingu.Kang at arm.com> > Cc: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the pipeline > of new pass manager > > The LoopDistribute pass doesn't do anything unless it sees > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) because it > does not have a profitability heuristic. It cannot say whether loop distribution is > good for performance or not. What makes it improve hmmer is that the > distributed loops can be vectoried. > However, LoopDistribute is located before the vectorizer and cannot say in > advance whether a distributed loop will be vectorized or not. > If not, then it potentially only increased loop overhead. > > To make -enable-loop-distribute on by default would mean that we could > consider loop distribution to be usually beneficial without causing major > regressions. We need a lot more data to support that conclusion. > > Alternatively, we could consider loop-distribution a canonicalization. > A later LoopFuse would do the profitability heuristic to re-fuse loops again if > loop distribution did not gain anything. > > Michael