Michael Kruse via llvm-dev
2021-Jun-17 18:13 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
The LoopDistribute pass doesn't do anything unless it sees llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) because it does not have a profitability heuristic. It cannot say whether loop distribution is good for performance or not. What makes it improve hmmer is that the distributed loops can be vectoried. However, LoopDistribute is located before the vectorizer and cannot say in advance whether a distributed loop will be vectorized or not. If not, then it potentially only increased loop overhead. To make -enable-loop-distribute on by default would mean that we could consider loop distribution to be usually beneficial without causing major regressions. We need a lot more data to support that conclusion. Alternatively, we could consider loop-distribution a canonicalization. A later LoopFuse would do the profitability heuristic to re-fuse loops again if loop distribution did not gain anything. Michael
Jingu Kang via llvm-dev
2021-Jun-18 12:13 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
I appreciate your replies. I have seen below performance data.
For AArch64, the performance data from llvm-test-suite is as below.
Metric: exec_time
Program results_base results_loop_dist
diff
test-suite...ications/JM/lencod/lencod.test 3.95 4.29
8.8%
test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 1574.29
8.1%
test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 24.50
7.3%
test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 24.17
5.1%
test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 616.70
4.6%
test-suite...t:BENCHMARK_asin_novec_double_ 330.25 342.17
3.6%
test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 2.67
3.3%
test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 806.36
3.2%
test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 64.93
3.0%
test-suite...gebra/kernels/syr2k/syr2k.test 6.53 6.73
3.0%
test-suite...t/StatementReordering-flt.test 2.33 2.40
2.8%
test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 810.05
2.6%
test-suite...s/gramschmidt/gramschmidt.test 1.44 1.48
2.5%
test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 39.37
2.5%
test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 612.34
2.4%
Geomean difference
-0.0%
results_base results_loop_dist diff
count 584.000000 584.000000 584.000000
mean 2761.681991 2759.451499 -0.000020
std 30145.555650 30124.858004 0.011093
min 0.608782 0.608729 -0.116286
25% 3.125425 3.106625 -0.000461
50% 130.212207 130.582658 0.000004
75% 602.708659 612.931769 0.000438
max 511340.880000 511059.980000 0.087630
For AArch64, the performance data from SPEC benchmark is as below.
SPEC2006
Benchmark Improvement(%)
400.perlbench -1.786911228
401.bzip2 -3.174199894
403.gcc 0.717990522
429.mcf 2.053027806
445.gobmk 0.775388165
456.hmmer 43.39308377
458.sjeng 0.133933093
462.libquantum 4.647923489
464.h264ref -0.059568786
471.omnetpp 1.352515266
473.astar 0.362752409
483.xalancbmk 0.746580249
SPEC2017
Benchmark Improvement(%)
500.perlbench_r 0.415424516
502.gcc_r -0.112915812
505.mcf_r 0.238633706
520.omnetpp_r 0.114830748
523.xalancbmk_r 0.460107636
525.x264_r -0.401915964
531.deepsjeng_r 0.010064227
541.leela_r 0.394797504
557.xz_r 0.111781366
Thanks
JinGu Kang
> -----Original Message-----
> From: Michael Kruse <llvmdev at meinersbur.de>
> Sent: 17 June 2021 19:13
> To: Jingu Kang <Jingu.Kang at arm.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline
> of new pass manager
>
> The LoopDistribute pass doesn't do anything unless it sees
> llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`)
because it
> does not have a profitability heuristic. It cannot say whether loop
distribution is
> good for performance or not. What makes it improve hmmer is that the
> distributed loops can be vectoried.
> However, LoopDistribute is located before the vectorizer and cannot say in
> advance whether a distributed loop will be vectorized or not.
> If not, then it potentially only increased loop overhead.
>
> To make -enable-loop-distribute on by default would mean that we could
> consider loop distribution to be usually beneficial without causing major
> regressions. We need a lot more data to support that conclusion.
>
> Alternatively, we could consider loop-distribution a canonicalization.
> A later LoopFuse would do the profitability heuristic to re-fuse loops
again if
> loop distribution did not gain anything.
>
> Michael