thr3ads.net - llvm dev - [llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2021-Jun-17 18:13 UTC

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

The LoopDistribute pass doesn't do anything unless it sees
llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`)
because it does not have a profitability heuristic. It cannot say
whether loop distribution is good for performance or not. What makes
it improve hmmer is that the distributed loops can be vectoried.
However, LoopDistribute is located before the vectorizer and cannot
say in advance whether a distributed loop will be vectorized or not.
If not, then it potentially only increased loop overhead.

To make -enable-loop-distribute on by default would mean that we could
consider loop distribution to be usually beneficial without causing
major regressions. We need a lot more data to support that conclusion.

Alternatively, we could consider loop-distribution a canonicalization.
A later LoopFuse would do the profitability heuristic to re-fuse loops
again if loop distribution did not gain anything.

Michael

Jingu Kang via llvm-dev

2021-Jun-18 12:13 UTC

head link

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

I appreciate your replies. I have seen below performance data.

For AArch64, the performance data from llvm-test-suite is as below.

Metric: exec_time

Program                                        results_base results_loop_dist
diff
 test-suite...ications/JM/lencod/lencod.test     3.95         4.29            
8.8%
 test-suite...emCmp<5, GreaterThanZero, Mid>   1456.09      1574.29       
8.1%
 test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217    22.83        24.50            
7.3%
 test-suite....test:BM_BAND_LIN_EQ_RAW/44217    23.00        24.17            
5.1%
 test-suite...st:BM_INT_PREDICT_LAMBDA/44217   589.54       616.70            
4.6%
 test-suite...t:BENCHMARK_asin_novec_double_   330.25       342.17            
3.6%
 test-suite...ow-dbl/GlobalDataFlow-dbl.test     2.58         2.67            
3.3%
 test-suite...da.test:BM_PIC_2D_LAMBDA/44217   781.30       806.36            
3.2%
 test-suite...est:BM_ENERGY_CALC_LAMBDA/5001    63.02        64.93            
3.0%
 test-suite...gebra/kernels/syr2k/syr2k.test     6.53         6.73            
3.0%
 test-suite...t/StatementReordering-flt.test     2.33         2.40            
2.8%
 test-suite...sCRaw.test:BM_PIC_2D_RAW/44217   789.90       810.05            
2.6%
 test-suite...s/gramschmidt/gramschmidt.test     1.44         1.48            
2.5%
 test-suite...Raw.test:BM_HYDRO_1D_RAW/44217    38.42        39.37            
2.5%
 test-suite....test:BM_INT_PREDICT_RAW/44217   597.73       612.34            
2.4%
 Geomean difference                                                          
-0.0%
        results_base  results_loop_dist        diff
count  584.000000     584.000000         584.000000
mean   2761.681991    2759.451499       -0.000020  
std    30145.555650   30124.858004       0.011093  
min    0.608782       0.608729          -0.116286  
25%    3.125425       3.106625          -0.000461  
50%    130.212207     130.582658         0.000004  
75%    602.708659     612.931769         0.000438  
max    511340.880000  511059.980000      0.087630

For AArch64, the performance data from SPEC benchmark is as below.

SPEC2006		
Benchmark		Improvement(%)
400.perlbench		-1.786911228
401.bzip2		-3.174199894
403.gcc		0.717990522
429.mcf		2.053027806
445.gobmk		0.775388165
456.hmmer		43.39308377
458.sjeng		0.133933093
462.libquantum		4.647923489
464.h264ref		-0.059568786
471.omnetpp		1.352515266
473.astar		0.362752409
483.xalancbmk		0.746580249
		
SPEC2017		
Benchmark		Improvement(%)
500.perlbench_r		0.415424516
502.gcc_r		-0.112915812
505.mcf_r		0.238633706
520.omnetpp_r		0.114830748
523.xalancbmk_r		0.460107636
525.x264_r		-0.401915964
531.deepsjeng_r		0.010064227
541.leela_r		0.394797504
557.xz_r		0.111781366

Thanks
JinGu Kang
> -----Original Message-----
> From: Michael Kruse <llvmdev at meinersbur.de>
> Sent: 17 June 2021 19:13
> To: Jingu Kang <Jingu.Kang at arm.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline
> of new pass manager
> 
> The LoopDistribute pass doesn't do anything unless it sees
> llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`)
because it
> does not have a profitability heuristic. It cannot say whether loop
distribution is
> good for performance or not. What makes it improve hmmer is that the
> distributed loops can be vectoried.
> However, LoopDistribute is located before the vectorizer and cannot say in
> advance whether a distributed loop will be vectorized or not.
> If not, then it potentially only increased loop overhead.
> 
> To make -enable-loop-distribute on by default would mean that we could
> consider loop distribution to be usually beneficial without causing major
> regressions. We need a lot more data to support that conclusion.
> 
> Alternatively, we could consider loop-distribution a canonicalization.
> A later LoopFuse would do the profitability heuristic to re-fuse loops
again if
> loop distribution did not gain anything.
> 
> Michael

llvm dev - Jun 2021 - Enabling Loop Distribution Pass as default in the pipeline of new pass manager

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager