thr3ads.net - llvm dev - [llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Jingu Kang via llvm-dev

2021-Jun-21 17:54 UTC

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

The compile time data is as below. There could be a bit noise but it looks there
is no big compile time regression.
>From llvm-test-suite
Metric: compile_time

Program                                        results_base results_loop_dist
diff
 test-suite...arks/VersaBench/dbms/dbms.test     0.94         0.95            
1.6%
test-suite...s/MallocBench/cfrac/cfrac.test     0.89         0.90            
1.5%
test-suite...ks/Prolangs-C/gnugo/gnugo.test     0.72         0.73            
1.4%
test-suite...yApps-C++/PENNANT/PENNANT.test     8.65         8.75            
1.2%
test-suite...marks/Ptrdist/yacr2/yacr2.test     0.84         0.85            
1.1%
test-suite.../Builtins/Int128/Builtins.test     0.86         0.87            
1.0%
test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test     0.69         0.70            
1.0%
test-suite...decode/alacconvert-decode.test     1.16         1.17            
0.9%
test-suite...encode/alacconvert-encode.test     1.16         1.17            
0.9%
test-suite...peg2/mpeg2dec/mpeg2decode.test     1.71         1.72            
0.9%
test-suite.../Applications/spiff/spiff.test     0.88         0.89            
0.9%
test-suite...terpolation/Interpolation.test     0.96         0.97            
0.9%
test-suite...chmarks/MallocBench/gs/gs.test     4.58         4.62            
0.9%
test-suite...-C++/stepanov_abstraction.test     0.69         0.70            
0.8%
test-suite...marks/7zip/7zip-benchmark.test    52.35        52.74            
0.7%
Geomean difference                                                           
nan%
       results_base  results_loop_dist        diff
count  117.000000    118.000000         117.000000
mean   4.636126      4.616575           0.002171
std    7.725991      7.737663           0.006310
min    0.607300      0.602200          -0.041930
25%    1.345700      1.313650          -0.001577
50%    1.887000      1.888800           0.002463
75%    4.340800      4.343275           0.005754
max    52.351200     52.736000          0.015861
>From SPEC2017benchmarks
baseline
enable-loop-distribute
diff (seconds)
500.perlbench_r
00:01:06
00:01:04
-2
502.gcc_r
00:05:24
00:05:25
1
505.mcf_r
00:00:02
00:00:02
0
520.omnetpp_r
00:00:58
00:00:58
0
523.xalancbmk_r
00:02:30
00:02:30
0
525.x264_r
00:00:32
00:00:31
-1
531.deepsjeng_r
00:00:04
00:00:04
0
541.leela_r
00:00:06
00:00:06
0
557.xz_r
00:00:05
00:00:05
0
999.specrand_ir
00:00:01
00:00:00
1
>From SPEC2006 (number is seconds)benchmarks
baseline
enable-loop-distribute
diff (seconds)
400.perlbench
00:00:29
00:00:29
0
401.bzip2
00:00:04
00:00:03
-1
403.gcc
00:01:28
00:01:26
-2
429.mcf
00:00:01
00:00:01
0
445.gobmk
00:00:24
00:00:24
0
456.hmmer
00:00:06
00:00:06
0
458.sjeng
00:00:03
00:00:03
0
462.libquantum
00:00:03
00:00:02
-1
464.h264ref
00:00:29
00:00:29
0
471.omnetpp
00:00:23
00:00:24
1
473.astar
00:00:02
00:00:02
0
483.xalancbmk
00:02:07
00:02:06
-1
999.specrand
00:00:01
00:00:01
0

Thanks
JinGu Kang

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Sjoerd
Meijer via llvm-dev
Sent: 21 June 2021 14:36
To: Jingu Kang <Jingu.Kang at arm.com>; Michael Kruse <llvmdev at
meinersbur.de>; Kyrylo Tkachov <Kyrylo.Tkachov at arm.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline of new pass manager
> Based on this data, I think we could say the pass is usually beneficial
without causing major regression.
I think we need to look at compile-times too before we can draw that conclusion,
i.e. we need to justify it's worth spending extra compile-time for
optimising a few cases. Hopefully loop distribution is a cheap pass to run (also
when it is running but not triggering), but that's something that needs to
be checked I think.
________________________________
From: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at
arm.com>>
Sent: 21 June 2021 14:27
To: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>; Kyrylo Tkachov <Kyrylo.Tkachov at
arm.com<mailto:Kyrylo.Tkachov at arm.com>>; Sjoerd Meijer
<Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline of new pass manager

For considering the LoopDistribute pass as a canonicalization with the
profitability heuristic of LoopFuse pass, it looks the LoopFuse pass does not
also have proper profitability function.

If possible, I would like to enable the LoopDistribute pass based on the
performance data.

As you can see on the previous email, the Geomean difference from
llvm-test-suite is -0.0%. From spec benchmarks, we can see 43% performance
improvement on 456.hmmer of SPEC2006. Based on this data, I think we could say
the pass is usually beneficial without causing major regression.

How do you think about it?

Thanks
JinGu Kang
> -----Original Message-----
> From: Jingu Kang
> Sent: 18 June 2021 13:13
> To: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>; Kyrylo Tkachov
> <Kyrylo.Tkachov at arm.com<mailto:Kyrylo.Tkachov at arm.com>>;
Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at
arm.com>>
> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline
> of new pass manager
>
> I appreciate your replies. I have seen below performance data.
>
> For AArch64, the performance data from llvm-test-suite is as below.
>
> Metric: exec_time
>
> Program                                        results_base
results_loop_dist diff
>  test-suite...ications/JM/lencod/lencod.test     3.95         4.29         
8.8%
>  test-suite...emCmp<5, GreaterThanZero, Mid>   1456.09      1574.29  
8.1%
>  test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217    22.83        24.50         
7.3%
>  test-suite....test:BM_BAND_LIN_EQ_RAW/44217    23.00        24.17         
5.1%
>  test-suite...st:BM_INT_PREDICT_LAMBDA/44217   589.54       616.70         
4.6%
>  test-suite...t:BENCHMARK_asin_novec_double_   330.25       342.17         
3.6%
>  test-suite...ow-dbl/GlobalDataFlow-dbl.test     2.58         2.67         
3.3%
>  test-suite...da.test:BM_PIC_2D_LAMBDA/44217   781.30       806.36         
3.2%
>  test-suite...est:BM_ENERGY_CALC_LAMBDA/5001    63.02        64.93         
3.0%
>  test-suite...gebra/kernels/syr2k/syr2k.test     6.53         6.73         
3.0%
>  test-suite...t/StatementReordering-flt.test     2.33         2.40         
2.8%
>  test-suite...sCRaw.test:BM_PIC_2D_RAW/44217   789.90       810.05         
2.6%
>  test-suite...s/gramschmidt/gramschmidt.test     1.44         1.48         
2.5%
>  test-suite...Raw.test:BM_HYDRO_1D_RAW/44217    38.42        39.37         
2.5%
>  test-suite....test:BM_INT_PREDICT_RAW/44217   597.73       612.34         
2.4%
>  Geomean difference                                                        
-0.0%
>         results_base  results_loop_dist        diff
> count  584.000000     584.000000         584.000000
> mean   2761.681991    2759.451499       -0.000020
> std    30145.555650   30124.858004       0.011093
> min    0.608782       0.608729          -0.116286
> 25%    3.125425       3.106625          -0.000461
> 50%    130.212207     130.582658         0.000004
> 75%    602.708659     612.931769         0.000438
> max    511340.880000  511059.980000      0.087630
>
> For AArch64, the performance data from SPEC benchmark is as below.
>
> SPEC2006
> Benchmark             Improvement(%)
> 400.perlbench         -1.786911228
> 401.bzip2             -3.174199894
> 403.gcc               0.717990522
> 429.mcf               2.053027806
> 445.gobmk             0.775388165
> 456.hmmer             43.39308377
> 458.sjeng             0.133933093
> 462.libquantum                4.647923489
> 464.h264ref           -0.059568786
> 471.omnetpp           1.352515266
> 473.astar             0.362752409
> 483.xalancbmk         0.746580249
>
> SPEC2017
> Benchmark             Improvement(%)
> 500.perlbench_r               0.415424516
> 502.gcc_r             -0.112915812
> 505.mcf_r             0.238633706
> 520.omnetpp_r         0.114830748
> 523.xalancbmk_r               0.460107636
> 525.x264_r            -0.401915964
> 531.deepsjeng_r               0.010064227
> 541.leela_r           0.394797504
> 557.xz_r              0.111781366
>
> Thanks
> JinGu Kang
>
> > -----Original Message-----
> > From: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>
> > Sent: 17 June 2021 19:13
> > To: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at
arm.com>>
> > Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
> > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in
> > the pipeline of new pass manager
> >
> > The LoopDistribute pass doesn't do anything unless it sees
> > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`)
> > because it does not have a profitability heuristic. It cannot say
> > whether loop distribution is good for performance or not. What makes
> > it improve hmmer is that the distributed loops can be vectoried.
> > However, LoopDistribute is located before the vectorizer and cannot
> > say in advance whether a distributed loop will be vectorized or not.
> > If not, then it potentially only increased loop overhead.
> >
> > To make -enable-loop-distribute on by default would mean that we could
> > consider loop distribution to be usually beneficial without causing
> > major regressions. We need a lot more data to support that conclusion.
> >
> > Alternatively, we could consider loop-distribution a canonicalization.
> > A later LoopFuse would do the profitability heuristic to re-fuse loops
> > again if loop distribution did not gain anything.
> >
> > Michael-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210621/45330e07/attachment.html>

Michael Kruse via llvm-dev

2021-Jun-21 18:12 UTC

head link

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

[adding nikc to CC]

@nikc Would you consider this amount of regression acceptable?



Am Mo., 21. Juni 2021 um 12:54 Uhr schrieb Jingu Kang <Jingu.Kang at
arm.com>:
> The compile time data is as below. There could be a bit noise but it looks
> there is no big compile time regression.
>
>
>
> From llvm-test-suite
>
>
>
> Metric: compile_time
>
>
>
> Program                                        results_base
> results_loop_dist diff
>
>  test-suite...arks/VersaBench/dbms/dbms.test     0.94
> 0.95             1.6%
>
> test-suite...s/MallocBench/cfrac/cfrac.test     0.89
> 0.90             1.5%
>
> test-suite...ks/Prolangs-C/gnugo/gnugo.test     0.72
> 0.73             1.4%
>
> test-suite...yApps-C++/PENNANT/PENNANT.test     8.65
> 8.75             1.2%
>
> test-suite...marks/Ptrdist/yacr2/yacr2.test     0.84
> 0.85             1.1%
>
> test-suite.../Builtins/Int128/Builtins.test     0.86
> 0.87             1.0%
>
> test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test     0.69
> 0.70             1.0%
>
> test-suite...decode/alacconvert-decode.test     1.16
> 1.17             0.9%
>
> test-suite...encode/alacconvert-encode.test     1.16
> 1.17             0.9%
>
> test-suite...peg2/mpeg2dec/mpeg2decode.test     1.71
> 1.72             0.9%
>
> test-suite.../Applications/spiff/spiff.test     0.88
> 0.89             0.9%
>
> test-suite...terpolation/Interpolation.test     0.96
> 0.97             0.9%
>
> test-suite...chmarks/MallocBench/gs/gs.test     4.58
> 4.62             0.9%
>
> test-suite...-C++/stepanov_abstraction.test     0.69
> 0.70             0.8%
>
> test-suite...marks/7zip/7zip-benchmark.test    52.35
> 52.74             0.7%
>
> Geomean difference
>                                                            nan%
>
>        results_base  results_loop_dist        diff
>
> count  117.000000    118.000000         117.000000
>
> mean   4.636126      4.616575           0.002171
>
> std    7.725991      7.737663           0.006310
>
> min    0.607300      0.602200          -0.041930
>
> 25%    1.345700      1.313650          -0.001577
>
> 50%    1.887000      1.888800           0.002463
>
> 75%    4.340800      4.343275           0.005754
>
> max    52.351200     52.736000          0.015861
>
>
>
> From SPEC2017
>
> benchmarks
>
> baseline
>
> enable-loop-distribute
>
> diff (seconds)
>
> 500.perlbench_r
>
> 00:01:06
>
> 00:01:04
>
> -2
>
> 502.gcc_r
>
> 00:05:24
>
> 00:05:25
>
> 1
>
> 505.mcf_r
>
> 00:00:02
>
> 00:00:02
>
> 0
>
> 520.omnetpp_r
>
> 00:00:58
>
> 00:00:58
>
> 0
>
> 523.xalancbmk_r
>
> 00:02:30
>
> 00:02:30
>
> 0
>
> 525.x264_r
>
> 00:00:32
>
> 00:00:31
>
> -1
>
> 531.deepsjeng_r
>
> 00:00:04
>
> 00:00:04
>
> 0
>
> 541.leela_r
>
> 00:00:06
>
> 00:00:06
>
> 0
>
> 557.xz_r
>
> 00:00:05
>
> 00:00:05
>
> 0
>
> 999.specrand_ir
>
> 00:00:01
>
> 00:00:00
>
> 1
>
>
>
> From SPEC2006 (number is seconds)
>
> benchmarks
>
> baseline
>
> enable-loop-distribute
>
> diff (seconds)
>
> 400.perlbench
>
> 00:00:29
>
> 00:00:29
>
> 0
>
> 401.bzip2
>
> 00:00:04
>
> 00:00:03
>
> -1
>
> 403.gcc
>
> 00:01:28
>
> 00:01:26
>
> -2
>
> 429.mcf
>
> 00:00:01
>
> 00:00:01
>
> 0
>
> 445.gobmk
>
> 00:00:24
>
> 00:00:24
>
> 0
>
> 456.hmmer
>
> 00:00:06
>
> 00:00:06
>
> 0
>
> 458.sjeng
>
> 00:00:03
>
> 00:00:03
>
> 0
>
> 462.libquantum
>
> 00:00:03
>
> 00:00:02
>
> -1
>
> 464.h264ref
>
> 00:00:29
>
> 00:00:29
>
> 0
>
> 471.omnetpp
>
> 00:00:23
>
> 00:00:24
>
> 1
>
> 473.astar
>
> 00:00:02
>
> 00:00:02
>
> 0
>
> 483.xalancbmk
>
> 00:02:07
>
> 00:02:06
>
> -1
>
> 999.specrand
>
> 00:00:01
>
> 00:00:01
>
> 0
>
>
>
> Thanks
>
> JinGu Kang
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Sjoerd
> Meijer via llvm-dev
> *Sent:* 21 June 2021 14:36
> *To:* Jingu Kang <Jingu.Kang at arm.com>; Michael Kruse <
> llvmdev at meinersbur.de>; Kyrylo Tkachov <Kyrylo.Tkachov at
arm.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Enabling Loop Distribution Pass as default in
> the pipeline of new pass manager
>
>
>
> > Based on this data, I think we could say the pass is usually
beneficial
> without causing major regression.
>
>
>
> I think we need to look at compile-times too before we can draw that
> conclusion, i.e. we need to justify it's worth spending extra
compile-time
> for optimising a few cases. Hopefully loop distribution is a cheap pass to
> run (also when it is running but not triggering), but that's something
that
> needs to be checked I think.
> ------------------------------
>
> *From:* Jingu Kang <Jingu.Kang at arm.com>
> *Sent:* 21 June 2021 14:27
> *To:* Michael Kruse <llvmdev at meinersbur.de>; Kyrylo Tkachov <
> Kyrylo.Tkachov at arm.com>; Sjoerd Meijer <Sjoerd.Meijer at
arm.com>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* RE: [llvm-dev] Enabling Loop Distribution Pass as default in
> the pipeline of new pass manager
>
>
>
> For considering the LoopDistribute pass as a canonicalization with the
> profitability heuristic of LoopFuse pass, it looks the LoopFuse pass does
> not also have proper profitability function.
>
> If possible, I would like to enable the LoopDistribute pass based on the
> performance data.
>
> As you can see on the previous email, the Geomean difference from
> llvm-test-suite is -0.0%. From spec benchmarks, we can see 43% performance
> improvement on 456.hmmer of SPEC2006. Based on this data, I think we could
> say the pass is usually beneficial without causing major regression.
>
> How do you think about it?
>
> Thanks
> JinGu Kang
>
> > -----Original Message-----
> > From: Jingu Kang
> > Sent: 18 June 2021 13:13
> > To: Michael Kruse <llvmdev at meinersbur.de>; Kyrylo Tkachov
> > <Kyrylo.Tkachov at arm.com>; Sjoerd Meijer <Sjoerd.Meijer at
arm.com>
> > Cc: llvm-dev at lists.llvm.org
> > Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in
> the pipeline
> > of new pass manager
> >
> > I appreciate your replies. I have seen below performance data.
> >
> > For AArch64, the performance data from llvm-test-suite is as below.
> >
> > Metric: exec_time
> >
> > Program                                        results_base
> results_loop_dist diff
> >  test-suite...ications/JM/lencod/lencod.test     3.95
> 4.29             8.8%
> >  test-suite...emCmp<5, GreaterThanZero, Mid>   1456.09
> 1574.29            8.1%
> >  test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217    22.83
> 24.50             7.3%
> >  test-suite....test:BM_BAND_LIN_EQ_RAW/44217    23.00
> 24.17             5.1%
> >  test-suite...st:BM_INT_PREDICT_LAMBDA/44217   589.54
> 616.70             4.6%
> >  test-suite...t:BENCHMARK_asin_novec_double_   330.25
> 342.17             3.6%
> >  test-suite...ow-dbl/GlobalDataFlow-dbl.test     2.58
> 2.67             3.3%
> >  test-suite...da.test:BM_PIC_2D_LAMBDA/44217   781.30
> 806.36             3.2%
> >  test-suite...est:BM_ENERGY_CALC_LAMBDA/5001    63.02
> 64.93             3.0%
> >  test-suite...gebra/kernels/syr2k/syr2k.test     6.53
> 6.73             3.0%
> >  test-suite...t/StatementReordering-flt.test     2.33
> 2.40             2.8%
> >  test-suite...sCRaw.test:BM_PIC_2D_RAW/44217   789.90
> 810.05             2.6%
> >  test-suite...s/gramschmidt/gramschmidt.test     1.44
> 1.48             2.5%
> >  test-suite...Raw.test:BM_HYDRO_1D_RAW/44217    38.42
> 39.37             2.5%
> >  test-suite....test:BM_INT_PREDICT_RAW/44217   597.73
> 612.34             2.4%
> >  Geomean
> difference                                                           -0.0%
> >         results_base  results_loop_dist        diff
> > count  584.000000     584.000000         584.000000
> > mean   2761.681991    2759.451499       -0.000020
> > std    30145.555650   30124.858004       0.011093
> > min    0.608782       0.608729          -0.116286
> > 25%    3.125425       3.106625          -0.000461
> > 50%    130.212207     130.582658         0.000004
> > 75%    602.708659     612.931769         0.000438
> > max    511340.880000  511059.980000      0.087630
> >
> > For AArch64, the performance data from SPEC benchmark is as below.
> >
> > SPEC2006
> > Benchmark             Improvement(%)
> > 400.perlbench         -1.786911228
> > 401.bzip2             -3.174199894
> > 403.gcc               0.717990522
> > 429.mcf               2.053027806
> > 445.gobmk             0.775388165
> > 456.hmmer             43.39308377
> > 458.sjeng             0.133933093
> > 462.libquantum                4.647923489
> > 464.h264ref           -0.059568786
> > 471.omnetpp           1.352515266
> > 473.astar             0.362752409
> > 483.xalancbmk         0.746580249
> >
> > SPEC2017
> > Benchmark             Improvement(%)
> > 500.perlbench_r               0.415424516
> > 502.gcc_r             -0.112915812
> > 505.mcf_r             0.238633706
> > 520.omnetpp_r         0.114830748
> > 523.xalancbmk_r               0.460107636
> > 525.x264_r            -0.401915964
> > 531.deepsjeng_r               0.010064227
> > 541.leela_r           0.394797504
> > 557.xz_r              0.111781366
> >
> > Thanks
> > JinGu Kang
> >
> > > -----Original Message-----
> > > From: Michael Kruse <llvmdev at meinersbur.de>
> > > Sent: 17 June 2021 19:13
> > > To: Jingu Kang <Jingu.Kang at arm.com>
> > > Cc: llvm-dev at lists.llvm.org
> > > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as
default in
> > > the pipeline of new pass manager
> > >
> > > The LoopDistribute pass doesn't do anything unless it sees
> > > llvm.loop.distribute.enable (`#pragma clang loop
distribute(enable)`)
> > > because it does not have a profitability heuristic. It cannot say
> > > whether loop distribution is good for performance or not. What
makes
> > > it improve hmmer is that the distributed loops can be vectoried.
> > > However, LoopDistribute is located before the vectorizer and
cannot
> > > say in advance whether a distributed loop will be vectorized or
not.
> > > If not, then it potentially only increased loop overhead.
> > >
> > > To make -enable-loop-distribute on by default would mean that we
could
> > > consider loop distribution to be usually beneficial without
causing
> > > major regressions. We need a lot more data to support that
conclusion.
> > >
> > > Alternatively, we could consider loop-distribution a
canonicalization.
> > > A later LoopFuse would do the profitability heuristic to re-fuse
loops
> > > again if loop distribution did not gain anything.
> > >
> > > Michael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210621/4f8dce72/attachment.html>

llvm dev - Jun 2021 - Enabling Loop Distribution Pass as default in the pipeline of new pass manager

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager

[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager