Michael Kruse via llvm-dev
2021-Jun-21 18:12 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
[adding nikc to CC] @nikc Would you consider this amount of regression acceptable? Am Mo., 21. Juni 2021 um 12:54 Uhr schrieb Jingu Kang <Jingu.Kang at arm.com>:> The compile time data is as below. There could be a bit noise but it looks > there is no big compile time regression. > > > > From llvm-test-suite > > > > Metric: compile_time > > > > Program results_base > results_loop_dist diff > > test-suite...arks/VersaBench/dbms/dbms.test 0.94 > 0.95 1.6% > > test-suite...s/MallocBench/cfrac/cfrac.test 0.89 > 0.90 1.5% > > test-suite...ks/Prolangs-C/gnugo/gnugo.test 0.72 > 0.73 1.4% > > test-suite...yApps-C++/PENNANT/PENNANT.test 8.65 > 8.75 1.2% > > test-suite...marks/Ptrdist/yacr2/yacr2.test 0.84 > 0.85 1.1% > > test-suite.../Builtins/Int128/Builtins.test 0.86 > 0.87 1.0% > > test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 0.69 > 0.70 1.0% > > test-suite...decode/alacconvert-decode.test 1.16 > 1.17 0.9% > > test-suite...encode/alacconvert-encode.test 1.16 > 1.17 0.9% > > test-suite...peg2/mpeg2dec/mpeg2decode.test 1.71 > 1.72 0.9% > > test-suite.../Applications/spiff/spiff.test 0.88 > 0.89 0.9% > > test-suite...terpolation/Interpolation.test 0.96 > 0.97 0.9% > > test-suite...chmarks/MallocBench/gs/gs.test 4.58 > 4.62 0.9% > > test-suite...-C++/stepanov_abstraction.test 0.69 > 0.70 0.8% > > test-suite...marks/7zip/7zip-benchmark.test 52.35 > 52.74 0.7% > > Geomean difference > nan% > > results_base results_loop_dist diff > > count 117.000000 118.000000 117.000000 > > mean 4.636126 4.616575 0.002171 > > std 7.725991 7.737663 0.006310 > > min 0.607300 0.602200 -0.041930 > > 25% 1.345700 1.313650 -0.001577 > > 50% 1.887000 1.888800 0.002463 > > 75% 4.340800 4.343275 0.005754 > > max 52.351200 52.736000 0.015861 > > > > From SPEC2017 > > benchmarks > > baseline > > enable-loop-distribute > > diff (seconds) > > 500.perlbench_r > > 00:01:06 > > 00:01:04 > > -2 > > 502.gcc_r > > 00:05:24 > > 00:05:25 > > 1 > > 505.mcf_r > > 00:00:02 > > 00:00:02 > > 0 > > 520.omnetpp_r > > 00:00:58 > > 00:00:58 > > 0 > > 523.xalancbmk_r > > 00:02:30 > > 00:02:30 > > 0 > > 525.x264_r > > 00:00:32 > > 00:00:31 > > -1 > > 531.deepsjeng_r > > 00:00:04 > > 00:00:04 > > 0 > > 541.leela_r > > 00:00:06 > > 00:00:06 > > 0 > > 557.xz_r > > 00:00:05 > > 00:00:05 > > 0 > > 999.specrand_ir > > 00:00:01 > > 00:00:00 > > 1 > > > > From SPEC2006 (number is seconds) > > benchmarks > > baseline > > enable-loop-distribute > > diff (seconds) > > 400.perlbench > > 00:00:29 > > 00:00:29 > > 0 > > 401.bzip2 > > 00:00:04 > > 00:00:03 > > -1 > > 403.gcc > > 00:01:28 > > 00:01:26 > > -2 > > 429.mcf > > 00:00:01 > > 00:00:01 > > 0 > > 445.gobmk > > 00:00:24 > > 00:00:24 > > 0 > > 456.hmmer > > 00:00:06 > > 00:00:06 > > 0 > > 458.sjeng > > 00:00:03 > > 00:00:03 > > 0 > > 462.libquantum > > 00:00:03 > > 00:00:02 > > -1 > > 464.h264ref > > 00:00:29 > > 00:00:29 > > 0 > > 471.omnetpp > > 00:00:23 > > 00:00:24 > > 1 > > 473.astar > > 00:00:02 > > 00:00:02 > > 0 > > 483.xalancbmk > > 00:02:07 > > 00:02:06 > > -1 > > 999.specrand > > 00:00:01 > > 00:00:01 > > 0 > > > > Thanks > > JinGu Kang > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Sjoerd > Meijer via llvm-dev > *Sent:* 21 June 2021 14:36 > *To:* Jingu Kang <Jingu.Kang at arm.com>; Michael Kruse < > llvmdev at meinersbur.de>; Kyrylo Tkachov <Kyrylo.Tkachov at arm.com> > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Enabling Loop Distribution Pass as default in > the pipeline of new pass manager > > > > > Based on this data, I think we could say the pass is usually beneficial > without causing major regression. > > > > I think we need to look at compile-times too before we can draw that > conclusion, i.e. we need to justify it's worth spending extra compile-time > for optimising a few cases. Hopefully loop distribution is a cheap pass to > run (also when it is running but not triggering), but that's something that > needs to be checked I think. > ------------------------------ > > *From:* Jingu Kang <Jingu.Kang at arm.com> > *Sent:* 21 June 2021 14:27 > *To:* Michael Kruse <llvmdev at meinersbur.de>; Kyrylo Tkachov < > Kyrylo.Tkachov at arm.com>; Sjoerd Meijer <Sjoerd.Meijer at arm.com> > *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> > *Subject:* RE: [llvm-dev] Enabling Loop Distribution Pass as default in > the pipeline of new pass manager > > > > For considering the LoopDistribute pass as a canonicalization with the > profitability heuristic of LoopFuse pass, it looks the LoopFuse pass does > not also have proper profitability function. > > If possible, I would like to enable the LoopDistribute pass based on the > performance data. > > As you can see on the previous email, the Geomean difference from > llvm-test-suite is -0.0%. From spec benchmarks, we can see 43% performance > improvement on 456.hmmer of SPEC2006. Based on this data, I think we could > say the pass is usually beneficial without causing major regression. > > How do you think about it? > > Thanks > JinGu Kang > > > -----Original Message----- > > From: Jingu Kang > > Sent: 18 June 2021 13:13 > > To: Michael Kruse <llvmdev at meinersbur.de>; Kyrylo Tkachov > > <Kyrylo.Tkachov at arm.com>; Sjoerd Meijer <Sjoerd.Meijer at arm.com> > > Cc: llvm-dev at lists.llvm.org > > Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in > the pipeline > > of new pass manager > > > > I appreciate your replies. I have seen below performance data. > > > > For AArch64, the performance data from llvm-test-suite is as below. > > > > Metric: exec_time > > > > Program results_base > results_loop_dist diff > > test-suite...ications/JM/lencod/lencod.test 3.95 > 4.29 8.8% > > test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 > 1574.29 8.1% > > test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 > 24.50 7.3% > > test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 > 24.17 5.1% > > test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 > 616.70 4.6% > > test-suite...t:BENCHMARK_asin_novec_double_ 330.25 > 342.17 3.6% > > test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 > 2.67 3.3% > > test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 > 806.36 3.2% > > test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 > 64.93 3.0% > > test-suite...gebra/kernels/syr2k/syr2k.test 6.53 > 6.73 3.0% > > test-suite...t/StatementReordering-flt.test 2.33 > 2.40 2.8% > > test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 > 810.05 2.6% > > test-suite...s/gramschmidt/gramschmidt.test 1.44 > 1.48 2.5% > > test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 > 39.37 2.5% > > test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 > 612.34 2.4% > > Geomean > difference -0.0% > > results_base results_loop_dist diff > > count 584.000000 584.000000 584.000000 > > mean 2761.681991 2759.451499 -0.000020 > > std 30145.555650 30124.858004 0.011093 > > min 0.608782 0.608729 -0.116286 > > 25% 3.125425 3.106625 -0.000461 > > 50% 130.212207 130.582658 0.000004 > > 75% 602.708659 612.931769 0.000438 > > max 511340.880000 511059.980000 0.087630 > > > > For AArch64, the performance data from SPEC benchmark is as below. > > > > SPEC2006 > > Benchmark Improvement(%) > > 400.perlbench -1.786911228 > > 401.bzip2 -3.174199894 > > 403.gcc 0.717990522 > > 429.mcf 2.053027806 > > 445.gobmk 0.775388165 > > 456.hmmer 43.39308377 > > 458.sjeng 0.133933093 > > 462.libquantum 4.647923489 > > 464.h264ref -0.059568786 > > 471.omnetpp 1.352515266 > > 473.astar 0.362752409 > > 483.xalancbmk 0.746580249 > > > > SPEC2017 > > Benchmark Improvement(%) > > 500.perlbench_r 0.415424516 > > 502.gcc_r -0.112915812 > > 505.mcf_r 0.238633706 > > 520.omnetpp_r 0.114830748 > > 523.xalancbmk_r 0.460107636 > > 525.x264_r -0.401915964 > > 531.deepsjeng_r 0.010064227 > > 541.leela_r 0.394797504 > > 557.xz_r 0.111781366 > > > > Thanks > > JinGu Kang > > > > > -----Original Message----- > > > From: Michael Kruse <llvmdev at meinersbur.de> > > > Sent: 17 June 2021 19:13 > > > To: Jingu Kang <Jingu.Kang at arm.com> > > > Cc: llvm-dev at lists.llvm.org > > > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in > > > the pipeline of new pass manager > > > > > > The LoopDistribute pass doesn't do anything unless it sees > > > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`) > > > because it does not have a profitability heuristic. It cannot say > > > whether loop distribution is good for performance or not. What makes > > > it improve hmmer is that the distributed loops can be vectoried. > > > However, LoopDistribute is located before the vectorizer and cannot > > > say in advance whether a distributed loop will be vectorized or not. > > > If not, then it potentially only increased loop overhead. > > > > > > To make -enable-loop-distribute on by default would mean that we could > > > consider loop distribution to be usually beneficial without causing > > > major regressions. We need a lot more data to support that conclusion. > > > > > > Alternatively, we could consider loop-distribution a canonicalization. > > > A later LoopFuse would do the profitability heuristic to re-fuse loops > > > again if loop distribution did not gain anything. > > > > > > Michael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210621/4f8dce72/attachment.html>
Jingu Kang via llvm-dev
2021-Jun-22 17:10 UTC
[llvm-dev] Enabling Loop Distribution Pass as default in the pipeline of new pass manager
@nikic<mailto:nikic at php.net> If you need more information for loop
distribute pass, please let me know.
Thanks
JinGu Kang
From: Michael Kruse <llvmdev at meinersbur.de>
Sent: 21 June 2021 19:12
To: Jingu Kang <Jingu.Kang at arm.com>
Cc: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Michael Kruse <llvmdev at
meinersbur.de>; Kyrylo Tkachov <Kyrylo.Tkachov at arm.com>; llvm-dev at
lists.llvm.org; nikic at php.net
Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline of new pass manager
[adding nikc to CC]
@nikc Would you consider this amount of regression acceptable?
Am Mo., 21. Juni 2021 um 12:54 Uhr schrieb Jingu Kang <Jingu.Kang at
arm.com<mailto:Jingu.Kang at arm.com>>:
The compile time data is as below. There could be a bit noise but it looks there
is no big compile time regression.
From llvm-test-suite
Metric: compile_time
Program results_base results_loop_dist
diff
test-suite...arks/VersaBench/dbms/dbms.test 0.94 0.95
1.6%
test-suite...s/MallocBench/cfrac/cfrac.test 0.89 0.90
1.5%
test-suite...ks/Prolangs-C/gnugo/gnugo.test 0.72 0.73
1.4%
test-suite...yApps-C++/PENNANT/PENNANT.test 8.65 8.75
1.2%
test-suite...marks/Ptrdist/yacr2/yacr2.test 0.84 0.85
1.1%
test-suite.../Builtins/Int128/Builtins.test 0.86 0.87
1.0%
test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 0.69 0.70
1.0%
test-suite...decode/alacconvert-decode.test 1.16 1.17
0.9%
test-suite...encode/alacconvert-encode.test 1.16 1.17
0.9%
test-suite...peg2/mpeg2dec/mpeg2decode.test 1.71 1.72
0.9%
test-suite.../Applications/spiff/spiff.test 0.88 0.89
0.9%
test-suite...terpolation/Interpolation.test 0.96 0.97
0.9%
test-suite...chmarks/MallocBench/gs/gs.test 4.58 4.62
0.9%
test-suite...-C++/stepanov_abstraction.test 0.69 0.70
0.8%
test-suite...marks/7zip/7zip-benchmark.test 52.35 52.74
0.7%
Geomean difference
nan%
results_base results_loop_dist diff
count 117.000000 118.000000 117.000000
mean 4.636126 4.616575 0.002171
std 7.725991 7.737663 0.006310
min 0.607300 0.602200 -0.041930
25% 1.345700 1.313650 -0.001577
50% 1.887000 1.888800 0.002463
75% 4.340800 4.343275 0.005754
max 52.351200 52.736000 0.015861
From SPEC2017
benchmarks
baseline
enable-loop-distribute
diff (seconds)
500.perlbench_r
00:01:06
00:01:04
-2
502.gcc_r
00:05:24
00:05:25
1
505.mcf_r
00:00:02
00:00:02
0
520.omnetpp_r
00:00:58
00:00:58
0
523.xalancbmk_r
00:02:30
00:02:30
0
525.x264_r
00:00:32
00:00:31
-1
531.deepsjeng_r
00:00:04
00:00:04
0
541.leela_r
00:00:06
00:00:06
0
557.xz_r
00:00:05
00:00:05
0
999.specrand_ir
00:00:01
00:00:00
1
From SPEC2006 (number is seconds)
benchmarks
baseline
enable-loop-distribute
diff (seconds)
400.perlbench
00:00:29
00:00:29
0
401.bzip2
00:00:04
00:00:03
-1
403.gcc
00:01:28
00:01:26
-2
429.mcf
00:00:01
00:00:01
0
445.gobmk
00:00:24
00:00:24
0
456.hmmer
00:00:06
00:00:06
0
458.sjeng
00:00:03
00:00:03
0
462.libquantum
00:00:03
00:00:02
-1
464.h264ref
00:00:29
00:00:29
0
471.omnetpp
00:00:23
00:00:24
1
473.astar
00:00:02
00:00:02
0
483.xalancbmk
00:02:07
00:02:06
-1
999.specrand
00:00:01
00:00:01
0
Thanks
JinGu Kang
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Sjoerd Meijer via llvm-dev
Sent: 21 June 2021 14:36
To: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at
arm.com>>; Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>; Kyrylo Tkachov <Kyrylo.Tkachov at
arm.com<mailto:Kyrylo.Tkachov at arm.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline of new pass manager
> Based on this data, I think we could say the pass is usually beneficial
without causing major regression.
I think we need to look at compile-times too before we can draw that conclusion,
i.e. we need to justify it's worth spending extra compile-time for
optimising a few cases. Hopefully loop distribution is a cheap pass to run (also
when it is running but not triggering), but that's something that needs to
be checked I think.
________________________________
From: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at
arm.com>>
Sent: 21 June 2021 14:27
To: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>; Kyrylo Tkachov <Kyrylo.Tkachov at
arm.com<mailto:Kyrylo.Tkachov at arm.com>>; Sjoerd Meijer
<Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline of new pass manager
For considering the LoopDistribute pass as a canonicalization with the
profitability heuristic of LoopFuse pass, it looks the LoopFuse pass does not
also have proper profitability function.
If possible, I would like to enable the LoopDistribute pass based on the
performance data.
As you can see on the previous email, the Geomean difference from
llvm-test-suite is -0.0%. From spec benchmarks, we can see 43% performance
improvement on 456.hmmer of SPEC2006. Based on this data, I think we could say
the pass is usually beneficial without causing major regression.
How do you think about it?
Thanks
JinGu Kang
> -----Original Message-----
> From: Jingu Kang
> Sent: 18 June 2021 13:13
> To: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>; Kyrylo Tkachov
> <Kyrylo.Tkachov at arm.com<mailto:Kyrylo.Tkachov at arm.com>>;
Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at
arm.com>>
> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> Subject: RE: [llvm-dev] Enabling Loop Distribution Pass as default in the
pipeline
> of new pass manager
>
> I appreciate your replies. I have seen below performance data.
>
> For AArch64, the performance data from llvm-test-suite is as below.
>
> Metric: exec_time
>
> Program results_base
results_loop_dist diff
> test-suite...ications/JM/lencod/lencod.test 3.95 4.29
8.8%
> test-suite...emCmp<5, GreaterThanZero, Mid> 1456.09 1574.29
8.1%
> test-suite...st:BM_BAND_LIN_EQ_LAMBDA/44217 22.83 24.50
7.3%
> test-suite....test:BM_BAND_LIN_EQ_RAW/44217 23.00 24.17
5.1%
> test-suite...st:BM_INT_PREDICT_LAMBDA/44217 589.54 616.70
4.6%
> test-suite...t:BENCHMARK_asin_novec_double_ 330.25 342.17
3.6%
> test-suite...ow-dbl/GlobalDataFlow-dbl.test 2.58 2.67
3.3%
> test-suite...da.test:BM_PIC_2D_LAMBDA/44217 781.30 806.36
3.2%
> test-suite...est:BM_ENERGY_CALC_LAMBDA/5001 63.02 64.93
3.0%
> test-suite...gebra/kernels/syr2k/syr2k.test 6.53 6.73
3.0%
> test-suite...t/StatementReordering-flt.test 2.33 2.40
2.8%
> test-suite...sCRaw.test:BM_PIC_2D_RAW/44217 789.90 810.05
2.6%
> test-suite...s/gramschmidt/gramschmidt.test 1.44 1.48
2.5%
> test-suite...Raw.test:BM_HYDRO_1D_RAW/44217 38.42 39.37
2.5%
> test-suite....test:BM_INT_PREDICT_RAW/44217 597.73 612.34
2.4%
> Geomean difference
-0.0%
> results_base results_loop_dist diff
> count 584.000000 584.000000 584.000000
> mean 2761.681991 2759.451499 -0.000020
> std 30145.555650 30124.858004 0.011093
> min 0.608782 0.608729 -0.116286
> 25% 3.125425 3.106625 -0.000461
> 50% 130.212207 130.582658 0.000004
> 75% 602.708659 612.931769 0.000438
> max 511340.880000 511059.980000 0.087630
>
> For AArch64, the performance data from SPEC benchmark is as below.
>
> SPEC2006
> Benchmark Improvement(%)
> 400.perlbench -1.786911228
> 401.bzip2 -3.174199894
> 403.gcc 0.717990522
> 429.mcf 2.053027806
> 445.gobmk 0.775388165
> 456.hmmer 43.39308377
> 458.sjeng 0.133933093
> 462.libquantum 4.647923489
> 464.h264ref -0.059568786
> 471.omnetpp 1.352515266
> 473.astar 0.362752409
> 483.xalancbmk 0.746580249
>
> SPEC2017
> Benchmark Improvement(%)
> 500.perlbench_r 0.415424516
> 502.gcc_r -0.112915812
> 505.mcf_r 0.238633706
> 520.omnetpp_r 0.114830748
> 523.xalancbmk_r 0.460107636
> 525.x264_r -0.401915964
> 531.deepsjeng_r 0.010064227
> 541.leela_r 0.394797504
> 557.xz_r 0.111781366
>
> Thanks
> JinGu Kang
>
> > -----Original Message-----
> > From: Michael Kruse <llvmdev at meinersbur.de<mailto:llvmdev at
meinersbur.de>>
> > Sent: 17 June 2021 19:13
> > To: Jingu Kang <Jingu.Kang at arm.com<mailto:Jingu.Kang at
arm.com>>
> > Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
> > Subject: Re: [llvm-dev] Enabling Loop Distribution Pass as default in
> > the pipeline of new pass manager
> >
> > The LoopDistribute pass doesn't do anything unless it sees
> > llvm.loop.distribute.enable (`#pragma clang loop distribute(enable)`)
> > because it does not have a profitability heuristic. It cannot say
> > whether loop distribution is good for performance or not. What makes
> > it improve hmmer is that the distributed loops can be vectoried.
> > However, LoopDistribute is located before the vectorizer and cannot
> > say in advance whether a distributed loop will be vectorized or not.
> > If not, then it potentially only increased loop overhead.
> >
> > To make -enable-loop-distribute on by default would mean that we could
> > consider loop distribution to be usually beneficial without causing
> > major regressions. We need a lot more data to support that conclusion.
> >
> > Alternatively, we could consider loop-distribution a canonicalization.
> > A later LoopFuse would do the profitability heuristic to re-fuse loops
> > again if loop distribution did not gain anything.
> >
> > Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/bdfc6942/attachment-0001.html>