thr3ads.net - llvm dev - [llvm-dev] Saving Compile Time in InstCombine [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Mikhail Zolotukhin via llvm-dev

2017-Mar-17 18:50 UTC

[llvm-dev] Saving Compile Time in InstCombine

Hi,

One of the most time-consuming passes in LLVM middle-end is InstCombine (see
e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and
new patterns are being constantly introduced there. The problem is that we often
use it just as a clean-up pass: it's scheduled 6 times in the current pass
pipeline, and each time it's invoked it checks all known patterns. It sounds
ok for O3, where we try to squeeze as much performance as possible, but it is
too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.

Trying to find out, which patterns are important, and which are rare, I profiled
clang using CTMark and got the following coverage report:

(beware, the file is ~6MB).

Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).

Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
CTMark/sqlite3/sqlite3
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.55%	6.8155
6.7102	0.0081
CTMark/mafft/pairlocalalign
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>	-1.05%	8.0407
7.9559	0.0193
CTMark/ClamAV/clamscan
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.02%	11.3893
11.2734	0.0081
CTMark/lencod/lencod
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>	-1.01%
12.8763	12.7461	0.0244
CTMark/SPASS/SPASS
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>	-1.01%	12.5048
12.3791	0.0340

Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
External/SPEC/CINT2006/403.gcc/403.gcc
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>	-1.64%
54.0801	53.1930	-
External/SPEC/CINT2006/400.perlbench/400.perlbench
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.25%	19.1481
18.9091	-
External/SPEC/CINT2006/445.gobmk/445.gobmk
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.01%
15.2819	15.1274	-


Do such changes make sense? The patch doesn't change O3, but it does change
Os and potentially can change performance there (though I didn't see any
changes in my tests).

The patch is attached for the reference, if we decide to go for it, I'll
upload it to phab:




Thanks,
Michael

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0004.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0005.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch
Type: application/octet-stream
Size: 33347 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0007.html>

Vedant Kumar via llvm-dev

2017-Mar-17 21:02 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> Hi,
> 
> One of the most time-consuming passes in LLVM middle-end is InstCombine
(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff,
and new patterns are being constantly introduced there. The problem is that we
often use it just as a clean-up pass: it's scheduled 6 times in the current
pass pipeline, and each time it's invoked it checks all known patterns. It
sounds ok for O3, where we try to squeeze as much performance as possible, but
it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.
> 
> Trying to find out, which patterns are important, and which are rare, I
profiled clang using CTMark and got the following coverage report:
> <InstCombine_covreport.html>
> (beware, the file is ~6MB).
> 
> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
It'd be nice to double-check that any runtime performance loss at -O2 is
negligible. But this sounds like a great idea!

vedant
> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
> CTMark/sqlite3/sqlite3	-1.55%	6.8155	6.7102	0.0081
> CTMark/mafft/pairlocalalign	-1.05%	8.0407	7.9559	0.0193
> CTMark/ClamAV/clamscan	-1.02%	11.3893	11.2734	0.0081
> CTMark/lencod/lencod	-1.01%	12.8763	12.7461	0.0244
> CTMark/SPASS/SPASS	-1.01%	12.5048	12.3791	0.0340
> 
> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
> External/SPEC/CINT2006/403.gcc/403.gcc	-1.64%	54.0801	53.1930	-
> External/SPEC/CINT2006/400.perlbench/400.perlbench	-1.25%	19.1481	18.9091	-
> External/SPEC/CINT2006/445.gobmk/445.gobmk	-1.01%	15.2819	15.1274	-
> 
> 
> Do such changes make sense? The patch doesn't change O3, but it does
change Os and potentially can change performance there (though I didn't see
any changes in my tests).
> 
> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
> 
> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
> 
> 
> Thanks,
> Michael
> 
> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mikhail Zolotukhin via llvm-dev

2017-Mar-17 21:22 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

> On Mar 17, 2017, at 2:02 PM, Vedant Kumar <vsk at apple.com> wrote:
> 
>> 
>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> 
>> Hi,
>> 
>> One of the most time-consuming passes in LLVM middle-end is InstCombine
(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff,
and new patterns are being constantly introduced there. The problem is that we
often use it just as a clean-up pass: it's scheduled 6 times in the current
pass pipeline, and each time it's invoked it checks all known patterns. It
sounds ok for O3, where we try to squeeze as much performance as possible, but
it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.
>> 
>> Trying to find out, which patterns are important, and which are rare, I
profiled clang using CTMark and got the following coverage report:
>> <InstCombine_covreport.html>
>> (beware, the file is ~6MB).
>> 
>> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
> 
> It'd be nice to double-check that any runtime performance loss at -O2
is negligible. But this sounds like a great idea!I forgot to mention that I ran SPEC2006/INT with "-Os" on ARM64 and
didn't see any changes in runtime performance. I can run O2 testing as well
over the weekend.

Michael> 
> vedant
> 
>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>> CTMark/sqlite3/sqlite3	-1.55%	6.8155	6.7102	0.0081
>> CTMark/mafft/pairlocalalign	-1.05%	8.0407	7.9559	0.0193
>> CTMark/ClamAV/clamscan	-1.02%	11.3893	11.2734	0.0081
>> CTMark/lencod/lencod	-1.01%	12.8763	12.7461	0.0244
>> CTMark/SPASS/SPASS	-1.01%	12.5048	12.3791	0.0340
>> 
>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>> External/SPEC/CINT2006/403.gcc/403.gcc	-1.64%	54.0801	53.1930	-
>> External/SPEC/CINT2006/400.perlbench/400.perlbench	-1.25%	19.1481
18.9091	-
>> External/SPEC/CINT2006/445.gobmk/445.gobmk	-1.01%	15.2819	15.1274	-
>> 
>> 
>> Do such changes make sense? The patch doesn't change O3, but it
does change Os and potentially can change performance there (though I didn't
see any changes in my tests).
>> 
>> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
>> 
>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>> 
>> 
>> Thanks,
>> Michael
>> 
>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/723f7037/attachment.html>

Mehdi Amini via llvm-dev

2017-Mar-17 21:30 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> Hi,
> 
> One of the most time-consuming passes in LLVM middle-end is InstCombine
(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff,
and new patterns are being constantly introduced there. The problem is that we
often use it just as a clean-up pass: it's scheduled 6 times in the current
pass pipeline, and each time it's invoked it checks all known patterns. It
sounds ok for O3, where we try to squeeze as much performance as possible, but
it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.
Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I believe
has always been intended to be extended the way you’re doing it. So I support
this effort :)

CC: David for the general direction on InstCombine though.


— 
Mehdi


> 
> Trying to find out, which patterns are important, and which are rare, I
profiled clang using CTMark and got the following coverage report:
> <InstCombine_covreport.html>
> (beware, the file is ~6MB).
> 
> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
> 
> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
> CTMark/sqlite3/sqlite3
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.55%	6.8155
6.7102	0.0081
> CTMark/mafft/pairlocalalign
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>	-1.05%	8.0407
7.9559	0.0193
> CTMark/ClamAV/clamscan
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.02%	11.3893
11.2734	0.0081
> CTMark/lencod/lencod
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>	-1.01%
12.8763	12.7461	0.0244
> CTMark/SPASS/SPASS
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>	-1.01%	12.5048
12.3791	0.0340
> 
> Performance Improvements - Compile Time	Δ 	Previous	Current	σ
> External/SPEC/CINT2006/403.gcc/403.gcc
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>	-1.64%
54.0801	53.1930	-
> External/SPEC/CINT2006/400.perlbench/400.perlbench
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.25%	19.1481
18.9091	-
> External/SPEC/CINT2006/445.gobmk/445.gobmk
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.01%
15.2819	15.1274	-
> 
> 
> Do such changes make sense? The patch doesn't change O3, but it does
change Os and potentially can change performance there (though I didn't see
any changes in my tests).
> 
> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
> 
> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
> 
> 
> Thanks,
> Michael
> 
> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/e0cd7a31/attachment.html>

Matthias Braun via llvm-dev

2017-Mar-17 21:36 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

In general it is great that we investigate these things! We have been liberally
adding pass invocations and patterns for years without checking the compiletime
consequences.

However intuitively it feels wrong to disable some patterns completely (there
will always be that one program that gets so much better when you have a certain
pattern).
- Do you have an idea what would happen if we only disable them in 5 of the 6
invocations?
- Or alternatively what happens when we just not put as many InstCombine
instances into the pass pipeline in -Os?

- Matthias
> On Mar 17, 2017, at 2:30 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> 
>> Hi,
>> 
>> One of the most time-consuming passes in LLVM middle-end is InstCombine
(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff,
and new patterns are being constantly introduced there. The problem is that we
often use it just as a clean-up pass: it's scheduled 6 times in the current
pass pipeline, and each time it's invoked it checks all known patterns. It
sounds ok for O3, where we try to squeeze as much performance as possible, but
it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.
> 
> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I
believe has always been intended to be extended the way you’re doing it. So I
support this effort :)
> 
> CC: David for the general direction on InstCombine though.
> 
> 
> — 
> Mehdi
> 
> 
> 
>> 
>> Trying to find out, which patterns are important, and which are rare, I
profiled clang using CTMark and got the following coverage report:
>> <InstCombine_covreport.html>
>> (beware, the file is ~6MB).
>> 
>> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
>> 
>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>> CTMark/sqlite3/sqlite3
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.55%	6.8155
6.7102	0.0081
>> CTMark/mafft/pairlocalalign
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>	-1.05%	8.0407
7.9559	0.0193
>> CTMark/ClamAV/clamscan
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.02%	11.3893
11.2734	0.0081
>> CTMark/lencod/lencod
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>	-1.01%
12.8763	12.7461	0.0244
>> CTMark/SPASS/SPASS
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>	-1.01%	12.5048
12.3791	0.0340
>> 
>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>> External/SPEC/CINT2006/403.gcc/403.gcc
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>	-1.64%
54.0801	53.1930	-
>> External/SPEC/CINT2006/400.perlbench/400.perlbench
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.25%	19.1481
18.9091	-
>> External/SPEC/CINT2006/445.gobmk/445.gobmk
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.01%
15.2819	15.1274	-
>> 
>> 
>> Do such changes make sense? The patch doesn't change O3, but it
does change Os and potentially can change performance there (though I didn't
see any changes in my tests).
>> 
>> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
>> 
>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>> 
>> 
>> Thanks,
>> Michael
>> 
>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/207de2f9/attachment-0001.html>

Hal Finkel via llvm-dev

2017-Mar-18 00:49 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:>
>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> Hi,
>>
>> One of the most time-consuming passes in LLVM middle-end is 
>> InstCombine (see e.g. [1]). It is a very powerful pass capable of 
>> doing all the crazy stuff, and new patterns are being constantly 
>> introduced there. The problem is that we often use it just as a 
>> clean-up pass: it's scheduled 6 times in the current pass pipeline,
>> and each time it's invoked it checks all known patterns. It sounds
ok
>> for O3, where we try to squeeze as much performance as possible, but 
>> it is too excessive for other opt-levels. InstCombine has an 
>> ExpensiveCombines parameter to address that - but I think it's 
>> underused at the moment.
>
> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I 
> believe has always been intended to be extended the way you’re doing 
> it. So I support this effort :)
+1

Also, did your profiling reveal why the other combines are expensive? 
Among other things, I'm curious if the expensive ones tend to spend a 
lot of time in ValueTracking (getting known bits and similar)?

  -Hal
>
> CC: David for the general direction on InstCombine though.
>
>
> —
> Mehdi
>
>
>
>>
>> Trying to find out, which patterns are important, and which are rare, 
>> I profiled clang using CTMark and got the following coverage report:
>> <InstCombine_covreport.html>
>> (beware, the file is ~6MB).
>>
>> Guided by this profile I moved some patterns under the "if 
>> (ExpensiveCombines)" check, which expectedly happened to be
neutral
>> for runtime performance, but improved compile-time. The testing 
>> results are below (measured for Os).
>>
>> Performance Improvements - Compile Time 	Δ 	Previous 	Current 	σ
>> CTMark/sqlite3/sqlite3 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> 
-1.55%
>> 6.8155 	6.7102 	0.0081
>> CTMark/mafft/pairlocalalign 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> 
-1.05%
>> 8.0407 	7.9559 	0.0193
>> CTMark/ClamAV/clamscan 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> 
-1.02%
>> 11.3893 	11.2734 	0.0081
>> CTMark/lencod/lencod 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> 
-1.01%
>> 12.8763 	12.7461 	0.0244
>> CTMark/SPASS/SPASS 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> 
-1.01%
>> 12.5048 	12.3791 	0.0340
>>
>>
>> Performance Improvements - Compile Time 	Δ 	Previous 	Current 	σ
>> External/SPEC/CINT2006/403.gcc/403.gcc 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> 
-1.64%
>> 54.0801 	53.1930 	-
>> External/SPEC/CINT2006/400.perlbench/400.perlbench 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> 
-1.25%
>> 19.1481 	18.9091 	-
>> External/SPEC/CINT2006/445.gobmk/445.gobmk 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> 
-1.01%
>> 15.2819 	15.1274 	-
>>
>>
>>
>> Do such changes make sense? The patch doesn't change O3, but it
does
>> change Os and potentially can change performance there (though I 
>> didn't see any changes in my tests).
>>
>> The patch is attached for the reference, if we decide to go for it, 
>> I'll upload it to phab:
>>
>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>
>>
>> Thanks,
>> Michael
>>
>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/01fcada7/attachment.html>

Davide Italiano via llvm-dev

2017-Mar-21 18:12 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

On Fri, Mar 17, 2017 at 11:50 AM, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi,
>
> One of the most time-consuming passes in LLVM middle-end is InstCombine
(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff,
and new patterns are being constantly introduced there. The problem is that we
often use it just as a clean-up pass: it's scheduled 6 times in the current
pass pipeline, and each time it's invoked it checks all known patterns. It
sounds ok for O3, where we try to squeeze as much performance as possible, but
it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines
parameter to address that - but I think it's underused at the moment.
>
> Trying to find out, which patterns are important, and which are rare, I
profiled clang using CTMark and got the following coverage report:
>
> (beware, the file is ~6MB).
>
> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
>
As somebody who brought up this problem at least once in the mailing
lists, I'm in agreement with David Majnemer here.
I think we should consider a caching strategy before going this route.
FWIW, I'm not a big fan of `ExpensiveCombines` at all, I can see the
reason why it was introduced, but in my experience the "expensive"
bits of Instcombine comes from the implementation of bitwise domain,
i.e. known bits & friends, so at least evaluating caching is something
I would try earlier.

Something else that can be tried (even if it doesn't improve compile
time is still a nice cleanup) is that of moving combines not creating
new instructions from instcombine to instsimplify. Many passes use
instruction simplify so that might result in the amount of code that's
processed by instcombine being smaller and/or could result in improved
code quality. Just speculations, but a decent experiment if somebody
has time to take a look at.

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Daniel Berlin via llvm-dev

2017-Mar-21 18:45 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

So, just a thought:
"The purpose of many of InstCombine's xforms is to canonicalize the IR
to
make life easier for downstream passes and analyses."
That sounds sane.

So, are the expensive things canonicalization?
If that is the case, why are we doing such expensive canonicalization?
That seems strange to me.

If they are not canonicalization, should they really not be separated out
(into some pass that possibly shares infrastructure)?

No compiler is going to get everything anyway, and instcombine needs to
decide what "good enough" really means.

I would rather see us understand what we want out of instcombine,
precisely, before we try to decide how to make it faster at doing whatever
that thing is :)


--Dan



On Tue, Mar 21, 2017 at 11:12 AM, Davide Italiano via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Fri, Mar 17, 2017 at 11:50 AM, Mikhail Zolotukhin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > One of the most time-consuming passes in LLVM middle-end is
InstCombine
> (see e.g. [1]). It is a very powerful pass capable of doing all the crazy
> stuff, and new patterns are being constantly introduced there. The problem
> is that we often use it just as a clean-up pass: it's scheduled 6 times
in
> the current pass pipeline, and each time it's invoked it checks all
known
> patterns. It sounds ok for O3, where we try to squeeze as much performance
> as possible, but it is too excessive for other opt-levels. InstCombine has
> an ExpensiveCombines parameter to address that - but I think it's
underused
> at the moment.
> >
> > Trying to find out, which patterns are important, and which are rare,
I
> profiled clang using CTMark and got the following coverage report:
> >
> > (beware, the file is ~6MB).
> >
> > Guided by this profile I moved some patterns under the "if
> (ExpensiveCombines)" check, which expectedly happened to be neutral
for
> runtime performance, but improved compile-time. The testing results are
> below (measured for Os).
> >
>
> As somebody who brought up this problem at least once in the mailing
> lists, I'm in agreement with David Majnemer here.
> I think we should consider a caching strategy before going this route.
> FWIW, I'm not a big fan of `ExpensiveCombines` at all, I can see the
> reason why it was introduced, but in my experience the
"expensive"
> bits of Instcombine comes from the implementation of bitwise domain,
> i.e. known bits & friends, so at least evaluating caching is something
> I would try earlier.
>
> Something else that can be tried (even if it doesn't improve compile
> time is still a nice cleanup) is that of moving combines not creating
> new instructions from instcombine to instsimplify. Many passes use
> instruction simplify so that might result in the amount of code that's
> processed by instcombine being smaller and/or could result in improved
> code quality. Just speculations, but a decent experiment if somebody
> has time to take a look at.
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170321/c6a52f32/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Mar 2017 - Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

Reasonably Related Threads