thr3ads.net - llvm dev - [llvm-dev] Saving Compile Time in InstCombine [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Mar-18 00:49 UTC

[llvm-dev] Saving Compile Time in InstCombine

On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:>
>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> Hi,
>>
>> One of the most time-consuming passes in LLVM middle-end is 
>> InstCombine (see e.g. [1]). It is a very powerful pass capable of 
>> doing all the crazy stuff, and new patterns are being constantly 
>> introduced there. The problem is that we often use it just as a 
>> clean-up pass: it's scheduled 6 times in the current pass pipeline,
>> and each time it's invoked it checks all known patterns. It sounds
ok
>> for O3, where we try to squeeze as much performance as possible, but 
>> it is too excessive for other opt-levels. InstCombine has an 
>> ExpensiveCombines parameter to address that - but I think it's 
>> underused at the moment.
>
> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I 
> believe has always been intended to be extended the way you’re doing 
> it. So I support this effort :)
+1

Also, did your profiling reveal why the other combines are expensive? 
Among other things, I'm curious if the expensive ones tend to spend a 
lot of time in ValueTracking (getting known bits and similar)?

  -Hal
>
> CC: David for the general direction on InstCombine though.
>
>
> —
> Mehdi
>
>
>
>>
>> Trying to find out, which patterns are important, and which are rare, 
>> I profiled clang using CTMark and got the following coverage report:
>> <InstCombine_covreport.html>
>> (beware, the file is ~6MB).
>>
>> Guided by this profile I moved some patterns under the "if 
>> (ExpensiveCombines)" check, which expectedly happened to be
neutral
>> for runtime performance, but improved compile-time. The testing 
>> results are below (measured for Os).
>>
>> Performance Improvements - Compile Time 	Δ 	Previous 	Current 	σ
>> CTMark/sqlite3/sqlite3 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> 
-1.55%
>> 6.8155 	6.7102 	0.0081
>> CTMark/mafft/pairlocalalign 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> 
-1.05%
>> 8.0407 	7.9559 	0.0193
>> CTMark/ClamAV/clamscan 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> 
-1.02%
>> 11.3893 	11.2734 	0.0081
>> CTMark/lencod/lencod 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> 
-1.01%
>> 12.8763 	12.7461 	0.0244
>> CTMark/SPASS/SPASS 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> 
-1.01%
>> 12.5048 	12.3791 	0.0340
>>
>>
>> Performance Improvements - Compile Time 	Δ 	Previous 	Current 	σ
>> External/SPEC/CINT2006/403.gcc/403.gcc 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> 
-1.64%
>> 54.0801 	53.1930 	-
>> External/SPEC/CINT2006/400.perlbench/400.perlbench 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> 
-1.25%
>> 19.1481 	18.9091 	-
>> External/SPEC/CINT2006/445.gobmk/445.gobmk 
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> 
-1.01%
>> 15.2819 	15.1274 	-
>>
>>
>>
>> Do such changes make sense? The patch doesn't change O3, but it
does
>> change Os and potentially can change performance there (though I 
>> didn't see any changes in my tests).
>>
>> The patch is attached for the reference, if we decide to go for it, 
>> I'll upload it to phab:
>>
>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>
>>
>> Thanks,
>> Michael
>>
>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/01fcada7/attachment.html>

David Majnemer via llvm-dev

2017-Mar-18 01:12 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

Honestly, I'm not a huge fan of this change as-is. The set of transforms
that were added behind ExpensiveChecks seems awfully strange and many would
not lead the reader to believe that they are expensive at all
(the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being
the obvious expensive routines).

The purpose of many of InstCombine's xforms is to canonicalize the IR to
make life easier for downstream passes and analyses.

InstCombine internally relies on knowing what transforms it may or may not
perform. This is important: canonicalizations may duel endlessly if we get
this wrong; the order of the combines is also important for exactly the
same reason (SelectionDAG deals with this problem in a different way with
its pattern complexity field).

Another concern with moving seemingly arbitrary combines under
ExpensiveCombines is that it will make it that much harder to understand
what is and is not canonical at a given point during the execution of the
optimizer.

I'd be much more interested in a patch which caches the result of
frequently called ValueTracking functionality like ComputeKnownBits,
ComputeSignBit, etc. which often doesn't change but is not intelligently
reused. I imagine that the performance win might be quite comparable. Such
a patch would have the benefit of keeping the set of available transforms
constant throughout the pipeline while bringing execution time down; I
wouldn't be at all surprised if caching the ValueTracking functions
resulted in a bigger time savings.

On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>
>
> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> One of the most time-consuming passes in LLVM middle-end is InstCombine
> (see e.g. [1]). It is a very powerful pass capable of doing all the crazy
> stuff, and new patterns are being constantly introduced there. The problem
> is that we often use it just as a clean-up pass: it's scheduled 6 times
in
> the current pass pipeline, and each time it's invoked it checks all
known
> patterns. It sounds ok for O3, where we try to squeeze as much performance
> as possible, but it is too excessive for other opt-levels. InstCombine has
> an ExpensiveCombines parameter to address that - but I think it's
underused
> at the moment.
>
>
> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I
> believe has always been intended to be extended the way you’re doing it. So
> I support this effort :)
>
>
> +1
>
> Also, did your profiling reveal why the other combines are expensive?
> Among other things, I'm curious if the expensive ones tend to spend a
lot
> of time in ValueTracking (getting known bits and similar)?
>
>  -Hal
>
>
>
> CC: David for the general direction on InstCombine though.
>
>
> —
> Mehdi
>
>
>
>
> Trying to find out, which patterns are important, and which are rare, I
> profiled clang using CTMark and got the following coverage report:
> <InstCombine_covreport.html>
> (beware, the file is ~6MB).
>
> Guided by this profile I moved some patterns under the "if
> (ExpensiveCombines)" check, which expectedly happened to be neutral
for
> runtime performance, but improved compile-time. The testing results are
> below (measured for Os).
>
> Performance Improvements - Compile Time Δ  Previous Current σ
> CTMark/sqlite3/sqlite3
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55%
6.8155
> 6.7102 0.0081
> CTMark/mafft/pairlocalalign
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05%
8.0407
> 7.9559 0.0193
> CTMark/ClamAV/clamscan
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02%
11.3893
> 11.2734 0.0081
> CTMark/lencod/lencod
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01%
> 12.8763 12.7461 0.0244
> CTMark/SPASS/SPASS
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01%
12.5048
> 12.3791 0.0340
>
> Performance Improvements - Compile Time Δ  Previous Current σ
> External/SPEC/CINT2006/403.gcc/403.gcc
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64%
> 54.0801 53.1930 -
> External/SPEC/CINT2006/400.perlbench/400.perlbench
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25%
19.1481
> 18.9091 -
> External/SPEC/CINT2006/445.gobmk/445.gobmk
> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01%
> 15.2819 15.1274 -
>
>
> Do such changes make sense? The patch doesn't change O3, but it does
> change Os and potentially can change performance there (though I didn't
see
> any changes in my tests).
>
> The patch is attached for the reference, if we decide to go for it,
I'll
> upload it to phab:
>
> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>
>
> Thanks,
> Michael
>
> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/64d4b134/attachment.html>

Hal Finkel via llvm-dev

2017-Mar-18 14:43 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

On 03/17/2017 08:12 PM, David Majnemer wrote:> Honestly, I'm not a huge fan of this change as-is. The set of 
> transforms that were added behind ExpensiveChecks seems awfully 
> strange and many would not lead the reader to believe that they are 
> expensive at all (the SimplifyDemandedInstructionBits and 
> foldICmpUsingKnownBits calls being the obvious expensive routines).
>
> The purpose of many of InstCombine's xforms is to canonicalize the IR 
> to make life easier for downstream passes and analyses.
>
> InstCombine internally relies on knowing what transforms it may or may 
> not perform. This is important: canonicalizations may duel endlessly 
> if we get this wrong; the order of the combines is also important for 
> exactly the same reason (SelectionDAG deals with this problem in a 
> different way with its pattern complexity field).
>
> Another concern with moving seemingly arbitrary combines under 
> ExpensiveCombines is that it will make it that much harder to 
> understand what is and is not canonical at a given point during the 
> execution of the optimizer.
I agree with this up to a point. If we have these kinds of 
canonicalization dependencies that depend on ValueTracking's depth, this 
seems very fragile. Even if we introduce caching, thus making the depth 
often infinite, if it will still be finite in the face of updates then 
we still need to be careful (plus, if we're worried about being able to 
understand the canonical form, then depending on known-bits analysis can 
make defining this form subtle).>
> I'd be much more interested in a patch which caches the result of 
> frequently called ValueTracking functionality like ComputeKnownBits, 
> ComputeSignBit, etc. which often doesn't change but is not 
> intelligently reused. I imagine that the performance win might be 
> quite comparable. Such a patch would have the benefit of keeping the 
> set of available transforms constant throughout the pipeline while 
> bringing execution time down; I wouldn't be at all surprised if 
> caching the ValueTracking functions resulted in a bigger time savings.
I'd started working on this a few months ago; I didn't finish the patch 
(mostly because I discovered that there's also a need to invalidate the 
cache whenever to perform a transformation that drops nsw/nuw flags and 
I've never got to that part), but I'm happy to provide my 
work-in-progress to anyone interested. cc'ing Davide who had also 
expressed an interest in this.

  -Hal
>
> On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
>     On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>>
>>>     On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
>>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>
>>>     Hi,
>>>
>>>     One of the most time-consuming passes in LLVM middle-end is
>>>     InstCombine (see e.g. [1]). It is a very powerful pass capable
>>>     of doing all the crazy stuff, and new patterns are being
>>>     constantly introduced there. The problem is that we often use
it
>>>     just as a clean-up pass: it's scheduled 6 times in the
current
>>>     pass pipeline, and each time it's invoked it checks all
known
>>>     patterns. It sounds ok for O3, where we try to squeeze as much
>>>     performance as possible, but it is too excessive for other
>>>     opt-levels. InstCombine has an ExpensiveCombines parameter to
>>>     address that - but I think it's underused at the moment.
>>
>>     Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?)
>>     but I believe has always been intended to be extended the way
>>     you’re doing it. So I support this effort :)
>
>     +1
>
>     Also, did your profiling reveal why the other combines are
>     expensive? Among other things, I'm curious if the expensive ones
>     tend to spend a lot of time in ValueTracking (getting known bits
>     and similar)?
>
>      -Hal
>
>
>>
>>     CC: David for the general direction on InstCombine though.
>>
>>
>>     —
>>     Mehdi
>>
>>
>>
>>>
>>>     Trying to find out, which patterns are important, and which are
>>>     rare, I profiled clang using CTMark and got the following
>>>     coverage report:
>>>     <InstCombine_covreport.html>
>>>     (beware, the file is ~6MB).
>>>
>>>     Guided by this profile I moved some patterns under the "if
>>>     (ExpensiveCombines)" check, which expectedly happened to
be
>>>     neutral for runtime performance, but improved compile-time. The
>>>     testing results are below (measured for Os).
>>>
>>>     Performance Improvements - Compile Time 	Δ 	Previous 	Current 
σ
>>>     CTMark/sqlite3/sqlite3
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>
>>>     -1.55% 	6.8155 	6.7102 	0.0081
>>>     CTMark/mafft/pairlocalalign
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>
>>>     -1.05% 	8.0407 	7.9559 	0.0193
>>>     CTMark/ClamAV/clamscan
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>
>>>     -1.02% 	11.3893 	11.2734 	0.0081
>>>     CTMark/lencod/lencod
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>
>>>     -1.01% 	12.8763 	12.7461 	0.0244
>>>     CTMark/SPASS/SPASS
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>
>>>     -1.01% 	12.5048 	12.3791 	0.0340
>>>
>>>
>>>     Performance Improvements - Compile Time 	Δ 	Previous 	Current 
σ
>>>     External/SPEC/CINT2006/403.gcc/403.gcc
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>
>>>     -1.64% 	54.0801 	53.1930 	-
>>>     External/SPEC/CINT2006/400.perlbench/400.perlbench
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>
>>>     -1.25% 	19.1481 	18.9091 	-
>>>     External/SPEC/CINT2006/445.gobmk/445.gobmk
>>>    
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>
>>>     -1.01% 	15.2819 	15.1274 	-
>>>
>>>
>>>
>>>     Do such changes make sense? The patch doesn't change O3,
but it
>>>     does change Os and potentially can change performance there
>>>     (though I didn't see any changes in my tests).
>>>
>>>     The patch is attached for the reference, if we decide to go for
>>>     it, I'll upload it to phab:
>>>
>>>    
<0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>>
>>>
>>>     Thanks,
>>>     Michael
>>>
>>>     [1]:
>>>    
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>>>    
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
>>>
>>>     _______________________________________________
>>>     LLVM Developers mailing list
>>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>     -- 
>     Hal Finkel
>     Lead, Compiler Technology and Programming Languages
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>     _______________________________________________ LLVM Developers
>     mailing list llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> 
>-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170318/be38b6af/attachment-0001.html>

Mehdi Amini via llvm-dev

2017-Mar-20 15:37 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

> On Mar 17, 2017, at 6:12 PM, David Majnemer <david.majnemer at
gmail.com> wrote:
> 
> Honestly, I'm not a huge fan of this change as-is. The set of
transforms that were added behind ExpensiveChecks seems awfully strange and many
would not lead the reader to believe that they are expensive at all (the
SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the
obvious expensive routines).
> 
> The purpose of many of InstCombine's xforms is to canonicalize the IR
to make life easier for downstream passes and analyses.
OK, but is it true of *all* the combines today?
> 
> InstCombine internally relies on knowing what transforms it may or may not
perform. This is important: canonicalizations may duel endlessly if we get this
wrong; the order of the combines is also important for exactly the same reason
(SelectionDAG deals with this problem in a different way with its pattern
complexity field).
> 
> Another concern with moving seemingly arbitrary combines under
ExpensiveCombines is that it will make it that much harder to understand what is
and is not canonical at a given point during the execution of the optimizer.
If a canonicalization is too costly to achieve, maybe it is not a reasonable
one?
It is also not clear to me that canonicalizations that are using complex
analyses (ValueTracking / computeKnownBits) are really making it easy to
"understand what is canonical” anyway. This is my impression in general as
the scope of what is needed to achieve the transformation gets larger: the more
context needed the less it looks like a “canonicalization” to me.
WDYT?

— 
Mehdi

> 
> I'd be much more interested in a patch which caches the result of
frequently called ValueTracking functionality like ComputeKnownBits,
ComputeSignBit, etc. which often doesn't change but is not intelligently
reused. I imagine that the performance win might be quite comparable. Such a
patch would have the benefit of keeping the set of available transforms constant
throughout the pipeline while bringing execution time down; I wouldn't be at
all surprised if caching the ValueTracking functions resulted in a bigger time
savings.
> 
> On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>> 
>>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>> Hi,
>>> 
>>> One of the most time-consuming passes in LLVM middle-end is
InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the
crazy stuff, and new patterns are being constantly introduced there. The problem
is that we often use it just as a clean-up pass: it's scheduled 6 times in
the current pass pipeline, and each time it's invoked it checks all known
patterns. It sounds ok for O3, where we try to squeeze as much performance as
possible, but it is too excessive for other opt-levels. InstCombine has an
ExpensiveCombines parameter to address that - but I think it's underused at
the moment.
>> 
>> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I
believe has always been intended to be extended the way you’re doing it. So I
support this effort :)
> 
> +1
> 
> Also, did your profiling reveal why the other combines are expensive? Among
other things, I'm curious if the expensive ones tend to spend a lot of time
in ValueTracking (getting known bits and similar)?
> 
>  -Hal
> 
> 
>> 
>> CC: David for the general direction on InstCombine though.
>> 
>> 
>> — 
>> Mehdi
>> 
>> 
>> 
>>> 
>>> Trying to find out, which patterns are important, and which are
rare, I profiled clang using CTMark and got the following coverage report:
>>> <InstCombine_covreport.html>
>>> (beware, the file is ~6MB).
>>> 
>>> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
>>> 
>>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>>> CTMark/sqlite3/sqlite3
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.55%	6.8155
6.7102	0.0081
>>> CTMark/mafft/pairlocalalign
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>	-1.05%	8.0407
7.9559	0.0193
>>> CTMark/ClamAV/clamscan
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.02%	11.3893
11.2734	0.0081
>>> CTMark/lencod/lencod
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>	-1.01%
12.8763	12.7461	0.0244
>>> CTMark/SPASS/SPASS
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>	-1.01%	12.5048
12.3791	0.0340
>>> 
>>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ
>>> External/SPEC/CINT2006/403.gcc/403.gcc
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>	-1.64%
54.0801	53.1930	-
>>> External/SPEC/CINT2006/400.perlbench/400.perlbench
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.25%	19.1481
18.9091	-
>>> External/SPEC/CINT2006/445.gobmk/445.gobmk
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.01%
15.2819	15.1274	-
>>> 
>>> 
>>> Do such changes make sense? The patch doesn't change O3, but it
does change Os and potentially can change performance there (though I didn't
see any changes in my tests).
>>> 
>>> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
>>> 
>>>
<0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>> 
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> [1]:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/8dd03758/attachment.html>

Gerolf Hoflehner via llvm-dev

2017-Mar-21 04:51 UTC

head link

[llvm-dev] Saving Compile Time in InstCombine

> On Mar 17, 2017, at 6:12 PM, David Majnemer via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Honestly, I'm not a huge fan of this change as-is. The set of
transforms that were added behind ExpensiveChecks seems awfully strange and many
would not lead the reader to believe that they are expensive at all (the
SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the
obvious expensive routines).
> 
> The purpose of many of InstCombine's xforms is to canonicalize the IR
to make life easier for downstream passes and analyses.
As we get further along with compile-time improvements one question we need to
ask ourselves more frequently is about the effectiveness of
optimizations/passes. For example -  in this case - how can we make an educated
assessment that running the combiner N times is a good cost/benefit investment
of compute resources? The questions below are meant to figure out what
technologies/instrumentations/etc could help towards a more data-driven decision
process when it comes to the effectiveness of optimizations. Instcombiner might
just be an inspirational use case to see what is possible in that direction.

The combiner is invoked in full multiple times. But is it really necessary to
run all of it for that purpose? After instcombine is run once is there a mapping
from transformation -> combines? I suspect most transformations could invoke
a subset of combines to re-canonicalize. Or, if there was a (cheap) verifier for
canonical IR, it could invoke a specific canonicalization routine. Instrumenting
the instcombiner and checking which patterns actually kick in (for different
invocations)  might give insight into how the combiner could be structured and
so that only a subset of pattern need to be checked.> 
> InstCombine internally relies on knowing what transforms it may or may not
perform. This is important: canonicalizations may duel endlessly if we get this
wrong; the order of the combines is also important for exactly the same reason
(SelectionDAG deals with this problem in a different way with its pattern
complexity field).
Can you elaborate on this “duel endlessly” with specific examples? This is out
of curiosity. There must be verifiers that check that this cannot happen. Or an
implementation strategy that guarantees that. Global isel will run into the
same/similar question when it gets far enough to replace
SD.> 
> Another concern with moving seemingly arbitrary combines under
ExpensiveCombines is that it will make it that much harder to understand what is
and is not canonical at a given point during the execution of the optimizer.
> 
> I'd be much more interested in a patch which caches the result of
frequently called ValueTracking functionality like ComputeKnownBits,
ComputeSignBit, etc. which often doesn't change but is not intelligently
reused. I imagine that the performance win might be quite comparable.
Can you back this up with measurements? Caching schemes are tricky. Is there a
way to evaluate when the results of ComputeKnownBits etc is actually effective
meaining the result is used and gives faster instructions? E.g. it might well be
that only the first instance of inst_combine benefits from computing the bits.

> Such a patch would have the benefit of keeping the set of available
transforms constant throughout the pipeline while bringing execution time down;
I wouldn't be at all surprised if caching the ValueTracking functions
resulted in a bigger time savings.
> 
> On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>> 
>>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>> Hi,
>>> 
>>> One of the most time-consuming passes in LLVM middle-end is
InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the
crazy stuff, and new patterns are being constantly introduced there. The problem
is that we often use it just as a clean-up pass: it's scheduled 6 times in
the current pass pipeline, and each time it's invoked it checks all known
patterns. It sounds ok for O3, where we try to squeeze as much performance as
possible, but it is too excessive for other opt-levels. InstCombine has an
ExpensiveCombines parameter to address that                       - but I think
it's underused at the moment.
>> 
>> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I
believe has always been intended to be extended the way you’re doing it. So I
support this effort :)
> 
> +1
> 
> Also, did your profiling reveal why the other combines are expensive? Among
other things, I'm curious if the expensive ones tend to spend a lot of time
in ValueTracking (getting known bits and similar)?
> 
>  -Hal
> 
> 
>> 
>> CC: David for the general direction on InstCombine though.
>> 
>> 
>> — 
>> Mehdi
>> 
>> 
>> 
>>> 
>>> Trying to find out, which patterns are important, and which are
rare, I profiled clang using CTMark and got the following coverage report:
>>> <InstCombine_covreport.html>
>>> (beware, the file is ~6MB).
>>> 
>>> Guided by this profile I moved some patterns under the "if
(ExpensiveCombines)" check, which expectedly happened to be neutral for
runtime performance, but improved compile-time. The testing results are below
(measured for Os).
>>> 
>>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>>> CTMark/sqlite3/sqlite3
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.55%	6.8155
6.7102	0.0081
>>> CTMark/mafft/pairlocalalign
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>	-1.05%	8.0407
7.9559	0.0193
>>> CTMark/ClamAV/clamscan
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.02%	11.3893
11.2734	0.0081
>>> CTMark/lencod/lencod
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>	-1.01%
12.8763	12.7461	0.0244
>>> CTMark/SPASS/SPASS
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>	-1.01%	12.5048
12.3791	0.0340
>>> 
>>> Performance Improvements - Compile Time	Δ 	Previous	Current	σ 
>>> External/SPEC/CINT2006/403.gcc/403.gcc
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>	-1.64%
54.0801	53.1930	-
>>> External/SPEC/CINT2006/400.perlbench/400.perlbench
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>	-1.25%	19.1481
18.9091	-
>>> External/SPEC/CINT2006/445.gobmk/445.gobmk
<http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>	-1.01%
15.2819	15.1274	-
>>> 
>>> 
>>> Do such changes make sense? The patch doesn't change O3, but it
does change Os and potentially can change performance there (though I didn't
see any changes in my tests).
>>> 
>>> The patch is attached for the reference, if we decide to go for it,
I'll upload it to phab:
>>> 
>>>
<0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>> 
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> [1]:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/fab21da1/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Mar 2017 - Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

[llvm-dev] Saving Compile Time in InstCombine

Seemingly Similar Threads