Mikhail Zolotukhin via llvm-dev
2017-Mar-17  18:50 UTC
[llvm-dev] Saving Compile Time in InstCombine
Hi, One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: (beware, the file is ~6MB). Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). Performance Improvements - Compile Time Δ Previous Current σ CTMark/sqlite3/sqlite3 <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55% 6.8155 6.7102 0.0081 CTMark/mafft/pairlocalalign <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% 8.0407 7.9559 0.0193 CTMark/ClamAV/clamscan <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02% 11.3893 11.2734 0.0081 CTMark/lencod/lencod <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01% 12.8763 12.7461 0.0244 CTMark/SPASS/SPASS <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01% 12.5048 12.3791 0.0340 Performance Improvements - Compile Time Δ Previous Current σ External/SPEC/CINT2006/403.gcc/403.gcc <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64% 54.0801 53.1930 - External/SPEC/CINT2006/400.perlbench/400.perlbench <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25% 19.1481 18.9091 - External/SPEC/CINT2006/445.gobmk/445.gobmk <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01% 15.2819 15.1274 - Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: Thanks, Michael [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0004.html> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0005.html> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0006.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch Type: application/octet-stream Size: 33347 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0001.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/0eb1e9d6/attachment-0007.html>
Vedant Kumar via llvm-dev
2017-Mar-17  21:02 UTC
[llvm-dev] Saving Compile Time in InstCombine
> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. > > Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: > <InstCombine_covreport.html> > (beware, the file is ~6MB). > > Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os).It'd be nice to double-check that any runtime performance loss at -O2 is negligible. But this sounds like a great idea! vedant> Performance Improvements - Compile Time Δ Previous Current σ > CTMark/sqlite3/sqlite3 -1.55% 6.8155 6.7102 0.0081 > CTMark/mafft/pairlocalalign -1.05% 8.0407 7.9559 0.0193 > CTMark/ClamAV/clamscan -1.02% 11.3893 11.2734 0.0081 > CTMark/lencod/lencod -1.01% 12.8763 12.7461 0.0244 > CTMark/SPASS/SPASS -1.01% 12.5048 12.3791 0.0340 > > Performance Improvements - Compile Time Δ Previous Current σ > External/SPEC/CINT2006/403.gcc/403.gcc -1.64% 54.0801 53.1930 - > External/SPEC/CINT2006/400.perlbench/400.perlbench -1.25% 19.1481 18.9091 - > External/SPEC/CINT2006/445.gobmk/445.gobmk -1.01% 15.2819 15.1274 - > > > Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). > > The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: > > <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> > > > Thanks, > Michael > > [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mikhail Zolotukhin via llvm-dev
2017-Mar-17  21:22 UTC
[llvm-dev] Saving Compile Time in InstCombine
> On Mar 17, 2017, at 2:02 PM, Vedant Kumar <vsk at apple.com> wrote: > >> >> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> >> One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. >> >> Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: >> <InstCombine_covreport.html> >> (beware, the file is ~6MB). >> >> Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). > > It'd be nice to double-check that any runtime performance loss at -O2 is negligible. But this sounds like a great idea!I forgot to mention that I ran SPEC2006/INT with "-Os" on ARM64 and didn't see any changes in runtime performance. I can run O2 testing as well over the weekend. Michael> > vedant > >> Performance Improvements - Compile Time Δ Previous Current σ >> CTMark/sqlite3/sqlite3 -1.55% 6.8155 6.7102 0.0081 >> CTMark/mafft/pairlocalalign -1.05% 8.0407 7.9559 0.0193 >> CTMark/ClamAV/clamscan -1.02% 11.3893 11.2734 0.0081 >> CTMark/lencod/lencod -1.01% 12.8763 12.7461 0.0244 >> CTMark/SPASS/SPASS -1.01% 12.5048 12.3791 0.0340 >> >> Performance Improvements - Compile Time Δ Previous Current σ >> External/SPEC/CINT2006/403.gcc/403.gcc -1.64% 54.0801 53.1930 - >> External/SPEC/CINT2006/400.perlbench/400.perlbench -1.25% 19.1481 18.9091 - >> External/SPEC/CINT2006/445.gobmk/445.gobmk -1.01% 15.2819 15.1274 - >> >> >> Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). >> >> The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: >> >> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> >> >> >> Thanks, >> Michael >> >> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/723f7037/attachment.html>
Mehdi Amini via llvm-dev
2017-Mar-17  21:30 UTC
[llvm-dev] Saving Compile Time in InstCombine
> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment.Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I believe has always been intended to be extended the way you’re doing it. So I support this effort :) CC: David for the general direction on InstCombine though. — Mehdi> > Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: > <InstCombine_covreport.html> > (beware, the file is ~6MB). > > Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). > > Performance Improvements - Compile Time Δ Previous Current σ > CTMark/sqlite3/sqlite3 <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55% 6.8155 6.7102 0.0081 > CTMark/mafft/pairlocalalign <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% 8.0407 7.9559 0.0193 > CTMark/ClamAV/clamscan <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02% 11.3893 11.2734 0.0081 > CTMark/lencod/lencod <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01% 12.8763 12.7461 0.0244 > CTMark/SPASS/SPASS <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01% 12.5048 12.3791 0.0340 > > Performance Improvements - Compile Time Δ Previous Current σ > External/SPEC/CINT2006/403.gcc/403.gcc <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64% 54.0801 53.1930 - > External/SPEC/CINT2006/400.perlbench/400.perlbench <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25% 19.1481 18.9091 - > External/SPEC/CINT2006/445.gobmk/445.gobmk <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01% 15.2819 15.1274 - > > > Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). > > The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: > > <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> > > > Thanks, > Michael > > [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/e0cd7a31/attachment.html>
Matthias Braun via llvm-dev
2017-Mar-17  21:36 UTC
[llvm-dev] Saving Compile Time in InstCombine
In general it is great that we investigate these things! We have been liberally adding pass invocations and patterns for years without checking the compiletime consequences. However intuitively it feels wrong to disable some patterns completely (there will always be that one program that gets so much better when you have a certain pattern). - Do you have an idea what would happen if we only disable them in 5 of the 6 invocations? - Or alternatively what happens when we just not put as many InstCombine instances into the pass pipeline in -Os? - Matthias> On Mar 17, 2017, at 2:30 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> >> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> >> One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. > > Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I believe has always been intended to be extended the way you’re doing it. So I support this effort :) > > CC: David for the general direction on InstCombine though. > > > — > Mehdi > > > >> >> Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: >> <InstCombine_covreport.html> >> (beware, the file is ~6MB). >> >> Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). >> >> Performance Improvements - Compile Time Δ Previous Current σ >> CTMark/sqlite3/sqlite3 <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55% 6.8155 6.7102 0.0081 >> CTMark/mafft/pairlocalalign <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% 8.0407 7.9559 0.0193 >> CTMark/ClamAV/clamscan <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02% 11.3893 11.2734 0.0081 >> CTMark/lencod/lencod <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01% 12.8763 12.7461 0.0244 >> CTMark/SPASS/SPASS <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01% 12.5048 12.3791 0.0340 >> >> Performance Improvements - Compile Time Δ Previous Current σ >> External/SPEC/CINT2006/403.gcc/403.gcc <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64% 54.0801 53.1930 - >> External/SPEC/CINT2006/400.perlbench/400.perlbench <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25% 19.1481 18.9091 - >> External/SPEC/CINT2006/445.gobmk/445.gobmk <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01% 15.2819 15.1274 - >> >> >> Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). >> >> The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: >> >> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> >> >> >> Thanks, >> Michael >> >> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/207de2f9/attachment-0001.html>
Hal Finkel via llvm-dev
2017-Mar-18  00:49 UTC
[llvm-dev] Saving Compile Time in InstCombine
On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:> >> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> >> One of the most time-consuming passes in LLVM middle-end is >> InstCombine (see e.g. [1]). It is a very powerful pass capable of >> doing all the crazy stuff, and new patterns are being constantly >> introduced there. The problem is that we often use it just as a >> clean-up pass: it's scheduled 6 times in the current pass pipeline, >> and each time it's invoked it checks all known patterns. It sounds ok >> for O3, where we try to squeeze as much performance as possible, but >> it is too excessive for other opt-levels. InstCombine has an >> ExpensiveCombines parameter to address that - but I think it's >> underused at the moment. > > Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I > believe has always been intended to be extended the way you’re doing > it. So I support this effort :)+1 Also, did your profiling reveal why the other combines are expensive? Among other things, I'm curious if the expensive ones tend to spend a lot of time in ValueTracking (getting known bits and similar)? -Hal> > CC: David for the general direction on InstCombine though. > > > — > Mehdi > > > >> >> Trying to find out, which patterns are important, and which are rare, >> I profiled clang using CTMark and got the following coverage report: >> <InstCombine_covreport.html> >> (beware, the file is ~6MB). >> >> Guided by this profile I moved some patterns under the "if >> (ExpensiveCombines)" check, which expectedly happened to be neutral >> for runtime performance, but improved compile-time. The testing >> results are below (measured for Os). >> >> Performance Improvements - Compile Time Δ Previous Current σ >> CTMark/sqlite3/sqlite3 >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55% >> 6.8155 6.7102 0.0081 >> CTMark/mafft/pairlocalalign >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% >> 8.0407 7.9559 0.0193 >> CTMark/ClamAV/clamscan >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02% >> 11.3893 11.2734 0.0081 >> CTMark/lencod/lencod >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01% >> 12.8763 12.7461 0.0244 >> CTMark/SPASS/SPASS >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01% >> 12.5048 12.3791 0.0340 >> >> >> Performance Improvements - Compile Time Δ Previous Current σ >> External/SPEC/CINT2006/403.gcc/403.gcc >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64% >> 54.0801 53.1930 - >> External/SPEC/CINT2006/400.perlbench/400.perlbench >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25% >> 19.1481 18.9091 - >> External/SPEC/CINT2006/445.gobmk/445.gobmk >> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01% >> 15.2819 15.1274 - >> >> >> >> Do such changes make sense? The patch doesn't change O3, but it does >> change Os and potentially can change performance there (though I >> didn't see any changes in my tests). >> >> The patch is attached for the reference, if we decide to go for it, >> I'll upload it to phab: >> >> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> >> >> >> Thanks, >> Michael >> >> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170317/01fcada7/attachment.html>
Davide Italiano via llvm-dev
2017-Mar-21  18:12 UTC
[llvm-dev] Saving Compile Time in InstCombine
On Fri, Mar 17, 2017 at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi, > > One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. > > Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: > > (beware, the file is ~6MB). > > Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). >As somebody who brought up this problem at least once in the mailing lists, I'm in agreement with David Majnemer here. I think we should consider a caching strategy before going this route. FWIW, I'm not a big fan of `ExpensiveCombines` at all, I can see the reason why it was introduced, but in my experience the "expensive" bits of Instcombine comes from the implementation of bitwise domain, i.e. known bits & friends, so at least evaluating caching is something I would try earlier. Something else that can be tried (even if it doesn't improve compile time is still a nice cleanup) is that of moving combines not creating new instructions from instcombine to instsimplify. Many passes use instruction simplify so that might result in the amount of code that's processed by instcombine being smaller and/or could result in improved code quality. Just speculations, but a decent experiment if somebody has time to take a look at. -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
Daniel Berlin via llvm-dev
2017-Mar-21  18:45 UTC
[llvm-dev] Saving Compile Time in InstCombine
So, just a thought: "The purpose of many of InstCombine's xforms is to canonicalize the IR to make life easier for downstream passes and analyses." That sounds sane. So, are the expensive things canonicalization? If that is the case, why are we doing such expensive canonicalization? That seems strange to me. If they are not canonicalization, should they really not be separated out (into some pass that possibly shares infrastructure)? No compiler is going to get everything anyway, and instcombine needs to decide what "good enough" really means. I would rather see us understand what we want out of instcombine, precisely, before we try to decide how to make it faster at doing whatever that thing is :) --Dan On Tue, Mar 21, 2017 at 11:12 AM, Davide Italiano via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Fri, Mar 17, 2017 at 11:50 AM, Mikhail Zolotukhin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > One of the most time-consuming passes in LLVM middle-end is InstCombine > (see e.g. [1]). It is a very powerful pass capable of doing all the crazy > stuff, and new patterns are being constantly introduced there. The problem > is that we often use it just as a clean-up pass: it's scheduled 6 times in > the current pass pipeline, and each time it's invoked it checks all known > patterns. It sounds ok for O3, where we try to squeeze as much performance > as possible, but it is too excessive for other opt-levels. InstCombine has > an ExpensiveCombines parameter to address that - but I think it's underused > at the moment. > > > > Trying to find out, which patterns are important, and which are rare, I > profiled clang using CTMark and got the following coverage report: > > > > (beware, the file is ~6MB). > > > > Guided by this profile I moved some patterns under the "if > (ExpensiveCombines)" check, which expectedly happened to be neutral for > runtime performance, but improved compile-time. The testing results are > below (measured for Os). > > > > As somebody who brought up this problem at least once in the mailing > lists, I'm in agreement with David Majnemer here. > I think we should consider a caching strategy before going this route. > FWIW, I'm not a big fan of `ExpensiveCombines` at all, I can see the > reason why it was introduced, but in my experience the "expensive" > bits of Instcombine comes from the implementation of bitwise domain, > i.e. known bits & friends, so at least evaluating caching is something > I would try earlier. > > Something else that can be tried (even if it doesn't improve compile > time is still a nice cleanup) is that of moving combines not creating > new instructions from instcombine to instsimplify. Many passes use > instruction simplify so that might result in the amount of code that's > processed by instcombine being smaller and/or could result in improved > code quality. Just speculations, but a decent experiment if somebody > has time to take a look at. > > -- > Davide > > "There are no solved problems; there are only problems that are more > or less solved" -- Henri Poincare > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170321/c6a52f32/attachment.html>