Mehdi Amini via llvm-dev
2017-Mar-20 15:37 UTC
[llvm-dev] Saving Compile Time in InstCombine
> On Mar 17, 2017, at 6:12 PM, David Majnemer <david.majnemer at gmail.com> wrote: > > Honestly, I'm not a huge fan of this change as-is. The set of transforms that were added behind ExpensiveChecks seems awfully strange and many would not lead the reader to believe that they are expensive at all (the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the obvious expensive routines). > > The purpose of many of InstCombine's xforms is to canonicalize the IR to make life easier for downstream passes and analyses.OK, but is it true of *all* the combines today?> > InstCombine internally relies on knowing what transforms it may or may not perform. This is important: canonicalizations may duel endlessly if we get this wrong; the order of the combines is also important for exactly the same reason (SelectionDAG deals with this problem in a different way with its pattern complexity field). > > Another concern with moving seemingly arbitrary combines under ExpensiveCombines is that it will make it that much harder to understand what is and is not canonical at a given point during the execution of the optimizer.If a canonicalization is too costly to achieve, maybe it is not a reasonable one? It is also not clear to me that canonicalizations that are using complex analyses (ValueTracking / computeKnownBits) are really making it easy to "understand what is canonical” anyway. This is my impression in general as the scope of what is needed to achieve the transformation gets larger: the more context needed the less it looks like a “canonicalization” to me. WDYT? — Mehdi> > I'd be much more interested in a patch which caches the result of frequently called ValueTracking functionality like ComputeKnownBits, ComputeSignBit, etc. which often doesn't change but is not intelligently reused. I imagine that the performance win might be quite comparable. Such a patch would have the benefit of keeping the set of available transforms constant throughout the pipeline while bringing execution time down; I wouldn't be at all surprised if caching the ValueTracking functions resulted in a bigger time savings. > > On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote: >> >>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> Hi, >>> >>> One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. >> >> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I believe has always been intended to be extended the way you’re doing it. So I support this effort :) > > +1 > > Also, did your profiling reveal why the other combines are expensive? Among other things, I'm curious if the expensive ones tend to spend a lot of time in ValueTracking (getting known bits and similar)? > > -Hal > > >> >> CC: David for the general direction on InstCombine though. >> >> >> — >> Mehdi >> >> >> >>> >>> Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: >>> <InstCombine_covreport.html> >>> (beware, the file is ~6MB). >>> >>> Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). >>> >>> Performance Improvements - Compile Time Δ Previous Current σ >>> CTMark/sqlite3/sqlite3 <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55% 6.8155 6.7102 0.0081 >>> CTMark/mafft/pairlocalalign <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% 8.0407 7.9559 0.0193 >>> CTMark/ClamAV/clamscan <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02% 11.3893 11.2734 0.0081 >>> CTMark/lencod/lencod <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01% 12.8763 12.7461 0.0244 >>> CTMark/SPASS/SPASS <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01% 12.5048 12.3791 0.0340 >>> >>> Performance Improvements - Compile Time Δ Previous Current σ >>> External/SPEC/CINT2006/403.gcc/403.gcc <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64% 54.0801 53.1930 - >>> External/SPEC/CINT2006/400.perlbench/400.perlbench <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25% 19.1481 18.9091 - >>> External/SPEC/CINT2006/445.gobmk/445.gobmk <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01% 15.2819 15.1274 - >>> >>> >>> Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). >>> >>> The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: >>> >>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> >>> >>> >>> Thanks, >>> Michael >>> >>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/8dd03758/attachment.html>
Amara Emerson via llvm-dev
2017-Mar-20 16:36 UTC
[llvm-dev] Saving Compile Time in InstCombine
Agree, it would be nice if instcombine was more modular, such that we could choose whether or not we wanted a canonicalisation run or wanted optimizations. And of the optimisations, it would also be good to be able to select which categories of combines to run. Splitting out canonicalisation would also make it clearer to all what the standard forms are for certain constructs. Amara On 20 March 2017 at 15:37, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > On Mar 17, 2017, at 6:12 PM, David Majnemer <david.majnemer at gmail.com>wrote:> > Honestly, I'm not a huge fan of this change as-is. The set of transformsthat were added behind ExpensiveChecks seems awfully strange and many would not lead the reader to believe that they are expensive at all (the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the obvious expensive routines).> > The purpose of many of InstCombine's xforms is to canonicalize the IR tomake life easier for downstream passes and analyses.> > > OK, but is it true of *all* the combines today? > > > InstCombine internally relies on knowing what transforms it may or maynot perform. This is important: canonicalizations may duel endlessly if we get this wrong; the order of the combines is also important for exactly the same reason (SelectionDAG deals with this problem in a different way with its pattern complexity field).> > Another concern with moving seemingly arbitrary combines underExpensiveCombines is that it will make it that much harder to understand what is and is not canonical at a given point during the execution of the optimizer.> > > If a canonicalization is too costly to achieve, maybe it is not areasonable one?> It is also not clear to me that canonicalizations that are using complexanalyses (ValueTracking / computeKnownBits) are really making it easy to "understand what is canonical” anyway. This is my impression in general as the scope of what is needed to achieve the transformation gets larger: the more context needed the less it looks like a “canonicalization” to me.> WDYT? > > — > Mehdi > > > > I'd be much more interested in a patch which caches the result offrequently called ValueTracking functionality like ComputeKnownBits, ComputeSignBit, etc. which often doesn't change but is not intelligently reused. I imagine that the performance win might be quite comparable. Such a patch would have the benefit of keeping the set of available transforms constant throughout the pipeline while bringing execution time down; I wouldn't be at all surprised if caching the ValueTracking functions resulted in a bigger time savings.> > On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> >> >> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote: >> >> >> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> >> Hi, >> >> One of the most time-consuming passes in LLVM middle-end is InstCombine(see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment.>> >> >> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but Ibelieve has always been intended to be extended the way you’re doing it. So I support this effort :)>> >> >> +1 >> >> Also, did your profiling reveal why the other combines are expensive?Among other things, I'm curious if the expensive ones tend to spend a lot of time in ValueTracking (getting known bits and similar)?>> >> -Hal >> >> >> >> CC: David for the general direction on InstCombine though. >> >> >> — >> Mehdi >> >> >> >> >> Trying to find out, which patterns are important, and which are rare, Iprofiled clang using CTMark and got the following coverage report:>> <InstCombine_covreport.html> >> (beware, the file is ~6MB). >> >> Guided by this profile I moved some patterns under the "if(ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os).>> >> Performance Improvements - Compile Time Δ Previous Current σ >> CTMark/sqlite3/sqlite3 -1.55% 6.8155 6.7102 0.0081 >> CTMark/mafft/pairlocalalign -1.05% 8.0407 7.9559 0.0193 >> CTMark/ClamAV/clamscan -1.02% 11.3893 11.2734 0.0081 >> CTMark/lencod/lencod -1.01% 12.8763 12.7461 0.0244 >> CTMark/SPASS/SPASS -1.01% 12.5048 12.3791 0.0340 >> >> Performance Improvements - Compile Time Δ Previous Current σ >> External/SPEC/CINT2006/403.gcc/403.gcc -1.64% 54.0801 53.1930 - >> External/SPEC/CINT2006/400.perlbench/400.perlbench -1.25% 19.148118.9091 ->> External/SPEC/CINT2006/445.gobmk/445.gobmk -1.01% 15.2819 15.1274 - >> >> >> Do such changes make sense? The patch doesn't change O3, but it doeschange Os and potentially can change performance there (though I didn't see any changes in my tests).>> >> The patch is attached for the reference, if we decide to go for it, I'llupload it to phab:>> >> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> >> >> >> Thanks, >> Michael >> >> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/68b20c97/attachment-0001.html>
Mikhail Zolotukhin via llvm-dev
2017-Mar-21 01:46 UTC
[llvm-dev] Saving Compile Time in InstCombine
Thanks for the interest, everyone! The intention of this patch was exactly to ignite the discussion, I expected that it should be changed/reworked before going in (if ever). The profile I used was not a time profile, it was a frequency-based report, as Sanjay noticed. The idea was to find patterns that are (almost) unused, and that was the patterns I later moved under 'ExpensiveCombines' - I admit it was kind of arbitrary on my side, I kept the ones which looked pretty cheap, and picked the others. Let me try to keep only the ValueTracking related patterns and remeasure the numbers for now, and then we can decide if we want to do anything about the others. I think it still would be nice to avoid checking for them - maybe we can keep one 'full' InstCombine run for canonicalization, and in subsequent runs only check frequent patterns. Thanks, Michael> On Mar 20, 2017, at 9:36 AM, Amara Emerson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Agree, it would be nice if instcombine was more modular, such that we could choose whether or not we wanted a canonicalisation run or wanted optimizations. And of the optimisations, it would also be good to be able to select which categories of combines to run. Splitting out canonicalisation would also make it clearer to all what the standard forms are for certain constructs. > > Amara > > > On 20 March 2017 at 15:37, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > > > > On Mar 17, 2017, at 6:12 PM, David Majnemer <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>> wrote: > > > > Honestly, I'm not a huge fan of this change as-is. The set of transforms that were added behind ExpensiveChecks seems awfully strange and many would not lead the reader to believe that they are expensive at all (the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the obvious expensive routines). > > > > The purpose of many of InstCombine's xforms is to canonicalize the IR to make life easier for downstream passes and analyses. > > > > > > OK, but is it true of *all* the combines today? > > > > > > InstCombine internally relies on knowing what transforms it may or may not perform. This is important: canonicalizations may duel endlessly if we get this wrong; the order of the combines is also important for exactly the same reason (SelectionDAG deals with this problem in a different way with its pattern complexity field). > > > > Another concern with moving seemingly arbitrary combines under ExpensiveCombines is that it will make it that much harder to understand what is and is not canonical at a given point during the execution of the optimizer. > > > > > > If a canonicalization is too costly to achieve, maybe it is not a reasonable one? > > It is also not clear to me that canonicalizations that are using complex analyses (ValueTracking / computeKnownBits) are really making it easy to "understand what is canonical” anyway. This is my impression in general as the scope of what is needed to achieve the transformation gets larger: the more context needed the less it looks like a “canonicalization” to me. > > WDYT? > > > > — > > Mehdi > > > > > > > > I'd be much more interested in a patch which caches the result of frequently called ValueTracking functionality like ComputeKnownBits, ComputeSignBit, etc. which often doesn't change but is not intelligently reused. I imagine that the performance win might be quite comparable. Such a patch would have the benefit of keeping the set of available transforms constant throughout the pipeline while bringing execution time down; I wouldn't be at all surprised if caching the ValueTracking functions resulted in a bigger time savings. > > > > On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >> > >> > >> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote: > >> > >> > >> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >> > >> Hi, > >> > >> One of the most time-consuming passes in LLVM middle-end is InstCombine (see e.g. [1]). It is a very powerful pass capable of doing all the crazy stuff, and new patterns are being constantly introduced there. The problem is that we often use it just as a clean-up pass: it's scheduled 6 times in the current pass pipeline, and each time it's invoked it checks all known patterns. It sounds ok for O3, where we try to squeeze as much performance as possible, but it is too excessive for other opt-levels. InstCombine has an ExpensiveCombines parameter to address that - but I think it's underused at the moment. > >> > >> > >> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I believe has always been intended to be extended the way you’re doing it. So I support this effort :) > >> > >> > >> +1 > >> > >> Also, did your profiling reveal why the other combines are expensive? Among other things, I'm curious if the expensive ones tend to spend a lot of time in ValueTracking (getting known bits and similar)? > >> > >> -Hal > >> > >> > >> > >> CC: David for the general direction on InstCombine though. > >> > >> > >> — > >> Mehdi > >> > >> > >> > >> > >> Trying to find out, which patterns are important, and which are rare, I profiled clang using CTMark and got the following coverage report: > >> <InstCombine_covreport.html> > >> (beware, the file is ~6MB). > >> > >> Guided by this profile I moved some patterns under the "if (ExpensiveCombines)" check, which expectedly happened to be neutral for runtime performance, but improved compile-time. The testing results are below (measured for Os). > >> > >> Performance Improvements - Compile Time Δ Previous Current σ > >> CTMark/sqlite3/sqlite3 -1.55% 6.8155 6.7102 0.0081 > >> CTMark/mafft/pairlocalalign -1.05% 8.0407 7.9559 0.0193 > >> CTMark/ClamAV/clamscan -1.02% 11.3893 11.2734 0.0081 > >> CTMark/lencod/lencod -1.01% 12.8763 12.7461 0.0244 > >> CTMark/SPASS/SPASS -1.01% 12.5048 12.3791 0.0340 > >> > >> Performance Improvements - Compile Time Δ Previous Current σ > >> External/SPEC/CINT2006/403.gcc/403.gcc -1.64% 54.0801 53.1930 - > >> External/SPEC/CINT2006/400.perlbench/400.perlbench -1.25% 19.1481 18.9091 - > >> External/SPEC/CINT2006/445.gobmk/445.gobmk -1.01% 15.2819 15.1274 - > >> > >> > >> Do such changes make sense? The patch doesn't change O3, but it does change Os and potentially can change performance there (though I didn't see any changes in my tests). > >> > >> The patch is attached for the reference, if we decide to go for it, I'll upload it to phab: > >> > >> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch> > >> > >> > >> Thanks, > >> Michael > >> > >> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >> > >> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >> > >> > >> -- > >> Hal Finkel > >> Lead, Compiler Technology and Programming Languages > >> Leadership Computing Facility > >> Argonne National Laboratory > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >> > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/44fa1bcd/attachment.html>