Xinliang David Li via llvm-dev
2015-Dec-11 00:29 UTC
[llvm-dev] [LLVMdev] Path forward on profile guided inlining?
On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames <listmail at philipreames.com> wrote:> Given I didn't get any response to my original query, I chose not to invest > time in this at the time. I am unlikely to get time for this in the near > future. > > On 12/07/2015 03:13 PM, Easwaran Raman wrote: > > (Resending after removing llvmdev at cs.uiuc.edu and using > llvm-dev at lists.llvm.org) > > On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com> wrote: >> >> Hi Philip, >> >> Is there any update on this? I've been sending patches to get rid of the >> callee hotness based inline hints from the frontend and move the logic to >> the inliner. The next step is to use the callsite hotness instead. I also >> want to focus on the infrastructure to enable this and what I've been >> experimenting with is similar to your two alternative approaches: >> >>> >>> Alternate Approaches: >>> 1) We could just recompute BFI explicitly in the inliner right before >>> passing the result to ICA for the purposes of prototyping. If this was off >>> by default, this might be a reasonable scheme for investigation. This could >>> likely never be enabled for real uses. >>> 2) We could pre-compute BFI once per function within a given SCC and then >>> try to keep it up to date during inlining. If we cached the call >>> frequencies for the initial call sites, we could adjust the visit order to >>> minimize the number of times we need to recompute a given functions block >>> frequencies. (e.g. we can look at all the original call sites within a >>> function before looking at newly inlined ones) >>> >> >> My proposal is very similar (perhaps identical) to your option 2 above. I >> don't understand the part where you talk about adjusting the visit order to >> minimize BFI computation. >> >> BFI computation: BFI for a function is computed on demand and cached. > > The problem here is that the old pass manager does not have a place/way to > cache the function level analysis. You could do this within the inliner > itself, but the function passes run by the SCC pass manager will invalidate > everything. That's the problem I was trying to solve. >>Easwaran's proposal does not rely on the analysis pass framework of the legacy pass manager -- that is the key idea. BFI info is designed in a way such that it can be computed on-demand. BFI for the caller will be incrementally updated during inlining, but after inlining is done for a node, the BFI info will be explicitly invalidated for that node due to the function passes run later by SCC pass manager. This is expected.>> >> Update: When 'bar' gets inlined into 'foo', the BFI for 'foo' is updated. >> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. NewBB's block frequency >> can be incrementally computed from OldBB's block frequency, entry block >> frequency of 'bar' and the frequency of the block containing the 'foo' -> >> 'bar' callsite. Even when the new CGSCC level BFI analysis is in place, this >> incremental update is useful to minimize computation. >> >> Invalidation: Once inlining is completed in an SCC (at the end of >> runOnSCC), the entries for functions in that SCC are invalidated since other >> passes run by the CGSCC pass manager (including those run by the function >> pass manager run under CGSCC pass manager) might affect the computed BFI for >> the functions in the SCC. > > This sounds like you're essentially planning to extend the old pass manager > with function level analysis support in the SCC pass manager. I advise > against this given the fact that Chandler is actively working on the new one > and getting patches reviewed for the old one is "challenging" at best. If > you take this approach, make sure Chandler - who is pretty much the only > active reviewer in this area - is on-board with your approach before you > start. >>There is no plan to extend old pass manager for this capability. The proposed solution is intended to work with the new pass manager too, so there is no conflict here. The BFI book-keeping by inliner can be removed with pass manager change is in place. David>> >> When the new PM infrastructure and a CGSCC based BFI analysis is in place, >> the transition should be easy assuming it will provide getBFI(Function *) >> and invalidateBFI(Function *) interfaces. BFI for a function is computed at >> most twice in this approach. Thoughts? >> >> >> Thanks, >> Easwaran >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > >
Philip Reames via llvm-dev
2015-Dec-11 00:51 UTC
[llvm-dev] [LLVMdev] Path forward on profile guided inlining?
On 12/10/2015 04:29 PM, Xinliang David Li wrote:> On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames > <listmail at philipreames.com> wrote: >> Given I didn't get any response to my original query, I chose not to invest >> time in this at the time. I am unlikely to get time for this in the near >> future. >> >> On 12/07/2015 03:13 PM, Easwaran Raman wrote: >> >> (Resending after removing llvmdev at cs.uiuc.edu and using >> llvm-dev at lists.llvm.org) >> >> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com> wrote: >>> Hi Philip, >>> >>> Is there any update on this? I've been sending patches to get rid of the >>> callee hotness based inline hints from the frontend and move the logic to >>> the inliner. The next step is to use the callsite hotness instead. I also >>> want to focus on the infrastructure to enable this and what I've been >>> experimenting with is similar to your two alternative approaches: >>> >>>> Alternate Approaches: >>>> 1) We could just recompute BFI explicitly in the inliner right before >>>> passing the result to ICA for the purposes of prototyping. If this was off >>>> by default, this might be a reasonable scheme for investigation. This could >>>> likely never be enabled for real uses. >>>> 2) We could pre-compute BFI once per function within a given SCC and then >>>> try to keep it up to date during inlining. If we cached the call >>>> frequencies for the initial call sites, we could adjust the visit order to >>>> minimize the number of times we need to recompute a given functions block >>>> frequencies. (e.g. we can look at all the original call sites within a >>>> function before looking at newly inlined ones) >>>> >>> My proposal is very similar (perhaps identical) to your option 2 above. I >>> don't understand the part where you talk about adjusting the visit order to >>> minimize BFI computation. >>> >>> BFI computation: BFI for a function is computed on demand and cached. >> The problem here is that the old pass manager does not have a place/way to >> cache the function level analysis. You could do this within the inliner >> itself, but the function passes run by the SCC pass manager will invalidate >> everything. That's the problem I was trying to solve. > Easwaran's proposal does not rely on the analysis pass framework of > the legacy pass manager -- that is the key idea. BFI info is designed > in a way such that it can be computed on-demand. > > BFI for the caller will be incrementally updated during inlining, but > after inlining is done for a node, the BFI info will be explicitly > invalidated for that node due to the function passes run later by SCC > pass manager. This is expected.Just to make sure, you're aware this will cost at least one re-computation per iteration in CGPassManager::runOnModule right? If that's expected and acceptable, then yes, the approach you seem to be describing should be workable. Particularly as an intermediate step to parallelize work with the new pass manager, this sounds very much worthwhile.> >>> Update: When 'bar' gets inlined into 'foo', the BFI for 'foo' is updated. >>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. NewBB's block frequency >>> can be incrementally computed from OldBB's block frequency, entry block >>> frequency of 'bar' and the frequency of the block containing the 'foo' -> >>> 'bar' callsite. Even when the new CGSCC level BFI analysis is in place, this >>> incremental update is useful to minimize computation. >>> >>> Invalidation: Once inlining is completed in an SCC (at the end of >>> runOnSCC), the entries for functions in that SCC are invalidated since other >>> passes run by the CGSCC pass manager (including those run by the function >>> pass manager run under CGSCC pass manager) might affect the computed BFI for >>> the functions in the SCC. >> This sounds like you're essentially planning to extend the old pass manager >> with function level analysis support in the SCC pass manager. I advise >> against this given the fact that Chandler is actively working on the new one >> and getting patches reviewed for the old one is "challenging" at best. If >> you take this approach, make sure Chandler - who is pretty much the only >> active reviewer in this area - is on-board with your approach before you >> start. > There is no plan to extend old pass manager for this capability. The > proposed solution is intended to work with the new pass manager too, > so there is no conflict here. The BFI book-keeping by inliner can be > removed with pass manager change is in place.Thanks for the clarification.> > David > > >>> When the new PM infrastructure and a CGSCC based BFI analysis is in place, >>> the transition should be easy assuming it will provide getBFI(Function *) >>> and invalidateBFI(Function *) interfaces. BFI for a function is computed at >>> most twice in this approach. Thoughts? >>> >>> >>> Thanks, >>> Easwaran >>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>
Xinliang David Li via llvm-dev
2015-Dec-11 01:03 UTC
[llvm-dev] [LLVMdev] Path forward on profile guided inlining?
On Thu, Dec 10, 2015 at 4:51 PM, Philip Reames <listmail at philipreames.com> wrote:> > > On 12/10/2015 04:29 PM, Xinliang David Li wrote: >> >> On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames >> <listmail at philipreames.com> wrote: >>> >>> Given I didn't get any response to my original query, I chose not to >>> invest >>> time in this at the time. I am unlikely to get time for this in the near >>> future. >>> >>> On 12/07/2015 03:13 PM, Easwaran Raman wrote: >>> >>> (Resending after removing llvmdev at cs.uiuc.edu and using >>> llvm-dev at lists.llvm.org) >>> >>> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com> wrote: >>>> >>>> Hi Philip, >>>> >>>> Is there any update on this? I've been sending patches to get rid of >>>> the >>>> callee hotness based inline hints from the frontend and move the logic >>>> to >>>> the inliner. The next step is to use the callsite hotness instead. I >>>> also >>>> want to focus on the infrastructure to enable this and what I've been >>>> experimenting with is similar to your two alternative approaches: >>>> >>>>> Alternate Approaches: >>>>> 1) We could just recompute BFI explicitly in the inliner right before >>>>> passing the result to ICA for the purposes of prototyping. If this was >>>>> off >>>>> by default, this might be a reasonable scheme for investigation. This >>>>> could >>>>> likely never be enabled for real uses. >>>>> 2) We could pre-compute BFI once per function within a given SCC and >>>>> then >>>>> try to keep it up to date during inlining. If we cached the call >>>>> frequencies for the initial call sites, we could adjust the visit order >>>>> to >>>>> minimize the number of times we need to recompute a given functions >>>>> block >>>>> frequencies. (e.g. we can look at all the original call sites within a >>>>> function before looking at newly inlined ones) >>>>> >>>> My proposal is very similar (perhaps identical) to your option 2 above. >>>> I >>>> don't understand the part where you talk about adjusting the visit order >>>> to >>>> minimize BFI computation. >>>> >>>> BFI computation: BFI for a function is computed on demand and cached. >>> >>> The problem here is that the old pass manager does not have a place/way >>> to >>> cache the function level analysis. You could do this within the inliner >>> itself, but the function passes run by the SCC pass manager will >>> invalidate >>> everything. That's the problem I was trying to solve. >> >> Easwaran's proposal does not rely on the analysis pass framework of >> the legacy pass manager -- that is the key idea. BFI info is designed >> in a way such that it can be computed on-demand. >> >> BFI for the caller will be incrementally updated during inlining, but >> after inlining is done for a node, the BFI info will be explicitly >> invalidated for that node due to the function passes run later by SCC >> pass manager. This is expected. > > Just to make sure, you're aware this will cost at least one re-computation > per iteration in CGPassManager::runOnModule right? If that's expected and > acceptable, then yes, the approach you seem to be describing should be > workable. >For each leaf node, the number of BFI computation is <= 1. For other nodes, it is <=2 (one as caller, one as callee) -- in other words, it is linear to the number of functions.> Particularly as an intermediate step to parallelize work with the new pass > manager, this sounds very much worthwhile. >>Yes, that is exactly the intention -- it can buy pass manager rework more time while unlock lots of pending performance tuning work we plan to do. thanks, David>> >>>> Update: When 'bar' gets inlined into 'foo', the BFI for 'foo' is >>>> updated. >>>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. NewBB's block >>>> frequency >>>> can be incrementally computed from OldBB's block frequency, entry block >>>> frequency of 'bar' and the frequency of the block containing the 'foo' >>>> -> >>>> 'bar' callsite. Even when the new CGSCC level BFI analysis is in place, >>>> this >>>> incremental update is useful to minimize computation. >>>> >>>> Invalidation: Once inlining is completed in an SCC (at the end of >>>> runOnSCC), the entries for functions in that SCC are invalidated since >>>> other >>>> passes run by the CGSCC pass manager (including those run by the >>>> function >>>> pass manager run under CGSCC pass manager) might affect the computed BFI >>>> for >>>> the functions in the SCC. >>> >>> This sounds like you're essentially planning to extend the old pass >>> manager >>> with function level analysis support in the SCC pass manager. I advise >>> against this given the fact that Chandler is actively working on the new >>> one >>> and getting patches reviewed for the old one is "challenging" at best. >>> If >>> you take this approach, make sure Chandler - who is pretty much the only >>> active reviewer in this area - is on-board with your approach before you >>> start. >> >> There is no plan to extend old pass manager for this capability. The >> proposed solution is intended to work with the new pass manager too, >> so there is no conflict here. The BFI book-keeping by inliner can be >> removed with pass manager change is in place. > > Thanks for the clarification. > >> >> David >> >> >>>> When the new PM infrastructure and a CGSCC based BFI analysis is in >>>> place, >>>> the transition should be easy assuming it will provide getBFI(Function >>>> *) >>>> and invalidateBFI(Function *) interfaces. BFI for a function is computed >>>> at >>>> most twice in this approach. Thoughts? >>>> >>>> >>>> Thanks, >>>> Easwaran >>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >