thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] Path forward on profile guided inlining? [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Easwaran Raman via llvm-dev

2015-Dec-07 23:13 UTC

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

(Resending after  removing llvmdev at cs.uiuc.edu and using
llvm-dev at lists.llvm.org)

On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com>
wrote:
> Hi Philip,
>
>  Is there any update on this? I've been sending patches to get rid of
the
> callee hotness based inline hints from the frontend and move the logic to
> the inliner. The next step is to use the callsite hotness instead. I also
> want to focus on the infrastructure to enable this and what I've been
> experimenting with is similar to your two alternative approaches:
>
>
>> Alternate Approaches:
>> 1) We could just recompute BFI explicitly in the inliner right before
>> passing the result to ICA for the purposes of prototyping. If this was
off
>> by default, this might be a reasonable scheme for investigation.  This
>> could likely never be enabled for real uses.
>> 2) We could pre-compute BFI once per function within a given SCC and
then
>> try to keep it up to date during inlining.  If we cached the call
>> frequencies for the initial call sites, we could adjust the visit order
to
>> minimize the number of times we need to recompute a given functions
block
>> frequencies.  (e.g. we can look at all the original call sites within a
>> function before looking at newly inlined ones)
>>
>>
> My proposal is very similar (perhaps identical) to your option 2 above. I
> don't understand the part where you talk about adjusting the visit
order to
> minimize BFI computation.
>
> BFI computation:  BFI for a function is computed on demand and cached.
>
> Update: When 'bar' gets inlined into 'foo', the BFI for
'foo' is updated.
> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. 
NewBB's block frequency
> can be incrementally computed from OldBB's block frequency, entry block
> frequency of 'bar' and the frequency of the block containing the
'foo' ->
> 'bar' callsite. Even when the new CGSCC level BFI analysis is in
place,
> this incremental update is useful to minimize computation.
>
> Invalidation:  Once inlining is completed in an SCC (at the end of
> runOnSCC), the entries for functions in that SCC are invalidated since
> other passes run by the CGSCC pass manager (including those run by the
> function pass manager run under CGSCC pass manager) might affect the
> computed BFI for the functions in the SCC.
>
> When the new PM infrastructure and a CGSCC based BFI analysis is in place,
> the transition should be easy assuming it will provide getBFI(Function *)
> and invalidateBFI(Function *) interfaces. BFI for a function is computed at
> most twice in this approach. Thoughts?
>
>
> Thanks,
> Easwaran
>
>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151207/97f64942/attachment.html>

Xinliang David Li via llvm-dev

2015-Dec-08 17:20 UTC

head link

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

This makes great sense to me. I am looking forward to the patch --
this is a critical for PGO performance tuning.

thanks,

David

On Mon, Dec 7, 2015 at 3:13 PM, Easwaran Raman <eraman at google.com>
wrote:> (Resending after  removing llvmdev at cs.uiuc.edu and using
> llvm-dev at lists.llvm.org)
>
> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com>
wrote:
>>
>> Hi Philip,
>>
>>  Is there any update on this? I've been sending patches to get rid
of the
>> callee hotness based inline hints from the frontend and move the logic
to
>> the inliner. The next step is to use the callsite hotness instead. I
also
>> want to focus on the infrastructure to enable this and what I've
been
>> experimenting with is similar to your two alternative approaches:
>>
>>>
>>> Alternate Approaches:
>>> 1) We could just recompute BFI explicitly in the inliner right
before
>>> passing the result to ICA for the purposes of prototyping. If this
was off
>>> by default, this might be a reasonable scheme for investigation. 
This could
>>> likely never be enabled for real uses.
>>> 2) We could pre-compute BFI once per function within a given SCC
and then
>>> try to keep it up to date during inlining.  If we cached the call
>>> frequencies for the initial call sites, we could adjust the visit
order to
>>> minimize the number of times we need to recompute a given functions
block
>>> frequencies.  (e.g. we can look at all the original call sites
within a
>>> function before looking at newly inlined ones)
>>>
>>
>> My proposal is very similar (perhaps identical) to your option 2 above.
I
>> don't understand the part where you talk about adjusting the visit
order to
>> minimize BFI computation.
>>
>> BFI computation:  BFI for a function is computed on demand and cached.
>>
>> Update: When 'bar' gets inlined into 'foo', the BFI for
'foo' is updated.
>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. 
NewBB's block frequency
>> can be incrementally computed from OldBB's block frequency, entry
block
>> frequency of 'bar' and the frequency of the block containing
the 'foo' ->
>> 'bar' callsite. Even when the new CGSCC level BFI analysis is
in place, this
>> incremental update is useful to minimize computation.
>>
>> Invalidation:  Once inlining is completed in an SCC (at the end of
>> runOnSCC), the entries for functions in that SCC are invalidated since
other
>> passes run by the CGSCC pass manager (including those run by the
function
>> pass manager run under CGSCC pass manager) might affect the computed
BFI for
>> the functions in the SCC.
>>
>> When the new PM infrastructure and a CGSCC based BFI analysis is in
place,
>> the transition should be easy assuming it will provide getBFI(Function
*)
>> and invalidateBFI(Function *) interfaces. BFI for a function is
computed at
>> most twice in this approach. Thoughts?
>>
>>
>> Thanks,
>> Easwaran
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>

Philip Reames via llvm-dev

2015-Dec-11 00:00 UTC

head link

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

Given I didn't get any response to my original query, I chose not to 
invest time in this at the time.  I am unlikely to get time for this in 
the near future.

On 12/07/2015 03:13 PM, Easwaran Raman wrote:> (Resending after  removing llvmdev at cs.uiuc.edu 
> <mailto:llvmdev at cs.uiuc.edu> and using llvm-dev at lists.llvm.org 
> <mailto:llvm-dev at lists.llvm.org>)
>
> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com 
> <mailto:eraman at google.com>> wrote:
>
>     Hi Philip,
>
>      Is there any update on this? I've been sending patches to get rid
>     of the callee hotness based inline hints from the frontend and
>     move the logic to the inliner. The next step is to use the
>     callsite hotness instead. I also want to focus on the
>     infrastructure to enable this and what I've been experimenting
>     with is similar to your two alternative approaches:
>
>
>         Alternate Approaches:
>         1) We could just recompute BFI explicitly in the inliner right
>         before passing the result to ICA for the purposes of
>         prototyping. If this was off by default, this might be a
>         reasonable scheme for investigation.  This could likely never
>         be enabled for real uses.
>         2) We could pre-compute BFI once per function within a given
>         SCC and then try to keep it up to date during inlining.  If we
>         cached the call frequencies for the initial call sites, we
>         could adjust the visit order to minimize the number of times
>         we need to recompute a given functions block frequencies. 
>         (e.g. we can look at all the original call sites within a
>         function before looking at newly inlined ones)
>
>
>     My proposal is very similar (perhaps identical) to your option 2
>     above. I don't understand the part where you talk about adjusting
>     the visit order to minimize BFI computation.
>
>     BFI computation:  BFI for a function is computed on demand and
>     cached.
>The problem here is that the old pass manager does not have a place/way 
to cache the function level analysis.  You could do this within the 
inliner itself, but the function passes run by the SCC pass manager will 
invalidate everything.  That's the problem I was trying to
solve.>
>
>     Update: When 'bar' gets inlined into 'foo', the BFI for
'foo' is
>     updated. Let OldBB in 'bar' gets cloned as NewBB in
'foo'.
>     NewBB's block frequency can be incrementally computed from
OldBB's
>     block frequency, entry block frequency of 'bar' and the
frequency
>     of the block containing the 'foo' -> 'bar' callsite.
Even when the
>     new CGSCC level BFI analysis is in place, this incremental update
>     is useful to minimize computation.
>
>     Invalidation:  Once inlining is completed in an SCC (at the end of
>     runOnSCC), the entries for functions in that SCC are invalidated
>     since other passes run by the CGSCC pass manager (including those
>     run by the function pass manager run under CGSCC pass manager)
>     might affect the computed BFI for the functions in the SCC.
>This sounds like you're essentially planning to extend the old pass 
manager with function level analysis support in the SCC pass manager.  I 
advise against this given the fact that Chandler is actively working on 
the new one and getting patches reviewed for the old one is 
"challenging" at best.  If you take this approach, make sure Chandler
-
who is pretty much the only active reviewer in this area - is on-board 
with your approach before you start.>
>
>     When the new PM infrastructure and a CGSCC based BFI analysis is
>     in place, the transition should be easy assuming it will provide
>     getBFI(Function *) and invalidateBFI(Function *) interfaces. BFI
>     for a function is computed at most twice in this approach. Thoughts?
>
>
>     Thanks,
>     Easwaran
>
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/64951b8d/attachment.html>

Xinliang David Li via llvm-dev

2015-Dec-11 00:29 UTC

head link

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames
<listmail at philipreames.com> wrote:> Given I didn't get any response to my original query, I chose not to
invest
> time in this at the time.  I am unlikely to get time for this in the near
> future.
>
> On 12/07/2015 03:13 PM, Easwaran Raman wrote:
>
> (Resending after  removing llvmdev at cs.uiuc.edu and using
> llvm-dev at lists.llvm.org)
>
> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com>
wrote:
>>
>> Hi Philip,
>>
>>  Is there any update on this? I've been sending patches to get rid
of the
>> callee hotness based inline hints from the frontend and move the logic
to
>> the inliner. The next step is to use the callsite hotness instead. I
also
>> want to focus on the infrastructure to enable this and what I've
been
>> experimenting with is similar to your two alternative approaches:
>>
>>>
>>> Alternate Approaches:
>>> 1) We could just recompute BFI explicitly in the inliner right
before
>>> passing the result to ICA for the purposes of prototyping. If this
was off
>>> by default, this might be a reasonable scheme for investigation. 
This could
>>> likely never be enabled for real uses.
>>> 2) We could pre-compute BFI once per function within a given SCC
and then
>>> try to keep it up to date during inlining.  If we cached the call
>>> frequencies for the initial call sites, we could adjust the visit
order to
>>> minimize the number of times we need to recompute a given functions
block
>>> frequencies.  (e.g. we can look at all the original call sites
within a
>>> function before looking at newly inlined ones)
>>>
>>
>> My proposal is very similar (perhaps identical) to your option 2 above.
I
>> don't understand the part where you talk about adjusting the visit
order to
>> minimize BFI computation.
>>
>> BFI computation:  BFI for a function is computed on demand and cached.
>
> The problem here is that the old pass manager does not have a place/way to
> cache the function level analysis.  You could do this within the inliner
> itself, but the function passes run by the SCC pass manager will invalidate
> everything.  That's the problem I was trying to solve.
>>
Easwaran's proposal does not rely on the analysis pass framework of
the legacy pass manager -- that is the key idea. BFI info is designed
in a way such that it can be computed on-demand.

BFI for the caller will be incrementally updated during inlining, but
after inlining is done for a node, the BFI info will be explicitly
invalidated for that node due to the function passes run later by SCC
pass manager.  This is expected.
>>
>> Update: When 'bar' gets inlined into 'foo', the BFI for
'foo' is updated.
>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. 
NewBB's block frequency
>> can be incrementally computed from OldBB's block frequency, entry
block
>> frequency of 'bar' and the frequency of the block containing
the 'foo' ->
>> 'bar' callsite. Even when the new CGSCC level BFI analysis is
in place, this
>> incremental update is useful to minimize computation.
>>
>> Invalidation:  Once inlining is completed in an SCC (at the end of
>> runOnSCC), the entries for functions in that SCC are invalidated since
other
>> passes run by the CGSCC pass manager (including those run by the
function
>> pass manager run under CGSCC pass manager) might affect the computed
BFI for
>> the functions in the SCC.
>
> This sounds like you're essentially planning to extend the old pass
manager
> with function level analysis support in the SCC pass manager.  I advise
> against this given the fact that Chandler is actively working on the new
one
> and getting patches reviewed for the old one is "challenging" at
best.  If
> you take this approach, make sure Chandler - who is pretty much the only
> active reviewer in this area - is on-board with your approach before you
> start.
>>
There is no plan to extend old pass manager for this capability.  The
proposed solution is intended to work with the new pass manager too,
so there is no conflict here. The BFI book-keeping by inliner can be
removed with pass manager change is in place.

David

>>
>> When the new PM infrastructure and a CGSCC based BFI analysis is in
place,
>> the transition should be easy assuming it will provide getBFI(Function
*)
>> and invalidateBFI(Function *) interfaces. BFI for a function is
computed at
>> most twice in this approach. Thoughts?
>>
>>
>> Thanks,
>> Easwaran
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>

Easwaran Raman via llvm-dev

2015-Dec-11 01:03 UTC

head link

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

Clearly I haven't done a good job explaining my proposal. I don't
propose
extending the pass manager to provide function level analysis. I proposed
to do something like this:

/// \brief Get BlockFrequencyInfo for a function.
BlockFrequencyInfo *BlockFrequencyAnalysis::getBlockFrequencyInfo(Function
*F) {
  // BFM is a map from Function * to BlockFrequencyInfo * and
  // caches BlockFrequencyInfo *
  auto Iter = BFM.find(F);
  if (Iter != BFM.end()) return Iter->second;
  // We need to create a BlockFrequencyInfo object for F and store it.
  //
  // DominatorTree, LoopInfo and BranchProbabilityInfo can be thrown
  // away after BFI is computed.

  // First create a dominator tree
  DominatorTree DT;
  DT.recalculate(*F);

  // Then get a LoopInfo object
  LoopInfo LI;
  LI.analyze(DT);

  // Then compute BranchProbabilityInfo
  BranchProbabilityInfo BPI;
  BPI.calculate(*F, LI);

  // Finally compute BlockFrequencyInfo
  BlockFrequencyInfo *BFI = new BlockFrequencyInfo;
  BFI->calculate(*F, BPI, LI);
  BFM[F] = BFI;
  return BFI;
}


This recomputes all analysis needed to compute BFI and doesn't depend on
the analysis framework. This is likely more computation (no benefit when a
dependent analysis is preserved) and is used only by the inliner and not
shared by passes like a true analysis. The invalidation needs to be done
only once per function after it ceases to be a caller in inlining. Within
inliner itself, the BFI data is incrementally updated.

Thanks,
Easwaran


On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames <listmail at
philipreames.com>
wrote:
> Given I didn't get any response to my original query, I chose not to
> invest time in this at the time.  I am unlikely to get time for this in the
> near future.
>
> On 12/07/2015 03:13 PM, Easwaran Raman wrote:
>
> (Resending after  removing  <llvmdev at cs.uiuc.edu>llvmdev at
cs.uiuc.edu and
> using llvm-dev at lists.llvm.org)
>
> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com>
wrote:
>
>> Hi Philip,
>>
>>  Is there any update on this? I've been sending patches to get rid
of the
>> callee hotness based inline hints from the frontend and move the logic
to
>> the inliner. The next step is to use the callsite hotness instead. I
also
>> want to focus on the infrastructure to enable this and what I've
been
>> experimenting with is similar to your two alternative approaches:
>>
>>
>>> Alternate Approaches:
>>> 1) We could just recompute BFI explicitly in the inliner right
before
>>> passing the result to ICA for the purposes of prototyping. If this
was off
>>> by default, this might be a reasonable scheme for investigation. 
This
>>> could likely never be enabled for real uses.
>>> 2) We could pre-compute BFI once per function within a given SCC
and
>>> then try to keep it up to date during inlining.  If we cached the
call
>>> frequencies for the initial call sites, we could adjust the visit
order to
>>> minimize the number of times we need to recompute a given functions
block
>>> frequencies.  (e.g. we can look at all the original call sites
within a
>>> function before looking at newly inlined ones)
>>>
>>>
>> My proposal is very similar (perhaps identical) to your option 2 above.
I
>> don't understand the part where you talk about adjusting the visit
order to
>> minimize BFI computation.
>>
>> BFI computation:  BFI for a function is computed on demand and cached.
>>
> The problem here is that the old pass manager does not have a place/way to
> cache the function level analysis.  You could do this within the inliner
> itself, but the function passes run by the SCC pass manager will invalidate
> everything.  That's the problem I was trying to solve.
>
>
>> Update: When 'bar' gets inlined into 'foo', the BFI for
'foo' is updated.
>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. 
NewBB's block frequency
>> can be incrementally computed from OldBB's block frequency, entry
block
>> frequency of 'bar' and the frequency of the block containing
the 'foo' ->
>> 'bar' callsite. Even when the new CGSCC level BFI analysis is
in place,
>> this incremental update is useful to minimize computation.
>>
>> Invalidation:  Once inlining is completed in an SCC (at the end of
>> runOnSCC), the entries for functions in that SCC are invalidated since
>> other passes run by the CGSCC pass manager (including those run by the
>> function pass manager run under CGSCC pass manager) might affect the
>> computed BFI for the functions in the SCC.
>>
> This sounds like you're essentially planning to extend the old pass
> manager with function level analysis support in the SCC pass manager.  I
> advise against this given the fact that Chandler is actively working on the
> new one and getting patches reviewed for the old one is
"challenging" at
> best.  If you take this approach, make sure Chandler - who is pretty much
> the only active reviewer in this area - is on-board with your approach
> before you start.
>
>
>> When the new PM infrastructure and a CGSCC based BFI analysis is in
>> place, the transition should be easy assuming it will provide
>> getBFI(Function *) and invalidateBFI(Function *) interfaces. BFI for a
>> function is computed at most twice in this approach. Thoughts?
>>
>>
>> Thanks,
>> Easwaran
>>
>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu          <http://llvm.cs.uiuc.edu>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/629a8794/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Dec 2015 - [LLVMdev] Path forward on profile guided inlining?

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

Maybe Matching Threads