On Fri, Jul 31, 2015 at 1:15 PM, Diego Novillo <dnovillo at google.com>
wrote:
> Dehao and I have been discussing changes we need to make to SamplePGO to
> make it more effective.
>
> Currently, SamplePGO is a scalar pass that limits itself to add branch
> weight annotations. It runs pretty early in the pipeline, so this is fine
> for other scalar passes that want to use profile data (block layout and
> regalloc).
>
> However, it does nothing to help module passes. Notably, the inliner. What
> Dehao has found in his experience with GCC is that in order to help the
> inliner, SamplePGO needs to become a module pass.
>
> Mainly, it needs to be able to affect inlining decisions. If a branch
> into a call site has many samples, we want to tell the inliner about it so
> it increases the inlinining score for that call site.
>
> Additionally, SamplePGO may need to actually perform some inlining before
> the inliner runs. This is needed to better match the samples obtained from
> optimized binaries. For example, suppose the binary had 3 functions A(),
> B() and C() all calling function foo(). When the code is executed assume
> that A() has many samples (i.e., it's hot) while B() and C() have no
> samples.
>
> Also assume that foo() was originally inlined in A(), B() and C(). When
> SamplePGO is analyzing function A(), it will find samples for the inlined
> copy of foo().
>
> At that point, SamplePGO may want to perform the inline of foo() into
> A()'s call site so that it can better match the samples it gets from
the
> profile. At the same time, since B() and C() had no/little samples to
> them, it wants to mark the respective call sites cold so the inliner
> doesn't bother with them.
>
> Chandler, is this something we can realistically do? I believe the first
> step would be to make SamplePGO a module pass, make sure it runs before the
> inliner and then we can see how we can implement the above behaviour, or
> some variant of it that provides the same benefit (e.g., cloning).
>
Context sensitive profile matching is one of the most important features
for SamplePGO performance. If we don't make use that information from
inline instance profiles, SamplePGO will not have a chance to match
instrumentation based PGO performance, period. On the other hand, if the
information is used, SamplePGO has more advantages.
So the question is not whether this needs to be done or not, but instead
whether using ModulePass is the right way to do it (but looks like so).
>
> Something similar will be needed for devirtualization and indirect calls.
> Sampling exposes actual devirtualization and indirect call opportunities.
>
yes.
thanks,
David
>
>
> Thanks. Diego.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150731/97332d79/attachment.html>