thr3ads.net - llvm dev - [LLVMdev] RFC - Improvements to PGO profile support [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Diego Novillo

2015-Feb-24 23:31 UTC

[LLVMdev] RFC - Improvements to PGO profile support

We (Google) have started to look more closely at the profiling
infrastructure in LLVM. Internally, we have a large dependency on PGO to
get peak performance in generated code.

Some of the dependencies we have on profiling are still not present in LLVM
(e.g., the inliner) but we will still need to incorporate changes to
support our work on these optimizations. Some of the changes may be
addressed as individual bug fixes on the existing profiling infrastructure.
Other changes  may be better implemented as either new extensions or as
replacements of existing code.

I think we will try to minimize infrastructure replacement at least in the
short/medium term. After all, it doesn't make too much sense to replace
infrastructure that is broken for code that doesn't exist yet.

David Li and I are preparing a document where we describe the major issues
that we'd like to address. The document is a bit on the lengthy side, so it
may be easier to start with an email discussion. This is a summary of the
main changes we are looking at:

   1. Need to faithfully represent the execution count taken from dynamic
   profiles. Currently, MD_prof does not really represent an execution
   count. This makes things like comparing hotness across functions hard or
   impossible. We need a concept of global hotness.
   2. When the CFG or callgraph change, there need to exist an API for
   incrementally updating/scaling counts. For instance, when a function is
   inlined or partially inlined, when the CFG is modified, etc. These counts
   need to be updated incrementally (or perhaps re-computed as a first step
   into that direction).
   3. The inliner (and other optimizations) needs to use profile
   information and update it accordingly. This is predicated on Chandler's
   work on the pass manager, of course.
   Need to represent global profile summary data. For example, for global
   hotness determination, it is useful to compute additional global summary
   info, such as a histogram of counts that can be used to determine hotness
   and working set size estimates for a large percentage of the profiled
   execution.

There are other changes that we will need to incorporate. David, Teresa,
Chandler, please add anything large that I missed.

My main question at the moment is what would be the best way of addressing
them. Some seem to require new concepts to be implemented (e.g., execution
counts). Others could be addressed as simple bugs to be fixed in the
current framework.

Would it make sense to present everything in a unified document and discuss
that? I've got some reservations about that approach because we will end up
discussing everything at once and it may not lead to concrete progress.
Another approach would be to present each issue individually either as
patches or RFCs or bugs.

I will be taking on the implementation of several of these issues. Some of
them involve the SamplePGO harness that I added last year. I would also
like to know what other bugs or problems people have in mind that I could
also roll into this work.


Thanks. Diego.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/f2b0ec58/attachment.html>

Xinliang David Li

2015-Feb-24 23:47 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

I don't mind whatever ways we take to upstream -- all these are
generally good problems to solve. I expect discussions more on
approaches to tackle the problem, not the problems themselves.

David

On Tue, Feb 24, 2015 at 3:31 PM, Diego Novillo <dnovillo at google.com>
wrote:>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to
support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other
changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion. This is a summary of the
> main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the
pass
> manager, of course.
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.

Justin Bogner

2015-Feb-25 01:00 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

Diego Novillo <dnovillo at google.com> writes:> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion. This is a summary of the
main
> changes we are looking at:
>
>  1. Need to faithfully represent the execution count taken from dynamic
>     profiles. Currently, MD_prof does not really represent an execution
count.
>     This makes things like comparing hotness across functions hard or
>     impossible. We need a concept of global hotness.
>  2. When the CFG or callgraph change, there need to exist an API for
>     incrementally updating/scaling counts. For instance, when a function is
>     inlined or partially inlined, when the CFG is modified, etc. These
counts
>     need to be updated incrementally (or perhaps re-computed as a first
step
>     into that direction).
>  3. The inliner (and other optimizations) needs to use profile information
and
>     update it accordingly. This is predicated on Chandler's work on the
pass
>     manager, of course.
>     Need to represent global profile summary data. For example, for global
>     hotness determination, it is useful to compute additional global
summary
>     info, such as a histogram of counts that can be used to determine
hotness
>     and working set size estimates for a large percentage of the profiled
>     execution.
Great! Looking forward to hearing more.
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
patches
> or RFCs or bugs.
While a unified document is likely to lead to a an unfocused discussion,
talking about the problems piecemeal will make it harder to think about
the big picture. I suspect that an overview of the issues and some brief
notes on how you're thinking of approaching each will be useful. We can
break out separate discussions from there on any points that are
contentious or otherwise need to be discussed in further detail.

Philip Reames

2015-Feb-25 18:52 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On 02/24/2015 03:31 PM, Diego Novillo wrote:>
> We (Google) have started to look more closely at the profiling 
> infrastructure in LLVM. Internally, we have a large dependency on PGO 
> to get peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in 
> LLVM (e.g., the inliner) but we will still need to incorporate changes 
> to support our work on these optimizations. Some of the changes may be 
> addressed as individual bug fixes on the existing profiling 
> infrastructure. Other changes  may be better implemented as either new 
> extensions or as replacements of existing code.
>
> I think we will try to minimize infrastructure replacement at least in 
> the short/medium term. After all, it doesn't make too much sense to 
> replace infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major 
> issues that we'd like to address. The document is a bit on the lengthy 
> side, so it may be easier to start with an email discussion.I would personally be interested in seeing a copy of that document, but 
it might be more appropriate for a blog post then a discussion on 
llvm-dev.  I worry that we'd end up with a very unfocused discussion.  
It might be better to frame this as your plan of attack and reserve 
discussion on llvm-dev for things that are being proposed semi near 
term.  Just my 2 cents.
> This is a summary of the main changes we are looking at:
>
>  1. Need to faithfully represent the execution count taken from
>     dynamic profiles. Currently, MD_prof does not really represent an
>     execution count. This makes things like comparing hotness across
>     functions hard or impossible. We need a concept of global hotness.
>What does MD_prof actually represent when used from Clang?  I know I've 
been using it for execution counters in my frontend.  Am I approaching 
that wrong?

As a side comment: I'm a bit leery of the notion of a consistent notion 
of hotness based on counters across functions.  These counters are 
almost always approximate in practice and counting problems run 
rampant.  I'd almost rather see a consistent count inferred from data 
that's assumed to be questionable than make the frontend try to generate 
consistent profiling metadata.  I think either approach could be made to 
work, we just need to think about it carefully.>
>  1. When the CFG or callgraph change, there need to exist an API for
>     incrementally updating/scaling counts. For instance, when a
>     function is inlined or partially inlined, when the CFG is
>     modified, etc. These counts need to be updated incrementally (or
>     perhaps re-computed as a first step into that direction).
>Agreed.  Do you have a sense how much of an issue this in practice? I 
haven't see it kick in much, but it's also not something I've been 
looking for.>
>  1. The inliner (and other optimizations) needs to use profile
>     information and update it accordingly. This is predicated on
>     Chandler's work on the pass manager, of course.
>Its worth noting that the inliner work can be done independently of the 
pass manager work.  We can always explicitly recompute relevant analysis 
in the inliner if needed.  This will cost compile time, so we might need 
to make this an off by default option.  (Maybe -O3 only?)  Being able to 
work on the inliner independently of the pass management structure is 
valuable enough that we should probably consider doing this.

PGO inlining is an area I'm very interested in.  I'd really encourage 
you to work incrementally in tree.  I'm likely to start putting 
non-trivial amounts of time into this topic in the next few weeks.  I 
just need to clear a few things off my plate first.

Other than the inliner, can you list the passes you think are profitable 
to teach about profiling data?  My list so far is: PRE (particularly of 
loads!), the vectorizer (i.e. duplicate work down both a hot and cold 
path when it can be vectorized on the hot path), LoopUnswitch, IRCE, & 
LoopUnroll (avoiding code size explosion in cold code).  I'm much more 
interested in sources of improved performance than I am simply code size 
reduction.  (Reducing code size can improve performance of
course.)>
>  1. Need to represent global profile summary data. For example, for
>     global hotness determination, it is useful to compute additional
>     global summary info, such as a histogram of counts that can be
>     used to determine hotness and working set size estimates for a
>     large percentage of the profiled execution.
>
Er, not clear what you're trying to say here?> There are other changes that we will need to incorporate. David, 
> Teresa, Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of 
> addressing them. Some seem to require new concepts to be implemented 
> (e.g., execution counts). Others could be addressed as simple bugs to 
> be fixed in the current framework.
>
> Would it make sense to present everything in a unified document and 
> discuss that? I've got some reservations about that approach because 
> we will end up discussing everything at once and it may not lead to 
> concrete progress. Another approach would be to present each issue 
> individually either as patches or RFCs or bugs.
See above.>
> I will be taking on the implementation of several of these issues. 
> Some of them involve the SamplePGO harness that I added last year. I 
> would also like to know what other bugs or problems people have in 
> mind that I could also roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150225/d5c8afaa/attachment.html>

Teresa Johnson

2015-Feb-25 20:11 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Wed, Feb 25, 2015 at 10:52 AM, Philip Reames
<listmail at philipreames.com> wrote:> On 02/24/2015 03:31 PM, Diego Novillo wrote:
>
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to
support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other
changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion.
>
> I would personally be interested in seeing a copy of that document, but it
> might be more appropriate for a blog post then a discussion on llvm-dev.  I
> worry that we'd end up with a very unfocused discussion.  It might be
better
> to frame this as your plan of attack and reserve discussion on llvm-dev for
> things that are being proposed semi near term.  Just my 2 cents.
>
> This is a summary of the main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
>
> What does MD_prof actually represent when used from Clang?  I know I've
been
> using it for execution counters in my frontend.  Am I approaching that
> wrong?
>
> As a side comment: I'm a bit leery of the notion of a consistent notion
of
> hotness based on counters across functions.  These counters are almost
> always approximate in practice and counting problems run rampant.  I'd
> almost rather see a consistent count inferred from data that's assumed
to be
> questionable than make the frontend try to generate consistent profiling
> metadata.  I think either approach could be made to work, we just need to
> think about it carefully.
>
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
>
> Agreed.  Do you have a sense how much of an issue this in practice?  I
> haven't see it kick in much, but it's also not something I've
been looking
> for.
>
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the
pass
> manager, of course.
>
> Its worth noting that the inliner work can be done independently of the
pass
> manager work.  We can always explicitly recompute relevant analysis in the
> inliner if needed.  This will cost compile time, so we might need to make
> this an off by default option.  (Maybe -O3 only?)  Being able to work on
the
> inliner independently of the pass management structure is valuable enough
> that we should probably consider doing this.
>
> PGO inlining is an area I'm very interested in.  I'd really
encourage you to
> work incrementally in tree.  I'm likely to start putting non-trivial
amounts
> of time into this topic in the next few weeks.  I just need to clear a few
> things off my plate first.
>
> Other than the inliner, can you list the passes you think are profitable to
> teach about profiling data?  My list so far is: PRE (particularly of
> loads!), the vectorizer (i.e. duplicate work down both a hot and cold path
> when it can be vectorized on the hot path), LoopUnswitch, IRCE, &
LoopUnroll
> (avoiding code size explosion in cold code).  I'm much more interested
in
> sources of improved performance than I am simply code size reduction.
> (Reducing code size can improve performance of course.)
Also, code layout (bb layout, function layout, function splitting).
>
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> Er, not clear what you're trying to say here?
The idea is to get a sense of a good global profile count threshold to
use given an application's profile, i.e. when determining whether a
profile count is hot in the given profile. For example, what is the
minimum profile count contributing to the hottest 99% of the
application's profile.

Teresa
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> See above.
>
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

Xinliang David Li

2015-Feb-25 20:40 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Wed, Feb 25, 2015 at 10:52 AM, Philip Reames
<listmail at philipreames.com> wrote:> On 02/24/2015 03:31 PM, Diego Novillo wrote:
>
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to
support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other
changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion.
>
> I would personally be interested in seeing a copy of that document, but it
> might be more appropriate for a blog post then a discussion on llvm-dev.  I
> worry that we'd end up with a very unfocused discussion.  It might be
better
> to frame this as your plan of attack and reserve discussion on llvm-dev for
> things that are being proposed semi near term.  Just my 2 cents.
>
> This is a summary of the main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
>
> What does MD_prof actually represent when used from Clang?  I know I've
been
> using it for execution counters in my frontend.  Am I approaching that
> wrong?
>
> As a side comment: I'm a bit leery of the notion of a consistent notion
of
> hotness based on counters across functions.  These counters are almost
> always approximate in practice and counting problems run rampant.
Having representative training runs is pre-requisite for using FDO/PGO.
>  I'd
> almost rather see a consistent count inferred from data that's assumed
to be
> questionable than
>make the frontend try to generate consistent profiling
> metadata.
Frontend does not generate profile data -- it is just a messenger that
should pass the data faithfully to the middle end. That messenger
(profile reader) can be in middle end too.
>  I think either approach could be made to work, we just need to
> think about it carefully.
>
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
>
> Agreed.  Do you have a sense how much of an issue this in practice?  I
> haven't see it kick in much, but it's also not something I've
been looking
> for.
>
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the
pass
> manager, of course.
>
> Its worth noting that the inliner work can be done independently of the
pass
> manager work.  We can always explicitly recompute relevant analysis in the
> inliner if needed.  This will cost compile time, so we might need to make
> this an off by default option.  (Maybe -O3 only?)  Being able to work on
the
> inliner independently of the pass management structure is valuable enough
> that we should probably consider doing this.
>
> PGO inlining is an area I'm very interested in.  I'd really
encourage you to
> work incrementally in tree.  I'm likely to start putting non-trivial
amounts
> of time into this topic in the next few weeks.  I just need to clear a few
> things off my plate first.
>
> Other than the inliner, can you list the passes you think are profitable to
> teach about profiling data?  My list so far is: PRE (particularly of
> loads!), the vectorizer (i.e. duplicate work down both a hot and cold path
> when it can be vectorized on the hot path), LoopUnswitch, IRCE, &
LoopUnroll
> (avoiding code size explosion in cold code).  I'm much more interested
in
> sources of improved performance than I am simply code size reduction.
> (Reducing code size can improve performance of course.)
PGO is very effective in code size reduction. In reality, large
percentage of functions are globally cold.

David>
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> Er, not clear what you're trying to say here?
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> See above.
>
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Bob Wilson

2015-Feb-26 20:50 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

> On Feb 24, 2015, at 3:31 PM, Diego Novillo <dnovillo at google.com>
wrote:
> 
> 
> We (Google) have started to look more closely at the profiling
infrastructure in LLVM. Internally, we have a large dependency on PGO to get
peak performance in generated code.
> 
> Some of the dependencies we have on profiling are still not present in LLVM
(e.g., the inliner) but we will still need to incorporate changes to support our
work on these optimizations. Some of the changes may be addressed as individual
bug fixes on the existing profiling infrastructure. Other changes  may be better
implemented as either new extensions or as replacements of existing code.
> 
> I think we will try to minimize infrastructure replacement at least in the
short/medium term. After all, it doesn't make too much sense to replace
infrastructure that is broken for code that doesn't exist yet.
> 
> David Li and I are preparing a document where we describe the major issues
that we'd like to address. The document is a bit on the lengthy side, so it
may be easier to start with an email discussion. This is a summary of the main
changes we are looking at:
> Need to faithfully represent the execution count taken from dynamic
profiles. Currently, MD_prof does not really represent an execution count. This
makes things like comparing hotness across functions hard or impossible. We need
a concept of global hotness.
The plan that we have discussed in the past (I don’t remember when) was to
record simple function entry execution counts. Those could be combined with the 
BlockFrequencyInfo to compare “hotness” across
functions.> When the CFG or callgraph change, there need to exist an API for
incrementally updating/scaling counts. For instance, when a function is inlined
or partially inlined, when the CFG is modified, etc. These counts need to be
updated incrementally (or perhaps re-computed as a first step into that
direction).One of the main reasons that we chose to use branch weights to represent profile
information within functions is that it makes this problem easier. Of course, we
still need to update the branch weights when transforming the CFG, but I believe
most of that work has already been done. Are you suggesting that we should work
on incremental BlockFrequencyInfo updates? We have discussed that in the past,
but so far, it has worked reasonably well to just re-run that analysis. (I
wouldn’t be surprised if we’re missing some places where the analysis needs to
be invalidated so that it gets re-run.)> The inliner (and other optimizations) needs to use profile information and
update it accordingly. This is predicated on Chandler's work on the pass
manager, of course.
> Need to represent global profile summary data. For example, for global
hotness determination, it is useful to compute additional global summary info,
such as a histogram of counts that can be used to determine hotness and working
set size estimates for a large percentage of the profiled execution.
> There are other changes that we will need to incorporate. David, Teresa,
Chandler, please add anything large that I missed.
> 
> My main question at the moment is what would be the best way of addressing
them. Some seem to require new concepts to be implemented (e.g., execution
counts). Others could be addressed as simple bugs to be fixed in the current
framework.
> 
> Would it make sense to present everything in a unified document and discuss
that? I've got some reservations about that approach because we will end up
discussing everything at once and it may not lead to concrete progress. Another
approach would be to present each issue individually either as patches or RFCs
or bugs.
> 
> I will be taking on the implementation of several of these issues. Some of
them involve the SamplePGO harness that I added last year. I would also like to
know what other bugs or problems people have in mind that I could also roll into
this work.
> 
> 
> Thanks. Diego.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150226/d8cd5316/attachment.html>

Xinliang David Li

2015-Feb-26 21:40 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Thu, Feb 26, 2015 at 12:50 PM, Bob Wilson <bob.wilson at apple.com>
wrote:>
> On Feb 24, 2015, at 3:31 PM, Diego Novillo <dnovillo at google.com>
wrote:
>
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to
support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other
changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion. This is a summary of the
> main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
>
>
> The plan that we have discussed in the past (I don’t remember when) was to
> record simple function entry execution counts. Those could be combined with
> the  BlockFrequencyInfo to compare “hotness” across functions.
Yes -- there are two aspects of the problems.
1) raw profile data representation in IR and
2) the profile count data represented for CFG.

What you said is for 2) which is one of the possibilities. There is a
third issue that is also going to be covered in more detail -- that is
the Block Frequency propagation algorithm is limited (leading to
information loss). When profile count is available, block frequency
data can be directly computed via simple normalization and scaling.
This requires the raw edge count data to be represented in 1)
truthfully.

>
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
>
> One of the main reasons that we chose to use branch weights to represent
> profile information within functions is that it makes this problem easier.
> Of course, we still need to update the branch weights when transforming the
> CFG, but I believe most of that work has already been done. Are you
> suggesting that we should work on incremental BlockFrequencyInfo updates?
We
> have discussed that in the past, but so far, it has worked reasonably well
> to just re-run that analysis. (I wouldn’t be surprised if we’re missing
some
> places where the analysis needs to be invalidated so that it gets re-run.)
Diego is going to share the proposal in more detail. I will give a
brief summary to answer your question:

1) making raw profile data (in MD_prof) to truthfully does not change
its original meaning -- the branch count is still branch weight -- but
it happens to be also execution count which contains more information.
2) At CFG level (BranchProbabilityInfo), using real branch probability
does not change the way branch weight is used either -- the branch
probability is still branch weight (but normalized to be 0 <= prob
<=1). Benefits and reasons behind that will be provided.

The infrastructure is ready to update the raw MD_prof data during
intra-procedural transformations when branch instructions are cloned,
but not CFG level branch prob and block-frequency/block count update.
This is especially important for inter-procedural transformations like
inliner (i.e.,  update the profile data associated with inline
instance in the caller context). Recomputing via freq propagation is
not only not precise (as mentioned above), but also very compile time
consuming.

thanks,

David


>
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the
pass
> manager, of course.
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Diego Novillo

2015-Feb-26 23:54 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

Folks,

I've created a few bugzilla issues with details of some of the things
I'll
be looking into. I'm not yet done wordsmithing the overall design document.
I'll try to finish it by early next week at the latest.

In the meantime, these are the specific bugzilla issues I've opened:

22716 <http://llvm.org/bugs/show_bug.cgi?id=22716>librarieGlobal A
dnovillo at google.comNEW---Need a mechanism to represent global profile
counts in CFG and MachineCFG <http://llvm.org/bugs/show_bug.cgi?id=22716>
16:47:0122718 <http://llvm.org/bugs/show_bug.cgi?id=22718>librarieGlobal A
dnovillo at google.comNEW---Information loss and performance issues in branch
probability representation <http://llvm.org/bugs/show_bug.cgi?id=22718>
17:37:3922719 <http://llvm.org/bugs/show_bug.cgi?id=22719>librarieGlobal A
dnovillo at google.comNEW---Improvements in raw branch profile data
representation <http://llvm.org/bugs/show_bug.cgi?id=22719>17:48:34

I'm hoping the descriptions in the bugs make sense on their own. If not,
please use the bugs to beat me with a clue stick and I'll clarify.


Thanks.  Diego.


On Tue, Feb 24, 2015 at 6:31 PM, Diego Novillo <dnovillo at google.com>
wrote:
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
> get peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in
> LLVM (e.g., the inliner) but we will still need to incorporate changes to
> support our work on these optimizations. Some of the changes may be
> addressed as individual bug fixes on the existing profiling infrastructure.
> Other changes  may be better implemented as either new extensions or as
> replacements of existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion. This is a summary of the
> main changes we are looking at:
>
>    1. Need to faithfully represent the execution count taken from dynamic
>    profiles. Currently, MD_prof does not really represent an execution
>    count. This makes things like comparing hotness across functions hard or
>    impossible. We need a concept of global hotness.
>    2. When the CFG or callgraph change, there need to exist an API for
>    incrementally updating/scaling counts. For instance, when a function is
>    inlined or partially inlined, when the CFG is modified, etc. These
counts
>    need to be updated incrementally (or perhaps re-computed as a first step
>    into that direction).
>    3. The inliner (and other optimizations) needs to use profile
>    information and update it accordingly. This is predicated on
Chandler's
>    work on the pass manager, of course.
>    Need to represent global profile summary data. For example, for global
>    hotness determination, it is useful to compute additional global summary
>    info, such as a histogram of counts that can be used to determine
hotness
>    and working set size estimates for a large percentage of the profiled
>    execution.
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
> current framework.
>
> Would it make sense to present everything in a unified document and
> discuss that? I've got some reservations about that approach because we
> will end up discussing everything at once and it may not lead to concrete
> progress. Another approach would be to present each issue individually
> either as patches or RFCs or bugs.
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
> like to know what other bugs or problems people have in mind that I could
> also roll into this work.
>
>
> Thanks. Diego.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150226/c7d4d2ab/attachment.html>

Diego Novillo

2015-Mar-03 00:19 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Thu, Feb 26, 2015 at 6:54 PM, Diego Novillo <dnovillo at google.com>
wrote:

I've created a few bugzilla issues with details of some of the things
I'll> be looking into. I'm not yet done wordsmithing the overall design
document.
> I'll try to finish it by early next week at the latest.
>
The document is available at

https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing
<https://docs.google.com/document/d/15VNiD-TmHqqao_8P-ArIsWj1KdtU-ElLFaYPmZdrDMI/edit?usp=sharing>

There are several topics covered. Ideally, I would prefer that we discuss
each topic separately. The main ones I will start working on are the ones
described in the bugzilla links we have in the doc.

This is just a starting point for us. I am not at all concerned with
implementing exactly what is proposed in the document. In fact, if we can
get the same value using the existing support, all the better.

OTOH, any other ideas that folks may have that work better than this are
more than welcome. I don't have really strong opinions on the matter. I am
fine with whatever works.

Thanks.  Diego.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150302/e69bbd63/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Feb 2015 - [LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

Reasonably Related Threads