thr3ads.net - llvm dev - [LLVMdev] RFC - Profile Guided Optimization in LLVM [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Diego Novillo

2013-Jun-12 21:23 UTC

[LLVMdev] RFC - Profile Guided Optimization in LLVM

I have started looking at the state of PGO (Profile Guided Optimization) 
in LLVM.**I want to discuss my high-level plan and make sure I'm not 
missing anything interesting out.  I appreciate any feedback on this, 
pointers to existing work, patches and anything related to PGO in LLVM.

I will be keeping changes to this plan in this web document

*https://docs.google.com/document/d/1b2XFuOkR2K-Oao4u5fR3a9Ok83IB_W4EJWVmNak4GRE/pub

At a high-level, I would like the PGO harness to contain the following 
modules:


Profile generators

These modules represent sources of profile.  Mostly, they work by 
instrumenting the user program to make it produce profile information. 
  However, other sources of profile information (e.g., samples, hardware 
counters, static predictors) would be supported.


Profile Analysis Oracles

Profile information is loaded into the compiler and translated into 
analysis data which the optimizers can use.  These oracles become the 
one and only source of profile information used by transformations. 
  Direct access to the raw profile data generated externally is not allowed.


Translation from profile information into analysis can be done by adding 
IR metadata or altering compiler internal data structures directly.  I 
prefer IR metadata because it simplifies debugging, unit testing and bug 
reproduction.


Analyses should be narrow in the specific type of information they 
provide (e.g., branch probability) and there should not be two different 
analyses that provide overlapping information.  We could later provide 
broader analyses types by aggregating the existing ones.


Transformations

Transformations should naturally take advantage of profile information 
by consulting the analyses.  The better information they get from the 
analysis oracles, the better their decisions.


My plan is to start by making sure that the infrastructure exists and 
provides the basic analyses.

I have two primary goals in this first phase:


 1.

    Augment the PGO infrastructure where required.

 2.

    Fix existing transformations that are not taking advantage of
    profile data.



In evaluating and triaging the existing infrastructure, I will use test 
cases taken from GCC's own testsuite, a collection of Google's internal 
applications and any other code base folks consider useful.


In using GCC's testsuite, my goal is not to mimic how GCC does its work, 
but make sure that the two compilers implement functionally equivalent 
transformations.  That is, make sure that LLVM is not leaving 
optimization opportunities behind.


This may require implementing missing profile functionality. From a 
brief inspection of the code, most of the major ones seem to be there 
(edge, path, block).  But I don't know what state they are in.


Some of the properties I would like to maintain or add to the current 
framework:


  *

    Profile data is never accessed directly by analyses and
    transformations.  Rather, it is translated into IR metadata.

  *

    Graceful degradation in the presence of stale profiles.  Old profile
    data should only result in degraded optimization opportunities.  It
    should neither confuse the compiler nor cause erroneous code generation.


After the basic profile-based transformations are working, I would like 
to add new sources of profile.  Mainly, I am thinking of implementing 
Auto FDO <http://gcc.gnu.org/wiki/AutoFDO>. FDO stands for Feedback 
Directed Optimization (both PGO and FDO tend to be used interchangeably 
in the GCC community).  In this scheme, the compiler does not instrument 
the code.  Rather, it uses an external sample collection tool (e.g., 
perf <https://perf.wiki.kernel.org/index.php/Main_Page>) to collect 
samples from the program's execution.  These samples are then converted 
to the format that the instrumented program would've emitted.


In terms of optimizations, our (Google) experience is that inlining is 
the key beneficiary of profile information. Particularly, in big C++ 
applications. I expect to focus most of my attention on the inliner.



*Thanks.  Diego.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130612/0ec31fdd/attachment.html>

Diego Novillo

2013-Jun-12 21:32 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

On 2013-06-12 17:23 , Diego Novillo wrote:>
> I have started looking at the state of PGO (Profile Guided 
> Optimization) in LLVM.**I want to discuss my high-level plan and make 
> sure I'm not missing anything interesting out.  I appreciate any 
> feedback on this, pointers to existing work, patches and anything 
> related to PGO in LLVM.
Good grief.  A whole lot of fail in my cut-n-paste job. 
Apologies.>
> I will be keeping changes to this plan in this web document
>
>
*https://docs.google.com/document/d/1b2XFuOkR2K-Oao4u5fR3a9Ok83IB_W4EJWVmNak4GRE/pub
> *
You can read from the above or this:

At a high-level, I would like the PGO harness to contain the following 
modules:

Profile generators

These modules represent sources of profile.  Mostly, they work by 
instrumenting the user program to make it produce profile information.  
However, other sources of profile information (e.g., samples, hardware 
counters, static predictors) would be supported.

Profile Analysis Oracles

Profile information is loaded into the compiler and translated into 
analysis data which the optimizers can use.  These oracles become the 
one and only source of profile information used by transformations.  
Direct access to the raw profile data generated externally is not allowed.

Translation from profile information into analysis can be done by adding 
IR metadata or altering compiler internal data structures directly.  I 
prefer IR metadata because it simplifies debugging, unit testing and bug 
reproduction.

Analyses should be narrow in the specific type of information they 
provide (e.g., branch probability) and there should not be two different 
analyses that provide overlapping information.  We could later provide 
broader analyses types by aggregating the existing ones.

Transformations

Transformations should naturally take advantage of profile information 
by consulting the analyses.  The better information they get from the 
analysis oracles, the better their decisions.

My plan is to start by making sure that the infrastructure exists and 
provides the basic analyses.
I have two primary goals in this first phase:

1- Augment the PGO infrastructure where required.
2- Fix existing transformations that are not taking advantage of profile 
data.

In evaluating and triaging the existing infrastructure, I will use test 
cases taken from GCC's own testsuite, a collection of Google's internal 
applications and any other code base folks consider useful.

In using GCC's testsuite, my goal is not to mimic how GCC does its work, 
but make sure that the two compilers implement functionally equivalent 
transformations.  That is, make sure that LLVM is not leaving 
optimization opportunities behind.

This may require implementing missing profile functionality. From a 
brief inspection of the code, most of the major ones seem to be there 
(edge, path, block).  But I don't know what state they are in.

Some of the properties I would like to maintain or add to the current 
framework:

* Profile data is never accessed directly by analyses and 
transformations.  Rather, it is translated into IR metadata.

* Graceful degradation in the presence of stale profiles.  Old profile 
data should only result in degraded optimization opportunities.  It 
should neither confuse the compiler nor cause erroneous code generation.

After the basic profile-based transformations are working, I would like 
to add new sources of profile.  Mainly, I am thinking of implementing 
Auto FDO. FDO stands for Feedback Directed Optimization (both PGO and 
FDO tend to be used interchangeably in the GCC community).  In this 
scheme, the compiler does not instrument the code.  Rather, it uses an 
external sample collection tool (e.g., perf) to collect samples from the 
program's execution.  These samples are then converted to the format 
that the instrumented program would've emitted.

In terms of optimizations, our (Google) experience is that inlining is 
the key beneficiary of profile information. Particularly, in big C++ 
applications. I expect to focus most of my attention on the inliner.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130612/d6b185af/attachment.html>

Jakob Stoklund Olesen

2013-Jun-12 21:55 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

On Jun 12, 2013, at 2:23 PM, Diego Novillo <dnovillo at google.com> wrote:
> In terms of optimizations, our (Google) experience is that
> inlining is the key beneficiary of profile information.
> Particularly, in big C++ applications. I expect to focus most
> of my attention on the inliner.
That sounds plausible to me. It seems like we might need a way of representing
call graph profiling in addition to the existing branch probabilities?

FWIW, the greedy register allocator’s live range splitting algorithm is designed
to consume profile information so it can push spill code into cold blocks. The
primary interface is SpillPlacement::getBlockFrequency() which currently returns
an estimate based on loop depth only.

/jakob

Chandler Carruth

2013-Jun-12 22:05 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

On Wed, Jun 12, 2013 at 2:55 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
> That sounds plausible to me. It seems like we might need a way of
> representing call graph profiling in addition to the existing branch
> probabilities?
>
Agreed. An important consideration here is WPO vs. LTO vs. TU-at-a-time
call graphs.

> FWIW, the greedy register allocator’s live range splitting algorithm is
> designed to consume profile information so it can push spill code into cold
> blocks. The primary interface is SpillPlacement::getBlockFrequency() which
> currently returns an estimate based on loop depth only.
>
It doesn't use MachineBlockFrequency? If it does, it will get a lot more
than loop depth: __builtin_expect, cold function attribute, and static
branch heuristics. If it doesn't it should, and then it will immediately
benefit from this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130612/a851059d/attachment.html>

Xinliang David Li

2013-Jun-12 23:26 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

>
> After the basic profile-based transformations are working, I would like to
> add new sources of profile.  Mainly, I am thinking of implementing Auto
> FDO.
>
For those who are not familiar with what autoFDO is -- Auto FDO is
originally called Sample Based FDO. Its main author is Dehao Chen @google,
and Robert Hundt is the one of the main pushers of technology in Google.
The latest incarnation of this technology uses LBR events available on
Nehalem and above.
http://www.computer.org/csdl/trans/tc/2013/02/ttc2013020376-abs.html

Cheers,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130612/bef348f4/attachment.html>

Chad Rosier

2013-Jun-14 16:57 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

Bob and I were discussing this over lunch yesterday and he has some very good
ideas, so I've CC'ed him to make sure he joins the thread.

 Chad

On Jun 12, 2013, at 2:32 PM, Diego Novillo <dnovillo at google.com> wrote:
> On 2013-06-12 17:23 , Diego Novillo wrote:
>> 
>> 
>>         I have started looking at the state of PGO (Profile Guided
>>         Optimization) in LLVM.  I want to discuss my high-level plan
and make sure I'm not missing anything interesting out.  I appreciate any
feedback on this, pointers to existing work, patches and anything related to PGO
in LLVM.
> 
> Good grief.  A whole lot of fail in my cut-n-paste job.  Apologies.  
>> 
>> I will be keeping changes to this plan in this web document
>> 
>>
https://docs.google.com/document/d/1b2XFuOkR2K-Oao4u5fR3a9Ok83IB_W4EJWVmNak4GRE/pub
> 
> You can read from the above or this:
> 
> At a high-level, I would like the PGO harness to contain the following
modules:
> 
> Profile generators
> 
> These modules represent sources of profile.  Mostly, they work by
instrumenting the user program to make it produce profile information.  However,
other sources of profile information (e.g., samples, hardware counters, static
predictors) would be supported.
> 
> 
> Profile Analysis Oracles
> 
> Profile information is loaded into the compiler and translated into
analysis data which the optimizers can use.  These oracles become the one and
only source of profile information used by transformations.  Direct access to
the raw profile data generated externally is not allowed.
> 
> Translation from profile information into analysis can be done by adding IR
metadata or altering compiler internal data structures directly.  I prefer IR
metadata because it simplifies debugging, unit testing and bug reproduction.
> 
> Analyses should be narrow in the specific type of information they provide
(e.g., branch probability) and there should not be two different analyses that
provide overlapping information.  We could later provide broader analyses types
by aggregating the existing ones.
> 
> 
> 
> Transformations
> 
> Transformations should naturally take advantage of profile information by
consulting the analyses.  The better information they get from the analysis
oracles, the better their decisions.
> 
> 
> My plan is to start by making sure that the infrastructure exists and
provides the basic analyses.
> I have two primary goals in this first phase:
> 
> 1- Augment the PGO infrastructure where required.
> 2- Fix existing transformations that are not taking advantage of profile
data.
> 
> 
> In evaluating and triaging the existing infrastructure, I will use test
cases taken from GCC’s own testsuite, a collection of Google’s internal
applications and any other code base folks consider useful.
> 
> In using GCC’s testsuite, my goal is not to mimic how GCC does its work,
but make sure that the two compilers implement functionally equivalent
transformations.  That is, make sure that LLVM is not leaving optimization
opportunities behind.
> 
> This may require implementing missing profile functionality. From a brief
inspection of the code, most of the major ones seem to be there (edge, path,
block).  But I don’t know what state they are in.
> 
> Some of the properties I would like to maintain or add to the current
framework:
> 
> * Profile data is never accessed directly by analyses and transformations. 
Rather, it is translated into IR metadata.
> 
> * Graceful degradation in the presence of stale profiles.  Old profile data
should only result in degraded optimization opportunities.  It should neither
confuse the compiler nor cause erroneous code generation.
> 
> After the basic profile-based transformations are working, I would like to
add new sources of profile.  Mainly, I am thinking of implementing Auto FDO. FDO
stands for Feedback Directed Optimization (both PGO and FDO tend to be used
interchangeably in the GCC community).  In this scheme, the compiler does not
instrument the code.  Rather, it uses an external sample collection tool (e.g.,
perf) to collect samples from the program’s execution.  These samples are then
converted to the format that the instrumented program would’ve emitted.
> 
> In terms of optimizations, our (Google) experience is that inlining is the
key beneficiary of profile information. Particularly, in big C++ applications. I
expect to focus most of my attention on the inliner.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130614/e01929f9/attachment.html>

Krzysztof Parzyszek

2013-Jun-14 17:56 UTC

head link

[LLVMdev] RFC - Profile Guided Optimization in LLVM

On 6/12/2013 4:55 PM, Jakob Stoklund Olesen wrote:>
> It seems like we might need a way of representing call graph profiling in
addition to the existing branch probabilities?
>
Metadata on function definitions/declarations perhaps?

I agree---the call graph profiling is of critical importance.  I guess 
we should first see what types of data we already know may be useful and 
see how we would represent them within the compiler.

Here's my list (in addition to the usual data gathered in such cases):

* Function call frequency:
   - How many times a function has been called?

* Function argument value profiling:
   - Is the function called with a small set of values for the parameters?
   - Is there any value that is particularly frequent for any of the 
arguments?
   - If the function takes pointers, are the pointers usually aligned?
   - If the function takes pointers, are they aliased?
   - etc.

* Branch profiling:
   - Is the branch executed (i.e. taken/not-taken) in the same way as 
the last time, or does it often change direction?

* Path profiling:
   - Correlation between branches, function calls, etc.


-Krzysztof


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jun 2013 - [LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

[LLVMdev] RFC - Profile Guided Optimization in LLVM

Maybe Matching Threads