thr3ads.net - llvm dev - [llvm-dev] RFC: Hotness thresholds in profile header [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Easwaran Raman via llvm-dev

2015-Dec-16 22:42 UTC

[llvm-dev] RFC: Hotness thresholds in profile header

Hi,

The problem we're trying to address:

PGO transforms that are based on block counts need some way to answer
isHotBlock() query. A simple comparison of the form block_count > N, for
some constant N is not useful since even for the same program different
profile inputs can cause different answers for isHotBlock() on the same
block. This can be addressed by comparing against a reference value that
depends on the program and its input. For instance, the indexed profile
header contains MaxFunctionCount and isHotBlock() could check if the block
count exceeds a certain fraction of MaxFunctionCount.


The drawback of this approach is that it is difficult to come up with a
single fraction that works both for programs with a few hot-spots and
programs with a long tail of block execution counts. Consider the following
two profiles both with a MaxFunctionCount of 1000. In the first case, there
are many blocks with count 100 and those (along with the function with
count 1000) account for 99% of execution time. In the second case, there
are many blocks with count 1000 that together account for 99% of execution
time, but there are also a non-trivial number of blocks with count 100. In
the first case, we would like the hotness threshold to be 100 (or 10% of
MaxFunctionCount), but in the second case, we want it to be 1000 (100% of
MaxFunctionCount).


Our proposal:

We want to define hot blocks as those that account for a large
percentage(P) of execution time of the program. We then make a simplistic
assumption that all blocks take the same amount of time to execute. With
this assumption, the execution time of the program can be approximated by
the sum of execution counts of the blocks in the program. We can then
compute a maximum threshold T(P) such that execution counts of blocks whose
count is >= T(P) account for P percentage of total block count.

To illustrate, let {B1...B5} be the set of blocks whose counts are {1, 0,
2, 2, 5}. For this example:

T(50) = 5 since {B5} accounts for 50% of total block count.
T(90) = 2 since {B3, B4, B5} has a cumulative count of 9 which is 90% of
total block count.
T(95) = 1 since all blocks with non-zero block counts are needed to account
for >=95% of total block count.

We propose to record a vector of triples {P, T(P), B(P)}, where B(P) is the
number of blocks with count >= T(P), for a set of pre-defined number of
values of P into the header of the indexed profile. A possible set of
values of P are 90, 95, 99, 99.9, 99.99, but this should be configurable
through profdata tool.  The B(P) above could be useful to optimization
passes in making size/performance tradeoffs. For example, if S is the
average block size and S*B(95) fits well within cache, but S*B(99) well
exceeds the cache size, an optimization like loop unroller could choose to
use T(95) as the threshold to avoid heavy cache misses.

The {P, T(P), B(P)} triples have to be attached to the module. A common API
would be added to retrieve the set of available Ps and {T(P), B(P)} pair
for a given P for all flavors (front-end, IR level and sample) of profiles.

Thoughts?


Thanks,

Easwaran (with inputs from Teresa Johnson and David Li)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151216/f5ebb922/attachment.html>

Andy Ayers via llvm-dev

2015-Dec-17 02:00 UTC

head link

[llvm-dev] RFC: Hotness thresholds in profile header

I’ve done similar rankings with profile data and found it worked out pretty
well.

In my case it was ranking function hotness but the idea was similar: sort by
weight, compute various percentile cutoff points, and use those to classify. I
put in some compensation for truly degenerate cases. Consider one super-hot
function that is 100x the weight of all remaining functions – I’d just call that
function hot and subtract it out of the distribution and so spread a bit of the
hotness around elsewhere.

Also, you should think about how to handle ties, so your sorting & ranking
is “stable” somehow.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Easwaran
Raman via llvm-dev
Sent: Wednesday, December 16, 2015 2:43 PM
To: llvm-dev at lists.llvm.org
Cc: David Li <davidxl at google.com>
Subject: [llvm-dev] RFC: Hotness thresholds in profile header


Hi,

The problem we're trying to address:

PGO transforms that are based on block counts need some way to answer
isHotBlock() query. A simple comparison of the form block_count > N, for some
constant N is not useful since even for the same program different profile
inputs can cause different answers for isHotBlock() on the same block. This can
be addressed by comparing against a reference value that depends on the program
and its input. For instance, the indexed profile header contains
MaxFunctionCount and isHotBlock() could check if the block count exceeds a
certain fraction of MaxFunctionCount.



The drawback of this approach is that it is difficult to come up with a single
fraction that works both for programs with a few hot-spots and programs with a
long tail of block execution counts. Consider the following two profiles both
with a MaxFunctionCount of 1000. In the first case, there are many blocks with
count 100 and those (along with the function with count 1000) account for 99% of
execution time. In the second case, there are many blocks with count 1000 that
together account for 99% of execution time, but there are also a non-trivial
number of blocks with count 100. In the first case, we would like the hotness
threshold to be 100 (or 10% of MaxFunctionCount), but in the second case, we
want it to be 1000 (100% of MaxFunctionCount).

Our proposal:

We want to define hot blocks as those that account for a large percentage(P) of
execution time of the program. We then make a simplistic assumption that all
blocks take the same amount of time to execute. With this assumption, the
execution time of the program can be approximated by the sum of execution counts
of the blocks in the program. We can then compute a maximum threshold T(P) such
that execution counts of blocks whose count is >= T(P) account for P
percentage of total block count.

To illustrate, let {B1...B5} be the set of blocks whose counts are {1, 0, 2, 2,
5}. For this example:

T(50) = 5 since {B5} accounts for 50% of total block count.
T(90) = 2 since {B3, B4, B5} has a cumulative count of 9 which is 90% of total
block count.
T(95) = 1 since all blocks with non-zero block counts are needed to account for
>=95% of total block count.

We propose to record a vector of triples {P, T(P), B(P)}, where B(P) is the
number of blocks with count >= T(P), for a set of pre-defined number of
values of P into the header of the indexed profile. A possible set of values of
P are 90, 95, 99, 99.9, 99.99, but this should be configurable through profdata
tool.  The B(P) above could be useful to optimization passes in making
size/performance tradeoffs. For example, if S is the average block size and
S*B(95) fits well within cache, but S*B(99) well exceeds the cache size, an
optimization like loop unroller could choose to use T(95) as the threshold to
avoid heavy cache misses.

The {P, T(P), B(P)} triples have to be attached to the module. A common API
would be added to retrieve the set of available Ps and {T(P), B(P)} pair for a
given P for all flavors (front-end, IR level and sample) of profiles.

Thoughts?



Thanks,

Easwaran (with inputs from Teresa Johnson and David Li)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151217/c9a0eb3b/attachment.html>

Vedant Kumar via llvm-dev

2015-Dec-17 04:16 UTC

head link

[llvm-dev] RFC: Hotness thresholds in profile header

Hi Easwaran,

I like this idea. It seems nicer than picking a constant 'hot' threshold
block
count or fraction of MaxFunctionCount.

Some concerns:

    (1) Passes may need different T(P) to do useful work. You mentioned
    including a list of {P, T(P), B(P)} triples in the profdata file to deal
    with this, but such a list may not be exhaustive/sufficient.

    Have you thought about adding a `compute-T(P)` method to InstrProfReader?

    Would it be better to encode a cumulative block-count histogram in the
    header? It would take O(blocks) time to generate it when writing out the
    indexed file. After normalization it could be used to answer `compute-T(P)`
    queries very quickly. It could also makes it more convenient to generate
    profdata files, since you wouldn't have to specify the P you care about.
    
    (2) Some profdata files will not include T(P) information.
    
    Will passes degrade gracefully in the absence of this information?
    
    IMO we'd need an API like `getBlockHotness() -> {Hot, NotHot,
Unknown}`.

    (3) + 1 to Andy Ayers' concern about smoothing the data.

    What do you think of removing the top and bottom 1% of blocks from
    consideration?

thanks
vedant
> On Dec 16, 2015, at 6:00 PM, Andy Ayers via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I’ve done similar rankings with profile data and found it worked out pretty
well.
>  
> In my case it was ranking function hotness but the idea was similar: sort
by weight, compute various percentile cutoff points, and use those to classify.
I put in some compensation for truly degenerate cases. Consider one super-hot
function that is 100x the weight of all remaining functions – I’d just call that
function hot and subtract it out of the distribution and so spread a bit of the
hotness around elsewhere.
>  
> Also, you should think about how to handle ties, so your sorting &
ranking is “stable” somehow.
>  
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Easwaran Raman via llvm-dev
> Sent: Wednesday, December 16, 2015 2:43 PM
> To: llvm-dev at lists.llvm.org
> Cc: David Li <davidxl at google.com>
> Subject: [llvm-dev] RFC: Hotness thresholds in profile header
>  
> Hi,
> 
> The problem we're trying to address:
> 
> PGO transforms that are based on block counts need some way to answer
isHotBlock() query. A simple comparison of the form block_count > N, for some
constant N is not useful since even for the same program different profile
inputs can cause different answers for isHotBlock() on the same block. This can
be addressed by comparing against a reference value that depends on the program
and its input. For instance, the indexed profile header contains
MaxFunctionCount and isHotBlock() could check if the block count exceeds a
certain fraction of MaxFunctionCount.
>  
> The drawback of this approach is that it is difficult to come up with a
single fraction that works both for programs with a few hot-spots and programs
with a long tail of block execution counts. Consider the following two profiles
both with a MaxFunctionCount of 1000. In the first case, there are many blocks
with count 100 and those (along with the function with count 1000) account for
99% of execution time. In the second case, there are many blocks with count 1000
that together account for 99% of execution time, but there are also a
non-trivial number of blocks with count 100. In the first case, we would like
the hotness threshold to be 100 (or 10% of MaxFunctionCount), but in the second
case, we want it to be 1000 (100% of MaxFunctionCount).
> 
> Our proposal:
> 
> We want to define hot blocks as those that account for a large
percentage(P) of execution time of the program. We then make a simplistic
assumption that all blocks take the same amount of time to execute. With this
assumption, the execution time of the program can be approximated by the sum of
execution counts of the blocks in the program. We can then compute a maximum
threshold T(P) such that execution counts of blocks whose count is >= T(P)
account for P percentage of total block count.
> 
> To illustrate, let {B1...B5} be the set of blocks whose counts are {1, 0,
2, 2, 5}. For this example:
> 
> T(50) = 5 since {B5} accounts for 50% of total block count.
> T(90) = 2 since {B3, B4, B5} has a cumulative count of 9 which is 90% of
total block count.
> T(95) = 1 since all blocks with non-zero block counts are needed to account
for >=95% of total block count.
> 
> We propose to record a vector of triples {P, T(P), B(P)}, where B(P) is the
number of blocks with count >= T(P), for a set of pre-defined number of
values of P into the header of the indexed profile. A possible set of values of
P are 90, 95, 99, 99.9, 99.99, but this should be configurable through profdata
tool.  The B(P) above could be useful to optimization passes in making
size/performance tradeoffs. For example, if S is the average block size and
S*B(95) fits well within cache, but S*B(99) well exceeds the cache size, an
optimization like loop unroller could choose to use T(95) as the threshold to
avoid heavy cache misses.
> 
> The {P, T(P), B(P)} triples have to be attached to the module. A common API
would be added to retrieve the set of available Ps and {T(P), B(P)} pair for a
given P for all flavors (front-end, IR level and sample) of profiles.
> 
> Thoughts?
>  
> Thanks,
> Easwaran (with inputs from Teresa Johnson and David Li)
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Xinliang David Li via llvm-dev

2015-Dec-17 05:47 UTC

head link

[llvm-dev] RFC: Hotness thresholds in profile header

On Wed, Dec 16, 2015 at 6:00 PM, Andy Ayers <andya at microsoft.com>
wrote:> I’ve done similar rankings with profile data and found it worked out pretty
> well.
>
>
>
> In my case it was ranking function hotness but the idea was similar: sort
by
> weight, compute various percentile cutoff points, and use those to
classify.
> I put in some compensation for truly degenerate cases. Consider one
> super-hot function that is 100x the weight of all remaining functions – I’d
> just call that function hot and subtract it out of the distribution and so
> spread a bit of the hotness around elsewhere.
>
This is probably more important if we only consider function entry
count (to measure hotness). With Easwaran's proposal, the cut-off
computation is based on sum of BB counts (sorted in descending order)
-- the BB count sum approximate the execution time spent in those BBs.
  For instance, if 99.9% of execution time is spent in top 10 BBs, the
rest BBs are pretty much  irrelevant in terms of performance.

thanks,
David

>
>
> Also, you should think about how to handle ties, so your sorting &
ranking
> is “stable” somehow.
>
>
>
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Easwaran Raman via llvm-dev
> Sent: Wednesday, December 16, 2015 2:43 PM
> To: llvm-dev at lists.llvm.org
> Cc: David Li <davidxl at google.com>
> Subject: [llvm-dev] RFC: Hotness thresholds in profile header
>
>
>
> Hi,
>
> The problem we're trying to address:
>
> PGO transforms that are based on block counts need some way to answer
> isHotBlock() query. A simple comparison of the form block_count > N, for
> some constant N is not useful since even for the same program different
> profile inputs can cause different answers for isHotBlock() on the same
> block. This can be addressed by comparing against a reference value that
> depends on the program and its input. For instance, the indexed profile
> header contains MaxFunctionCount and isHotBlock() could check if the block
> count exceeds a certain fraction of MaxFunctionCount.
>
>
>
> The drawback of this approach is that it is difficult to come up with a
> single fraction that works both for programs with a few hot-spots and
> programs with a long tail of block execution counts. Consider the following
> two profiles both with a MaxFunctionCount of 1000. In the first case, there
> are many blocks with count 100 and those (along with the function with
count
> 1000) account for 99% of execution time. In the second case, there are many
> blocks with count 1000 that together account for 99% of execution time, but
> there are also a non-trivial number of blocks with count 100. In the first
> case, we would like the hotness threshold to be 100 (or 10% of
> MaxFunctionCount), but in the second case, we want it to be 1000 (100% of
> MaxFunctionCount).
>
>
> Our proposal:
>
> We want to define hot blocks as those that account for a large
percentage(P)
> of execution time of the program. We then make a simplistic assumption that
> all blocks take the same amount of time to execute. With this assumption,
> the execution time of the program can be approximated by the sum of
> execution counts of the blocks in the program. We can then compute a
maximum
> threshold T(P) such that execution counts of blocks whose count is >=
T(P)
> account for P percentage of total block count.
>
> To illustrate, let {B1...B5} be the set of blocks whose counts are {1, 0,
2,
> 2, 5}. For this example:
>
> T(50) = 5 since {B5} accounts for 50% of total block count.
> T(90) = 2 since {B3, B4, B5} has a cumulative count of 9 which is 90% of
> total block count.
> T(95) = 1 since all blocks with non-zero block counts are needed to account
> for >=95% of total block count.
>
> We propose to record a vector of triples {P, T(P), B(P)}, where B(P) is the
> number of blocks with count >= T(P), for a set of pre-defined number of
> values of P into the header of the indexed profile. A possible set of
values
> of P are 90, 95, 99, 99.9, 99.99, but this should be configurable through
> profdata tool.  The B(P) above could be useful to optimization passes in
> making size/performance tradeoffs. For example, if S is the average block
> size and S*B(95) fits well within cache, but S*B(99) well exceeds the cache
> size, an optimization like loop unroller could choose to use T(95) as the
> threshold to avoid heavy cache misses.
>
> The {P, T(P), B(P)} triples have to be attached to the module. A common API
> would be added to retrieve the set of available Ps and {T(P), B(P)} pair
for
> a given P for all flavors (front-end, IR level and sample) of profiles.
>
> Thoughts?
>
>
>
> Thanks,
>
> Easwaran (with inputs from Teresa Johnson and David Li)

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Dec 2015 - RFC: Hotness thresholds in profile header

[llvm-dev] RFC: Hotness thresholds in profile header

[llvm-dev] RFC: Hotness thresholds in profile header

[llvm-dev] RFC: Hotness thresholds in profile header

[llvm-dev] RFC: Hotness thresholds in profile header

Possibly Parallel Threads