thr3ads.net - llvm dev - [llvm-dev] Computing block profile weights [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Jonas Wagner via llvm-dev

2016-Dec-02 09:40 UTC

[llvm-dev] Computing block profile weights

Hello,

I'm working on an application that would benefit from knowing the weight of
a basic block, as in "fraction of the program's execution time spent in
this block".

Currently, I'm computing this using the block's frequency from
BlockFrequencyInfo, relative to the function's entry block frequency, and
scaled by the function's entry count. This is also the computation
that's
done by getBlockProfileCount
<https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BlockFrequencyInfoImpl.cpp#L540>
in
lib/Analysis/BlockFrequencyInfoImpl.cpp.

The problem is that this method can be extremely imprecise, because many
functions have an entry count of zero. The entry count is computed from the
number of profile samples in the entry block. Depending on the function's
CFG, this count can be arbitrarily low even though the function is
frequently called or hot.

Here's an idea to address this. I'd like to collect a bit of feedback
from
the community before trying it out.

1) Instead of relying on a function's *entry count*, store the *total
number of samples* in a function. This number is readily available from the
profile loader.
2) Compute a block's weight as `function_samples * block_weight /
sum_of_block_weights_in_function`

Why do I like this?

- Total samples in a function gives a good impression of the importance of
a function, better than the entry count.
- This scheme "preserves mass" in that all samples of a function are
taken
into account. The samples in a BB are compared to samples in the entire
function, rather than a few (arbitrarily) selected samples from the entry
block.
- The computation avoids imprecision from multiplying by small numbers.

Disadvantages?

- BlockFrequencyInfo needs to keep track of the total frequency in a
function.
- BlockFrequencyInfo would probably scale the frequencies w.r.t. that
total, rather then the maximum frequency. This loses a few bits of precision

Note that the entry count would not be lost in this scheme; one could
easily compute it as `function_samples * entry_weight /
sum_of_block_weights_in_function`.

I believe using an entire function as unit of reference is a good
compromise between precision and modularity. Precision is high because
there's a sufficient number of samples available in a function. Modularity
is preserved because the computation does not need to take other functions
into account (in fact, BlockFrequencyInfo already processes one function at
at time).

What do people think about this?
- Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/902cf3ae/attachment.html>

Jonas Wagner via llvm-dev

2016-Dec-02 09:43 UTC

head link

[llvm-dev] Computing block profile weights

An addendum to fix a mistake in terminology:

2) Compute a block's weight as `function_samples * block_weight
/> sum_of_block_weights_in_function`
>
This should be `function_samples * block_frequency /
sum_of_block_frequencies_in_function`

Cheers,
Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/9e103d52/attachment.html>

Jonas Wagner via llvm-dev

2016-Dec-06 14:19 UTC

head link

[llvm-dev] Computing block profile weights

Hello,

I've implemented my idea; a patch is attached. My code computes block
weights as follows:

    w = f[bb] / sum(f[b] for b in func) * sum(s[b] for b in func)

Where `f` is the frequency of a basic block (as computed by
BlockFrequencyInfo), `func` is the function that contains `bb`, and `s` is
the number of profiling samples in a block.

Previously, the computation was done as follows:

    w = f[bb] / f[entry] * s[entry]

Where `entry` is the entry block of the function containing `bb`.

At first glance, block weights look more stable. There are fewer cases
where weights are zero, because there is a sufficient number of samples in
all functions of interest. There are also fewer cases where block weights
are unrealistically high, because the weight of a block is now limited to
the total number of samples in the function.

Questions to the community:

- What do you think about this method of computing block weights?
- There are some cases where I'm unsure how this method behaves, e.g., with
inlining. Thoughts about this are welcome.
- (since this is the first time I'm upstreaming a change to LLVM) What
would it take to get this into LLVM?

Best,
Jonas


On Fri, Dec 2, 2016 at 10:43 AM Jonas Wagner <jonas.wagner at epfl.ch>
wrote:

An addendum to fix a mistake in terminology:

2) Compute a block's weight as `function_samples * block_weight /
sum_of_block_weights_in_function`


This should be `function_samples * block_frequency /
sum_of_block_frequencies_in_function`

Cheers,
Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161206/bc05776a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Use-profile-samples-to-compute-block-weight.patch
Type: text/x-patch
Size: 37436 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161206/bc05776a/attachment.bin>

Xinliang David Li via llvm-dev

2016-Dec-06 19:41 UTC

head link

[llvm-dev] Computing block profile weights

Jonas, I assume you are talking about Sample-Based PGO. Yes, the problem
you mentioned exists -- and your proposed solution seems reasonable. +dehao
for comments.

David

On Fri, Dec 2, 2016 at 1:40 AM, Jonas Wagner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello,
>
> I'm working on an application that would benefit from knowing the
weight
> of a basic block, as in "fraction of the program's execution time
spent in
> this block".
>
> Currently, I'm computing this using the block's frequency from
> BlockFrequencyInfo, relative to the function's entry block frequency,
and
> scaled by the function's entry count. This is also the computation
that's
> done by getBlockProfileCount
>
<https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BlockFrequencyInfoImpl.cpp#L540>
in
> lib/Analysis/BlockFrequencyInfoImpl.cpp.
>
> The problem is that this method can be extremely imprecise, because many
> functions have an entry count of zero. The entry count is computed from the
> number of profile samples in the entry block. Depending on the
function's
> CFG, this count can be arbitrarily low even though the function is
> frequently called or hot.
>
> Here's an idea to address this. I'd like to collect a bit of
feedback from
> the community before trying it out.
>
> 1) Instead of relying on a function's *entry count*, store the *total
> number of samples* in a function. This number is readily available from
> the profile loader.
> 2) Compute a block's weight as `function_samples * block_weight /
> sum_of_block_weights_in_function`
>
> Why do I like this?
>
> - Total samples in a function gives a good impression of the importance of
> a function, better than the entry count.
> - This scheme "preserves mass" in that all samples of a function
are taken
> into account. The samples in a BB are compared to samples in the entire
> function, rather than a few (arbitrarily) selected samples from the entry
> block.
> - The computation avoids imprecision from multiplying by small numbers.
>
> Disadvantages?
>
> - BlockFrequencyInfo needs to keep track of the total frequency in a
> function.
> - BlockFrequencyInfo would probably scale the frequencies w.r.t. that
> total, rather then the maximum frequency. This loses a few bits of
precision
>
> Note that the entry count would not be lost in this scheme; one could
> easily compute it as `function_samples * entry_weight /
> sum_of_block_weights_in_function`.
>
> I believe using an entire function as unit of reference is a good
> compromise between precision and modularity. Precision is high because
> there's a sufficient number of samples available in a function.
Modularity
> is preserved because the computation does not need to take other functions
> into account (in fact, BlockFrequencyInfo already processes one function at
> at time).
>
> What do people think about this?
> - Jonas
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161206/3b629ee1/attachment.html>

Jonas Wagner via llvm-dev

2016-Dec-07 09:18 UTC

head link

[llvm-dev] Computing block profile weights

Hi,

Jonas, I assume you are talking about Sample-Based PGO. Yes, the
problem> you mentioned exists -- and your proposed solution seems reasonable. +dehao
> for comments.
>
Yes, the problem is present in sample-based PGO, and so far this is the
only case I've tested.

The patch contains a little bit of code to also compute the total number of
samples for instrumentation-based PGO, and so it should not break this
case. But I don't expect improvements for instrumentation-based PGO,
because is has accurate function entry counts.

Best,
Jonas


On Fri, Dec 2, 2016 at 1:40 AM, Jonas Wagner via llvm-dev
<> llvm-dev at lists.llvm.org> wrote:
>
> Hello,
>
> I'm working on an application that would benefit from knowing the
weight
> of a basic block, as in "fraction of the program's execution time
spent in
> this block".
>
> Currently, I'm computing this using the block's frequency from
> BlockFrequencyInfo, relative to the function's entry block frequency,
and
> scaled by the function's entry count. This is also the computation
that's
> done by getBlockProfileCount
>
<https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BlockFrequencyInfoImpl.cpp#L540>
in
> lib/Analysis/BlockFrequencyInfoImpl.cpp.
>
> The problem is that this method can be extremely imprecise, because many
> functions have an entry count of zero. The entry count is computed from the
> number of profile samples in the entry block. Depending on the
function's
> CFG, this count can be arbitrarily low even though the function is
> frequently called or hot.
>
> Here's an idea to address this. I'd like to collect a bit of
feedback from
> the community before trying it out.
>
> 1) Instead of relying on a function's *entry count*, store the *total
> number of samples* in a function. This number is readily available from
> the profile loader.
> 2) Compute a block's weight as `function_samples * block_weight /
> sum_of_block_weights_in_function`
>
> Why do I like this?
>
> - Total samples in a function gives a good impression of the importance of
> a function, better than the entry count.
> - This scheme "preserves mass" in that all samples of a function
are taken
> into account. The samples in a BB are compared to samples in the entire
> function, rather than a few (arbitrarily) selected samples from the entry
> block.
> - The computation avoids imprecision from multiplying by small numbers.
>
> Disadvantages?
>
> - BlockFrequencyInfo needs to keep track of the total frequency in a
> function.
> - BlockFrequencyInfo would probably scale the frequencies w.r.t. that
> total, rather then the maximum frequency. This loses a few bits of
precision
>
> Note that the entry count would not be lost in this scheme; one could
> easily compute it as `function_samples * entry_weight /
> sum_of_block_weights_in_function`.
>
> I believe using an entire function as unit of reference is a good
> compromise between precision and modularity. Precision is high because
> there's a sufficient number of samples available in a function.
Modularity
> is preserved because the computation does not need to take other functions
> into account (in fact, BlockFrequencyInfo already processes one function at
> at time).
>
> What do people think about this?
> - Jonas
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/b7726b95/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Dec 2016 - Computing block profile weights

[llvm-dev] Computing block profile weights

[llvm-dev] Computing block profile weights

[llvm-dev] Computing block profile weights

[llvm-dev] Computing block profile weights

[llvm-dev] Computing block profile weights

Apparently Analagous Threads