thr3ads.net - llvm dev - [LLVMdev] RFC - Improvements to PGO profile support [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Bob Wilson

2015-Mar-24 17:54 UTC

[LLVMdev] RFC - Improvements to PGO profile support

> On Mar 24, 2015, at 10:53 AM, Diego Novillo <dnovillo at google.com>
wrote:
> 
> On Tue, Mar 24, 2015 at 1:48 PM, Bob Wilson <bob.wilson at apple.com>
wrote:
>> 
>>> On Mar 24, 2015, at 10:27 AM, Xinliang David Li <davidxl at
google.com> wrote:
>>> 
>>>  2.3) remove the 'laplace rule of succession' which can be
very harmful
>> 
>> I have not seen any consensus on this. I’m not at all convinced this
would be a good idea. From what I’ve seen, using LaPlace’s rule avoids really
bad behavior in cases where the counts are very small and has almost no impact
when the counts are large.
> 
> Duncan and I chatted briefly about this on IRC. I'm going to
> experiment with only applying laplace smoothing if any scaled weights
> are 0. AFAIU, usage of this rule is really just trying to avoid having
> downstream users deal with 0 weights. Duncan, please whack me with a
> clue stick if I totally misrepresented our conclusions.
Zero is the extreme case, but it’s also important for other small counts. I’d
like to see some specific examples of why you think the current behavior is
harmful.

Xinliang David Li

2015-Mar-24 19:08 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Tue, Mar 24, 2015 at 10:54 AM, Bob Wilson <bob.wilson at apple.com>
wrote:>
>> On Mar 24, 2015, at 10:53 AM, Diego Novillo <dnovillo at
google.com> wrote:
>>
>> On Tue, Mar 24, 2015 at 1:48 PM, Bob Wilson <bob.wilson at
apple.com> wrote:
>>>
>>>> On Mar 24, 2015, at 10:27 AM, Xinliang David Li <davidxl at
google.com> wrote:
>>>>
>>>>  2.3) remove the 'laplace rule of succession' which can
be very harmful
>>>
>>> I have not seen any consensus on this. I’m not at all convinced
this would be a good idea. From what I’ve seen, using LaPlace’s rule avoids
really bad behavior in cases where the counts are very small and has almost no
impact when the counts are large.
>>
>> Duncan and I chatted briefly about this on IRC. I'm going to
>> experiment with only applying laplace smoothing if any scaled weights
>> are 0. AFAIU, usage of this rule is really just trying to avoid having
>> downstream users deal with 0 weights. Duncan, please whack me with a
>> clue stick if I totally misrepresented our conclusions.
>
> Zero is the extreme case, but it’s also important for other small counts.
I’d like to see some specific examples of why you think the current behavior is
harmful.
Using the rule has at least the following bad side effects:
1) wrong estimation of average loop trip count -- possibly leading to
wrong unroll decisions
2) result in bad inlining decisions. For instance:
    for (...)
        bar();  // (1)

 where (1) is the only callsite to bar().   Using the rule, BB count
enclosing the call to bar() can be as low as half of the entry count
of bar().  Inliner will get confused and think there are more hot
callsites to 'bar' and  make suboptimal decisions ..

Also if bar has calls to other functions, those callsites will look
hotter than the call to 'bar' ...

The attached are two cases as well as the frequency graph computed
today (with the laplace rule) and the correct frequency expected.

David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops2.c
Type: text/x-csrc
Size: 608 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops2.dot
Type: application/msword
Size: 2914 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment.dot>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops2.prof
Type: application/octet-stream
Size: 728 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops2_correct.dot
Type: application/msword
Size: 2831 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment-0001.dot>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops3.c
Type: text/x-csrc
Size: 566 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment-0001.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops3.dot
Type: application/msword
Size: 1936 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment-0002.dot>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops3.prof
Type: application/octet-stream
Size: 816 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loops3_correct.dot
Type: application/msword
Size: 1896 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/3a4afb39/attachment-0003.dot>

Bob Wilson

2015-Mar-24 21:47 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

> On Mar 24, 2015, at 12:08 PM, Xinliang David Li <davidxl at
google.com> wrote:
> 
> On Tue, Mar 24, 2015 at 10:54 AM, Bob Wilson <bob.wilson at
apple.com> wrote:
>> 
>>> On Mar 24, 2015, at 10:53 AM, Diego Novillo <dnovillo at
google.com> wrote:
>>> 
>>> On Tue, Mar 24, 2015 at 1:48 PM, Bob Wilson <bob.wilson at
apple.com> wrote:
>>>> 
>>>>> On Mar 24, 2015, at 10:27 AM, Xinliang David Li <davidxl
at google.com> wrote:
>>>>> 
>>>>> 2.3) remove the 'laplace rule of succession' which
can be very harmful
>>>> 
>>>> I have not seen any consensus on this. I’m not at all convinced
this would be a good idea. From what I’ve seen, using LaPlace’s rule avoids
really bad behavior in cases where the counts are very small and has almost no
impact when the counts are large.
>>> 
>>> Duncan and I chatted briefly about this on IRC. I'm going to
>>> experiment with only applying laplace smoothing if any scaled
weights
>>> are 0. AFAIU, usage of this rule is really just trying to avoid
having
>>> downstream users deal with 0 weights. Duncan, please whack me with
a
>>> clue stick if I totally misrepresented our conclusions.
>> 
>> Zero is the extreme case, but it’s also important for other small
counts. I’d like to see some specific examples of why you think the current
behavior is harmful.
> 
> Using the rule has at least the following bad side effects:
> 1) wrong estimation of average loop trip count -- possibly leading to
> wrong unroll decisions
I remain skeptical. If the loop only ran once in your profile run, how much
confidence do you have in the trip count? The LaPlace rule will tend to make the
compiler more conservative in exactly the cases where there is insufficient
information to be confident about what to do.
> 2) result in bad inlining decisions. For instance:
>    for (...)
>        bar();  // (1)
> 
> where (1) is the only callsite to bar().   Using the rule, BB count
> enclosing the call to bar() can be as low as half of the entry count
> of bar().  Inliner will get confused and think there are more hot
> callsites to 'bar' and  make suboptimal decisions ..
> 
> Also if bar has calls to other functions, those callsites will look
> hotter than the call to 'bar' …
Your own proposal for recording entry counts is to record “relative hotness”,
not absolute profile counts. On the caller’s side, we’ve got a branch weight
influenced by LaPlace’s rule that is then used to compute BlockFrequency and
you’re concerned about a mismatch between that the “relative hotness” recorded
for the callee??
> 
> The attached are two cases as well as the frequency graph computed
> today (with the laplace rule) and the correct frequency expected.
I’d be a lot more interested to see a real-world example.

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2015 - [LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

Possibly Parallel Threads