thr3ads.net - llvm dev - [LLVMdev] RFC - Improvements to PGO profile support [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2015-Mar-24 19:53 UTC

[LLVMdev] RFC - Improvements to PGO profile support

Capping also leads to other kinds of problems -- e.g., sum of incoming edge
count (callgraph) does not match the callee entry count etc.

David

On Tue, Mar 24, 2015 at 12:50 PM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
>
>
> On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
>>
>> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at
gmail.com
>> > wrote:
>>
>>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at
google.com
>>> > wrote:
>>>
>>>> Sorry I haven't responded earlier, but one point here still
doesn't
>>>> make sense to me:
>>>>
>>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl
at google.com
>>>> > wrote:
>>>>
>>>>> Diego and I have discussed this according to the feedback
received. We
>>>>> have revised plan for this (see Diego's last reply). 
Here is a more
>>>>> detailed re-cap:
>>>>>
>>>>> 1) keep MD_prof definition as it is today; also keep using
the
>>>>> frequency propagation as it is (assuming programs with
irreducible
>>>>> loops are not common and not important. If it turns out to
be
>>>>> otherwise, we will revisit this).
>>>>> 2) fix all problems that lead to wrong
'frequency/count' computed from
>>>>> the frequency propagation algorithm
>>>>>    2.1) relax 32bit limit
>>>>>
>>>>
>>>> I still don't understand why this is important or
useful.... Maybe I'm
>>>> just missing something.
>>>>
>>>> Given the current meaning of MD_prof, it seems like the result
of
>>>> limiting this to 32-bits is that the maximum relative ratio of
>>>> probabilities between two successors of a basic block with N
successors is
>>>> (2 billion / N):1 -- what is the circumstance that makes this
resolution
>>>> insufficient?
>>>>
>>>> It also doesn't seem *bad* per-se, I just don't see
what it improves,
>>>> and it does cost memory...
>>>>
>>>
>>> right -- there is some ambiguity here -- it is needed if we were to
>>> change MD_prof's definition to represent branch count. 
However, with the
>>> new plan, the removal of the limit only applies to the function
entry count
>>> representation planned.
>>>
>>
>> Ah, ok, that makes more sense.
>>
>> I'm still curious, is the ratio of 2 billion : 1 insufficient
between the
>> hottest basic block in the inner most loop and the entry block? My
>> intuition is that this ratio encapsulates all the information we could
>> meaningfully make decisions based upon, and I don't have any
examples where
>> it falls over, but perhaps you have some examples?
>>
>
> The ratio is not the problem. The problem is that we can no longer
> effectively differentiate hot functions. 2 billion vs 4 billion will look
> the same with the small capping.
>
> David
>
>
>
>>
>> (Note, the 4096 scaling limit thing is completely separate, that makes
>> perfect sense to me.)
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20150324/31f9b999/attachment.html>

Chandler Carruth

2015-Mar-24 19:57 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Tue, Mar 24, 2015 at 12:53 PM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
> Capping also leads to other kinds of problems -- e.g., sum of incoming
> edge count (callgraph) does not match the callee entry count etc.

Can you explain these problems in more detail? I think that's essential for
understanding why you think the design should be work in this way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20150324/5b9f9a45/attachment.html>

Xinliang David Li

2015-Mar-24 20:08 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

Example. Assuming the cap is 'C'

void bar()
{
    // ENTRY count is 4*C, after capping it becomes 'C'
    ...
}

void test()
{
  // BB1:   count(BB1) = C
  bar();

  // BB2:   count(BB2) = C
  bar();

}

void test2()
{
  // BB3:   count(BB3) = C
  bar();

  // BB4:   count(BB4) = C
  bar();
}

What would inliner see here ? When it sees callsite1 -- it might mistaken
that is the only dominating callsite to 'bar'.

David


On Tue, Mar 24, 2015 at 12:57 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
>
> On Tue, Mar 24, 2015 at 12:53 PM, Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>
>> Capping also leads to other kinds of problems -- e.g., sum of incoming
>> edge count (callgraph) does not match the callee entry count etc.
>
>
> Can you explain these problems in more detail? I think that's essential
> for understanding why you think the design should be work in this way.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20150324/079c7769/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Mar 2015 - [LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

Reasonably Related Threads