thr3ads.net - llvm dev - [LLVMdev] RFC - Improvements to PGO profile support [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2015-Mar-24 19:45 UTC

[LLVMdev] RFC - Improvements to PGO profile support

On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
>> Sorry I haven't responded earlier, but one point here still
doesn't make
>> sense to me:
>>
>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl at
google.com>
>> wrote:
>>
>>> Diego and I have discussed this according to the feedback received.
We
>>> have revised plan for this (see Diego's last reply).  Here is a
more
>>> detailed re-cap:
>>>
>>> 1) keep MD_prof definition as it is today; also keep using the
>>> frequency propagation as it is (assuming programs with irreducible
>>> loops are not common and not important. If it turns out to be
>>> otherwise, we will revisit this).
>>> 2) fix all problems that lead to wrong 'frequency/count'
computed from
>>> the frequency propagation algorithm
>>>    2.1) relax 32bit limit
>>>
>>
>> I still don't understand why this is important or useful.... Maybe
I'm
>> just missing something.
>>
>> Given the current meaning of MD_prof, it seems like the result of
>> limiting this to 32-bits is that the maximum relative ratio of
>> probabilities between two successors of a basic block with N successors
is
>> (2 billion / N):1 -- what is the circumstance that makes this
resolution
>> insufficient?
>>
>> It also doesn't seem *bad* per-se, I just don't see what it
improves, and
>> it does cost memory...
>>
>
> right -- there is some ambiguity here -- it is needed if we were to change
> MD_prof's definition to represent branch count.  However, with the new
> plan, the removal of the limit only applies to the function entry count
> representation planned.
>
Ah, ok, that makes more sense.

I'm still curious, is the ratio of 2 billion : 1 insufficient between the
hottest basic block in the inner most loop and the entry block? My
intuition is that this ratio encapsulates all the information we could
meaningfully make decisions based upon, and I don't have any examples where
it falls over, but perhaps you have some examples?

(Note, the 4096 scaling limit thing is completely separate, that makes
perfect sense to me.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/6281dea2/attachment.html>

Xinliang David Li

2015-Mar-24 19:50 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
>
> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>
>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at
google.com>
>> wrote:
>>
>>> Sorry I haven't responded earlier, but one point here still
doesn't make
>>> sense to me:
>>>
>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl at
google.com>
>>> wrote:
>>>
>>>> Diego and I have discussed this according to the feedback
received. We
>>>> have revised plan for this (see Diego's last reply).  Here
is a more
>>>> detailed re-cap:
>>>>
>>>> 1) keep MD_prof definition as it is today; also keep using the
>>>> frequency propagation as it is (assuming programs with
irreducible
>>>> loops are not common and not important. If it turns out to be
>>>> otherwise, we will revisit this).
>>>> 2) fix all problems that lead to wrong
'frequency/count' computed from
>>>> the frequency propagation algorithm
>>>>    2.1) relax 32bit limit
>>>>
>>>
>>> I still don't understand why this is important or useful....
Maybe I'm
>>> just missing something.
>>>
>>> Given the current meaning of MD_prof, it seems like the result of
>>> limiting this to 32-bits is that the maximum relative ratio of
>>> probabilities between two successors of a basic block with N
successors is
>>> (2 billion / N):1 -- what is the circumstance that makes this
resolution
>>> insufficient?
>>>
>>> It also doesn't seem *bad* per-se, I just don't see what it
improves,
>>> and it does cost memory...
>>>
>>
>> right -- there is some ambiguity here -- it is needed if we were to
>> change MD_prof's definition to represent branch count.  However,
with the
>> new plan, the removal of the limit only applies to the function entry
count
>> representation planned.
>>
>
> Ah, ok, that makes more sense.
>
> I'm still curious, is the ratio of 2 billion : 1 insufficient between
the
> hottest basic block in the inner most loop and the entry block? My
> intuition is that this ratio encapsulates all the information we could
> meaningfully make decisions based upon, and I don't have any examples
where
> it falls over, but perhaps you have some examples?
>
The ratio is not the problem. The problem is that we can no longer
effectively differentiate hot functions. 2 billion vs 4 billion will look
the same with the small capping.

David


>
> (Note, the 4096 scaling limit thing is completely separate, that makes
> perfect sense to me.)
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/2f2cfdf8/attachment.html>

Chandler Carruth

2015-Mar-24 19:53 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

On Tue, Mar 24, 2015 at 12:50 PM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
> On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
>>
>> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at
gmail.com
>> > wrote:
>>
>>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at
google.com
>>> > wrote:
>>>
>>>> Sorry I haven't responded earlier, but one point here still
doesn't
>>>> make sense to me:
>>>>
>>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl
at google.com
>>>> > wrote:
>>>>
>>>>> Diego and I have discussed this according to the feedback
received. We
>>>>> have revised plan for this (see Diego's last reply). 
Here is a more
>>>>> detailed re-cap:
>>>>>
>>>>> 1) keep MD_prof definition as it is today; also keep using
the
>>>>> frequency propagation as it is (assuming programs with
irreducible
>>>>> loops are not common and not important. If it turns out to
be
>>>>> otherwise, we will revisit this).
>>>>> 2) fix all problems that lead to wrong
'frequency/count' computed from
>>>>> the frequency propagation algorithm
>>>>>    2.1) relax 32bit limit
>>>>>
>>>>
>>>> I still don't understand why this is important or
useful.... Maybe I'm
>>>> just missing something.
>>>>
>>>> Given the current meaning of MD_prof, it seems like the result
of
>>>> limiting this to 32-bits is that the maximum relative ratio of
>>>> probabilities between two successors of a basic block with N
successors is
>>>> (2 billion / N):1 -- what is the circumstance that makes this
resolution
>>>> insufficient?
>>>>
>>>> It also doesn't seem *bad* per-se, I just don't see
what it improves,
>>>> and it does cost memory...
>>>>
>>>
>>> right -- there is some ambiguity here -- it is needed if we were to
>>> change MD_prof's definition to represent branch count. 
However, with the
>>> new plan, the removal of the limit only applies to the function
entry count
>>> representation planned.
>>>
>>
>> Ah, ok, that makes more sense.
>>
>> I'm still curious, is the ratio of 2 billion : 1 insufficient
between the
>> hottest basic block in the inner most loop and the entry block? My
>> intuition is that this ratio encapsulates all the information we could
>> meaningfully make decisions based upon, and I don't have any
examples where
>> it falls over, but perhaps you have some examples?
>>
>
> The ratio is not the problem. The problem is that we can no longer
> effectively differentiate hot functions. 2 billion vs 4 billion will look
> the same with the small capping.
>
The current design for the entry frequency is that it should be interpreted
relative to the global hotness of the function. Today, we only have
attributes "hot" and "cold" which are terrible models of
this, but if you
imagine having a function-level metadata signifying the detailed function
profile weight, then you could interpret the basic block frequencies
between two functions only after normalizing with these function-level
weights.

Does that make sense?

(Note, I'm not actually saying that I think this is necessarily the right
design, just that I believe it is the intent of the current design, and if
it is flawed I think that flaw hasn't yet been effectively shown.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/77951fd9/attachment.html>

Xinliang David Li

2015-Mar-24 19:53 UTC

head link

[LLVMdev] RFC - Improvements to PGO profile support

Capping also leads to other kinds of problems -- e.g., sum of incoming edge
count (callgraph) does not match the callee entry count etc.

David

On Tue, Mar 24, 2015 at 12:50 PM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
>
>
> On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
>>
>> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at
gmail.com
>> > wrote:
>>
>>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at
google.com
>>> > wrote:
>>>
>>>> Sorry I haven't responded earlier, but one point here still
doesn't
>>>> make sense to me:
>>>>
>>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl
at google.com
>>>> > wrote:
>>>>
>>>>> Diego and I have discussed this according to the feedback
received. We
>>>>> have revised plan for this (see Diego's last reply). 
Here is a more
>>>>> detailed re-cap:
>>>>>
>>>>> 1) keep MD_prof definition as it is today; also keep using
the
>>>>> frequency propagation as it is (assuming programs with
irreducible
>>>>> loops are not common and not important. If it turns out to
be
>>>>> otherwise, we will revisit this).
>>>>> 2) fix all problems that lead to wrong
'frequency/count' computed from
>>>>> the frequency propagation algorithm
>>>>>    2.1) relax 32bit limit
>>>>>
>>>>
>>>> I still don't understand why this is important or
useful.... Maybe I'm
>>>> just missing something.
>>>>
>>>> Given the current meaning of MD_prof, it seems like the result
of
>>>> limiting this to 32-bits is that the maximum relative ratio of
>>>> probabilities between two successors of a basic block with N
successors is
>>>> (2 billion / N):1 -- what is the circumstance that makes this
resolution
>>>> insufficient?
>>>>
>>>> It also doesn't seem *bad* per-se, I just don't see
what it improves,
>>>> and it does cost memory...
>>>>
>>>
>>> right -- there is some ambiguity here -- it is needed if we were to
>>> change MD_prof's definition to represent branch count. 
However, with the
>>> new plan, the removal of the limit only applies to the function
entry count
>>> representation planned.
>>>
>>
>> Ah, ok, that makes more sense.
>>
>> I'm still curious, is the ratio of 2 billion : 1 insufficient
between the
>> hottest basic block in the inner most loop and the entry block? My
>> intuition is that this ratio encapsulates all the information we could
>> meaningfully make decisions based upon, and I don't have any
examples where
>> it falls over, but perhaps you have some examples?
>>
>
> The ratio is not the problem. The problem is that we can no longer
> effectively differentiate hot functions. 2 billion vs 4 billion will look
> the same with the small capping.
>
> David
>
>
>
>>
>> (Note, the 4096 scaling limit thing is completely separate, that makes
>> perfect sense to me.)
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/31f9b999/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Mar 2015 - [LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

[LLVMdev] RFC - Improvements to PGO profile support

Possibly Parallel Threads