Xinliang David Li
2015-Mar-24 19:53 UTC
[LLVMdev] RFC - Improvements to PGO profile support
Capping also leads to other kinds of problems -- e.g., sum of incoming edge count (callgraph) does not match the callee entry count etc. David On Tue, Mar 24, 2015 at 12:50 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> > > On Tue, Mar 24, 2015 at 12:45 PM, Chandler Carruth <chandlerc at google.com> > wrote: > >> >> On Tue, Mar 24, 2015 at 11:46 AM, Xinliang David Li <xinliangli at gmail.com >> > wrote: >> >>> On Tue, Mar 24, 2015 at 11:29 AM, Chandler Carruth <chandlerc at google.com >>> > wrote: >>> >>>> Sorry I haven't responded earlier, but one point here still doesn't >>>> make sense to me: >>>> >>>> On Tue, Mar 24, 2015 at 10:27 AM, Xinliang David Li <davidxl at google.com >>>> > wrote: >>>> >>>>> Diego and I have discussed this according to the feedback received. We >>>>> have revised plan for this (see Diego's last reply). Here is a more >>>>> detailed re-cap: >>>>> >>>>> 1) keep MD_prof definition as it is today; also keep using the >>>>> frequency propagation as it is (assuming programs with irreducible >>>>> loops are not common and not important. If it turns out to be >>>>> otherwise, we will revisit this). >>>>> 2) fix all problems that lead to wrong 'frequency/count' computed from >>>>> the frequency propagation algorithm >>>>> 2.1) relax 32bit limit >>>>> >>>> >>>> I still don't understand why this is important or useful.... Maybe I'm >>>> just missing something. >>>> >>>> Given the current meaning of MD_prof, it seems like the result of >>>> limiting this to 32-bits is that the maximum relative ratio of >>>> probabilities between two successors of a basic block with N successors is >>>> (2 billion / N):1 -- what is the circumstance that makes this resolution >>>> insufficient? >>>> >>>> It also doesn't seem *bad* per-se, I just don't see what it improves, >>>> and it does cost memory... >>>> >>> >>> right -- there is some ambiguity here -- it is needed if we were to >>> change MD_prof's definition to represent branch count. However, with the >>> new plan, the removal of the limit only applies to the function entry count >>> representation planned. >>> >> >> Ah, ok, that makes more sense. >> >> I'm still curious, is the ratio of 2 billion : 1 insufficient between the >> hottest basic block in the inner most loop and the entry block? My >> intuition is that this ratio encapsulates all the information we could >> meaningfully make decisions based upon, and I don't have any examples where >> it falls over, but perhaps you have some examples? >> > > The ratio is not the problem. The problem is that we can no longer > effectively differentiate hot functions. 2 billion vs 4 billion will look > the same with the small capping. > > David > > > >> >> (Note, the 4096 scaling limit thing is completely separate, that makes >> perfect sense to me.) >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/31f9b999/attachment.html>
Chandler Carruth
2015-Mar-24 19:57 UTC
[LLVMdev] RFC - Improvements to PGO profile support
On Tue, Mar 24, 2015 at 12:53 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> Capping also leads to other kinds of problems -- e.g., sum of incoming > edge count (callgraph) does not match the callee entry count etc.Can you explain these problems in more detail? I think that's essential for understanding why you think the design should be work in this way. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/5b9f9a45/attachment.html>
Xinliang David Li
2015-Mar-24 20:08 UTC
[LLVMdev] RFC - Improvements to PGO profile support
Example. Assuming the cap is 'C' void bar() { // ENTRY count is 4*C, after capping it becomes 'C' ... } void test() { // BB1: count(BB1) = C bar(); // BB2: count(BB2) = C bar(); } void test2() { // BB3: count(BB3) = C bar(); // BB4: count(BB4) = C bar(); } What would inliner see here ? When it sees callsite1 -- it might mistaken that is the only dominating callsite to 'bar'. David On Tue, Mar 24, 2015 at 12:57 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Tue, Mar 24, 2015 at 12:53 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > >> Capping also leads to other kinds of problems -- e.g., sum of incoming >> edge count (callgraph) does not match the callee entry count etc. > > > Can you explain these problems in more detail? I think that's essential > for understanding why you think the design should be work in this way. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150324/079c7769/attachment.html>
Possibly Parallel Threads
- [LLVMdev] RFC - Improvements to PGO profile support
- [LLVMdev] Copy Instruction from one Basic block to another
- [LLVMdev] Update PHINode after extracting code
- [LLVMdev] Question about an unusual jump instruction
- [LLVMdev] How to insert a basic block in an edge