Chandler Carruth
2011-Oct-23 08:11 UTC
[LLVMdev] Question regarding basic-block placement optimization
Ok, I think I have a working pass that is based much more on what we've talked about here. The patch is attached. I'd love to commit it soon-ish and then start tweaking it based on feedback from you, others, and looking at how it actually works in the wild. It's essentially a re-write, so it may be hard to read... Let me know if some other form would be easier. Some responses to your comments below, perhaps giving insight into the choices I made or what the pass is currently doing and what I haven't addressed yet. On Fri, Oct 21, 2011 at 6:17 PM, Andrew Trick <atrick at apple.com> wrote:> Once you decide to break these constraints, you have effectively > designated the offending paths as "cold". It's up to you how to > decide which paths are cold. You'll need some threshold based on > confidence in the profile, and we can measure the effectiveness of > your heuristics and add more over time. >FYI, I ended up with a heuristic that requires a hot edge to be 4x more likely than any other competing in the CFG. This is very loosely based on the definition of a 'hot' edge in BranchProbabilityInfo (80% vs. 20%). I don't care much about how you layout the cold chains as long as they> are not interleaved with the rest of the code, thus breaking the > topological ordering and forcing extra branches. In practice, this > means they get their own little Siberia after the function return, > effectively splitting the function into two sections. However, the > notion of "coldness" is really relative to the current loop head. >I've not really tried to ensure this happens correctly. Mostly, this is just forming hot-paths through the code when they are *really* hot. I'll work on better sectioning off the cold paths that are cut off in a followup patch. 2) Construct chains of blocks within the loop. Here the "entry" chain> is the loop header, or function entry. You can probably represent > each already laid out inner loop as a single chain. The graph of > chains within a loop is *acyclic* except for irreducible control > flow, which doesn't have any structure to preserve anyway. There's > no need to compute SCCs. >I'm actually not satisfied with how I've done this. I think there is likely a cleaner / better solution than the one I have in the patch, but it largely works, and is easy to change later. The patch just walks forward over the function, and tries to chain blocks together with their successors when the CFG permits, choosing successors based on whatever (even weak) signal the probability info gives us. 4) Splice the chains together, preserving topological order. Assigning> each block to a DFS reverse-post-order number provides a nice > ordering that is easy to compare or sort. >Currently I'm doing this once at the very end. I wonder if it would be better to do iteratively as we walk up through the loops? Let me give you a little example to think about:> > A: > br B(80%), C(20%) > B: > br D > C: > br D(90%), E(10%) > D: > br F > E: > br F > F: > ret > > This is a fine topological sort but bad layout based on expected > branch probability (even with low confidence in the profile). > > The only good layout that preserves topological ordering is: > A, B, C, E, D, F >I made a test case out of this, and I think it works. It's a bit confusing at it just replaces branches to F with ret. That removes the edge from D to F, and makes the following ordering also viable: A, B, C, D, E. That's the ordering it chooses, and it also duplicates D up into B instead of branching from B to D. :: shrug ::. It looks pretty good to me though. =]> Finally, we need some way to validate the design and verify the > implementation. Weighted statistics will be very useful here, similar > to those that Jakob used in the register allocator. For example: > > For each nonadjacent edge: > TakenBranchFrequency += frequency(Edge) > > This metric will prefer the most "extreme" layout because it assumes > perfect accuracy of block frequencies. It would be nice to look at the > same metric computed from skewed branch probabilities to see how > sensitive it is to small shifts in branch behavior. > > Taken branch frequency is the most obvious, but doesn't completely > capture locality. You could also measure distance jumped fairly > easily, also weighted by edge frequency. > > It might be useful to gather these stats as late as possible to take > into consideration the final impact of layout, including the effect on > any later passes. >Cool, I'm working on this next. I've also uncovered some *completely* broken aspects of branch probability. Going back and adding tests and fixes for it. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: mbp-redux.patch Type: application/octet-stream Size: 33590 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.obj>
Chandler Carruth
2011-Oct-23 08:14 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Sun, Oct 23, 2011 at 1:11 AM, Chandler Carruth <chandlerc at google.com>wrote:> Ok, I think I have a working pass that is based much more on what we've > talked about here. The patch is attached. I'd love to commit it soon-ish and > then start tweaking it based on feedback from you, others, and looking at > how it actually works in the wild.After chatting briefly w/ Owen in IRC, he encouraged me to just commit this as it's behind an off-by-default flag, and not touching anything else. I'm going to go ahead with that so I can more easily start working on statistics and clean up associated other stuff. It'll also help me debug some weird probabilities. Feel free to carry on with discussion here or on the commit thread. =] Thanks again for all the really great feedback and thoughts!! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/1fa4eb90/attachment.html>
Andrew Trick
2011-Oct-23 18:31 UTC
[LLVMdev] Question regarding basic-block placement optimization
That's fine. I've been following along and what you're doing sounds right. I'll try to provide feedback on your latest commit once I have time to fully understand how it works. Andy Sent from my iPhone On Oct 23, 2011, at 1:14 AM, Chandler Carruth <chandlerc at google.com> wrote:> On Sun, Oct 23, 2011 at 1:11 AM, Chandler Carruth <chandlerc at google.com> wrote: > Ok, I think I have a working pass that is based much more on what we've talked about here. The patch is attached. I'd love to commit it soon-ish and then start tweaking it based on feedback from you, others, and looking at how it actually works in the wild. > > After chatting briefly w/ Owen in IRC, he encouraged me to just commit this as it's behind an off-by-default flag, and not touching anything else. I'm going to go ahead with that so I can more easily start working on statistics and clean up associated other stuff. It'll also help me debug some weird probabilities. > > Feel free to carry on with discussion here or on the commit thread. =] Thanks again for all the really great feedback and thoughts!!-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/cb16ae4b/attachment.html>
Andrew Trick
2011-Oct-23 18:45 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Oct 23, 2011, at 1:11 AM, Chandler Carruth <chandlerc at google.com> wrote:> > Let me give you a little example to think about: > > A: > br B(80%), C(20%) > B: > br D > C: > br D(90%), E(10%) > D: > br F > E: > br F > F: > ret > > This is a fine topological sort but bad layout based on expected > branch probability (even with low confidence in the profile). > > The only good layout that preserves topological ordering is: > A, B, C, E, D, F > > I made a test case out of this, and I think it works. It's a bit confusing at it just replaces branches to F with ret. That removes the edge from D to F, and makes the following ordering also viable: A, B, C, D, E. That's the ordering it chooses, and it also duplicates D up into B instead of branching from B to D. :: shrug ::. It looks pretty good to me though. =] >Great. I didn't expect it to work as test case without tweaking. The case when no blocks get folded or tail duped is interesting. You'll have to disable those codegen optimizations or just add blocks and instructions until they're defeated. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/6afe4f1b/attachment.html>
Andrew Trick
2011-Oct-23 18:51 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Oct 23, 2011, at 1:11 AM, Chandler Carruth <chandlerc at google.com> wrote:> I don't care much about how you layout the cold chains as long as they > are not interleaved with the rest of the code, thus breaking the > topological ordering and forcing extra branches. In practice, this > means they get their own little Siberia after the function return, > effectively splitting the function into two sections. However, the > notion of "coldness" is really relative to the current loop head. > > I've not really tried to ensure this happens correctly. Mostly, this is just forming hot-paths through the code when they are *really* hot. I'll work on better sectioning off the cold paths that are cut off in a followup patch.Once you move a block " out of line" it will naturally end up floating to the end of the function when you do a good job splicing the other warm chains. So I don't think it needs to be explicit in the design. The explicit decision is simply when to move a chain out of line ( no longer topplogical). Depending on the algorithm you may need to do some work to ensure loops are followed by their "best" loop exit. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/a6144364/attachment.html>
Apparently Analagous Threads
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization