thr3ads.net - llvm dev - [LLVMdev] Question regarding basic-block placement optimization [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2011-Oct-23 08:11 UTC

[LLVMdev] Question regarding basic-block placement optimization

Ok, I think I have a working pass that is based much more on what we've
talked about here. The patch is attached. I'd love to commit it soon-ish and
then start tweaking it based on feedback from you, others, and looking at
how it actually works in the wild.

It's essentially a re-write, so it may be hard to read... Let me know if
some other form would be easier.

Some responses to your comments below, perhaps giving insight into the
choices I made or what the pass is currently doing and what I haven't
addressed yet.

On Fri, Oct 21, 2011 at 6:17 PM, Andrew Trick <atrick at apple.com> wrote:
> Once you decide to break these constraints, you have effectively
> designated the offending paths as "cold". It's up to you how
to
> decide which paths are cold. You'll need some threshold based on
> confidence in the profile, and we can measure the effectiveness of
> your heuristics and add more over time.
>
FYI, I ended up with a heuristic that requires a hot edge to be 4x more
likely than any other competing in the CFG. This is very loosely based on
the definition of a 'hot' edge in BranchProbabilityInfo (80% vs. 20%).

I don't care much about how you layout the cold chains as long as
they> are not interleaved with the rest of the code, thus breaking the
> topological ordering and forcing extra branches. In practice, this
> means they get their own little Siberia after the function return,
> effectively splitting the function into two sections. However, the
> notion of "coldness" is really relative to the current loop head.
>
I've not really tried to ensure this happens correctly. Mostly, this is just
forming hot-paths through the code when they are *really* hot. I'll work on
better sectioning off the cold paths that are cut off in a followup patch.

2) Construct chains of blocks within the loop. Here the "entry"
chain>    is the loop header, or function entry. You can probably represent
>    each already laid out inner loop as a single chain. The graph of
>    chains within a loop is *acyclic* except for irreducible control
>    flow, which doesn't have any structure to preserve anyway.
There's
>    no need to compute SCCs.
>
I'm actually not satisfied with how I've done this. I think there is
likely
a cleaner / better solution than the one I have in the patch, but it largely
works, and is easy to change later. The patch just walks forward over the
function, and tries to chain blocks together with their successors when the
CFG permits, choosing successors based on whatever (even weak) signal the
probability info gives us.

4) Splice the chains together, preserving topological order.
Assigning>    each block to a DFS reverse-post-order number provides a nice
>    ordering that is easy to compare or sort.
>
Currently I'm doing this once at the very end. I wonder if it would be
better to do iteratively as we walk up through the loops?

Let me give you a little example to think about:>
> A:
>   br B(80%), C(20%)
> B:
>   br D
> C:
>   br D(90%), E(10%)
> D:
>   br F
> E:
>   br F
> F:
>   ret
>
> This is a fine topological sort but bad layout based on expected
> branch probability (even with low confidence in the profile).
>
> The only good layout that preserves topological ordering is:
> A, B, C, E, D, F
>
I made a test case out of this, and I think it works. It's a bit confusing
at it just replaces branches to F with ret. That removes the edge from D to
F, and makes the following ordering also viable: A, B, C, D, E. That's the
ordering it chooses, and it also duplicates D up into B instead of branching
from B to D. :: shrug ::. It looks pretty good to me though. =]

> Finally, we need some way to validate the design and verify the
> implementation. Weighted statistics will be very useful here, similar
> to those that Jakob used in the register allocator. For example:
>
> For each nonadjacent edge:
>   TakenBranchFrequency += frequency(Edge)
>
> This metric will prefer the most "extreme" layout because it
assumes
> perfect accuracy of block frequencies. It would be nice to look at the
> same metric computed from skewed branch probabilities to see how
> sensitive it is to small shifts in branch behavior.
>
> Taken branch frequency is the most obvious, but doesn't completely
> capture locality. You could also measure distance jumped fairly
> easily, also weighted by edge frequency.
>
> It might be useful to gather these stats as late as possible to take
> into consideration the final impact of layout, including the effect on
> any later passes.
>
Cool, I'm working on this next. I've also uncovered some *completely*
broken
aspects of branch probability. Going back and adding tests and fixes for it.
=]
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mbp-redux.patch
Type: application/octet-stream
Size: 33590 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.obj>

Chandler Carruth

2011-Oct-23 08:14 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Sun, Oct 23, 2011 at 1:11 AM, Chandler Carruth <chandlerc at
google.com>wrote:
> Ok, I think I have a working pass that is based much more on what we've
> talked about here. The patch is attached. I'd love to commit it
soon-ish and
> then start tweaking it based on feedback from you, others, and looking at
> how it actually works in the wild.

After chatting briefly w/ Owen in IRC, he encouraged me to just commit this
as it's behind an off-by-default flag, and not touching anything else.
I'm
going to go ahead with that so I can more easily start working on statistics
and clean up associated other stuff. It'll also help me debug some weird
probabilities.

Feel free to carry on with discussion here or on the commit thread. =]
Thanks again for all the really great feedback and thoughts!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/1fa4eb90/attachment.html>

Andrew Trick

2011-Oct-23 18:31 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

That's fine. I've been following along and what you're doing sounds
right. I'll try to provide feedback on your latest commit once I have time
to fully understand how it works.

Andy 

Sent from my iPhone

On Oct 23, 2011, at 1:14 AM, Chandler Carruth <chandlerc at google.com>
wrote:
> On Sun, Oct 23, 2011 at 1:11 AM, Chandler Carruth <chandlerc at
google.com> wrote:
> Ok, I think I have a working pass that is based much more on what we've
talked about here. The patch is attached. I'd love to commit it soon-ish and
then start tweaking it based on feedback from you, others, and looking at how it
actually works in the wild.
> 
> After chatting briefly w/ Owen in IRC, he encouraged me to just commit this
as it's behind an off-by-default flag, and not touching anything else.
I'm going to go ahead with that so I can more easily start working on
statistics and clean up associated other stuff. It'll also help me debug
some weird probabilities.
> 
> Feel free to carry on with discussion here or on the commit thread. =]
Thanks again for all the really great feedback and thoughts!!-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/cb16ae4b/attachment.html>

Andrew Trick

2011-Oct-23 18:45 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 23, 2011, at 1:11 AM, Chandler Carruth <chandlerc at google.com>
wrote:
> 
> Let me give you a little example to think about:
> 
> A:
>   br B(80%), C(20%)
> B:
>   br D
> C:
>   br D(90%), E(10%)
> D:
>   br F
> E:
>   br F
> F:
>   ret
> 
> This is a fine topological sort but bad layout based on expected
> branch probability (even with low confidence in the profile).
> 
> The only good layout that preserves topological ordering is:
> A, B, C, E, D, F
> 
> I made a test case out of this, and I think it works. It's a bit
confusing at it just replaces branches to F with ret. That removes the edge from
D to F, and makes the following ordering also viable: A, B, C, D, E. That's
the ordering it chooses, and it also duplicates D up into B instead of branching
from B to D. :: shrug ::. It looks pretty good to me though. =]
> 
Great. I didn't expect it to work as test case without tweaking. The case
when no blocks get folded or tail duped is interesting. You'll have to
disable those codegen optimizations or just add blocks and instructions until
they're defeated.

Andy 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/6afe4f1b/attachment.html>

Andrew Trick

2011-Oct-23 18:51 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 23, 2011, at 1:11 AM, Chandler Carruth <chandlerc at google.com>
wrote:
> I don't care much about how you layout the cold chains as long as they
> are not interleaved with the rest of the code, thus breaking the
> topological ordering and forcing extra branches. In practice, this
> means they get their own little Siberia after the function return,
> effectively splitting the function into two sections. However, the
> notion of "coldness" is really relative to the current loop head.
> 
> I've not really tried to ensure this happens correctly. Mostly, this is
just forming hot-paths through the code when they are *really* hot. I'll
work on better sectioning off the cold paths that are cut off in a followup
patch.
Once you move a block " out of line" it will naturally end up floating
to the end of the function when you do a good job splicing the other warm
chains. So I don't think it needs to be explicit in the design. The explicit
decision is simply when to move a chain out of line ( no longer topplogical).

Depending on the algorithm you may need to do some work to ensure loops are
followed by their "best" loop exit.

Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/a6144364/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Oct 2011 - [LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

Maybe Matching Threads