thr3ads.net - llvm dev - [LLVMdev] Question regarding basic-block placement optimization [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2011-Oct-18 22:07 UTC

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at apple.com>
wrote:
> On Oct 18, 2011, at 2:53 AM, Chandler Carruth wrote:
>
> > Hello,
> >
> > I'm working on basic-block placement optimizations based on branch
> probability information. I've run into a stumbling block though. One of
the
> existing passes to do this, essentially a dead pass
'block-placement'
> operates on the IR, reordering the blocks of the function, and relying on
> the code generator to preserve that order unless it has a reason to
> rearrange it. This pass is entirely untested AFAICT, and it doesn't
work any
> more...
> >
> > That said, I've replicated its functionality with a much updated
> transformation algorithm (no longer O(N^2), and finds better orderings!)
and
> predicated it on the new frequency and probability information. It
"works"
> great, but it doesn't actually optimize anything. It re-arranges all
the
> blocks of the IR function, laying them out perfectly. And then the code
> generation layer throws that away and seems to establish its own ordering
> altogether.
> >
> > What's the best approach here? Is ignoring the order of blocks in
the IR
> function desired behavior? (It does seem reasonable, it just seems to have
> not always been that way, so I wondered.)
> >
> > Should I sink my pass down to a codegen pass?
>
> I think this should really live as a CodeGen pass. Is there any good reason
> to make it an IR pass?

So, as it happens, I was *completely* wrong here. CodeGen correctly
preserves the ordering of blocks from IR, *unless* it can do folding, etc.
My original test cases ended up being folded into other structures, hiding
the fact that the ordering wasn't what was changing.

I've run much more real-world style test cases through the pass, and it is
working beautifully. I'll be attaching a patch shortly, I'm currently
cleaning it up based on some over-the-shoulder feedback from Nick Lewycky.

As for why it should be an IR pass, mostly because once the selection dag
runs through the code, we can never recover all of the freedom we have at
the IR level. To start with, splicing MBBs around requires known about the
terminators (which we only some of the time do), and it requires re-writing
them a touch to account for the different fall-through pattern. To make
matters worse, at this point we don't have the nicely analyzable
'switch'
terminator (I think), and so the existing MBB placement code just bails on
non-branch-exit blocks.

Operating at the IR level makes writing the pass a breeze, and we can
quickly and efficiently simply lay the blocks out in the ideal ordering.
Then the SelectionDAG and other layers can preserve this modulo folding and
other optimization opportunities.

At the most, we should likely add probability awareness to some of these
CodeGen passes to turn off (or on) their transforms around unlikely (or
likely) edges.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/fb7f9bf7/attachment.html>

Cameron Zwarich

2011-Oct-18 23:18 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:
> As for why it should be an IR pass, mostly because once the selection dag
runs through the code, we can never recover all of the freedom we have at the IR
level. To start with, splicing MBBs around requires known about the terminators
(which we only some of the time do), and it requires re-writing them a touch to
account for the different fall-through pattern. To make matters worse, at this
point we don't have the nicely analyzable 'switch' terminator (I
think), and so the existing MBB placement code just bails on non-branch-exit
blocks.
We should be able to analyze more terminators in the backend. Most other
compilers have this ability. If we don't fix this problem now, we will only
increase the distance between the functionality of the backend IR and the
mid-level IR.
> Operating at the IR level makes writing the pass a breeze, and we can
quickly and efficiently simply lay the blocks out in the ideal ordering. Then
the SelectionDAG and other layers can preserve this modulo folding and other
optimization opportunities.
The biggest problem with doing block layout at the IR level is that you
don't have the final CFG. Lots of passes can modify the CFG, and they will
have to rely on the sorts of local heuristics that already exist (and are known
to perform poorly in some cases) to redo block layout after these changes. You
also can't represent jump fall-throughs in the mid-level IR (nor should you
be able to).

Cameron

Jakob Stoklund Olesen

2011-Oct-18 23:31 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:
> On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at
apple.com> wrote:
> I think this should really live as a CodeGen pass. Is there any good reason
to make it an IR pass?
> 
> So, as it happens, I was *completely* wrong here. CodeGen correctly
preserves the ordering of blocks from IR, *unless* it can do folding, etc.
That's right. However, the CFG changes quite a bit during CodeGen.

Switches can be lowered into branch trees, multiple passes can split critical
edges, and then there is taildup and tailmerge.

An IR code layout algorithm simply doesn't know the final CFG.
> As for why it should be an IR pass, mostly because once the selection dag
runs through the code, we can never recover all of the freedom we have at the IR
level. To start with, splicing MBBs around requires known about the terminators
(which we only some of the time do), and it requires re-writing them a touch to
account for the different fall-through pattern. To make matters worse, at this
point we don't have the nicely analyzable 'switch' terminator (I
think), and so the existing MBB placement code just bails on non-branch-exit
blocks.
Those are all the wrong reasons for not doing the right thing.

Some basic blocks are glued together and must be placed next to each other. That
situation can be recognized by "MBB->canFallThrough() &&
TII->AnalyzeBranch(MBB..)".

Treat glued-together blocks as super-blocks, and everything should be as breezy
as IR.

I realize the MBB interface is not the prettiest. Suggestions for cleaning it up
are very welcome.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/f9832789/attachment.html>

Chandler Carruth

2011-Oct-19 00:22 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 4:31 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
> On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:
>
> On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at
apple.com>wrote:
>
>> I think this should really live as a CodeGen pass. Is there any good
>> reason to make it an IR pass?
>>
>
> So, as it happens, I was *completely* wrong here. CodeGen correctly
> preserves the ordering of blocks from IR, *unless* it can do folding, etc.
>
>
> That's right. However, the CFG changes quite a bit during CodeGen.
>
> Switches can be lowered into branch trees, multiple passes can split
> critical edges, and then there is taildup and tailmerge.
>
> An IR code layout algorithm simply doesn't know the final CFG.
>
To be clear, I don't disagree with any of this. =] However, my rough
experiments thus far (hoping to have real benchmark data soon) seem to
indicate that just giving a baseline ordering of block to the CodeGen layer

>
> As for why it should be an IR pass, mostly because once the selection dag
> runs through the code, we can never recover all of the freedom we have at
> the IR level. To start with, splicing MBBs around requires known about the
> terminators (which we only some of the time do), and it requires re-writing
> them a touch to account for the different fall-through pattern. To make
> matters worse, at this point we don't have the nicely analyzable
'switch'
> terminator (I think), and so the existing MBB placement code just bails on
> non-branch-exit blocks.
>
>
> Those are all the wrong reasons for not doing the right thing.
>
Sorry, I'm not trying to do the wrong thing because of this... Currently, it
feels like a trade-off in terms of cost/benefit. It's not yet clear to me
that the benefit of doing this analysis in the CodeGen layer outweighs the
cost and I was trying to clarify what the costs I perceive are.

Some basic blocks are glued together and must be placed next to each
other.> That situation can be recognized by "MBB->canFallThrough()
&&
> TII->AnalyzeBranch(MBB..)".
>
> Treat glued-together blocks as super-blocks, and everything should be as
> breezy as IR.
>
But that's just the thing -- a primary goal of this pass would be to
*change* the fall-through pattern. Currently, that can be done very easily,
although to a limited extent, by changing the IR which enters the selection
dag.

Maybe what we need is to have this pass at both layers? Then the codegen
layer can work on the glued-together blocks to check for (and correct) any
inappropriate CFG changes made in the intervening passes?

Also, it's still not clear to me how to analyze switches in CodeGen, but
that's likely my lack of having read the appropriate interfaces thoroughly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/2163fe58/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Oct 2011 - [LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

Reasonably Related Threads