thr3ads.net - llvm dev - [LLVMdev] Question regarding basic-block placement optimization [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2011-Oct-19 00:22 UTC

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 4:31 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
> On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:
>
> On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at
apple.com>wrote:
>
>> I think this should really live as a CodeGen pass. Is there any good
>> reason to make it an IR pass?
>>
>
> So, as it happens, I was *completely* wrong here. CodeGen correctly
> preserves the ordering of blocks from IR, *unless* it can do folding, etc.
>
>
> That's right. However, the CFG changes quite a bit during CodeGen.
>
> Switches can be lowered into branch trees, multiple passes can split
> critical edges, and then there is taildup and tailmerge.
>
> An IR code layout algorithm simply doesn't know the final CFG.
>
To be clear, I don't disagree with any of this. =] However, my rough
experiments thus far (hoping to have real benchmark data soon) seem to
indicate that just giving a baseline ordering of block to the CodeGen layer

>
> As for why it should be an IR pass, mostly because once the selection dag
> runs through the code, we can never recover all of the freedom we have at
> the IR level. To start with, splicing MBBs around requires known about the
> terminators (which we only some of the time do), and it requires re-writing
> them a touch to account for the different fall-through pattern. To make
> matters worse, at this point we don't have the nicely analyzable
'switch'
> terminator (I think), and so the existing MBB placement code just bails on
> non-branch-exit blocks.
>
>
> Those are all the wrong reasons for not doing the right thing.
>
Sorry, I'm not trying to do the wrong thing because of this... Currently, it
feels like a trade-off in terms of cost/benefit. It's not yet clear to me
that the benefit of doing this analysis in the CodeGen layer outweighs the
cost and I was trying to clarify what the costs I perceive are.

Some basic blocks are glued together and must be placed next to each
other.> That situation can be recognized by "MBB->canFallThrough()
&&
> TII->AnalyzeBranch(MBB..)".
>
> Treat glued-together blocks as super-blocks, and everything should be as
> breezy as IR.
>
But that's just the thing -- a primary goal of this pass would be to
*change* the fall-through pattern. Currently, that can be done very easily,
although to a limited extent, by changing the IR which enters the selection
dag.

Maybe what we need is to have this pass at both layers? Then the codegen
layer can work on the glued-together blocks to check for (and correct) any
inappropriate CFG changes made in the intervening passes?

Also, it's still not clear to me how to analyze switches in CodeGen, but
that's likely my lack of having read the appropriate interfaces thoroughly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/2163fe58/attachment.html>

Jakob Stoklund Olesen

2011-Oct-19 00:47 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote:
> Treat glued-together blocks as super-blocks, and everything should be as
breezy as IR.
> 
> But that's just the thing -- a primary goal of this pass would be to
*change* the fall-through pattern.
That's not a problem. I wasn't talking about normal fall-through blocks.
You simply call MBB->updateTerminator() for those.
> Also, it's still not clear to me how to analyze switches in CodeGen,
but that's likely my lack of having read the appropriate interfaces
thoroughly.
Switches aren't real, so they don't exist in CodeGen.

Parts of switches can be lowered to jump tables and indirect branches.

An indirect branch will cause AnalyzeBranch() to fail, but canFallThrough() will
still return false, and it is safe to move the successors around. This also
works for computed goto.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/e0dbf9b4/attachment.html>

Jakob Stoklund Olesen

2011-Oct-19 01:58 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote:
>> As for why it should be an IR pass, mostly because once the selection
dag runs through the code, we can never recover all of the freedom we have at
the IR level. To start with, splicing MBBs around requires known about the
terminators (which we only some of the time do), and it requires re-writing them
a touch to account for the different fall-through pattern. To make matters
worse, at this point we don't have the nicely analyzable 'switch'
terminator (I think), and so the existing MBB placement code just bails on
non-branch-exit blocks.
> 
> Those are all the wrong reasons for not doing the right thing.
> 
> Sorry, I'm not trying to do the wrong thing because of this...
Currently, it feels like a trade-off in terms of cost/benefit. It's not yet
clear to me that the benefit of doing this analysis in the CodeGen layer
outweighs the cost and I was trying to clarify what the costs I perceive are.
I think it's mostly about understanding how MBBs work.

Ignoring calls and returns, most machines have three kinds of branches:

1. Unconditional
2. Conditional
3. Indirect.

The AnalyzeBranch() function understands the first two kinds, so if that
function returns false (as in it's false that it didn't succeed) you can
move the successors around, and you know that placing a successor immediately
after the block and calling updateTerminator() will give you a fall-through.

If AnalyzeBranch() fails, you can still check if the last instruction in the
block is an unpredicated barrier. If so, it is still safe to move the successors
around, but that block will never be a fall-through. The canFallThrough()
function implements this check.

If the last instruction in the block is predicated or not a barrier, you must
keep it together with its layout successor. This should only happen in rare
cases where it is necessary. For example, I am planning to lower invoke
instructions into call instructions that are terminators. This is necessary to
accurately model control flow to landing pads. Such a call instruction must fall
through to its layout successor.

Some experimental targets don't implement AnalyzeBranch, so everything looks
like an indirect branch. Those targets get the code placement they deserve.

I am not claiming the API is awesome, but the information you need is there, and
you have the same freedom as for IR.

We explicitly designed the branch weights so switch lowering could annotate all
the new branches with exact weights. It would be a shame to ignore that
information.

So the benefits are:

- Profile-driven fall-through layout of lowered switches. That should be a
pretty big deal.
- Proper placement of split critical edges.
- The ability to implement stuff like: "Don't put too many branches in
a fetch group, or you'll freak out the branch predictor".

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/4bcbd10e/attachment.html>

Chandler Carruth

2011-Oct-19 10:24 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 6:58 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
> On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote:
>
> As for why it should be an IR pass, mostly because once the selection dag
>> runs through the code, we can never recover all of the freedom we have
at
>> the IR level. To start with, splicing MBBs around requires known about
the
>> terminators (which we only some of the time do), and it requires
re-writing
>> them a touch to account for the different fall-through pattern. To make
>> matters worse, at this point we don't have the nicely analyzable
'switch'
>> terminator (I think), and so the existing MBB placement code just bails
on
>> non-branch-exit blocks.
>>
>>
>> Those are all the wrong reasons for not doing the right thing.
>>
>
> Sorry, I'm not trying to do the wrong thing because of this...
Currently,
> it feels like a trade-off in terms of cost/benefit. It's not yet clear
to me
> that the benefit of doing this analysis in the CodeGen layer outweighs the
> cost and I was trying to clarify what the costs I perceive are.
>
>
> I think it's mostly about understanding how MBBs work.
>
Indeed, that seems to be the case. =D Thanks for explaining things below, it
helped me a lot.

> Ignoring calls and returns, most machines have three kinds of branches:
>
> 1. Unconditional
> 2. Conditional
> 3. Indirect.
>
> The AnalyzeBranch() function understands the first two kinds, so if that
> function returns false (as in it's false that it didn't succeed)
you can
> move the successors around, and you know that placing a successor
> immediately after the block and calling updateTerminator() will give you a
> fall-through.
>
> If AnalyzeBranch() fails, you can still check if the last instruction in
> the block is an unpredicated barrier. If so, it is still safe to move the
> successors around, but that block will never be a fall-through. The
> canFallThrough() function implements this check.
>
> If the last instruction in the block is predicated or not a barrier, you
> must keep it together with its layout successor. This should only happen in
> rare cases where it is necessary. For example, I am planning to lower
invoke
> instructions into call instructions that are terminators. This is necessary
> to accurately model control flow to landing pads. Such a call instruction
> must fall through to its layout successor.
>
> Some experimental targets don't implement AnalyzeBranch, so everything
> looks like an indirect branch. Those targets get the code placement they
> deserve.
>
> I am not claiming the API is awesome, but the information you need is
> there, and you have the same freedom as for IR.
>
> We explicitly designed the branch weights so switch lowering could annotate
> all the new branches with exact weights. It would be a shame to ignore that
> information.
>
> So the benefits are:
>
> - Profile-driven fall-through layout of lowered switches. That should be a
> pretty big deal.
> - Proper placement of split critical edges.
> - The ability to implement stuff like: "Don't put too many
branches in a
> fetch group, or you'll freak out the branch predictor".
>
These all seem like really good reasons, and your explanation helps me a
lot. I'll take a stab at re-implementing this on MBBs. In the mean time,
I've attached my patch with the IR-level patch as most of the interesting
logic will remain the same. Please be gentle, it's my first proper LLVM
pass, and my knowledge of optimization pass research & papers is limited.
The part I'm least pleased with is the computation of weight for each chain
in an SCC of chains, but so far I've not come up with anything better. =]
I've tested this on several ad-hoc test cases, but nothing really thorough.
Going to try moving it to the codegen layer first.


One question that remains, mostly to ensure I've understood you correctly:
For switches, it is represented as an N-ary conditional terminator, and the
targets of the switch can be freely intermingled with other MBBs?

I'll attach an updated patch to work on MBBs when I have one...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111019/bbb9144a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: block_placement_ir.patch
Type: text/x-patch
Size: 24170 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111019/bbb9144a/attachment.bin>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2011 - [LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

Maybe Matching Threads