thr3ads.net - llvm dev - [LLVMdev] Question regarding basic-block placement optimization [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2011-Oct-18 09:53 UTC

[LLVMdev] Question regarding basic-block placement optimization

Hello,

I'm working on basic-block placement optimizations based on branch
probability information. I've run into a stumbling block though. One of the
existing passes to do this, essentially a dead pass 'block-placement'
operates on the IR, reordering the blocks of the function, and relying on
the code generator to preserve that order unless it has a reason to
rearrange it. This pass is entirely untested AFAICT, and it doesn't work any
more...

That said, I've replicated its functionality with a much updated
transformation algorithm (no longer O(N^2), and finds better orderings!) and
predicated it on the new frequency and probability information. It
"works"
great, but it doesn't actually optimize anything. It re-arranges all the
blocks of the IR function, laying them out perfectly. And then the code
generation layer throws that away and seems to establish its own ordering
altogether.

What's the best approach here? Is ignoring the order of blocks in the IR
function desired behavior? (It does seem reasonable, it just seems to have
not always been that way, so I wondered.)

Should I sink my pass down to a codegen pass?

If this really should live as a CodeGen pass, then what should become of the
existing code-placement pass in the CodeGen layer? Currently that pass does
two things:
1) It aligns loop bodies. This seems like a completely separate concern,
although likely a good one to do after any other placement optimizations.
2) It rearranges very specific loop structures using very specific
heuristics. Some of these seem redundant in the presence of probability
based information, but others seem orthogonal. Honestly, I don't understand
any of this very well, so it's likely I'm just misreading it.

I'm a bit wary of implementing this in the CodeGen layer as I worry that the
layout of the MBBs and their various branches will have already made
assumptions regarding the structure and shape of the CFG of the function...
But it would also perhaps give more stability to the resulting orderings,
improving the benefits of the pass.

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/6535f897/attachment.html>

Sergei Larin

2011-Oct-18 15:37 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

Chandler,

Can you please elaborate on the following statement? 
>>” I've replicated its functionality with a much updated
transformation algorithm (no longer O(N^2), and finds better orderings!)”

Is this a new algorithm or something you can refer to.  Thanks. 

Sergei




From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chandler Carruth
Sent: Tuesday, October 18, 2011 4:54 AM
To: LLVM Developers Mailing List
Subject: [LLVMdev] Question regarding basic-block placement optimization

Hello,

I'm working on basic-block placement optimizations based on branch
probability information. I've run into a stumbling block though. One of the
existing passes to do this, essentially a dead pass 'block-placement'
operates on the IR, reordering the blocks of the function, and relying on the
code generator to preserve that order unless it has a reason to rearrange it.
This pass is entirely untested AFAICT, and it doesn't work any more...

That said, I've replicated its functionality with a much updated
transformation algorithm (no longer O(N^2), and finds better orderings!) and
predicated it on the new frequency and probability information. It
"works" great, but it doesn't actually optimize anything. It
re-arranges all the blocks of the IR function, laying them out perfectly. And
then the code generation layer throws that away and seems to establish its own
ordering altogether.

What's the best approach here? Is ignoring the order of blocks in the IR
function desired behavior? (It does seem reasonable, it just seems to have not
always been that way, so I wondered.)

Should I sink my pass down to a codegen pass?

If this really should live as a CodeGen pass, then what should become of the
existing code-placement pass in the CodeGen layer? Currently that pass does two
things:
1) It aligns loop bodies. This seems like a completely separate concern,
although likely a good one to do after any other placement optimizations.
2) It rearranges very specific loop structures using very specific heuristics.
Some of these seem redundant in the presence of probability based information,
but others seem orthogonal. Honestly, I don't understand any of this very
well, so it's likely I'm just misreading it.

I'm a bit wary of implementing this in the CodeGen layer as I worry that the
layout of the MBBs and their various branches will have already made assumptions
regarding the structure and shape of the CFG of the function... But it would
also perhaps give more stability to the resulting orderings, improving the
benefits of the pass.

Thoughts?

Cameron Zwarich

2011-Oct-18 21:59 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 18, 2011, at 2:53 AM, Chandler Carruth wrote:
> Hello,
> 
> I'm working on basic-block placement optimizations based on branch
probability information. I've run into a stumbling block though. One of the
existing passes to do this, essentially a dead pass 'block-placement'
operates on the IR, reordering the blocks of the function, and relying on the
code generator to preserve that order unless it has a reason to rearrange it.
This pass is entirely untested AFAICT, and it doesn't work any more...
> 
> That said, I've replicated its functionality with a much updated
transformation algorithm (no longer O(N^2), and finds better orderings!) and
predicated it on the new frequency and probability information. It
"works" great, but it doesn't actually optimize anything. It
re-arranges all the blocks of the IR function, laying them out perfectly. And
then the code generation layer throws that away and seems to establish its own
ordering altogether.
> 
> What's the best approach here? Is ignoring the order of blocks in the
IR function desired behavior? (It does seem reasonable, it just seems to have
not always been that way, so I wondered.)
> 
> Should I sink my pass down to a codegen pass?
I think this should really live as a CodeGen pass. Is there any good reason to
make it an IR pass?

Cameron

Chandler Carruth

2011-Oct-18 22:07 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at apple.com>
wrote:
> On Oct 18, 2011, at 2:53 AM, Chandler Carruth wrote:
>
> > Hello,
> >
> > I'm working on basic-block placement optimizations based on branch
> probability information. I've run into a stumbling block though. One of
the
> existing passes to do this, essentially a dead pass
'block-placement'
> operates on the IR, reordering the blocks of the function, and relying on
> the code generator to preserve that order unless it has a reason to
> rearrange it. This pass is entirely untested AFAICT, and it doesn't
work any
> more...
> >
> > That said, I've replicated its functionality with a much updated
> transformation algorithm (no longer O(N^2), and finds better orderings!)
and
> predicated it on the new frequency and probability information. It
"works"
> great, but it doesn't actually optimize anything. It re-arranges all
the
> blocks of the IR function, laying them out perfectly. And then the code
> generation layer throws that away and seems to establish its own ordering
> altogether.
> >
> > What's the best approach here? Is ignoring the order of blocks in
the IR
> function desired behavior? (It does seem reasonable, it just seems to have
> not always been that way, so I wondered.)
> >
> > Should I sink my pass down to a codegen pass?
>
> I think this should really live as a CodeGen pass. Is there any good reason
> to make it an IR pass?

So, as it happens, I was *completely* wrong here. CodeGen correctly
preserves the ordering of blocks from IR, *unless* it can do folding, etc.
My original test cases ended up being folded into other structures, hiding
the fact that the ordering wasn't what was changing.

I've run much more real-world style test cases through the pass, and it is
working beautifully. I'll be attaching a patch shortly, I'm currently
cleaning it up based on some over-the-shoulder feedback from Nick Lewycky.

As for why it should be an IR pass, mostly because once the selection dag
runs through the code, we can never recover all of the freedom we have at
the IR level. To start with, splicing MBBs around requires known about the
terminators (which we only some of the time do), and it requires re-writing
them a touch to account for the different fall-through pattern. To make
matters worse, at this point we don't have the nicely analyzable
'switch'
terminator (I think), and so the existing MBB placement code just bails on
non-branch-exit blocks.

Operating at the IR level makes writing the pass a breeze, and we can
quickly and efficiently simply lay the blocks out in the ideal ordering.
Then the SelectionDAG and other layers can preserve this modulo folding and
other optimization opportunities.

At the most, we should likely add probability awareness to some of these
CodeGen passes to turn off (or on) their transforms around unlikely (or
likely) edges.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/fb7f9bf7/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Oct 2011 - [LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

Possibly Parallel Threads