thr3ads.net - llvm dev - [LLVMdev] Question regarding basic-block placement optimization [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-Oct-19 14:56 UTC

[LLVMdev] Question regarding basic-block placement optimization

On Oct 19, 2011, at 5:50 AM, Chandler Carruth wrote:
> Ok, wow that wasn't hard at all.
Awesome ;-)
> This is still *very* much a rough draft, but it's probably better to
review that the previous patch. One big caveat, I know I have an iteration bug
in here somewhere that is inf-looping. Just ran out of steam debugging it, will
pick it back up again later today to shake it out.
Maybe splicing a block into it current position will create a loop?

Some random notes:

- Please add a description of the algorithm.
- Please add a comment to the BlockChain class.
- Use a separate anonymous namespace per class, and don't indent for the
namespace.

+BlockChain *BlockPlacement2::CreateChain(MachineBasicBlock *BB) {
+  Chains.push_back(BlockChain(BlockToChain, BB));
+  BlockToChain[BB] = &Chains.back();
+  assert(ActiveChains.insert(&Chains.back()));
+  return &Chains.back();
+}

Whoa, you are storing pointers into a growing vector. You should at least
assert(Chains.size() != Chains.capacity()) before pushing.

+      RequiredChain->BBEnd = ++MachineFunction::iterator(From);

Does C++ allow this these days? I would prefer
llvm::next(MachineFunction::iterator(From))

Note that MachineFunction::iterator decays into an MBB pointer, so you can say
FI->canFallThrough() and AnalyzeBranch(*FI...)

+      WeightedEdge WE = { BaseFrequency * MBPI->getEdgeProbability(From,
To),

Did we add API for this computation? We intended to add a function that computes
this and saturates on overflow etc.
> Still, for the test cases that don't tickle the iteration bug, it
generates beautiful, well laid out code. =]
I am sure others have strong opinions about the algorithm ;-)

I would like to see a more explicit handling of loop layout. There is a tradeoff
between entering the loop with a branch, and having the exit block last. It is
not clear to me that this algorithm handles that properly.

Be careful about placing too much trust in the behavior of branch probabilities.
They go out of whack when they saturate.

Second, you should handle blocks that can never fall through (indirect branches
from jump tables). There is no need to try to place their successors.

/jakob

Evan Cheng

2011-Oct-20 01:15 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 19, 2011, at 7:56 AM, Jakob Stoklund Olesen wrote:
> 
> On Oct 19, 2011, at 5:50 AM, Chandler Carruth wrote:
> 
>> Ok, wow that wasn't hard at all.
> 
> Awesome ;-)
> 
>> This is still *very* much a rough draft, but it's probably better
to review that the previous patch. One big caveat, I know I have an iteration
bug in here somewhere that is inf-looping. Just ran out of steam debugging it,
will pick it back up again later today to shake it out.
> 
> Maybe splicing a block into it current position will create a loop?
> 
> Some random notes:
> 
> - Please add a description of the algorithm.
> - Please add a comment to the BlockChain class.
> - Use a separate anonymous namespace per class, and don't indent for
the namespace.
> 
> +BlockChain *BlockPlacement2::CreateChain(MachineBasicBlock *BB) {
> +  Chains.push_back(BlockChain(BlockToChain, BB));
> +  BlockToChain[BB] = &Chains.back();
> +  assert(ActiveChains.insert(&Chains.back()));
> +  return &Chains.back();
> +}
> 
> Whoa, you are storing pointers into a growing vector. You should at least
assert(Chains.size() != Chains.capacity()) before pushing.
> 
> +      RequiredChain->BBEnd = ++MachineFunction::iterator(From);
> 
> Does C++ allow this these days? I would prefer
llvm::next(MachineFunction::iterator(From))
> 
> Note that MachineFunction::iterator decays into an MBB pointer, so you can
say FI->canFallThrough() and AnalyzeBranch(*FI...)
> 
> +      WeightedEdge WE = { BaseFrequency *
MBPI->getEdgeProbability(From, To),
> 
> Did we add API for this computation? We intended to add a function that
computes this and saturates on overflow etc.
> 
>> Still, for the test cases that don't tickle the iteration bug, it
generates beautiful, well laid out code. =]
> 
> I am sure others have strong opinions about the algorithm ;-)
> 
> I would like to see a more explicit handling of loop layout. There is a
tradeoff between entering the loop with a branch, and having the exit block
last. It is not clear to me that this algorithm handles that properly.
I agree. The current CodePlacementOpt pass handles blocks layout within a loop.
The new BB placement optimization should be integrated with it.

Evan
> 
> Be careful about placing too much trust in the behavior of branch
probabilities. They go out of whack when they saturate.
> 
> Second, you should handle blocks that can never fall through (indirect
branches from jump tables). There is no need to try to place their successors.
> 
> /jakob
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Andrew Trick

2011-Oct-20 01:48 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

On Oct 19, 2011, at 7:56 AM, Jakob Stoklund Olesen
wrote:>> This is still *very* much a rough draft, but it's probably better
to review that the previous patch. One big caveat, I know I have an iteration
bug in here somewhere that is inf-looping. Just ran out of steam debugging it,
will pick it back up again later today to shake it out.
> 
> Some random notes:
> 
> - Please add a description of the algorithm.
> - Please add a comment to the BlockChain class.
> - Use a separate anonymous namespace per class, and don't indent for
the namespace.
> 
...> Note that MachineFunction::iterator decays into an MBB pointer, so you can
say FI->canFallThrough() and AnalyzeBranch(*FI...)
> 
> +      WeightedEdge WE = { BaseFrequency *
MBPI->getEdgeProbability(From, To),
> 
> Did we add API for this computation? We intended to add a function that
computes this and saturates on overflow etc.
Overflow is handled transparently in the overloaded
BlockFrequency::operator*(BranchProbability). But you could use the existing API
anyway by adding a helper to MachineBlockFrequencyInfo calling
BlockFrequencyImpl::getEdgeFreq(Src,Dst).
>> Still, for the test cases that don't tickle the iteration bug, it
generates beautiful, well laid out code. =]
> 
> I am sure others have strong opinions about the algorithm ;-)
I may be one of those others ;-). Although handling SCCs (loops) probably saves
Chandler's design from going too far off the rails. I don't remember
that being in the published algorithm.
> I would like to see a more explicit handling of loop layout. There is a
tradeoff between entering the loop with a branch, and having the exit block
last. It is not clear to me that this algorithm handles that properly.
How do you ensure that BlockChains start at loop headers?

How do you ensure that loop exits are laid out following the loop body,
especially in the presence of critical edges?

Why not layout blocks according to MachineLoopInfo? The SCC abstraction seems
overly generic and unnecessary.

When you merge chains, why don't you delete the edge between the chains?

Why do you run profile guided block layout after the existing CodePlacementOpt?
Shouldn't it be the other way around so that CodePlacementOpt can cleanup
loops, which BlockPlacement2 doesn't handle well? I think the answer is that
BlockPlacement2 actually depends on loops being laid out sensibly before
running, but that needs to be explained.
> 
> Be careful about placing too much trust in the behavior of branch
probabilities. They go out of whack when they saturate.
> 
> Second, you should handle blocks that can never fall through (indirect
branches from jump tables). There is no need to try to place their successors.
Please put this under some flag for now. Enabling it by default opens up a whole
set of issues that are premature to discuss. Until the overall code layout
strategy is more coherent, people will only want to enable profile guided layout
when they have complete profile info, or really need to take full advantage of
builtin expect. If you're not one of those people, you'll want to leave
this disabled to avoid unnecessary misery when debugging generated disassembly,
and reduce the number of places that codegen makes semi-random decisions that
perturb the code.

Otherwise, the code looks nice! Good comments.

-Andy

Chandler Carruth

2011-Oct-20 16:56 UTC

head link

[LLVMdev] Question regarding basic-block placement optimization

Thanks for all of the comments Andy and Jakob. A new patch is attached that
is *much* less of a rough draft. Sorry for any confusion due to the early
state of the patch.

Also, many many thanks to Jakob for his explanation of how the branches
between MBBs should be handled, and Andy, Jim, and Eric who helped answer my
questions on IRC when I ran into stumbling blocks. Also, Nick, who shoulder
surfed and helped with a lot of the tricky debugging. The previous patch had
some... very annoying to track down bugs. =]

Still, I never intended this to be on-by-default at first. This is a
starting point, that I hope can be improved into something that is
on-by-default eventually, but I'll be the first to say it's almost
certainly
not ready for that just yet.

I'm more hopeful that we can delete the existing block placement pass, and
direct anyone interested in profile file guided stuff to write a simple pass
to load profiles into metadata. I suspect this pass is already superior to
that one.

Responses to some comments inline:

On Wed, Oct 19, 2011 at 6:48 PM, Andrew Trick <atrick at apple.com> wrote:
> On Oct 19, 2011, at 7:56 AM, Jakob Stoklund Olesen wrote:
> >> This is still *very* much a rough draft, but it's probably
better to
> review that the previous patch. One big caveat, I know I have an iteration
> bug in here somewhere that is inf-looping. Just ran out of steam debugging
> it, will pick it back up again later today to shake it out.
> >
> > Some random notes:
> >
> > - Please add a description of the algorithm.
>
There is more described now in various comments, but I'm going to work on an
overview in the file comments. Not done yet, but wanted to get it back out
for review in a more polished state.

> > - Please add a comment to the BlockChain class.
>
Done.

> > - Use a separate anonymous namespace per class, and don't indent
for the
> namespace.
>
Done.

> >
> ...
> > Note that MachineFunction::iterator decays into an MBB pointer, so you
> can say FI->canFallThrough() and AnalyzeBranch(*FI...)
>
I'm aware, but there were a few weird places where it didn't happen, and
'From' seemed like a more readable name anyways... Lemme know if
you'd
really rather I use the iterator everywhere.

> >
> > +      WeightedEdge WE = { BaseFrequency *
MBPI->getEdgeProbability(From,
> To),
> >
> > Did we add API for this computation? We intended to add a function
that
> computes this and saturates on overflow etc.
>
> Overflow is handled transparently in the overloaded
> BlockFrequency::operator*(BranchProbability). But you could use the
existing
> API anyway by adding a helper to MachineBlockFrequencyInfo calling
> BlockFrequencyImpl::getEdgeFreq(Src,Dst).
>
Yea, this should be handled by the BlockFrequency object correctly here, but
if you'd rather see the helper I can add that. Was just keeping this patch
more focused on the pass.

> >> Still, for the test cases that don't tickle the iteration bug,
it
> generates beautiful, well laid out code. =]
> >
> > I am sure others have strong opinions about the algorithm ;-)
>
> I may be one of those others ;-). Although handling SCCs (loops) probably
> saves Chandler's design from going too far off the rails. I don't
remember
> that being in the published algorithm.
>
It definitely isn't. The published algorithm is very hand-wave-y about
exactly how you select between chains when there isn't an obvious ordering.

> > I would like to see a more explicit handling of loop layout. There is
a
> tradeoff between entering the loop with a branch, and having the exit block
> last. It is not clear to me that this algorithm handles that properly.
>
Keep in mind that the SCCs are across block *chains*. Most of the loops I
saw formed a single chain with back edges, and so would be laid out exactly
as they came into this pass, modulo fallthrough branches that for some
reason we have probability information that says aren't likely to be taken.
That said, I haven't done a thorough investigation of this... See below...

>
> How do you ensure that BlockChains start at loop headers?
>
> How do you ensure that loop exits are laid out following the loop body,
> especially in the presence of critical edges?
>
These are really interesting questions. I'm not sure how to answer them. Is
this something we could continue to iterate on? Maybe you could point me at
how to dig into these issues?

> Why not layout blocks according to MachineLoopInfo? The SCC abstraction
> seems overly generic and unnecessary.
>
I'm not sure -- the SCCs here are often at a much coarser granualarity than
those in the MBB graph. If anything, it might be good to use MachineLoopInfo
to force loops into a single pre-laid-out chain; but I haven't looked at
that pass yet. I'm interested in adding this kind of special and careful
logic to the pass though, and making it much more loop-structure-aware.

> When you merge chains, why don't you delete the edge between the
chains?
>
No good reason. The edges were a wart on the entire design. They're gone
now, and we just use the BB -> successor list -> block-to-chain mapping
sequence to deduce edges when needed.

Why do you run profile guided block layout after the
existing> CodePlacementOpt? Shouldn't it be the other way around so that
> CodePlacementOpt can cleanup loops, which BlockPlacement2 doesn't
handle
> well?

Yep, I just slapped it in there for testing. I've put it immediately before
the CodePlacementOpt pass, but I know very little about the other passes.
Let me know if there is a better home for it.

> I think the answer is that BlockPlacement2 actually depends on loops being
> laid out sensibly before running, but that needs to be explained.
>
Essentially, yes. How should we document it? Does that change if we teach it
to explicitly use some of the loop analysis passes?
> Be careful about placing too much trust in the behavior of branch
> probabilities. They go out of whack when they saturate.
> >
> > Second, you should handle blocks that can never fall through (indirect
> branches from jump tables). There is no need to try to place their
> successors.
>
> Please put this under some flag for now. Enabling it by default opens up a
> whole set of issues that are premature to discuss. Until the overall code
> layout strategy is more coherent, people will only want to enable profile
> guided layout when they have complete profile info, or really need to take
> full advantage of builtin expect. If you're not one of those people,
you'll
> want to leave this disabled to avoid unnecessary misery when debugging
> generated disassembly, and reduce the number of places that codegen makes
> semi-random decisions that perturb the code.
>
Couldn't agree more. The goal is to get something started.

> Otherwise, the code looks nice! Good comments.

Thanks! Please let me know both what needs to be done to get this first
version checked in, and what the important next steps are.

On my end, I'm working on getting some benchmark numbers next.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111020/8501a41a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: block_placement_codegen.patch
Type: application/octet-stream
Size: 29933 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111020/8501a41a/attachment.obj>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Oct 2011 - [LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

[LLVMdev] Question regarding basic-block placement optimization

Apparently Analagous Threads