Chandler Carruth
2011-Oct-18 22:07 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at apple.com> wrote:> On Oct 18, 2011, at 2:53 AM, Chandler Carruth wrote: > > > Hello, > > > > I'm working on basic-block placement optimizations based on branch > probability information. I've run into a stumbling block though. One of the > existing passes to do this, essentially a dead pass 'block-placement' > operates on the IR, reordering the blocks of the function, and relying on > the code generator to preserve that order unless it has a reason to > rearrange it. This pass is entirely untested AFAICT, and it doesn't work any > more... > > > > That said, I've replicated its functionality with a much updated > transformation algorithm (no longer O(N^2), and finds better orderings!) and > predicated it on the new frequency and probability information. It "works" > great, but it doesn't actually optimize anything. It re-arranges all the > blocks of the IR function, laying them out perfectly. And then the code > generation layer throws that away and seems to establish its own ordering > altogether. > > > > What's the best approach here? Is ignoring the order of blocks in the IR > function desired behavior? (It does seem reasonable, it just seems to have > not always been that way, so I wondered.) > > > > Should I sink my pass down to a codegen pass? > > I think this should really live as a CodeGen pass. Is there any good reason > to make it an IR pass?So, as it happens, I was *completely* wrong here. CodeGen correctly preserves the ordering of blocks from IR, *unless* it can do folding, etc. My original test cases ended up being folded into other structures, hiding the fact that the ordering wasn't what was changing. I've run much more real-world style test cases through the pass, and it is working beautifully. I'll be attaching a patch shortly, I'm currently cleaning it up based on some over-the-shoulder feedback from Nick Lewycky. As for why it should be an IR pass, mostly because once the selection dag runs through the code, we can never recover all of the freedom we have at the IR level. To start with, splicing MBBs around requires known about the terminators (which we only some of the time do), and it requires re-writing them a touch to account for the different fall-through pattern. To make matters worse, at this point we don't have the nicely analyzable 'switch' terminator (I think), and so the existing MBB placement code just bails on non-branch-exit blocks. Operating at the IR level makes writing the pass a breeze, and we can quickly and efficiently simply lay the blocks out in the ideal ordering. Then the SelectionDAG and other layers can preserve this modulo folding and other optimization opportunities. At the most, we should likely add probability awareness to some of these CodeGen passes to turn off (or on) their transforms around unlikely (or likely) edges. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/fb7f9bf7/attachment.html>
Cameron Zwarich
2011-Oct-18 23:18 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:> As for why it should be an IR pass, mostly because once the selection dag runs through the code, we can never recover all of the freedom we have at the IR level. To start with, splicing MBBs around requires known about the terminators (which we only some of the time do), and it requires re-writing them a touch to account for the different fall-through pattern. To make matters worse, at this point we don't have the nicely analyzable 'switch' terminator (I think), and so the existing MBB placement code just bails on non-branch-exit blocks.We should be able to analyze more terminators in the backend. Most other compilers have this ability. If we don't fix this problem now, we will only increase the distance between the functionality of the backend IR and the mid-level IR.> Operating at the IR level makes writing the pass a breeze, and we can quickly and efficiently simply lay the blocks out in the ideal ordering. Then the SelectionDAG and other layers can preserve this modulo folding and other optimization opportunities.The biggest problem with doing block layout at the IR level is that you don't have the final CFG. Lots of passes can modify the CFG, and they will have to rely on the sorts of local heuristics that already exist (and are known to perform poorly in some cases) to redo block layout after these changes. You also can't represent jump fall-throughs in the mid-level IR (nor should you be able to). Cameron
Jakob Stoklund Olesen
2011-Oct-18 23:31 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote:> On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at apple.com> wrote: > I think this should really live as a CodeGen pass. Is there any good reason to make it an IR pass? > > So, as it happens, I was *completely* wrong here. CodeGen correctly preserves the ordering of blocks from IR, *unless* it can do folding, etc.That's right. However, the CFG changes quite a bit during CodeGen. Switches can be lowered into branch trees, multiple passes can split critical edges, and then there is taildup and tailmerge. An IR code layout algorithm simply doesn't know the final CFG.> As for why it should be an IR pass, mostly because once the selection dag runs through the code, we can never recover all of the freedom we have at the IR level. To start with, splicing MBBs around requires known about the terminators (which we only some of the time do), and it requires re-writing them a touch to account for the different fall-through pattern. To make matters worse, at this point we don't have the nicely analyzable 'switch' terminator (I think), and so the existing MBB placement code just bails on non-branch-exit blocks.Those are all the wrong reasons for not doing the right thing. Some basic blocks are glued together and must be placed next to each other. That situation can be recognized by "MBB->canFallThrough() && TII->AnalyzeBranch(MBB..)". Treat glued-together blocks as super-blocks, and everything should be as breezy as IR. I realize the MBB interface is not the prettiest. Suggestions for cleaning it up are very welcome. /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/f9832789/attachment.html>
Chandler Carruth
2011-Oct-19 00:22 UTC
[LLVMdev] Question regarding basic-block placement optimization
On Tue, Oct 18, 2011 at 4:31 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:> > On Oct 18, 2011, at 3:07 PM, Chandler Carruth wrote: > > On Tue, Oct 18, 2011 at 2:59 PM, Cameron Zwarich <zwarich at apple.com>wrote: > >> I think this should really live as a CodeGen pass. Is there any good >> reason to make it an IR pass? >> > > So, as it happens, I was *completely* wrong here. CodeGen correctly > preserves the ordering of blocks from IR, *unless* it can do folding, etc. > > > That's right. However, the CFG changes quite a bit during CodeGen. > > Switches can be lowered into branch trees, multiple passes can split > critical edges, and then there is taildup and tailmerge. > > An IR code layout algorithm simply doesn't know the final CFG. >To be clear, I don't disagree with any of this. =] However, my rough experiments thus far (hoping to have real benchmark data soon) seem to indicate that just giving a baseline ordering of block to the CodeGen layer> > As for why it should be an IR pass, mostly because once the selection dag > runs through the code, we can never recover all of the freedom we have at > the IR level. To start with, splicing MBBs around requires known about the > terminators (which we only some of the time do), and it requires re-writing > them a touch to account for the different fall-through pattern. To make > matters worse, at this point we don't have the nicely analyzable 'switch' > terminator (I think), and so the existing MBB placement code just bails on > non-branch-exit blocks. > > > Those are all the wrong reasons for not doing the right thing. >Sorry, I'm not trying to do the wrong thing because of this... Currently, it feels like a trade-off in terms of cost/benefit. It's not yet clear to me that the benefit of doing this analysis in the CodeGen layer outweighs the cost and I was trying to clarify what the costs I perceive are. Some basic blocks are glued together and must be placed next to each other.> That situation can be recognized by "MBB->canFallThrough() && > TII->AnalyzeBranch(MBB..)". > > Treat glued-together blocks as super-blocks, and everything should be as > breezy as IR. >But that's just the thing -- a primary goal of this pass would be to *change* the fall-through pattern. Currently, that can be done very easily, although to a limited extent, by changing the IR which enters the selection dag. Maybe what we need is to have this pass at both layers? Then the codegen layer can work on the glued-together blocks to check for (and correct) any inappropriate CFG changes made in the intervening passes? Also, it's still not clear to me how to analyze switches in CodeGen, but that's likely my lack of having read the appropriate interfaces thoroughly. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/2163fe58/attachment.html>
Reasonably Related Threads
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] Question regarding basic-block placement optimization
- [LLVMdev] [LLVMDev] Phi elimination: Who does what