Philip Reames via llvm-dev
2016-Feb-24 01:19 UTC
[llvm-dev] Oddity w/MachineBlockPlacement and Loops
I'm getting some odd behavior out of MBP and was hoping someone knowledge of the code might be able to give some guidance. Fair warning, I'm trying to describe a problem in code I don't really understand, so if something doesn't make sense, assume I misunderstood something. The problematic case I'm seeing is that cold blocks are being placed between the preheader and header of a hot loop. This has the result of adding a bunch of cold code spread through out the code rather than grouped all together at the end of the function. From what I can tell tracing through the code, the critical decision that goes wrong is when we're visiting the preheader after forming a (correct) chain for the loop itself. When selecting a successor to merge with, we appear to not be considering the loop even though the loop hasn't been rotated and the header would be ideal for fallthrough. In particular, we're printing the "(prob) (non-cold CFG conflict)" debug output message for the successor of the preheader which is the header. If I'm reading this code correctly, it's identifying the fact there's a global more important predecessor for the header (i.e. the latch block), but it doesn't seem to be account for the fact that the latch block has already been combined into the header's chain. Unless I'm misreading something, we *should* be able to merge the loop chain with the preheader chain in it's entirety right? At least one on example I've looked at, adding an early exit from BadCFGConflict loop when Pred and Succ are part of the same chain does appear to give the expected result, but I don't understand the code well enough to reason about whether that is generally a correct thing to do or not. Philip
Xinliang David Li via llvm-dev
2016-Feb-24 05:48 UTC
[llvm-dev] Oddity w/MachineBlockPlacement and Loops
Hi Phillip, http://reviews.llvm.org/D10825 tries to fix similar issues. Looks like there are missing cases. Can you create a small reproducer? Cong has also improved loop rotation based on better cost model -- it is not yet enabled (-mllvm -precise-rotation-cost=true to turn on). If possible, can you also give it a try? thanks, David On Tue, Feb 23, 2016 at 5:19 PM, Philip Reames via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I'm getting some odd behavior out of MBP and was hoping someone knowledge > of the code might be able to give some guidance. Fair warning, I'm trying > to describe a problem in code I don't really understand, so if something > doesn't make sense, assume I misunderstood something. > > The problematic case I'm seeing is that cold blocks are being placed > between the preheader and header of a hot loop. This has the result of > adding a bunch of cold code spread through out the code rather than grouped > all together at the end of the function. > > From what I can tell tracing through the code, the critical decision that > goes wrong is when we're visiting the preheader after forming a (correct) > chain for the loop itself. When selecting a successor to merge with, we > appear to not be considering the loop even though the loop hasn't been > rotated and the header would be ideal for fallthrough. > > In particular, we're printing the "(prob) (non-cold CFG conflict)" debug > output message for the successor of the preheader which is the header. If > I'm reading this code correctly, it's identifying the fact there's a global > more important predecessor for the header (i.e. the latch block), but it > doesn't seem to be account for the fact that the latch block has already > been combined into the header's chain. Unless I'm misreading something, we > *should* be able to merge the loop chain with the preheader chain in it's > entirety right? > > At least one on example I've looked at, adding an early exit from > BadCFGConflict loop when Pred and Succ are part of the same chain does > appear to give the expected result, but I don't understand the code well > enough to reason about whether that is generally a correct thing to do or > not. > > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160223/4ad06988/attachment.html>
John Criswell via llvm-dev
2016-Feb-24 15:47 UTC
[llvm-dev] Oddity w/MachineBlockPlacement and Loops
Dear Philip, On a related note, Rahman Lavaee has been working on optimizing code layout with LLVM. His approach uses a trace of the program, so it may not be applicable to what you're doing, but I thought I should let you know since it works on a similar problem. I believe his code is at https://github.com/rlavaee/code-layout-optimizer. Rahman, does this code do the function-level code layout, the basic block layout optimization, or both? Regards, John Criswell On 2/23/16 8:19 PM, Philip Reames via llvm-dev wrote:> I'm getting some odd behavior out of MBP and was hoping someone > knowledge of the code might be able to give some guidance. Fair > warning, I'm trying to describe a problem in code I don't really > understand, so if something doesn't make sense, assume I misunderstood > something. > > The problematic case I'm seeing is that cold blocks are being placed > between the preheader and header of a hot loop. This has the result > of adding a bunch of cold code spread through out the code rather than > grouped all together at the end of the function. > > From what I can tell tracing through the code, the critical decision > that goes wrong is when we're visiting the preheader after forming a > (correct) chain for the loop itself. When selecting a successor to > merge with, we appear to not be considering the loop even though the > loop hasn't been rotated and the header would be ideal for fallthrough. > > In particular, we're printing the "(prob) (non-cold CFG conflict)" > debug output message for the successor of the preheader which is the > header. If I'm reading this code correctly, it's identifying the fact > there's a global more important predecessor for the header (i.e. the > latch block), but it doesn't seem to be account for the fact that the > latch block has already been combined into the header's chain. Unless > I'm misreading something, we *should* be able to merge the loop chain > with the preheader chain in it's entirety right? > > At least one on example I've looked at, adding an early exit from > BadCFGConflict loop when Pred and Succ are part of the same chain does > appear to give the expected result, but I don't understand the code > well enough to reason about whether that is generally a correct thing > to do or not. > > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell
Philip Reames via llvm-dev
2016-Feb-24 17:21 UTC
[llvm-dev] Oddity w/MachineBlockPlacement and Loops
On 02/23/2016 09:48 PM, Xinliang David Li wrote:> Hi Phillip, > http://reviews.llvm.org/D10825 tries to fix similar issues. Looks like > there are missing cases. Can you create a small reproducer?This doesn't quite look the same. That review is about continuing the loop chain itself into the most profitable successor. My case is about continuing the preheader's chain into the (previously formed) loop chain.> > Cong has also improved loop rotation based on better cost model -- it > is not yet enabled (-mllvm -precise-rotation-cost=true to turn on). If > possible, can you also give it a try?Loop rotation is not the problem here. The layout of the loop (header->latch->header cycle) is exactly what it should be.> > thanks, > > David > > On Tue, Feb 23, 2016 at 5:19 PM, Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > I'm getting some odd behavior out of MBP and was hoping someone > knowledge of the code might be able to give some guidance. Fair > warning, I'm trying to describe a problem in code I don't really > understand, so if something doesn't make sense, assume I > misunderstood something. > > The problematic case I'm seeing is that cold blocks are being > placed between the preheader and header of a hot loop. This has > the result of adding a bunch of cold code spread through out the > code rather than grouped all together at the end of the function. > > From what I can tell tracing through the code, the critical > decision that goes wrong is when we're visiting the preheader > after forming a (correct) chain for the loop itself. When > selecting a successor to merge with, we appear to not be > considering the loop even though the loop hasn't been rotated and > the header would be ideal for fallthrough. > > In particular, we're printing the "(prob) (non-cold CFG conflict)" > debug output message for the successor of the preheader which is > the header. If I'm reading this code correctly, it's identifying > the fact there's a global more important predecessor for the > header (i.e. the latch block), but it doesn't seem to be account > for the fact that the latch block has already been combined into > the header's chain. Unless I'm misreading something, we *should* > be able to merge the loop chain with the preheader chain in it's > entirety right? > > At least one on example I've looked at, adding an early exit from > BadCFGConflict loop when Pred and Succ are part of the same chain > does appear to give the expected result, but I don't understand > the code well enough to reason about whether that is generally a > correct thing to do or not. > > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/7c47c8b5/attachment.html>
Philip Reames via llvm-dev
2016-Mar-02 23:10 UTC
[llvm-dev] Oddity w/MachineBlockPlacement and Loops
I've posted a patch which addresses this: http://reviews.llvm.org/D17830 On 02/23/2016 05:19 PM, Philip Reames wrote:> I'm getting some odd behavior out of MBP and was hoping someone > knowledge of the code might be able to give some guidance. Fair > warning, I'm trying to describe a problem in code I don't really > understand, so if something doesn't make sense, assume I misunderstood > something. > > The problematic case I'm seeing is that cold blocks are being placed > between the preheader and header of a hot loop. This has the result > of adding a bunch of cold code spread through out the code rather than > grouped all together at the end of the function. > > From what I can tell tracing through the code, the critical decision > that goes wrong is when we're visiting the preheader after forming a > (correct) chain for the loop itself. When selecting a successor to > merge with, we appear to not be considering the loop even though the > loop hasn't been rotated and the header would be ideal for fallthrough. > > In particular, we're printing the "(prob) (non-cold CFG conflict)" > debug output message for the successor of the preheader which is the > header. If I'm reading this code correctly, it's identifying the fact > there's a global more important predecessor for the header (i.e. the > latch block), but it doesn't seem to be account for the fact that the > latch block has already been combined into the header's chain. Unless > I'm misreading something, we *should* be able to merge the loop chain > with the preheader chain in it's entirety right? > > At least one on example I've looked at, adding an early exit from > BadCFGConflict loop when Pred and Succ are part of the same chain does > appear to give the expected result, but I don't understand the code > well enough to reason about whether that is generally a correct thing > to do or not. > > Philip >
Possibly Parallel Threads
- [LLVMdev] Separating loop nests based on profile information?
- [LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one
- [LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one
- [LLVMdev] Separating loop nests based on profile information?
- LoopSimplify pass prevents loop unrolling