On 09/25/2013 04:55 AM, Star Tan wrote:> Here is an update about moving Polly later.Hi star tan, thanks for your report.> > 1. Why does Polly generate incorrect code when we move Polly immediately after the loop rotating pass? > > It is mainly caused by a wrong polly merge block. When Polly detects a valid loop for Polyhedral transformations, it usually introduces a new basic block "polly.merge_new_and_old" after the original loop exit block. This new basic block contains a branch instruction that decides whether to exit or re-enter the loop like this: > > polly.loop_exit28: ; preds = %polly.stmt.for.body.i, %polly.loop_if25 > br label %polly.merge_new_and_old > polly.merge_new_and_old: ; preds = %polly.loop_exit28, %for.body.i > ... > %cmp = icmp slt i32 %2, 0 > ... > br i1 %cmp, label %for.body, label %for.end12 > for.end12: ; preds = %loop.exit > ... > ret i32 0 > > Unfortunately, the new basic block "polly.merge_new_and_old" is marked as unreachable after a serial of immediate Loop optimizations in the following pass order: > Polly - Create LLVM-IR from SCoPs > Loop Pass Manager > Canonicalize natural loops > Loop-Closed SSA Form Pass > Rotate Loops > Loop Invariant Code Motion > Loop-Closed SSA Form Pass > Unswitch loops > Combine redundant instructions > > After those loop optimition passes, the "Combine redundant instructions" would treat the basic block "merge_new_and_old" unreachable and transfer the block "polly.loop_exit28" into an infinity loop: > polly.loop_exit28: ; preds = %polly.stmt.for.body.i, %polly.loop_if25 > br label polly.loop_exit28 > > Actually, I have no idea why this happens, but experiments show that this problem can be addressed by adding a pass "MPM.add(createCFGSimplificationPass())" after Polly. It seems it is necessary to run this pass after Polly code generation.Any pass order should yield correct code. If this is not the case there is either a bug in Polly or we expose a bug in LLVM. We should track this down. Our path ordering decisions should not be driven by us trying to avoid triggering this bug. As you already created a bug report, we should work on fixing it. The next step would be to reduce the set of passes involved. You can do this by calling -debug-pass=Arguments> 2. Where should we move Polly to? > > There are many choices to move Polly later. Tobias suggests to move it immediately after the loop rotate pass (the first loop optimization pass), but unfortunately Polly would generate incorrect code in that case (http://llvm.org/bugs/show_bug.cgi?id=17323). > > By investigating those Polly canonicalization passes (polly/lib/RegisterPasses.cpp): > PM.add(llvm::createPromoteMemoryToRegisterPass()); > PM.add(llvm::createInstructionCombiningPass()); > PM.add(llvm::createCFGSimplificationPass()); > PM.add(llvm::createTailCallEliminationPass()); > PM.add(llvm::createCFGSimplificationPass()); > PM.add(llvm::createReassociatePass()); > PM.add(llvm::createLoopRotatePass()); > PM.add(llvm::createInstructionCombiningPass()); > > We can see that some of them are also called in common cases (lib/Transforms/IPO/PassManagerBuilder.cpp:181): > > MPM.add(createEarlyCSEPass()); > MPM.add(createJumpThreadingPass()); > MPM.add(createCorrelatedValuePropagationPass()); > MPM.add(createCFGSimplificationPass()); // also called in Polly > MPM.add(createInstructionCombiningPass()); // also called in Polly > MPM.add(createTailCallEliminationPass()); // also called in Polly > MPM.add(createCFGSimplificationPass()); // also called in Polly > MPM.add(createReassociatePass()); // also called in Polly > MPM.add(createLoopRotatePass()); // also called in Polly > MPM.add(createLICMPass()); > MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3)); > > The initial idea is to move Polly immediately after LoopRotatePass, which will maximum the reuse of canonicalization passes and we need only keep one canonicalization pass "createPromoteMemoryToRegisterPass" in Polly. Unfortunately, it would lead to errors as shown in previous analysis (see bug http://llvm.org/bugs/show_bug.cgi?id=17323). We need to run "createCFGSimplificationPass" to ensure the correct code.In fact, I propose to move it _before_ the LoopRotate Pass. I think that is also what you tried, no?> The second possible solution is to move Polly immediately before "createCFGSimplificationPass", and then remove all Polly canonicalization passes except the "createTailCallEliminationPass" and "createCFGSimplificationPass"! Unfortunately, it would still trigger an assert error when compiling some benchmarks in LLVM test-suite (see http://llvm.org/bugs/show_bug.cgi?id=17351).I think this is too early, as most of the canonicalization is not yet done. We probably don't need to investigate this bug immediately, but it would be nice if we could make it reproducible without your changes to polly. For this please run the command with -debug-pass=Arguments and replace the -O3 with the actual set of commands that is needed to trigger the bug. If you can reduce the set of commands using bugpoint, that would be even better.> Similarly I also investigated other points to move Polly, such as before "MPM.add(createTailCallEliminationPass())" or before "MPM.add(createCorrelatedValuePropagationPass())", but all these cases would trigger some unexpected compile error or execution error. I think we need more efforts to investigate why Polly cannot be placed in these points.This is unfortunate. I wonder if they all have the same root cause. In any case, I would focus on moving Polly _before_ the loop rotate pass and would start with fixing the blocking bug. Tobias
At 2013-09-25 18:03:18,"Tobias Grosser" <tobias at grosser.es> wrote:>On 09/25/2013 04:55 AM, Star Tan wrote:>> Here is an update about moving Polly later. > >Hi star tan, > >thanks for your report. > >> >> 1. Why does Polly generate incorrect code when we move Polly immediately after the loop rotating pass? >> >> It is mainly caused by a wrong polly merge block. When Polly detects a valid loop for Polyhedral transformations, it usually introduces a new basic block "polly.merge_new_and_old" after the original loop exit block. This new basic block contains a branch instruction that decides whether to exit or re-enter the loop like this: >> >> polly.loop_exit28: ; preds = %polly.stmt.for.body.i, %polly.loop_if25 >> br label %polly.merge_new_and_old >> polly.merge_new_and_old: ; preds = %polly.loop_exit28, %for.body.i >> ... >> %cmp = icmp slt i32 %2, 0 >> ... >> br i1 %cmp, label %for.body, label %for.end12 >> for.end12: ; preds = %loop.exit >> ... >> ret i32 0 >> >> Unfortunately, the new basic block "polly.merge_new_and_old" is marked as unreachable after a serial of immediate Loop optimizations in the following pass order: >> Polly - Create LLVM-IR from SCoPs >> Loop Pass Manager >> Canonicalize natural loops >> Loop-Closed SSA Form Pass >> Rotate Loops >> Loop Invariant Code Motion >> Loop-Closed SSA Form Pass >> Unswitch loops >> Combine redundant instructions >> >> After those loop optimition passes, the "Combine redundant instructions" would treat the basic block "merge_new_and_old" unreachable and transfer the block "polly.loop_exit28" into an infinity loop: >> polly.loop_exit28: ; preds = %polly.stmt.for.body.i, %polly.loop_if25 >> br label polly.loop_exit28 >> >> Actually, I have no idea why this happens, but experiments show that this problem can be addressed by adding a pass "MPM.add(createCFGSimplificationPass())" after Polly. It seems it is necessary to run this pass after Polly code generation. > >Any pass order should yield correct code. If this is not the case there >is either a bug in Polly or we expose a bug in LLVM. We should track >this down. > >Our path ordering decisions should not be driven by us trying to avoid >triggering this bug. > >As you already created a bug report, we should work on fixing it. >The next step would be to reduce the set of passes involved. You can do >this by calling -debug-pass=ArgumentsThanks for your suggestion. I intended to get some quick idea of how much compile/execution performance impact by moving Polly later. Of course, I would also work on fixing these bug.>> 2. Where should we move Polly to? >> >> There are many choices to move Polly later. Tobias suggests to move it immediately after the loop rotate pass (the first loop optimization pass), but unfortunately Polly would generate incorrect code in that case (http://llvm.org/bugs/show_bug.cgi?id=17323). >> >> By investigating those Polly canonicalization passes (polly/lib/RegisterPasses.cpp): >> PM.add(llvm::createPromoteMemoryToRegisterPass()); >> PM.add(llvm::createInstructionCombiningPass()); >> PM.add(llvm::createCFGSimplificationPass()); >> PM.add(llvm::createTailCallEliminationPass()); >> PM.add(llvm::createCFGSimplificationPass()); >> PM.add(llvm::createReassociatePass()); >> PM.add(llvm::createLoopRotatePass()); >> PM.add(llvm::createInstructionCombiningPass()); >> >> We can see that some of them are also called in common cases (lib/Transforms/IPO/PassManagerBuilder.cpp:181): >> >> MPM.add(createEarlyCSEPass()); >> MPM.add(createJumpThreadingPass()); >> MPM.add(createCorrelatedValuePropagationPass()); >> MPM.add(createCFGSimplificationPass()); // also called in Polly >> MPM.add(createInstructionCombiningPass()); // also called in Polly >> MPM.add(createTailCallEliminationPass()); // also called in Polly >> MPM.add(createCFGSimplificationPass()); // also called in Polly >> MPM.add(createReassociatePass()); // also called in Polly >> MPM.add(createLoopRotatePass()); // also called in Polly >> MPM.add(createLICMPass()); >> MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3)); >> >> The initial idea is to move Polly immediately after LoopRotatePass, which will maximum the reuse of canonicalization passes and we need only keep one canonicalization pass "createPromoteMemoryToRegisterPass" in Polly. Unfortunately, it would lead to errors as shown in previous analysis (see bug http://llvm.org/bugs/show_bug.cgi?id=17323). We need to run "createCFGSimplificationPass" to ensure the correct code. > >In fact, I propose to move it _before_ the LoopRotate Pass. I think that >is also what you tried, no?Yes, I actually meant _before_ the Loop Rotate pass. However, I wonder why you care about the LoopRotate pass so much? Is this pass very important to loop optimization? You know, it is also one of Polly's canonicalization passes, so in that case the LoopRotatePass would be called twice, one as Polly's canonicalization pass and the other as Loop canonicalization pass after Polly. It is just a discussion; I am still working on moving Polly before LoopRotatePass.>> The second possible solution is to move Polly immediately before "createCFGSimplificationPass", and then remove all Polly canonicalization passes except the "createTailCallEliminationPass" and "createCFGSimplificationPass"! Unfortunately, it would still trigger an assert error when compiling some benchmarks in LLVM test-suite (see http://llvm.org/bugs/show_bug.cgi?id=17351). > >I think this is too early, as most of the canonicalization is not yet >done. We probably don't need to investigate this bug immediately, but >it would be nice if we could make it reproducible without your changes >to polly. For this please run the command with -debug-pass=Arguments >and replace the -O3 with the actual set of commands that is needed to >trigger the bug. If you can reduce the set of commands using bugpoint, >that would be even better.I see. I will try to reproduce the bug in this way.>> Similarly I also investigated other points to move Polly, such as before "MPM.add(createTailCallEliminationPass())" or before "MPM.add(createCorrelatedValuePropagationPass())", but all these cases would trigger some unexpected compile error or execution error. I think we need more efforts to investigate why Polly cannot be placed in these points. > >This is unfortunate. I wonder if they all have the same root cause. > >In any case, I would focus on moving Polly _before_ the loop rotate pass >and would start with fixing the blocking bug. > >TobiasThanks for your suggestion. Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130925/643d299e/attachment.html>
At 2013-09-25 18:03:18,"Tobias Grosser" <tobias at grosser.es> wrote:>>I think this is too early, as most of the canonicalization is not yet >done. We probably don't need to investigate this bug immediately, but >it would be nice if we could make it reproducible without your changes >to polly. For this please run the command with -debug-pass=Arguments >and replace the -O3 with the actual set of commands that is needed to >trigger the bug. If you can reduce the set of commands using bugpoint, >that would be even better. >I am trying to figure out why "moving Polly before loop rotate pass would lead to incorrect code generation. With the help of "-debug-pass=Arguments", I have reproduced the bug without modify LLVM and Polly. For the attached testcase, it will produce incorrect code if we run a very simple command: $ opt -load LLVMPolly.so -basicaa -mem2reg -polly-codegen -loop-simplify -print-module foo.preopt.ll The output LLVM IR code would contain a exit basic block that is incorrectly marked as "unreachable". I have updated the bug17323 (http://llvm.org/bugs/show_bug.cgi?id=17323) for the further debugging. Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130930/3278805a/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.preopt.ll Type: application/octet-stream Size: 5728 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130930/3278805a/attachment.obj>
On 09/30/2013 04:11 AM, Star Tan wrote:> At 2013-09-25 18:03:18,"Tobias Grosser" <tobias at grosser.es> wrote:> >> I think this is too early, as most of the canonicalization is not yet >> done. We probably don't need to investigate this bug immediately, but >> it would be nice if we could make it reproducible without your changes >> to polly. For this please run the command with -debug-pass=Arguments >> and replace the -O3 with the actual set of commands that is needed to >> trigger the bug. If you can reduce the set of commands using bugpoint, >> that would be even better. >> > > I am trying to figure out why "moving Polly before loop rotate pass would > lead to incorrect code generation. > > With the help of "-debug-pass=Arguments", I have reproduced the > bug without modify LLVM and Polly. For the attached testcase, it will produce > incorrect code if we run a very simple command: > $ opt -load LLVMPolly.so -basicaa -mem2reg -polly-codegen -loop-simplify -print-module foo.preopt.ll > > The output LLVM IR code would contain a exit basic block that is incorrectly > marked as "unreachable". > > I have updated the bug17323 (http://llvm.org/bugs/show_bug.cgi?id=17323) > for the further debugging.The problem is that at some point we failt to properly update the LoopInfo. I updated the bug report with more details. Tobias
Possibly Parallel Threads
- [LLVMdev] [Polly] Move Polly's execution later
- [LLVMdev] [Polly] Move Polly's execution later
- [LLVMdev] How to make Polly ignore some non-affine memory accesses
- [LLVMdev] How to make Polly ignore some non-affine memory accesses
- [LLVMdev] How to make Polly ignore some non-affine memory accesses