thr3ads.net - llvm dev - [LLVMdev] [Polly] Move Polly's execution later [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Star Tan

2013-Sep-22 05:02 UTC

[LLVMdev] [Polly] Move Polly's execution later

Hi Tobias,
At 2013-09-19 22:59:25,"Tobias Grosser" <tobias at grosser.es>
wrote:
>On 09/19/2013 04:46 PM, Star Tan wrote:
>> Hi Tobias,
>>
>>
>> I am trying to move Polly later.
>>
>>
>> LLVM provides some predefined ExtensionPointTy:
>>      EP_EarlyAsPossible,
>>      EP_ModuleOptimizerEarly,
>>      EP_LoopOptimizerEnd,
>>      EP_ScalarOptimizerLate,
>>      ...
>>
>>
>> Currently Polly uses "EP_EarlyAsPossible" to run as early as
possible.  As what you suggested:
>>> Instead of removing canonicalization passes, I believe we may want
to
>>> move Polly to a later place in the pass manager. Possibly at the
>>> beginning of the loop optimizer right before
PM.add(createLoopRotatePass());
>> I want to move it to the point immediate after someone Loop
optimization pass, e.g. MPM.add(createLoopRotatePass()).  However no predefined
ExtensionPointTy is available for this purpose. Instead, the
"EP_ModuleOptimizerEarly" would move Polly before all loop
optimization passes.
>>
>>
>> In my option, there are two solutions: one is to use
"EP_ModuleOptimizerEarly" (only modify the
tool/polly/lib/RegisterPasses.cpp) to move Polly before all loop optimization
passes; the other is to add a new ExtensionPointTy, e.g.
"EP_LoopRotateEnd" and move Polly exactly immediate after the
"LoopRotate" pass (need to modify tool/polly/lib/RegisterPasses.cpp,
include/llvm/Transforms/IPO/PassManagerBuilder.h and
lib/Transforms/IPO/PassManagerBuilder.cpp). We can use the second way to
investigate other points to start Polly.
>>
>> Is my understanding correct? Do you have any further suggestion?
>
>Yes, fully correct. I would play with solution two. Why would you add 
>Polly after the loop rotate pass?Forgot it. I just wanted to give an immature example to show how to move Polly
to other points.>I would possibly add it at the beginning of the loop optimizer (right 
>before the loop rotation pass) possibly experimenting with a new 
>extension point called EP_LoopOptimizerBegin.
>Thanks for your advice. I have tried to move Polly immediately before the loop
rotation pass. The temporary patch file can be reached on
http://llvm.org/bugs/attachment.cgi?id=11262.
First of all, I evaluated Polly's canonicalization passes based on this
patch file. Results are posted on http://188.40.87.11:8000, which contains two
new runs:
PollyNoGen_loop (run 56): move polly later but keep all polly canonicalization
passes;
PollyNoGen_loop_nocan(run57): move polly later and delete all polly
canonicalization passes;
Comparison is shown on 
http://188.40.87.11:8000/db_default/v4/nts/57?compare_to=56&baseline=56. It
seems that even we move polly later, those canonicalization passes still have
significant impact on the final execution-time performance.
A very bad news is that Polly would produce wrong code if we move Polly right
immediately before the loop rotate pass. Many benchmarks in LLVM testsuite would
fail in execution. I constructed a relatively simple testcase for this bug as
shown on http://llvm.org/bugs/show_bug.cgi?id=17323. The key problem is Polly
would produce wrong basic blocks like this:
polly.loop_header27:                              ; preds = %vector.body,
%polly.loop_header27
  br label %polly.loop_header27
For your information, the source code is posted on
http://llvm.org/bugs/attachment.cgi?id=11263 and the LLVM IR code is posted on
http://llvm.org/bugs/attachment.cgi?id=11264
I have not yet found out why Polly produces such wrong code.  However, maybe I
should also investigate other points where we can move Polly. Do you have any
suggestion?
Best,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130922/eda59b06/attachment.html>

Star Tan

2013-Sep-25 02:55 UTC

head link

[LLVMdev] [Polly] Move Polly's execution later

Here is an update about moving Polly later.

1. Why does Polly generate incorrect code when we move Polly immediately after
the loop rotating pass?

It is mainly caused by a wrong polly merge block. When Polly detects a valid
loop for Polyhedral transformations, it usually introduces a new basic block
"polly.merge_new_and_old" after the original loop exit block. This new
basic block contains a branch instruction that decides whether to exit or
re-enter the loop like this:

polly.loop_exit28:                                ; preds =
%polly.stmt.for.body.i, %polly.loop_if25
  br label %polly.merge_new_and_old
polly.merge_new_and_old:                          ; preds = %polly.loop_exit28,
%for.body.i
  ...
  %cmp = icmp slt i32 %2, 0
  ...
  br i1 %cmp, label %for.body, label %for.end12
for.end12:                                        ; preds = %loop.exit
  ...
  ret i32 0

Unfortunately, the new basic block "polly.merge_new_and_old" is marked
as unreachable after a serial of immediate Loop optimizations in the following
pass order:
          Polly - Create LLVM-IR from SCoPs
        Loop Pass Manager
          Canonicalize natural loops
          Loop-Closed SSA Form Pass               
          Rotate Loops
          Loop Invariant Code Motion
          Loop-Closed SSA Form Pass
          Unswitch loops
        Combine redundant instructions

After those loop optimition passes, the "Combine redundant
instructions" would treat the basic block "merge_new_and_old"
unreachable and transfer the block "polly.loop_exit28" into an
infinity loop:
polly.loop_exit28:                                ; preds =
%polly.stmt.for.body.i, %polly.loop_if25
  br label polly.loop_exit28

Actually, I have no idea why this happens, but experiments show that this
problem can be addressed by adding a pass
"MPM.add(createCFGSimplificationPass())" after Polly. It seems it is
necessary to run this pass after Polly code generation.

2. Where should we  move Polly to?

There are many choices to move Polly later. Tobias suggests to move it
immediately after the loop rotate pass (the first loop optimization pass), but
unfortunately Polly would generate incorrect code in that case
(http://llvm.org/bugs/show_bug.cgi?id=17323).

By investigating those Polly canonicalization passes
(polly/lib/RegisterPasses.cpp):
    PM.add(llvm::createPromoteMemoryToRegisterPass());
    PM.add(llvm::createInstructionCombiningPass());
    PM.add(llvm::createCFGSimplificationPass());
    PM.add(llvm::createTailCallEliminationPass());
    PM.add(llvm::createCFGSimplificationPass());
    PM.add(llvm::createReassociatePass());
    PM.add(llvm::createLoopRotatePass());
    PM.add(llvm::createInstructionCombiningPass());

We can see that some of them are also called in common cases
(lib/Transforms/IPO/PassManagerBuilder.cpp:181):

  MPM.add(createEarlyCSEPass());
  MPM.add(createJumpThreadingPass());
  MPM.add(createCorrelatedValuePropagationPass());
  MPM.add(createCFGSimplificationPass());     // also called in Polly
  MPM.add(createInstructionCombiningPass());  // also called in Polly
  MPM.add(createTailCallEliminationPass());   // also called in Polly
  MPM.add(createCFGSimplificationPass());     // also called in Polly
  MPM.add(createReassociatePass());           // also called in Polly
  MPM.add(createLoopRotatePass());            // also called in Polly
  MPM.add(createLICMPass());
  MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));

The initial idea is to move Polly immediately after LoopRotatePass, which will
maximum the reuse of canonicalization passes and we need only keep one
canonicalization pass "createPromoteMemoryToRegisterPass" in Polly.
Unfortunately, it would lead to errors as shown in previous analysis (see bug
http://llvm.org/bugs/show_bug.cgi?id=17323). We need to run
"createCFGSimplificationPass" to ensure the correct code.

The second possible solution is to move Polly immediately before
"createCFGSimplificationPass", and then remove all Polly
canonicalization passes except the "createTailCallEliminationPass" and
"createCFGSimplificationPass"! Unfortunately, it would still trigger
an assert error when compiling some benchmarks in LLVM test-suite (see
http://llvm.org/bugs/show_bug.cgi?id=17351).

Similarly I also investigated other points to move Polly, such as before
"MPM.add(createTailCallEliminationPass())" or before
"MPM.add(createCorrelatedValuePropagationPass())", but all these cases
would trigger some unexpected compile error or execution error. I think we need
more efforts to investigate why Polly cannot be placed in these points.

At last, I also investigated the basic solution that move Polly immediately
before "CallGraph SCC passes" which is a very early stage in module
level optimizations. However, there are still some pre-executed basic passes
(DeadArgElimination/InstructionCombining/CFGSimplification),  which may reduce
Polly canonicalization passes . Preliminary testing shows no extra error happens
in LLVM test-suite and detailed evaluation is running. Hope we can get detailed
results after several hours's running.

Best,
Star Tan

At 2013-09-22 13:02:38,"Star Tan" <tanmx_star at yeah.net>
wrote: Hi Tobias,At 2013-09-19 22:59:25,"Tobias Grosser" <tobias at
grosser.es> wrote:>On 09/19/2013 04:46 PM, Star Tan
wrote:>> Hi Tobias,
>>
>>
>> I am trying to move Polly later.
>>
>>
>> LLVM provides some predefined ExtensionPointTy:
>>      EP_EarlyAsPossible,
>>      EP_ModuleOptimizerEarly,
>>      EP_LoopOptimizerEnd,
>>      EP_ScalarOptimizerLate,
>>      ...
>>
>>
>> Currently Polly uses "EP_EarlyAsPossible" to run as early as
possible.  As what you suggested:
>>> Instead of removing canonicalization passes, I believe we may want
to
>>> move Polly to a later place in the pass manager. Possibly at the
>>> beginning of the loop optimizer right before
PM.add(createLoopRotatePass());
>> I want to move it to the point immediate after someone Loop
optimization pass, e.g. MPM.add(createLoopRotatePass()).  However no predefined
ExtensionPointTy is available for this purpose. Instead, the
"EP_ModuleOptimizerEarly" would move Polly before all loop
optimization passes.
>>
>>
>> In my option, there are two solutions: one is to use
"EP_ModuleOptimizerEarly" (only modify the
tool/polly/lib/RegisterPasses.cpp) to move Polly before all loop optimization
passes; the other is to add a new ExtensionPointTy, e.g.
"EP_LoopRotateEnd" and move Polly exactly immediate after the
"LoopRotate" pass (need to modify tool/polly/lib/RegisterPasses.cpp,
include/llvm/Transforms/IPO/PassManagerBuilder.h and
lib/Transforms/IPO/PassManagerBuilder.cpp). We can use the second way to
investigate other points to start Polly.
>>
>> Is my understanding correct? Do you have any further suggestion?
>
>Yes, fully correct. I would play with solution two. Why would you add 
>Polly after the loop rotate pass?
>
>I would possibly add it at the beginning of the loop optimizer (right 
>before the loop rotation pass) possibly experimenting with a new 
>extension point called EP_LoopOptimizerBegin.
>
>Cheers,
>Tobias
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130925/ed6c476d/attachment.html>

Tobias Grosser

2013-Sep-25 10:03 UTC

head link

[LLVMdev] [Polly] Move Polly's execution later

On 09/25/2013 04:55 AM, Star Tan wrote:> Here is an update about moving Polly later.
Hi star tan,

thanks for your report.

>
> 1. Why does Polly generate incorrect code when we move Polly immediately
after the loop rotating pass?
>
> It is mainly caused by a wrong polly merge block. When Polly detects a
valid loop for Polyhedral transformations, it usually introduces a new basic
block "polly.merge_new_and_old" after the original loop exit block.
This new basic block contains a branch instruction that decides whether to exit
or re-enter the loop like this:
>
> polly.loop_exit28:                                ; preds =
%polly.stmt.for.body.i, %polly.loop_if25
>    br label %polly.merge_new_and_old
> polly.merge_new_and_old:                          ; preds =
%polly.loop_exit28, %for.body.i
>    ...
>    %cmp = icmp slt i32 %2, 0
>    ...
>    br i1 %cmp, label %for.body, label %for.end12
> for.end12:                                        ; preds = %loop.exit
>    ...
>    ret i32 0
>
> Unfortunately, the new basic block "polly.merge_new_and_old" is
marked as unreachable after a serial of immediate Loop optimizations in the
following pass order:
>            Polly - Create LLVM-IR from SCoPs
>          Loop Pass Manager
>            Canonicalize natural loops
>            Loop-Closed SSA Form Pass
>            Rotate Loops
>            Loop Invariant Code Motion
>            Loop-Closed SSA Form Pass
>            Unswitch loops
>          Combine redundant instructions
>
> After those loop optimition passes, the "Combine redundant
instructions" would treat the basic block "merge_new_and_old"
unreachable and transfer the block "polly.loop_exit28" into an
infinity loop:
> polly.loop_exit28:                                ; preds =
%polly.stmt.for.body.i, %polly.loop_if25
>    br label polly.loop_exit28
>
> Actually, I have no idea why this happens, but experiments show that this
problem can be addressed by adding a pass
"MPM.add(createCFGSimplificationPass())" after Polly. It seems it is
necessary to run this pass after Polly code generation.
Any pass order should yield correct code. If this is not the case there 
is either a bug in Polly or we expose a bug in LLVM. We should track 
this down.

Our path ordering decisions should not be driven by us trying to avoid 
triggering this bug.

As you already created a bug report, we should work on fixing it.
The next step would be to reduce the set of passes involved. You can do 
this by calling -debug-pass=Arguments

> 2. Where should we  move Polly to?
>
> There are many choices to move Polly later. Tobias suggests to move it
immediately after the loop rotate pass (the first loop optimization pass), but
unfortunately Polly would generate incorrect code in that case
(http://llvm.org/bugs/show_bug.cgi?id=17323).
>
> By investigating those Polly canonicalization passes
(polly/lib/RegisterPasses.cpp):
>      PM.add(llvm::createPromoteMemoryToRegisterPass());
>      PM.add(llvm::createInstructionCombiningPass());
>      PM.add(llvm::createCFGSimplificationPass());
>      PM.add(llvm::createTailCallEliminationPass());
>      PM.add(llvm::createCFGSimplificationPass());
>      PM.add(llvm::createReassociatePass());
>      PM.add(llvm::createLoopRotatePass());
>      PM.add(llvm::createInstructionCombiningPass());
>
> We can see that some of them are also called in common cases
(lib/Transforms/IPO/PassManagerBuilder.cpp:181):
>
>    MPM.add(createEarlyCSEPass());
>    MPM.add(createJumpThreadingPass());
>    MPM.add(createCorrelatedValuePropagationPass());
>    MPM.add(createCFGSimplificationPass());     // also called in Polly
>    MPM.add(createInstructionCombiningPass());  // also called in Polly
>    MPM.add(createTailCallEliminationPass());   // also called in Polly
>    MPM.add(createCFGSimplificationPass());     // also called in Polly
>    MPM.add(createReassociatePass());           // also called in Polly
>    MPM.add(createLoopRotatePass());            // also called in Polly
>    MPM.add(createLICMPass());
>    MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
>
> The initial idea is to move Polly immediately after LoopRotatePass, which
will maximum the reuse of canonicalization passes and we need only keep one
canonicalization pass "createPromoteMemoryToRegisterPass" in Polly.
Unfortunately, it would lead to errors as shown in previous analysis (see bug
http://llvm.org/bugs/show_bug.cgi?id=17323). We need to run
"createCFGSimplificationPass" to ensure the correct code.
In fact, I propose to move it _before_ the LoopRotate Pass. I think that 
is also what you tried, no?
> The second possible solution is to move Polly immediately before
"createCFGSimplificationPass", and then remove all Polly
canonicalization passes except the "createTailCallEliminationPass" and
"createCFGSimplificationPass"! Unfortunately, it would still trigger
an assert error when compiling some benchmarks in LLVM test-suite (see
http://llvm.org/bugs/show_bug.cgi?id=17351).
I think this is too early, as most of the canonicalization is not yet 
done. We probably don't need to investigate this bug immediately, but
it would be nice if we could make it reproducible without your changes 
to polly. For this please run the command with -debug-pass=Arguments
and replace the -O3 with the actual set of commands that is needed to 
trigger the bug. If you can reduce the set of commands using bugpoint, 
that would be even better.
> Similarly I also investigated other points to move Polly, such as before
"MPM.add(createTailCallEliminationPass())" or before
"MPM.add(createCorrelatedValuePropagationPass())", but all these cases
would trigger some unexpected compile error or execution error. I think we need
more efforts to investigate why Polly cannot be placed in these points.
This is unfortunate. I wonder if they all have the same root cause.

In any case, I would focus on moving Polly _before_ the loop rotate pass 
and would start with fixing the blocking bug.

Tobias

Sebastian Pop

2013-Sep-26 20:22 UTC

head link

[LLVMdev] [Polly] Move Polly's execution later

Star Tan wrote:> I have not yet found out why Polly produces such wrong code.
IMO this is because Polly does not recognize a given number of patterns that
occur after the scalar opts have transformed the code.  There are a given number
of assumptions that we took for granted and these do not hold down the pipeline.
> However, maybe I should also investigate other points where we can move
> Polly. Do you have any suggestion?
Yep, try to move Polly just after function inlining, that's better than the
current place, and will extend the loop coverage in many cases.

Sebastian
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Star Tan

2013-Sep-27 04:00 UTC

head link

[LLVMdev] [Polly] Move Polly's execution later

At 2013-09-27 04:22:13,"Sebastian Pop" <spop at
codeaurora.org> wrote:>Star Tan wrote:>> I have not yet found out why Polly produces such wrong code.
>
>IMO this is because Polly does not recognize a given number of patterns that
>occur after the scalar opts have transformed the code.  There are a given number
>of assumptions that we took for granted and these do not hold down the pipeline.
>
>> However, maybe I should also investigate other points where we can move
>> Polly. Do you have any suggestion?
>
>Yep, try to move Polly just after function inlining, that's better than the
>current place, and will extend the loop coverage in many cases.
>Do you mean we should move it  after "MPM.add(Inliner);"? 
Inliner pass is an early pass, so moving Polly after Inliner may not reduce the
Polly's canonicalization passes. On the contrary, it may make the Polly more
complex. However,  as what you said, moving Polly after the inlinerg pass
will extend the loop coverage and thus improve the performance of Polly! 
I'll try it.

Thanks for your suggestion,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130927/d4e63640/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Sep 2013 - [LLVMdev] [Polly] Move Polly's execution later

[LLVMdev] [Polly] Move Polly's execution later

[LLVMdev] [Polly] Move Polly's execution later

[LLVMdev] [Polly] Move Polly's execution later

[LLVMdev] [Polly] Move Polly's execution later

[LLVMdev] [Polly] Move Polly's execution later

Maybe Matching Threads