Chen Li
2015-Jul-16 23:10 UTC
[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks
Hi, I have a general question on LoopUnswtich pass. Consider the following IR snippet: define i32 @test(i1 %cond) { br label %loop_begin loop_begin: br i1 %cond, label %loop_body, label %loop_exit loop_body: br label %do_something do_something: call void @some_func() noreturn nounwind br label %loop_begin loop_exit: ret i32 0 } declare void @some_func() noreturn After running it through "opt -loop-unswitch -S", it unswitched loop on %cond and produced result like this: define i32 @test(i1 %cond) { br i1 %cond, label %..split_crit_edge, label %.loop_exit.split_crit_edge .loop_exit.split_crit_edge: ; preds = %0 br label %loop_exit.split ..split_crit_edge: ; preds = %0 br label %.split .split: ; preds = %..split_crit_edge br label %loop_begin loop_begin: ; preds = %do_something, %.split br i1 true, label %loop_body, label %loop_exit loop_body: ; preds = %loop_begin br label %do_something do_something: ; preds = %loop_body call void @some_func() #1 br label %loop_begin loop_exit: ; preds = %loop_begin br label %loop_exit.split loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit ret i32 0 } We see it did not constant fold "br i1 true, label %loop_body, label %loop_exit" and merge %loop_body into %loop_begin. My understanding is that later llvm passes (likely -simplifycfg) will cleanup the code properly and doing this in LoopUnswtich pass is duplicated. However, consider the following case: define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { br label %loop_begin loop_begin: br i1 %cond1, label %continue, label %loop_exit continue: %var_val = load i32, i32* %var br i1 %cond2, label %do_something, label %loop_exit do_something: call void @some_func() noreturn nounwind br label %loop_begin loop_exit: ret i32 0 } Assume you don't have enough budget to do non-trivial loop unswtich (for test purpose, set -loop-unswitch-threshold=0 with opt), the result would be: define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { br i1 %cond1, label %..split_crit_edge, label %.loop_exit.split_crit_edge .loop_exit.split_crit_edge: ; preds = %0 br label %loop_exit.split ..split_crit_edge: ; preds = %0 br label %.split .split: ; preds = %..split_crit_edge br label %loop_begin loop_begin: ; preds = %do_something, %.split br i1 true, label %continue, label %loop_exit continue: ; preds = %loop_begin %var_val = load i32, i32* %var br i1 %cond2, label %do_something, label %loop_exit do_something: ; preds = %continue call void @some_func() #1 br label %loop_begin loop_exit: ; preds = %continue, %loop_begin br label %loop_exit.split loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit ret i32 0 } The remaining loop (%loop_begin) in the result actually contains a trivial unswitch condition (on %cond2), which should always unswitch because there is no code growth. However, because the branch in loop header is not constant folded and %continue is not merged into %loop_begin, LoopUnswtich pass can not recongnize the trivial unswitch condition (in the current implementation, trivial unswitch condition must be loop header’s terminator). If this is the last LoopUnswtich pass in the optimization pipeline, we will miss the trivial unswitch opportunity. So would it be reasonable to add branch constant folding into LoopUnswtich pass for this kind of cases? thanks, chen
Philip Reames
2015-Jul-17 21:45 UTC
[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks
Adding folding of constant branches to the unswitcher seems likes it would be a perfectly reasonable enhancement. For context, it's worth restating that the unswitcher is used in trivial-only mode in the standard O2 pipeline. We only enable non-trivial unswitch if -O3 is specified. The complication I'd watch out for is that folding such branches might eliminate the loop entirely. The unswitcher clearly already has to handle the case of "1 loop -> 1 loop + 1 non-loop", but I'm not sure it would correctly handle "1 loop -> 2 non-loops". In particular, I'm not sure what the interaction with the LoopPassManager is here. Taking a look at LoopDeletionPass might be a good source of information. Philip On 07/16/2015 04:10 PM, Chen Li wrote:> Hi, > > I have a general question on LoopUnswtich pass. > > Consider the following IR snippet: > > define i32 @test(i1 %cond) { > br label %loop_begin > > loop_begin: > br i1 %cond, label %loop_body, label %loop_exit > > loop_body: > br label %do_something > > do_something: > call void @some_func() noreturn nounwind > br label %loop_begin > > loop_exit: > ret i32 0 > } > declare void @some_func() noreturn > > After running it through "opt -loop-unswitch -S", it unswitched loop on %cond and produced result like this: > > define i32 @test(i1 %cond) { > br i1 %cond, label %..split_crit_edge, label %.loop_exit.split_crit_edge > > .loop_exit.split_crit_edge: ; preds = %0 > br label %loop_exit.split > > ..split_crit_edge: ; preds = %0 > br label %.split > > .split: ; preds = %..split_crit_edge > br label %loop_begin > > loop_begin: ; preds = %do_something, %.split > br i1 true, label %loop_body, label %loop_exit > > loop_body: ; preds = %loop_begin > br label %do_something > > do_something: ; preds = %loop_body > call void @some_func() #1 > br label %loop_begin > > loop_exit: ; preds = %loop_begin > br label %loop_exit.split > > loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit > ret i32 0 > } > > We see it did not constant fold "br i1 true, label %loop_body, label %loop_exit" and merge %loop_body into %loop_begin. > > My understanding is that later llvm passes (likely -simplifycfg) will cleanup the code properly and doing this in LoopUnswtich pass is duplicated. However, consider the following case: > > define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { > br label %loop_begin > > loop_begin: > br i1 %cond1, label %continue, label %loop_exit > > continue: > %var_val = load i32, i32* %var > br i1 %cond2, label %do_something, label %loop_exit > > do_something: > call void @some_func() noreturn nounwind > br label %loop_begin > > loop_exit: > ret i32 0 > } > > Assume you don't have enough budget to do non-trivial loop unswtich (for test purpose, set -loop-unswitch-threshold=0 with opt), the result would be: > > define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { > br i1 %cond1, label %..split_crit_edge, label %.loop_exit.split_crit_edge > > .loop_exit.split_crit_edge: ; preds = %0 > br label %loop_exit.split > > ..split_crit_edge: ; preds = %0 > br label %.split > > .split: ; preds = %..split_crit_edge > br label %loop_begin > > loop_begin: ; preds = %do_something, %.split > br i1 true, label %continue, label %loop_exit > > continue: ; preds = %loop_begin > %var_val = load i32, i32* %var > br i1 %cond2, label %do_something, label %loop_exit > > do_something: ; preds = %continue > call void @some_func() #1 > br label %loop_begin > > loop_exit: ; preds = %continue, %loop_begin > br label %loop_exit.split > > loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit > ret i32 0 > } > > The remaining loop (%loop_begin) in the result actually contains a trivial unswitch condition (on %cond2), which should always unswitch because there is no code growth. However, because the branch in loop header is not constant folded and %continue is not merged into %loop_begin, LoopUnswtich pass can not recongnize the trivial unswitch condition (in the current implementation, trivial unswitch condition must be loop header’s terminator). If this is the last LoopUnswtich pass in the optimization pipeline, we will miss the trivial unswitch opportunity. > > So would it be reasonable to add branch constant folding into LoopUnswtich pass for this kind of cases? > > thanks, > chen > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Philip Reames
2015-Jul-18 04:27 UTC
[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks
Looking into this further, this may be as simple as extending the SimplifyCode function already used in LoopUnswitch. Philip> On Jul 17, 2015, at 2:45 PM, Philip Reames <listmail at philipreames.com> wrote: > > Adding folding of constant branches to the unswitcher seems likes it would be a perfectly reasonable enhancement. For context, it's worth restating that the unswitcher is used in trivial-only mode in the standard O2 pipeline. We only enable non-trivial unswitch if -O3 is specified. > > The complication I'd watch out for is that folding such branches might eliminate the loop entirely. The unswitcher clearly already has to handle the case of "1 loop -> 1 loop + 1 non-loop", but I'm not sure it would correctly handle "1 loop -> 2 non-loops". In particular, I'm not sure what the interaction with the LoopPassManager is here. Taking a look at LoopDeletionPass might be a good source of information. > > Philip > > > >> On 07/16/2015 04:10 PM, Chen Li wrote: >> Hi, >> >> I have a general question on LoopUnswtich pass. >> >> Consider the following IR snippet: >> >> define i32 @test(i1 %cond) { >> br label %loop_begin >> >> loop_begin: >> br i1 %cond, label %loop_body, label %loop_exit >> >> loop_body: >> br label %do_something >> >> do_something: >> call void @some_func() noreturn nounwind >> br label %loop_begin >> >> loop_exit: >> ret i32 0 >> } >> declare void @some_func() noreturn >> >> After running it through "opt -loop-unswitch -S", it unswitched loop on %cond and produced result like this: >> >> define i32 @test(i1 %cond) { >> br i1 %cond, label %..split_crit_edge, label %.loop_exit.split_crit_edge >> >> .loop_exit.split_crit_edge: ; preds = %0 >> br label %loop_exit.split >> >> ..split_crit_edge: ; preds = %0 >> br label %.split >> >> .split: ; preds = %..split_crit_edge >> br label %loop_begin >> >> loop_begin: ; preds = %do_something, %.split >> br i1 true, label %loop_body, label %loop_exit >> >> loop_body: ; preds = %loop_begin >> br label %do_something >> >> do_something: ; preds = %loop_body >> call void @some_func() #1 >> br label %loop_begin >> >> loop_exit: ; preds = %loop_begin >> br label %loop_exit.split >> >> loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit >> ret i32 0 >> } >> >> We see it did not constant fold "br i1 true, label %loop_body, label %loop_exit" and merge %loop_body into %loop_begin. >> >> My understanding is that later llvm passes (likely -simplifycfg) will cleanup the code properly and doing this in LoopUnswtich pass is duplicated. However, consider the following case: >> >> define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { >> br label %loop_begin >> >> loop_begin: >> br i1 %cond1, label %continue, label %loop_exit >> >> continue: >> %var_val = load i32, i32* %var >> br i1 %cond2, label %do_something, label %loop_exit >> >> do_something: >> call void @some_func() noreturn nounwind >> br label %loop_begin >> >> loop_exit: >> ret i32 0 >> } >> >> Assume you don't have enough budget to do non-trivial loop unswtich (for test purpose, set -loop-unswitch-threshold=0 with opt), the result would be: >> >> define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { >> br i1 %cond1, label %..split_crit_edge, label %.loop_exit.split_crit_edge >> >> .loop_exit.split_crit_edge: ; preds = %0 >> br label %loop_exit.split >> >> ..split_crit_edge: ; preds = %0 >> br label %.split >> >> .split: ; preds = %..split_crit_edge >> br label %loop_begin >> >> loop_begin: ; preds = %do_something, %.split >> br i1 true, label %continue, label %loop_exit >> >> continue: ; preds = %loop_begin >> %var_val = load i32, i32* %var >> br i1 %cond2, label %do_something, label %loop_exit >> >> do_something: ; preds = %continue >> call void @some_func() #1 >> br label %loop_begin >> >> loop_exit: ; preds = %continue, %loop_begin >> br label %loop_exit.split >> >> loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit >> ret i32 0 >> } >> >> The remaining loop (%loop_begin) in the result actually contains a trivial unswitch condition (on %cond2), which should always unswitch because there is no code growth. However, because the branch in loop header is not constant folded and %continue is not merged into %loop_begin, LoopUnswtich pass can not recongnize the trivial unswitch condition (in the current implementation, trivial unswitch condition must be loop header’s terminator). If this is the last LoopUnswtich pass in the optimization pipeline, we will miss the trivial unswitch opportunity. >> >> So would it be reasonable to add branch constant folding into LoopUnswtich pass for this kind of cases? >> >> thanks, >> chen >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Chen Li
2015-Jul-19 05:17 UTC
[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks
> On Jul 17, 2015, at 2:45 PM, Philip Reames <listmail at philipreames.com> wrote: > > Adding folding of constant branches to the unswitcher seems likes it would be a perfectly reasonable enhancement. For context, it's worth restating that the unswitcher is used in trivial-only mode in the standard O2 pipeline. We only enable non-trivial unswitch if -O3 is specified. > > The complication I'd watch out for is that folding such branches might eliminate the loop entirely. The unswitcher clearly already has to handle the case of "1 loop -> 1 loop + 1 non-loop", but I'm not sure it would correctly handle "1 loop -> 2 non-loops". In particular, I'm not sure what the interaction with the LoopPassManager is here. Taking a look at LoopDeletionPass might be a good source of information.Thinking about it further, I don’t think it’s possible to eliminate the loop entirely. Loop unswtich applies on conditional branch and splits the original loop to 2 loops based on *true* condition and *false* condition of the branch (by *true* I mean the branch goes into the loop, and *false* to exit the loop). For trivial cases, folding the *false* condition will eliminate the loop, but folding the *true* condition will create an infinite loop with an unconditional branch to the loop header. So I don’t think this should be a problem for LoopPassManager. I will take a deep look at both LoopDeletionPass and SimplifyCode function in LoopUnswitch. thanks, chen> > Philip > > > > On 07/16/2015 04:10 PM, Chen Li wrote: >> Hi, >> >> I have a general question on LoopUnswtich pass. >> >> Consider the following IR snippet: >> >> define i32 @test(i1 %cond) { >> br label %loop_begin >> >> loop_begin: >> br i1 %cond, label %loop_body, label %loop_exit >> >> loop_body: >> br label %do_something >> >> do_something: >> call void @some_func() noreturn nounwind >> br label %loop_begin >> >> loop_exit: >> ret i32 0 >> } >> declare void @some_func() noreturn >> >> After running it through "opt -loop-unswitch -S", it unswitched loop on %cond and produced result like this: >> >> define i32 @test(i1 %cond) { >> br i1 %cond, label %..split_crit_edge, label %.loop_exit.split_crit_edge >> >> .loop_exit.split_crit_edge: ; preds = %0 >> br label %loop_exit.split >> >> ..split_crit_edge: ; preds = %0 >> br label %.split >> >> .split: ; preds = %..split_crit_edge >> br label %loop_begin >> >> loop_begin: ; preds = %do_something, %.split >> br i1 true, label %loop_body, label %loop_exit >> >> loop_body: ; preds = %loop_begin >> br label %do_something >> >> do_something: ; preds = %loop_body >> call void @some_func() #1 >> br label %loop_begin >> >> loop_exit: ; preds = %loop_begin >> br label %loop_exit.split >> >> loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit >> ret i32 0 >> } >> >> We see it did not constant fold "br i1 true, label %loop_body, label %loop_exit" and merge %loop_body into %loop_begin. >> >> My understanding is that later llvm passes (likely -simplifycfg) will cleanup the code properly and doing this in LoopUnswtich pass is duplicated. However, consider the following case: >> >> define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { >> br label %loop_begin >> >> loop_begin: >> br i1 %cond1, label %continue, label %loop_exit >> >> continue: >> %var_val = load i32, i32* %var >> br i1 %cond2, label %do_something, label %loop_exit >> >> do_something: >> call void @some_func() noreturn nounwind >> br label %loop_begin >> >> loop_exit: >> ret i32 0 >> } >> >> Assume you don't have enough budget to do non-trivial loop unswtich (for test purpose, set -loop-unswitch-threshold=0 with opt), the result would be: >> >> define i32 @test(i32* %var, i1 %cond1, i1 %cond2) { >> br i1 %cond1, label %..split_crit_edge, label %.loop_exit.split_crit_edge >> >> .loop_exit.split_crit_edge: ; preds = %0 >> br label %loop_exit.split >> >> ..split_crit_edge: ; preds = %0 >> br label %.split >> >> .split: ; preds = %..split_crit_edge >> br label %loop_begin >> >> loop_begin: ; preds = %do_something, %.split >> br i1 true, label %continue, label %loop_exit >> >> continue: ; preds = %loop_begin >> %var_val = load i32, i32* %var >> br i1 %cond2, label %do_something, label %loop_exit >> >> do_something: ; preds = %continue >> call void @some_func() #1 >> br label %loop_begin >> >> loop_exit: ; preds = %continue, %loop_begin >> br label %loop_exit.split >> >> loop_exit.split: ; preds = %.loop_exit.split_crit_edge, %loop_exit >> ret i32 0 >> } >> >> The remaining loop (%loop_begin) in the result actually contains a trivial unswitch condition (on %cond2), which should always unswitch because there is no code growth. However, because the branch in loop header is not constant folded and %continue is not merged into %loop_begin, LoopUnswtich pass can not recongnize the trivial unswitch condition (in the current implementation, trivial unswitch condition must be loop header’s terminator). If this is the last LoopUnswtich pass in the optimization pipeline, we will miss the trivial unswitch opportunity. >> >> So would it be reasonable to add branch constant folding into LoopUnswtich pass for this kind of cases? >> >> thanks, >> chen >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Possibly Parallel Threads
- [LLVMdev] Interesting post increment situation in DAG combiner
- [LLVMdev] Interesting post increment situation in DAG combiner
- [LLVMdev] Interesting post increment situation in DAG combiner
- [LLVMdev] Support for per-loop pragma
- [LLVMdev] How to unroll reduction loop with caching accumulator on register?