Jingu Kang via llvm-dev
2021-Jul-20 16:45 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
Hi Philip, I have reduced the test case roughly as below. declare i32 @foo(i8 *) define void @test(i8* %s, i64 %a) { entry: %s.addr.a = getelementptr i8, i8* %s, i64 %a br label %while.body.us while.body.us: %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 %incdec.val = load i8, i8* %s.addr, align 1 %cmp1 = icmp eq i8 %incdec.val, 10 br i1 %cmp1, label %if.end8.us, label %while.cond.backedge if.end8.us: %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) %cmp2 = icmp ult i32 %call9, 0 br i1 %cmp2, label %while.cond.backedge, label %return.loopexit while.cond.backedge: %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a br i1 %cmp3, label %return.loopexit, label %while.body.us return.loopexit: ret void } Roughly, I unrolled the loop manually 2 times as below and ignored the remaining loop simply. define void @test(i8* %s, i64 %a) { entry: %s.addr.a = getelementptr i8, i8* %s, i64 %a br label %while.body.us while.body.us: %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 %incdec.val = load i8, i8* %s.addr, align 1 %cmp1 = icmp eq i8 %incdec.val, 10 br i1 %cmp1, label %if.end8.us, label %while.body.us.1 while.body.us.1: %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1 %incdec.val.1 = load i8, i8* %incdec.ptr, align 1 %cmp1.1 = icmp eq i8 %incdec.val.1, 10 br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge if.end8.us: %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ %incdec.ptr.1, %while.body.us.1 ] %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi) %cmp2 = icmp ult i32 %call9, 0 br i1 %cmp2, label %while.cond.backedge, label %return.loopexit while.cond.backedge: %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a br i1 %cmp3, label %return.loopexit, label %while.body.us return.loopexit: ret void } If possible, can we make loop unroll pass handle this kind of cases please? If I missed something, please let me know. Thanks JinGu Kang From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 20:12 To: Jingu Kang <Jingu.Kang at arm.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits Jingu, I'm not fluent in AArch64 assembly, sorry.� If you want more meaningful commentary, you'll need to reduce your c++ example further, and give a better explanation of what you expect the unroller to do for you.� Philip On 7/16/21 12:09 PM, Jingu Kang wrote: Sorry for poor example� � The AArch64 assembly output of the example from gcc is as below. The loop is unrolled 7 times. I have written some comments to explain how the assembly code is mapped to C source code. As you can see on `.L3` label, the �if (*s++ != '\n')� block is unrolled 7 times. � ������� stp���� x29, x30, [sp, -48]! ������� mov���� x29, sp ������� stp���� x19, x20, [sp, 16] ������� mov���� x19, x0� --> x19 is char *s ������� mov���� x20, x1� --> x20 is char *end ������� str���� x21, [sp, 32] ������� orr���� x21, x3, x2� --> x21 is check1 | check2 .L70: ������� sub���� x0, x20, x19� --> x0 = end - s; ������� add���� x1, x0, 1 ������� ands��� x2, x1, 7� --> unroll count is 7 ������� beq���� .L3���� --> .L3 is inside while loop. if (*s++ != '\n') ������� cmp���� x19, x20 �--> while(s <= end) ������� bhi���� .L2���� --> .L2 is label ret1. � ������� ldrb��� w3, [x19], 1� --> start of remainder ������� cmp���� w3, 10 ������� beq���� .L71 ������� cmp���� x2, 1 ������� beq���� .L3 ������� cmp���� x2, 2 ������� beq���� .L49 ������� cmp���� x2, 3 ������� beq���� .L50 ������� cmp���� x2, 4 ������� beq���� .L51 ������� cmp���� x2, 5 ������� beq���� .L52 ������� cmp���� x2, 6 ������� beq���� .L53 ������� ldrb��� w4, [x19], 1 ������� cmp���� w4, 10 ������� bne���� .L53 .L71: ������� cbz���� x21, .L4� --> if(check1 || check2) ������� bl����� foo() ������� mov���� x19, x0 ������� cbz���� x0, .L2�� --> if (!s) goto ret1; � .L4:��������������������� --> if(boo(s)) goto ret0; ������� mov���� x0, x19 ����� ��bl����� boo(char*) ������� cbz���� w0, .L70 ������� mov���� w0, 0 ������� ldp���� x19, x20, [sp, 16] ������� ldr���� x21, [sp, 32] ������� ldp���� x29, x30, [sp], 48 ������� ret .L53:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w5, [x19], 1 ������� cmp���� w5, 10 ������� beq���� .L71 .L52:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w6, [x19], 1 ������� cmp���� w6, 10 ������� beq���� .L71 .L51:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w7, [x19], 1 ������� cmp���� w7, 10 ������� beq���� .L71 .L50:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w8, [x19], 1 ������� cmp���� w8, 10 ������� beq���� .L71 .L49:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w9, [x19], 1 ������� cmp���� w9, 10 ������� beq���� .L71 .L3:������������������������� --> if (*s++ != '\n'), 7 times unrolled ������� cmp���� x19, x20 ������� bhi���� .L2 ������� ldrb��� w10, [x19] ������� add�� ��x19, x19, 1 ������� mov���� x11, x19 ������� cmp���� w10, 10 ������� beq���� .L71 ������� ldrb��� w12, [x19], 1 ������� cmp���� w12, 10 ������� beq���� .L71 ������� ldrb��� w13, [x11, 1] ������� add���� x19, x11, 2 ������� cmp���� w13, 10 ������� beq��� �.L71 ������� ldrb��� w14, [x11, 2] ������� add���� x19, x11, 3 ������� cmp���� w14, 10 ������� beq���� .L71 ������� ldrb��� w15, [x11, 3] ������� add���� x19, x11, 4 ������� cmp���� w15, 10 ������� beq���� .L71 ������� ldrb��� w16, [x11, 4] ������� add���� x19, x11, 5 ������� cmp���� w16, 10 ������� beq���� .L71 ������� ldrb��� w17, [x11, 5] ������� add���� x19, x11, 6 ������� cmp���� w17, 10 ������� beq���� .L71 ������� ldrb��� w18, [x11, 6] ������� add���� x19, x11, 7 ������� cmp���� w18, 10 ������� beq���� .L71 ������� b������ .L3 .L2:��������������������� --> label ret1 ������� mov���� w0, 1 ������� ldp���� x19, x20, [sp, 16] ������� ldr���� x21, [sp, 32] ������� ldp���� x29, x30, [sp], 48 ������� ret � I am sorry for showing you assembly output directly� It looks like the rtl level�s unroll pass of gcc unrolls above loop and I need to check it next week. Once I have clear idea about above unrolling from gcc, let me reduce the example more and let you know. � Thanks JinGu Kang � From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 18:27 To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits � � On 7/16/21 9:29 AM, Jingu Kang wrote: Hi Philip, Thanks for your kind reply.> A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.Yep, I am investigating benchmarks with llvm tip and I can see the llvm fails to unroll some loops with multiple exits.> B) One of your exits does not dominate your latch.� Those are generally hard > C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�� I am looking at perlbench recently and it has `goto` statements inside loop. The example is a reduced case. Right, but the gotos aren't relevant for your reduced test.� You can reduce further. When I look at the gcc�s output of the example, it looks like gcc unrolls only the below `if` statement block� � ��� if (*s++ != '\n') ����� continue; Your phrasing here does not parse for me.� Can you restate this with different wording and maybe a fully worked example? � Thanks JinGu Kang � From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 15:52 To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits � A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently. B) One of your exits does not dominate your latch.� Those are generally hard.� C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.� Philip On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: Hi All, � While I am investigating benchmarks, I have found loops which llvm fails to unroll because the loops have multiple exits. For example, � char *foo(void); int boo(char *s); � int test(char *s, char *end, char *check1, char *check2) { � while (s <= end) { ��� if (*s++ != '\n') ����� continue; ��� if (check1 || check2) { ����� s = foo(); ����� if (!s) ������� goto ret1; ��� }�� ����if (boo(s)) ����� goto ret0; � } � goto ret1; � ret0: � return 0; ret1: � return 1; } � Above code causes below messages from LoopUnroll pass. � Bailout for multi-exit handling when latch exit has >1 predecessor. Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled! � I can see the option `unroll-runtime-multi-exit` and comments about it. I wonder there are already reviews for the work on phabriactor or some people are working on it. If someone knows information about it, please share it. � Thanks JinGu Kang � � � � � � _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210720/fc741b3b/attachment-0001.html>
Philip Reames via llvm-dev
2021-Jul-20 17:17 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
./opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug Loop Unroll: F[test] Loop %while.body.us Loop Size = 10 runtime unrolling with count: 8 Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1 Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1, BreakoutTrip=1 Trying runtime unrolling on Loop: Loop at depth 1 containing: %while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting> Using prolog remainder. Bailout for multi-exit handling when latch exit has >1 predecessor. Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled! Won't unroll; remainder loop could not be generated when assuming runtime trip count ; ModuleID = '<stdin>' source_filename = "<stdin>" declare i32 @foo(i8*) define void @test(i8* %s, i64 %a) { entry: %s.addr.a = getelementptr i8, i8* %s, i64 %a br label %while.body.us while.body.us: ; preds = %while.cond.backedge, %entry %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 %incdec.val = load i8, i8* %s.addr, align 1 %cmp1 = icmp eq i8 %incdec.val, 10 br i1 %cmp1, label %if.end8.us, label %while.cond.backedge if.end8.us: ; preds = %while.body.us %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) %cmp2 = icmp ult i32 %call9, 0 br i1 %cmp2, label %while.cond.backedge, label %return.loopexit while.cond.backedge: ; preds = %if.end8.us, %while.body.us %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a br i1 %cmp3, label %return.loopexit, label %while.body.us return.loopexit: ; preds = %while.cond.backedge, %if.end8.us ret void } The debug output appears to give a pretty good clue as to where you could start if desired. Philip On 7/20/21 9:45 AM, Jingu Kang wrote:> > Hi Philip, > > I have reduced the test case roughly as below. > > declare i32 @foo(i8 *) > > define void @test(i8* %s, i64 %a) { > > entry: > > %s.addr.a = getelementptr i8, i8* %s, i64 %a > > br label %while.body.us > > while.body.us: > > %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] > > %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 > > %incdec.val = load i8, i8* %s.addr, align 1 > > %cmp1 = icmp eq i8 %incdec.val, 10 > > br i1 %cmp1, label %if.end8.us, label %while.cond.backedge > > if.end8.us: > > %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) > > %cmp2 = icmp ult i32 %call9, 0 > > br i1 %cmp2, label %while.cond.backedge, label %return.loopexit > > while.cond.backedge: > > %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a > > br i1 %cmp3, label %return.loopexit, label %while.body.us > > return.loopexit: > > ret void > > } > > Roughly, I unrolled the loop manually 2 times as below and ignored the > remaining loop simply. > > define void @test(i8* %s, i64 %a) { > > entry: > > %s.addr.a = getelementptr i8, i8* %s, i64 %a > > br label %while.body.us > > while.body.us: > > %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] > > %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 > > %incdec.val = load i8, i8* %s.addr, align 1 > > %cmp1 = icmp eq i8 %incdec.val, 10 > > br i1 %cmp1, label %if.end8.us, label %while.body.us.1 > > while.body.us.1: > > %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1 > > %incdec.val.1 = load i8, i8* %incdec.ptr, align 1 > > %cmp1.1 = icmp eq i8 %incdec.val.1, 10 > > br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge > > if.end8.us: > > %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ > %incdec.ptr.1, %while.body.us.1 ] > > %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi) > > %cmp2 = icmp ult i32 %call9, 0 > > br i1 %cmp2, label %while.cond.backedge, label %return.loopexit > > while.cond.backedge: > > %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a > > br i1 %cmp3, label %return.loopexit, label %while.body.us > > return.loopexit: > > ret void > > } > > If possible, can we make loop unroll pass handle this kind of cases > please? > > If I missed something, please let me know. > > Thanks > > JinGu Kang > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of > *Philip Reames via llvm-dev > *Sent:* 16 July 2021 20:12 > *To:* Jingu Kang <Jingu.Kang at arm.com> > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Question about Unrolling Loop with Multiple > Exits > > Jingu, > > I'm not fluent in AArch64 assembly, sorry.� If you want more > meaningful commentary, you'll need to reduce your c++ example further, > and give a better explanation of what you expect the unroller to do > for you.� > > Philip > > On 7/16/21 12:09 PM, Jingu Kang wrote: > > Sorry for poor example� > > � > > The AArch64 assembly output of the example from gcc is as below. > The loop is unrolled 7 times. I have written some comments to > explain how the assembly code is mapped to C source code. As you > can see on `.L3` label, the �if (*s++ != '\n')� block is > unrolled 7 times. > > � > > ������� stp���� x29, x30, [sp, -48]! > > ������� mov���� x29, sp > > ������� stp���� x19, x20, [sp, 16] > > ������� mov���� x19, x0� --> x19 is char *s > > ������� mov���� x20, x1� --> x20 is char *end > > ������� str���� x21, [sp, 32] > > ������� orr���� x21, x3, x2� --> x21 is > check1 | check2 > > .L70: > > ������� sub���� x0, x20, x19� --> x0 = end > - s; > > ������� add���� x1, x0, 1 > > ������� ands��� x2, x1, 7� --> unroll count is 7 > > ������� beq���� .L3���� --> .L3 is > inside while loop. if (*s++ != '\n') > > ������� cmp���� x19, x20 �--> while(s <= end) > > ������� bhi���� .L2���� --> .L2 is > label ret1. > > � > > ������� ldrb��� w3, [x19], 1� --> start of > remainder > > ������� cmp���� w3, 10 > > ������� beq���� .L71 > > ������� cmp���� x2, 1 > > ������� beq���� .L3 > > ������� cmp���� x2, 2 > > ������� beq���� .L49 > > ������� cmp���� x2, 3 > > ������� beq���� .L50 > > ������� cmp���� x2, 4 > > ������� beq���� .L51 > > ������� cmp���� x2, 5 > > ������� beq���� .L52 > > ������� cmp���� x2, 6 > > ������� beq���� .L53 > > ������� ldrb��� w4, [x19], 1 > > ������� cmp���� w4, 10 > > ������� bne���� .L53 > > .L71: > > ������� cbz���� x21, .L4� --> if(check1 || > check2) > > ������� bl����� foo() > > ������� mov���� x19, x0 > > ������� cbz���� x0, .L2�� --> if (!s) > goto ret1; > > � > > .L4:��������������������� > --> if(boo(s)) goto ret0; > > ������� mov���� x0, x19 > > ����� ��bl����� boo(char*) > > ������� cbz���� w0, .L70 > > ������� mov���� w0, 0 > > ������� ldp���� x19, x20, [sp, 16] > > ������� ldr���� x21, [sp, 32] > > ������� ldp���� x29, x30, [sp], 48 > > ������� ret > > .L53:��������� --> if (*s++ != '\n') for remainder > > ������� ldrb��� w5, [x19], 1 > > ������� cmp���� w5, 10 > > ������� beq���� .L71 > > .L52:��������� --> if (*s++ != '\n') for remainder > > ������� ldrb��� w6, [x19], 1 > > ������� cmp���� w6, 10 > > ������� beq���� .L71 > > .L51:��������� --> if (*s++ != '\n') for remainder > > ������� ldrb��� w7, [x19], 1 > > ������� cmp���� w7, 10 > > ������� beq���� .L71 > > .L50:��������� --> if (*s++ != '\n') for remainder > > ������� ldrb��� w8, [x19], 1 > > ������� cmp���� w8, 10 > > ������� beq���� .L71 > > .L49:��������� --> if (*s++ != '\n') for remainder > > ������� ldrb��� w9, [x19], 1 > > ������� cmp���� w9, 10 > > ������� beq���� .L71 > > .L3:������������������������� > --> if (*s++ != '\n'), 7 times unrolled > > ������� cmp���� x19, x20 > > ������� bhi���� .L2 > > ������� ldrb��� w10, [x19] > > ������� add�� ��x19, x19, 1 > > ������� mov���� x11, x19 > > ������� cmp���� w10, 10 > > ������� beq���� .L71 > > ������� ldrb��� w12, [x19], 1 > > ������� cmp���� w12, 10 > > ������� beq���� .L71 > > ������� ldrb��� w13, [x11, 1] > > ������� add���� x19, x11, 2 > > ������� cmp���� w13, 10 > > ������� beq��� �.L71 > > ������� ldrb��� w14, [x11, 2] > > ������� add���� x19, x11, 3 > > ������� cmp���� w14, 10 > > ������� beq���� .L71 > > ������� ldrb��� w15, [x11, 3] > > ������� add���� x19, x11, 4 > > ������� cmp���� w15, 10 > > ������� beq���� .L71 > > ������� ldrb��� w16, [x11, 4] > > ������� add���� x19, x11, 5 > > ������� cmp���� w16, 10 > > ������� beq���� .L71 > > ������� ldrb��� w17, [x11, 5] > > ������� add���� x19, x11, 6 > > ������� cmp���� w17, 10 > > ������� beq���� .L71 > > ������� ldrb��� w18, [x11, 6] > > ������� add���� x19, x11, 7 > > ������� cmp���� w18, 10 > > ������� beq���� .L71 > > ������� b������ .L3 > > .L2:��������������������� > --> label ret1 > > ������� mov���� w0, 1 > > ������� ldp���� x19, x20, [sp, 16] > > ������� ldr���� x21, [sp, 32] > > ������� ldp���� x29, x30, [sp], 48 > > ������� ret > > � > > I am sorry for showing you assembly output directly� It looks > like the rtl level�s unroll pass of gcc unrolls above loop and I > need to check it next week. Once I have clear idea about above > unrolling from gcc, let me reduce the example more and let you know. > > � > > Thanks > > JinGu Kang > > � > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> > <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip > Reames via llvm-dev > *Sent:* 16 July 2021 18:27 > *To:* Jingu Kang <Jingu.Kang at arm.com> <mailto:Jingu.Kang at arm.com> > *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] Question about Unrolling Loop with > Multiple Exits > > � > > � > > On 7/16/21 9:29 AM, Jingu Kang wrote: > > Hi Philip, > > Thanks for your kind reply. > > > A) Are you measuring on tip of tree?� There were changes > for multiple exit unrolling which landed very recently. > > Yep, I am investigating benchmarks with llvm tip and I can see > the llvm fails to unroll some loops with multiple exits. > > > B) One of your exits does not dominate your latch.� Those > are generally hard > > > C) This example does not seem to require gotos.� I > strongly suggest reducing your test cases if you want more > informed commentary.� > > � > > I am looking at perlbench recently and it has `goto` > statements inside loop. The example is a reduced case. > > Right, but the gotos aren't relevant for your reduced test.� You > can reduce further. > > > When I look at the gcc�s output of the example, it looks > like gcc unrolls only the below `if` statement block� > > � > > ��� if (*s++ != '\n') > > ����� continue; > > Your phrasing here does not parse for me.� Can you restate this > with different wording and maybe a fully worked example? > > > � > > Thanks > > JinGu Kang > > � > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> > <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip > Reames via llvm-dev > *Sent:* 16 July 2021 15:52 > *To:* Jingu Kang <Jingu.Kang at arm.com> > <mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] Question about Unrolling Loop with > Multiple Exits > > � > > A) Are you measuring on tip of tree?� There were changes for > multiple exit unrolling which landed very recently. > > B) One of your exits does not dominate your latch.� Those > are generally hard.� > > C) This example does not seem to require gotos.� I strongly > suggest reducing your test cases if you want more informed > commentary.� > > Philip > > On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: > > Hi All, > > � > > While I am investigating benchmarks, I have found loops > which llvm fails to unroll because the loops have multiple > exits. > > For example, > > � > > char *foo(void); > > int boo(char *s); > > � > > int test(char *s, char *end, char *check1, char *check2) { > > � while (s <= end) { > > ��� if (*s++ != '\n') > > ����� continue; > > ��� if (check1 || check2) { > > ����� s = foo(); > > ����� if (!s) > > ������� goto ret1; > > ��� }�� > > ����if (boo(s)) > > ����� goto ret0; > > � } > > � goto ret1; > > � > > ret0: > > � return 0; > > ret1: > > � return 1; > > } > > � > > Above code causes below messages from LoopUnroll pass. > > � > > Bailout for multi-exit handling when latch exit has >1 > predecessor. > > Multiple exit/exiting blocks in loop and multi-exit > unrolling not enabled! > > � > > I can see the option `unroll-runtime-multi-exit` and > comments about it. I wonder there are already reviews for > the work on phabriactor or some people are working on it. > > If someone knows information about it, please share it. > > � > > Thanks > > JinGu Kang > > � > > � > > � > > � > > � > > � > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210720/dc3a3a47/attachment.html>
Philip Reames via llvm-dev
2021-Aug-03 18:30 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
JFYI, https://reviews.llvm.org/D107381 works in this direction, and the reduced test case does unroll with that patch and the following command line. $ opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug -unroll-runtime-multi-exit -unroll-runtime-epilog On 7/20/21 10:17 AM, Philip Reames wrote:> ./opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime > -debug > Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug > Loop Unroll: F[test] Loop %while.body.us > Loop Size = 10 > runtime unrolling with count: 8 > Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1 > Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1, > BreakoutTrip=1 > Trying runtime unrolling on Loop: > Loop at depth 1 containing: > %while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting> > Using prolog remainder. > Bailout for multi-exit handling when latch exit has >1 predecessor. > Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled! > Won't unroll; remainder loop could not be generated when assuming > runtime trip count > ; ModuleID = '<stdin>' > source_filename = "<stdin>" > > declare i32 @foo(i8*) > > define void @test(i8* %s, i64 %a) { > entry: > %s.addr.a = getelementptr i8, i8* %s, i64 %a > br label %while.body.us > > while.body.us: ; preds = > %while.cond.backedge, %entry > %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] > %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 > %incdec.val = load i8, i8* %s.addr, align 1 > %cmp1 = icmp eq i8 %incdec.val, 10 > br i1 %cmp1, label %if.end8.us, label %while.cond.backedge > > if.end8.us: ; preds = %while.body.us > %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) > %cmp2 = icmp ult i32 %call9, 0 > br i1 %cmp2, label %while.cond.backedge, label %return.loopexit > > while.cond.backedge: ; preds = > %if.end8.us, %while.body.us > %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a > br i1 %cmp3, label %return.loopexit, label %while.body.us > > return.loopexit: ; preds = > %while.cond.backedge, %if.end8.us > ret void > } > > The debug output appears to give a pretty good clue as to where you > could start if desired. > > Philip > > On 7/20/21 9:45 AM, Jingu Kang wrote: >> >> Hi Philip, >> >> I have reduced the test case roughly as below. >> >> declare i32 @foo(i8 *) >> >> define void @test(i8* %s, i64 %a) { >> >> entry: >> >> %s.addr.a = getelementptr i8, i8* %s, i64 %a >> >> br label %while.body.us >> >> while.body.us: >> >> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] >> >> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 >> >> %incdec.val = load i8, i8* %s.addr, align 1 >> >> %cmp1 = icmp eq i8 %incdec.val, 10 >> >> br i1 %cmp1, label %if.end8.us, label %while.cond.backedge >> >> if.end8.us: >> >> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) >> >> %cmp2 = icmp ult i32 %call9, 0 >> >> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit >> >> while.cond.backedge: >> >> %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a >> >> br i1 %cmp3, label %return.loopexit, label %while.body.us >> >> return.loopexit: >> >> ret void >> >> } >> >> Roughly, I unrolled the loop manually 2 times as below and ignored >> the remaining loop simply. >> >> define void @test(i8* %s, i64 %a) { >> >> entry: >> >> %s.addr.a = getelementptr i8, i8* %s, i64 %a >> >> br label %while.body.us >> >> while.body.us: >> >> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] >> >> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 >> >> %incdec.val = load i8, i8* %s.addr, align 1 >> >> %cmp1 = icmp eq i8 %incdec.val, 10 >> >> br i1 %cmp1, label %if.end8.us, label %while.body.us.1 >> >> while.body.us.1: >> >> %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1 >> >> %incdec.val.1 = load i8, i8* %incdec.ptr, align 1 >> >> %cmp1.1 = icmp eq i8 %incdec.val.1, 10 >> >> br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge >> >> if.end8.us: >> >> %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ >> %incdec.ptr.1, %while.body.us.1 ] >> >> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi) >> >> %cmp2 = icmp ult i32 %call9, 0 >> >> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit >> >> while.cond.backedge: >> >> %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a >> >> br i1 %cmp3, label %return.loopexit, label %while.body.us >> >> return.loopexit: >> >> ret void >> >> } >> >> If possible, can we make loop unroll pass handle this kind of cases >> please? >> >> If I missed something, please let me know. >> >> Thanks >> >> JinGu Kang >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of >> *Philip Reames via llvm-dev >> *Sent:* 16 July 2021 20:12 >> *To:* Jingu Kang <Jingu.Kang at arm.com> >> *Cc:* llvm-dev at lists.llvm.org >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with Multiple >> Exits >> >> Jingu, >> >> I'm not fluent in AArch64 assembly, sorry.� If you want more >> meaningful commentary, you'll need to reduce your c++ example >> further, and give a better explanation of what you expect the >> unroller to do for you.� >> >> Philip >> >> On 7/16/21 12:09 PM, Jingu Kang wrote: >> >> Sorry for poor example� >> >> � >> >> The AArch64 assembly output of the example from gcc is as below. >> The loop is unrolled 7 times. I have written some comments to >> explain how the assembly code is mapped to C source code. As you >> can see on `.L3` label, the �if (*s++ != '\n')� block is >> unrolled 7 times. >> >> � >> >> ������� stp���� x29, x30, [sp, -48]! >> >> ������� mov���� x29, sp >> >> ������� stp���� x19, x20, [sp, 16] >> >> ������� mov���� x19, x0� --> x19 is char *s >> >> ������� mov���� x20, x1� --> x20 is char *end >> >> ������� str���� x21, [sp, 32] >> >> ������� orr���� x21, x3, x2� --> x21 is >> check1 | check2 >> >> .L70: >> >> ������� sub���� x0, x20, x19� --> x0 >> end - s; >> >> ������� add���� x1, x0, 1 >> >> ������� ands��� x2, x1, 7� --> unroll count >> is 7 >> >> ������� beq���� .L3���� --> .L3 is >> inside while loop. if (*s++ != '\n') >> >> ������� cmp���� x19, x20 �--> while(s <= end) >> >> ������� bhi���� .L2���� --> .L2 is >> label ret1. >> >> � >> >> ������� ldrb��� w3, [x19], 1� --> start of >> remainder >> >> ������� cmp���� w3, 10 >> >> ������� beq���� .L71 >> >> ������� cmp���� x2, 1 >> >> ������� beq���� .L3 >> >> ������� cmp���� x2, 2 >> >> ������� beq���� .L49 >> >> ������� cmp���� x2, 3 >> >> ������� beq���� .L50 >> >> ������� cmp���� x2, 4 >> >> ������� beq���� .L51 >> >> ������� cmp���� x2, 5 >> >> ������� beq���� .L52 >> >> ������� cmp���� x2, 6 >> >> ������� beq���� .L53 >> >> ������� ldrb��� w4, [x19], 1 >> >> ������� cmp���� w4, 10 >> >> ������� bne���� .L53 >> >> .L71: >> >> ������� cbz���� x21, .L4� --> if(check1 >> || check2) >> >> ������� bl����� foo() >> >> ������� mov���� x19, x0 >> >> ������� cbz���� x0, .L2�� --> if (!s) >> goto ret1; >> >> � >> >> .L4:��������������������� >> --> if(boo(s)) goto ret0; >> >> ������� mov���� x0, x19 >> >> ����� ��bl����� boo(char*) >> >> ������� cbz���� w0, .L70 >> >> ������� mov���� w0, 0 >> >> ������� ldp���� x19, x20, [sp, 16] >> >> ������� ldr���� x21, [sp, 32] >> >> ������� ldp���� x29, x30, [sp], 48 >> >> ������� ret >> >> .L53:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w5, [x19], 1 >> >> ������� cmp���� w5, 10 >> >> ������� beq���� .L71 >> >> .L52:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w6, [x19], 1 >> >> ������� cmp���� w6, 10 >> >> ������� beq���� .L71 >> >> .L51:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w7, [x19], 1 >> >> ������� cmp���� w7, 10 >> >> ������� beq���� .L71 >> >> .L50:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w8, [x19], 1 >> >> ������� cmp���� w8, 10 >> >> ������� beq���� .L71 >> >> .L49:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w9, [x19], 1 >> >> ������� cmp���� w9, 10 >> >> ������� beq���� .L71 >> >> .L3:������������������������� >> --> if (*s++ != '\n'), 7 times unrolled >> >> ������� cmp���� x19, x20 >> >> ������� bhi���� .L2 >> >> ������� ldrb��� w10, [x19] >> >> ������� add�� ��x19, x19, 1 >> >> ������� mov���� x11, x19 >> >> ������� cmp���� w10, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w12, [x19], 1 >> >> ������� cmp���� w12, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w13, [x11, 1] >> >> ������� add���� x19, x11, 2 >> >> ������� cmp���� w13, 10 >> >> ������� beq��� �.L71 >> >> ������� ldrb��� w14, [x11, 2] >> >> ������� add���� x19, x11, 3 >> >> ������� cmp���� w14, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w15, [x11, 3] >> >> ������� add���� x19, x11, 4 >> >> ������� cmp���� w15, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w16, [x11, 4] >> >> ������� add���� x19, x11, 5 >> >> ������� cmp���� w16, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w17, [x11, 5] >> >> ������� add���� x19, x11, 6 >> >> ������� cmp���� w17, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w18, [x11, 6] >> >> ������� add���� x19, x11, 7 >> >> ������� cmp���� w18, 10 >> >> ������� beq���� .L71 >> >> ������� b������ .L3 >> >> .L2:��������������������� >> --> label ret1 >> >> ������� mov���� w0, 1 >> >> ������� ldp���� x19, x20, [sp, 16] >> >> ������� ldr���� x21, [sp, 32] >> >> ������� ldp���� x29, x30, [sp], 48 >> >> ������� ret >> >> � >> >> I am sorry for showing you assembly output directly� It looks >> like the rtl level�s unroll pass of gcc unrolls above loop and >> I need to check it next week. Once I have clear idea about above >> unrolling from gcc, let me reduce the example more and let you know. >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> >> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip >> Reames via llvm-dev >> *Sent:* 16 July 2021 18:27 >> *To:* Jingu Kang <Jingu.Kang at arm.com> <mailto:Jingu.Kang at arm.com> >> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with >> Multiple Exits >> >> � >> >> � >> >> On 7/16/21 9:29 AM, Jingu Kang wrote: >> >> Hi Philip, >> >> Thanks for your kind reply. >> >> > A) Are you measuring on tip of tree?� There were changes >> for multiple exit unrolling which landed very recently. >> >> Yep, I am investigating benchmarks with llvm tip and I can >> see the llvm fails to unroll some loops with multiple exits. >> >> > B) One of your exits does not dominate your latch.� Those >> are generally hard >> >> > C) This example does not seem to require gotos.� I >> strongly suggest reducing your test cases if you want more >> informed commentary.� >> >> � >> >> I am looking at perlbench recently and it has `goto` >> statements inside loop. The example is a reduced case. >> >> Right, but the gotos aren't relevant for your reduced test.� >> You can reduce further. >> >> >> When I look at the gcc�s output of the example, it looks >> like gcc unrolls only the below `if` statement block� >> >> � >> >> ��� if (*s++ != '\n') >> >> ����� continue; >> >> Your phrasing here does not parse for me.� Can you restate this >> with different wording and maybe a fully worked example? >> >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> >> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of >> *Philip Reames via llvm-dev >> *Sent:* 16 July 2021 15:52 >> *To:* Jingu Kang <Jingu.Kang at arm.com> >> <mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org> >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with >> Multiple Exits >> >> � >> >> A) Are you measuring on tip of tree?� There were changes >> for multiple exit unrolling which landed very recently. >> >> B) One of your exits does not dominate your latch.� Those >> are generally hard.� >> >> C) This example does not seem to require gotos.� I strongly >> suggest reducing your test cases if you want more informed >> commentary.� >> >> Philip >> >> On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: >> >> Hi All, >> >> � >> >> While I am investigating benchmarks, I have found loops >> which llvm fails to unroll because the loops have >> multiple exits. >> >> For example, >> >> � >> >> char *foo(void); >> >> int boo(char *s); >> >> � >> >> int test(char *s, char *end, char *check1, char *check2) { >> >> � while (s <= end) { >> >> ��� if (*s++ != '\n') >> >> ����� continue; >> >> ��� if (check1 || check2) { >> >> ����� s = foo(); >> >> ����� if (!s) >> >> ������� goto ret1; >> >> ��� }�� >> >> ����if (boo(s)) >> >> ����� goto ret0; >> >> � } >> >> � goto ret1; >> >> � >> >> ret0: >> >> � return 0; >> >> ret1: >> >> � return 1; >> >> } >> >> � >> >> Above code causes below messages from LoopUnroll pass. >> >> � >> >> Bailout for multi-exit handling when latch exit has >1 >> predecessor. >> >> Multiple exit/exiting blocks in loop and multi-exit >> unrolling not enabled! >> >> � >> >> I can see the option `unroll-runtime-multi-exit` and >> comments about it. I wonder there are already reviews for >> the work on phabriactor or some people are working on it. >> >> If someone knows information about it, please share it. >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> � >> >> � >> >> � >> >> � >> >> � >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210803/b0f20a57/attachment.html>