Philip Reames via llvm-dev
2021-Jul-16 19:11 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
Jingu, I'm not fluent in AArch64 assembly, sorry. If you want more meaningful commentary, you'll need to reduce your c++ example further, and give a better explanation of what you expect the unroller to do for you. Philip On 7/16/21 12:09 PM, Jingu Kang wrote:> > Sorry for poor example… > > The AArch64 assembly output of the example from gcc is as below. The > loop is unrolled 7 times. I have written some comments to explain how > the assembly code is mapped to C source code. As you can see on `.L3` > label, the ‘if (*s++ != '\n')’ block is unrolled 7 times. > > stp x29, x30, [sp, -48]! > > mov x29, sp > > stp x19, x20, [sp, 16] > > mov x19, x0 --> x19 is char *s > > mov x20, x1 --> x20 is char *end > > str x21, [sp, 32] > > orr x21, x3, x2 --> x21 is check1 | check2 > > .L70: > > sub x0, x20, x19 --> x0 = end - s; > > add x1, x0, 1 > > ands x2, x1, 7 --> unroll count is 7 > > beq .L3 --> .L3 is inside while loop. if (*s++ != '\n') > > cmp x19, x20 --> while(s <= end) > > bhi .L2 --> .L2 is label ret1. > > ldrb w3, [x19], 1 --> start of remainder > > cmp w3, 10 > > beq .L71 > > cmp x2, 1 > > beq .L3 > > cmp x2, 2 > > beq .L49 > > cmp x2, 3 > > beq .L50 > > cmp x2, 4 > > beq .L51 > > cmp x2, 5 > > beq .L52 > > cmp x2, 6 > > beq .L53 > > ldrb w4, [x19], 1 > > cmp w4, 10 > > bne .L53 > > .L71: > > cbz x21, .L4 --> if(check1 || check2) > > bl foo() > > mov x19, x0 > > cbz x0, .L2 --> if (!s) goto ret1; > > .L4: --> if(boo(s)) goto ret0; > > mov x0, x19 > > bl boo(char*) > > cbz w0, .L70 > > mov w0, 0 > > ldp x19, x20, [sp, 16] > > ldr x21, [sp, 32] > > ldp x29, x30, [sp], 48 > > ret > > .L53: --> if (*s++ != '\n') for remainder > > ldrb w5, [x19], 1 > > cmp w5, 10 > > beq .L71 > > .L52: --> if (*s++ != '\n') for remainder > > ldrb w6, [x19], 1 > > cmp w6, 10 > > beq .L71 > > .L51: --> if (*s++ != '\n') for remainder > > ldrb w7, [x19], 1 > > cmp w7, 10 > > beq .L71 > > .L50: --> if (*s++ != '\n') for remainder > > ldrb w8, [x19], 1 > > cmp w8, 10 > > beq .L71 > > .L49: --> if (*s++ != '\n') for remainder > > ldrb w9, [x19], 1 > > cmp w9, 10 > > beq .L71 > > .L3: --> if (*s++ != '\n'), 7 times unrolled > > cmp x19, x20 > > bhi .L2 > > ldrb w10, [x19] > > add x19, x19, 1 > > mov x11, x19 > > cmp w10, 10 > > beq .L71 > > ldrb w12, [x19], 1 > > cmp w12, 10 > > beq .L71 > > ldrb w13, [x11, 1] > > add x19, x11, 2 > > cmp w13, 10 > > beq .L71 > > ldrb w14, [x11, 2] > > add x19, x11, 3 > > cmp w14, 10 > > beq .L71 > > ldrb w15, [x11, 3] > > add x19, x11, 4 > > cmp w15, 10 > > beq .L71 > > ldrb w16, [x11, 4] > > add x19, x11, 5 > > cmp w16, 10 > > beq .L71 > > ldrb w17, [x11, 5] > > add x19, x11, 6 > > cmp w17, 10 > > beq .L71 > > ldrb w18, [x11, 6] > > add x19, x11, 7 > > cmp w18, 10 > > beq .L71 > > b .L3 > > .L2: --> label ret1 > > mov w0, 1 > > ldp x19, x20, [sp, 16] > > ldr x21, [sp, 32] > > ldp x29, x30, [sp], 48 > > ret > > I am sorry for showing you assembly output directly… It looks like the > rtl level’s unroll pass of gcc unrolls above loop and I need to check > it next week. Once I have clear idea about above unrolling from gcc, > let me reduce the example more and let you know. > > Thanks > > JinGu Kang > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of > *Philip Reames via llvm-dev > *Sent:* 16 July 2021 18:27 > *To:* Jingu Kang <Jingu.Kang at arm.com> > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Question about Unrolling Loop with Multiple > Exits > > On 7/16/21 9:29 AM, Jingu Kang wrote: > > Hi Philip, > > Thanks for your kind reply. > > > A) Are you measuring on tip of tree?� There were changes for > multiple exit unrolling which landed very recently. > > Yep, I am investigating benchmarks with llvm tip and I can see the > llvm fails to unroll some loops with multiple exits. > > > B) One of your exits does not dominate your latch.� Those are > generally hard > > > C) This example does not seem to require gotos.� I strongly > suggest reducing your test cases if you want more informed > commentary.� > > � > > I am looking at perlbench recently and it has `goto` statements > inside loop. The example is a reduced case. > > Right, but the gotos aren't relevant for your reduced test.� You can > reduce further. > > When I look at the gcc�s output of the example, it looks like > gcc unrolls only the below `if` statement block� > > � > > ��� if (*s++ != '\n') > > ����� continue; > > Your phrasing here does not parse for me.� Can you restate this with > different wording and maybe a fully worked example? > > � > > Thanks > > JinGu Kang > > � > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> > <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip > Reames via llvm-dev > *Sent:* 16 July 2021 15:52 > *To:* Jingu Kang <Jingu.Kang at arm.com> <mailto:Jingu.Kang at arm.com>; > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] Question about Unrolling Loop with > Multiple Exits > > � > > A) Are you measuring on tip of tree?� There were changes for > multiple exit unrolling which landed very recently. > > B) One of your exits does not dominate your latch.� Those are > generally hard.� > > C) This example does not seem to require gotos.� I strongly > suggest reducing your test cases if you want more informed > commentary.� > > Philip > > On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: > > Hi All, > > � > > While I am investigating benchmarks, I have found loops which > llvm fails to unroll because the loops have multiple exits. > > For example, > > � > > char *foo(void); > > int boo(char *s); > > � > > int test(char *s, char *end, char *check1, char *check2) { > > � while (s <= end) { > > ��� if (*s++ != '\n') > > ����� continue; > > ��� if (check1 || check2) { > > ����� s = foo(); > > ����� if (!s) > > ������� goto ret1; > > ��� }�� > > ����if (boo(s)) > > ����� goto ret0; > > � } > > � goto ret1; > > � > > ret0: > > � return 0; > > ret1: > > � return 1; > > } > > � > > Above code causes below messages from LoopUnroll pass. > > � > > Bailout for multi-exit handling when latch exit has >1 > predecessor. > > Multiple exit/exiting blocks in loop and multi-exit unrolling > not enabled! > > � > > I can see the option `unroll-runtime-multi-exit` and comments > about it. I wonder there are already reviews for the work on > phabriactor or some people are working on it. > > If someone knows information about it, please share it. > > � > > Thanks > > JinGu Kang > > � > > � > > � > > � > > � > > � > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210716/003b4b1c/attachment.html>
Jingu Kang via llvm-dev
2021-Jul-20 16:45 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
Hi Philip, I have reduced the test case roughly as below. declare i32 @foo(i8 *) define void @test(i8* %s, i64 %a) { entry: %s.addr.a = getelementptr i8, i8* %s, i64 %a br label %while.body.us while.body.us: %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 %incdec.val = load i8, i8* %s.addr, align 1 %cmp1 = icmp eq i8 %incdec.val, 10 br i1 %cmp1, label %if.end8.us, label %while.cond.backedge if.end8.us: %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) %cmp2 = icmp ult i32 %call9, 0 br i1 %cmp2, label %while.cond.backedge, label %return.loopexit while.cond.backedge: %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a br i1 %cmp3, label %return.loopexit, label %while.body.us return.loopexit: ret void } Roughly, I unrolled the loop manually 2 times as below and ignored the remaining loop simply. define void @test(i8* %s, i64 %a) { entry: %s.addr.a = getelementptr i8, i8* %s, i64 %a br label %while.body.us while.body.us: %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 %incdec.val = load i8, i8* %s.addr, align 1 %cmp1 = icmp eq i8 %incdec.val, 10 br i1 %cmp1, label %if.end8.us, label %while.body.us.1 while.body.us.1: %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1 %incdec.val.1 = load i8, i8* %incdec.ptr, align 1 %cmp1.1 = icmp eq i8 %incdec.val.1, 10 br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge if.end8.us: %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ %incdec.ptr.1, %while.body.us.1 ] %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi) %cmp2 = icmp ult i32 %call9, 0 br i1 %cmp2, label %while.cond.backedge, label %return.loopexit while.cond.backedge: %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a br i1 %cmp3, label %return.loopexit, label %while.body.us return.loopexit: ret void } If possible, can we make loop unroll pass handle this kind of cases please? If I missed something, please let me know. Thanks JinGu Kang From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 20:12 To: Jingu Kang <Jingu.Kang at arm.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits Jingu, I'm not fluent in AArch64 assembly, sorry.� If you want more meaningful commentary, you'll need to reduce your c++ example further, and give a better explanation of what you expect the unroller to do for you.� Philip On 7/16/21 12:09 PM, Jingu Kang wrote: Sorry for poor example� � The AArch64 assembly output of the example from gcc is as below. The loop is unrolled 7 times. I have written some comments to explain how the assembly code is mapped to C source code. As you can see on `.L3` label, the �if (*s++ != '\n')� block is unrolled 7 times. � ������� stp���� x29, x30, [sp, -48]! ������� mov���� x29, sp ������� stp���� x19, x20, [sp, 16] ������� mov���� x19, x0� --> x19 is char *s ������� mov���� x20, x1� --> x20 is char *end ������� str���� x21, [sp, 32] ������� orr���� x21, x3, x2� --> x21 is check1 | check2 .L70: ������� sub���� x0, x20, x19� --> x0 = end - s; ������� add���� x1, x0, 1 ������� ands��� x2, x1, 7� --> unroll count is 7 ������� beq���� .L3���� --> .L3 is inside while loop. if (*s++ != '\n') ������� cmp���� x19, x20 �--> while(s <= end) ������� bhi���� .L2���� --> .L2 is label ret1. � ������� ldrb��� w3, [x19], 1� --> start of remainder ������� cmp���� w3, 10 ������� beq���� .L71 ������� cmp���� x2, 1 ������� beq���� .L3 ������� cmp���� x2, 2 ������� beq���� .L49 ������� cmp���� x2, 3 ������� beq���� .L50 ������� cmp���� x2, 4 ������� beq���� .L51 ������� cmp���� x2, 5 ������� beq���� .L52 ������� cmp���� x2, 6 ������� beq���� .L53 ������� ldrb��� w4, [x19], 1 ������� cmp���� w4, 10 ������� bne���� .L53 .L71: ������� cbz���� x21, .L4� --> if(check1 || check2) ������� bl����� foo() ������� mov���� x19, x0 ������� cbz���� x0, .L2�� --> if (!s) goto ret1; � .L4:��������������������� --> if(boo(s)) goto ret0; ������� mov���� x0, x19 ����� ��bl����� boo(char*) ������� cbz���� w0, .L70 ������� mov���� w0, 0 ������� ldp���� x19, x20, [sp, 16] ������� ldr���� x21, [sp, 32] ������� ldp���� x29, x30, [sp], 48 ������� ret .L53:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w5, [x19], 1 ������� cmp���� w5, 10 ������� beq���� .L71 .L52:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w6, [x19], 1 ������� cmp���� w6, 10 ������� beq���� .L71 .L51:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w7, [x19], 1 ������� cmp���� w7, 10 ������� beq���� .L71 .L50:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w8, [x19], 1 ������� cmp���� w8, 10 ������� beq���� .L71 .L49:��������� --> if (*s++ != '\n') for remainder ������� ldrb��� w9, [x19], 1 ������� cmp���� w9, 10 ������� beq���� .L71 .L3:������������������������� --> if (*s++ != '\n'), 7 times unrolled ������� cmp���� x19, x20 ������� bhi���� .L2 ������� ldrb��� w10, [x19] ������� add�� ��x19, x19, 1 ������� mov���� x11, x19 ������� cmp���� w10, 10 ������� beq���� .L71 ������� ldrb��� w12, [x19], 1 ������� cmp���� w12, 10 ������� beq���� .L71 ������� ldrb��� w13, [x11, 1] ������� add���� x19, x11, 2 ������� cmp���� w13, 10 ������� beq��� �.L71 ������� ldrb��� w14, [x11, 2] ������� add���� x19, x11, 3 ������� cmp���� w14, 10 ������� beq���� .L71 ������� ldrb��� w15, [x11, 3] ������� add���� x19, x11, 4 ������� cmp���� w15, 10 ������� beq���� .L71 ������� ldrb��� w16, [x11, 4] ������� add���� x19, x11, 5 ������� cmp���� w16, 10 ������� beq���� .L71 ������� ldrb��� w17, [x11, 5] ������� add���� x19, x11, 6 ������� cmp���� w17, 10 ������� beq���� .L71 ������� ldrb��� w18, [x11, 6] ������� add���� x19, x11, 7 ������� cmp���� w18, 10 ������� beq���� .L71 ������� b������ .L3 .L2:��������������������� --> label ret1 ������� mov���� w0, 1 ������� ldp���� x19, x20, [sp, 16] ������� ldr���� x21, [sp, 32] ������� ldp���� x29, x30, [sp], 48 ������� ret � I am sorry for showing you assembly output directly� It looks like the rtl level�s unroll pass of gcc unrolls above loop and I need to check it next week. Once I have clear idea about above unrolling from gcc, let me reduce the example more and let you know. � Thanks JinGu Kang � From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 18:27 To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits � � On 7/16/21 9:29 AM, Jingu Kang wrote: Hi Philip, Thanks for your kind reply.> A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently.Yep, I am investigating benchmarks with llvm tip and I can see the llvm fails to unroll some loops with multiple exits.> B) One of your exits does not dominate your latch.� Those are generally hard > C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.�� I am looking at perlbench recently and it has `goto` statements inside loop. The example is a reduced case. Right, but the gotos aren't relevant for your reduced test.� You can reduce further. When I look at the gcc�s output of the example, it looks like gcc unrolls only the below `if` statement block� � ��� if (*s++ != '\n') ����� continue; Your phrasing here does not parse for me.� Can you restate this with different wording and maybe a fully worked example? � Thanks JinGu Kang � From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip Reames via llvm-dev Sent: 16 July 2021 15:52 To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits � A) Are you measuring on tip of tree?� There were changes for multiple exit unrolling which landed very recently. B) One of your exits does not dominate your latch.� Those are generally hard.� C) This example does not seem to require gotos.� I strongly suggest reducing your test cases if you want more informed commentary.� Philip On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: Hi All, � While I am investigating benchmarks, I have found loops which llvm fails to unroll because the loops have multiple exits. For example, � char *foo(void); int boo(char *s); � int test(char *s, char *end, char *check1, char *check2) { � while (s <= end) { ��� if (*s++ != '\n') ����� continue; ��� if (check1 || check2) { ����� s = foo(); ����� if (!s) ������� goto ret1; ��� }�� ����if (boo(s)) ����� goto ret0; � } � goto ret1; � ret0: � return 0; ret1: � return 1; } � Above code causes below messages from LoopUnroll pass. � Bailout for multi-exit handling when latch exit has >1 predecessor. Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled! � I can see the option `unroll-runtime-multi-exit` and comments about it. I wonder there are already reviews for the work on phabriactor or some people are working on it. If someone knows information about it, please share it. � Thanks JinGu Kang � � � � � � _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210720/fc741b3b/attachment-0001.html>