Jingu Kang via llvm-dev
2021-Jul-20 16:45 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
Hi Philip,
I have reduced the test case roughly as below.
declare i32 @foo(i8 *)
define void @test(i8* %s, i64 %a) {
entry:
%s.addr.a = getelementptr i8, i8* %s, i64 %a
br label %while.body.us
while.body.us:
%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
%incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
%incdec.val = load i8, i8* %s.addr, align 1
%cmp1 = icmp eq i8 %incdec.val, 10
br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
if.end8.us:
%call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
%cmp2 = icmp ult i32 %call9, 0
br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
while.cond.backedge:
%cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
br i1 %cmp3, label %return.loopexit, label %while.body.us
return.loopexit:
ret void
}
Roughly, I unrolled the loop manually 2 times as below and ignored the remaining
loop simply.
define void @test(i8* %s, i64 %a) {
entry:
%s.addr.a = getelementptr i8, i8* %s, i64 %a
br label %while.body.us
while.body.us:
%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
%incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
%incdec.val = load i8, i8* %s.addr, align 1
%cmp1 = icmp eq i8 %incdec.val, 10
br i1 %cmp1, label %if.end8.us, label %while.body.us.1
while.body.us.1:
%incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1
%incdec.val.1 = load i8, i8* %incdec.ptr, align 1
%cmp1.1 = icmp eq i8 %incdec.val.1, 10
br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge
if.end8.us:
%incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ %incdec.ptr.1,
%while.body.us.1 ]
%call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi)
%cmp2 = icmp ult i32 %call9, 0
br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
while.cond.backedge:
%cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a
br i1 %cmp3, label %return.loopexit, label %while.body.us
return.loopexit:
ret void
}
If possible, can we make loop unroll pass handle this kind of cases please?
If I missed something, please let me know.
Thanks
JinGu Kang
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Philip
Reames via llvm-dev
Sent: 16 July 2021 20:12
To: Jingu Kang <Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
Jingu,
I'm not fluent in AArch64 assembly, sorry.� If you want more meaningful
commentary, you'll need to reduce your c++ example further, and give a
better explanation of what you expect the unroller to do for you.�
Philip
On 7/16/21 12:09 PM, Jingu Kang wrote:
Sorry for poor example�
�
The AArch64 assembly output of the example from gcc is as below. The loop is
unrolled 7 times. I have written some comments to explain how the assembly code
is mapped to C source code. As you can see on `.L3` label, the �if (*s++ !=
'\n')� block is unrolled 7 times.
�
������� stp���� x29, x30, [sp, -48]!
������� mov���� x29, sp
������� stp���� x19, x20, [sp, 16]
������� mov���� x19, x0� --> x19 is char *s
������� mov���� x20, x1� --> x20 is char *end
������� str���� x21, [sp, 32]
������� orr���� x21, x3, x2� --> x21 is check1 |
check2
.L70:
������� sub���� x0, x20, x19� --> x0 = end - s;
������� add���� x1, x0, 1
������� ands��� x2, x1, 7� --> unroll count is 7
������� beq���� .L3���� --> .L3 is inside while
loop. if (*s++ != '\n')
������� cmp���� x19, x20 �--> while(s <= end)
������� bhi���� .L2���� --> .L2 is label ret1.
�
������� ldrb��� w3, [x19], 1� --> start of remainder
������� cmp���� w3, 10
������� beq���� .L71
������� cmp���� x2, 1
������� beq���� .L3
������� cmp���� x2, 2
������� beq���� .L49
������� cmp���� x2, 3
������� beq���� .L50
������� cmp���� x2, 4
������� beq���� .L51
������� cmp���� x2, 5
������� beq���� .L52
������� cmp���� x2, 6
������� beq���� .L53
������� ldrb��� w4, [x19], 1
������� cmp���� w4, 10
������� bne���� .L53
.L71:
������� cbz���� x21, .L4� --> if(check1 || check2)
������� bl����� foo()
������� mov���� x19, x0
������� cbz���� x0, .L2�� --> if (!s) goto ret1;
�
.L4:��������������������� -->
if(boo(s)) goto ret0;
������� mov���� x0, x19
����� ��bl����� boo(char*)
������� cbz���� w0, .L70
������� mov���� w0, 0
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
.L53:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w5, [x19], 1
������� cmp���� w5, 10
������� beq���� .L71
.L52:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w6, [x19], 1
������� cmp���� w6, 10
������� beq���� .L71
.L51:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w7, [x19], 1
������� cmp���� w7, 10
������� beq���� .L71
.L50:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w8, [x19], 1
������� cmp���� w8, 10
������� beq���� .L71
.L49:��������� --> if (*s++ != '\n') for remainder
������� ldrb��� w9, [x19], 1
������� cmp���� w9, 10
������� beq���� .L71
.L3:�������������������������
--> if (*s++ != '\n'), 7 times unrolled
������� cmp���� x19, x20
������� bhi���� .L2
������� ldrb��� w10, [x19]
������� add�� ��x19, x19, 1
������� mov���� x11, x19
������� cmp���� w10, 10
������� beq���� .L71
������� ldrb��� w12, [x19], 1
������� cmp���� w12, 10
������� beq���� .L71
������� ldrb��� w13, [x11, 1]
������� add���� x19, x11, 2
������� cmp���� w13, 10
������� beq��� �.L71
������� ldrb��� w14, [x11, 2]
������� add���� x19, x11, 3
������� cmp���� w14, 10
������� beq���� .L71
������� ldrb��� w15, [x11, 3]
������� add���� x19, x11, 4
������� cmp���� w15, 10
������� beq���� .L71
������� ldrb��� w16, [x11, 4]
������� add���� x19, x11, 5
������� cmp���� w16, 10
������� beq���� .L71
������� ldrb��� w17, [x11, 5]
������� add���� x19, x11, 6
������� cmp���� w17, 10
������� beq���� .L71
������� ldrb��� w18, [x11, 6]
������� add���� x19, x11, 7
������� cmp���� w18, 10
������� beq���� .L71
������� b������ .L3
.L2:��������������������� --> label
ret1
������� mov���� w0, 1
������� ldp���� x19, x20, [sp, 16]
������� ldr���� x21, [sp, 32]
������� ldp���� x29, x30, [sp], 48
������� ret
�
I am sorry for showing you assembly output directly� It looks like the rtl
level�s unroll pass of gcc unrolls above loop and I need to check it next
week. Once I have clear idea about above unrolling from gcc, let me reduce the
example more and let you know.
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at
lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of
Philip Reames via llvm-dev
Sent: 16 July 2021 18:27
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�
�
On 7/16/21 9:29 AM, Jingu Kang wrote:
Hi Philip,
Thanks for your kind reply.
> A) Are you measuring on tip of tree?� There were changes for multiple
exit unrolling which landed very recently.
Yep, I am investigating benchmarks with llvm tip and I can see the llvm fails to
unroll some loops with multiple exits.> B) One of your exits does not dominate your latch.� Those are generally
hard
> C) This example does not seem to require gotos.� I strongly suggest
reducing your test cases if you want more informed commentary.�
�
I am looking at perlbench recently and it has `goto` statements inside loop. The
example is a reduced case.
Right, but the gotos aren't relevant for your reduced test.� You can
reduce further.
When I look at the gcc�s output of the example, it looks like gcc unrolls only
the below `if` statement block�
�
��� if (*s++ != '\n')
����� continue;
Your phrasing here does not parse for me.� Can you restate this with different
wording and maybe a fully worked example?
�
Thanks
JinGu Kang
�
From: llvm-dev <llvm-dev-bounces at
lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> On Behalf Of
Philip Reames via llvm-dev
Sent: 16 July 2021 15:52
To: Jingu Kang <Jingu.Kang at arm.com><mailto:Jingu.Kang at
arm.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Question about Unrolling Loop with Multiple Exits
�
A) Are you measuring on tip of tree?� There were changes for multiple exit
unrolling which landed very recently.
B) One of your exits does not dominate your latch.� Those are generally
hard.�
C) This example does not seem to require gotos.� I strongly suggest reducing
your test cases if you want more informed commentary.�
Philip
On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote:
Hi All,
�
While I am investigating benchmarks, I have found loops which llvm fails to
unroll because the loops have multiple exits.
For example,
�
char *foo(void);
int boo(char *s);
�
int test(char *s, char *end, char *check1, char *check2) {
� while (s <= end) {
��� if (*s++ != '\n')
����� continue;
��� if (check1 || check2) {
����� s = foo();
����� if (!s)
������� goto ret1;
��� }��
����if (boo(s))
����� goto ret0;
� }
� goto ret1;
�
ret0:
� return 0;
ret1:
� return 1;
}
�
Above code causes below messages from LoopUnroll pass.
�
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
�
I can see the option `unroll-runtime-multi-exit` and comments about it. I wonder
there are already reviews for the work on phabriactor or some people are working
on it.
If someone knows information about it, please share it.
�
Thanks
JinGu Kang
�
�
�
�
�
�
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210720/fc741b3b/attachment-0001.html>
Philip Reames via llvm-dev
2021-Jul-20 17:17 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
./opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug
Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug
Loop Unroll: F[test] Loop %while.body.us
Loop Size = 10
runtime unrolling with count: 8
Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1
Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1,
BreakoutTrip=1
Trying runtime unrolling on Loop:
Loop at depth 1 containing:
%while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting>
Using prolog remainder.
Bailout for multi-exit handling when latch exit has >1 predecessor.
Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled!
Won't unroll; remainder loop could not be generated when assuming
runtime trip count
; ModuleID = '<stdin>'
source_filename = "<stdin>"
declare i32 @foo(i8*)
define void @test(i8* %s, i64 %a) {
entry:
%s.addr.a = getelementptr i8, i8* %s, i64 %a
br label %while.body.us
while.body.us: ; preds =
%while.cond.backedge, %entry
%s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
%incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
%incdec.val = load i8, i8* %s.addr, align 1
%cmp1 = icmp eq i8 %incdec.val, 10
br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
if.end8.us: ; preds = %while.body.us
%call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
%cmp2 = icmp ult i32 %call9, 0
br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
while.cond.backedge: ; preds = %if.end8.us,
%while.body.us
%cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
br i1 %cmp3, label %return.loopexit, label %while.body.us
return.loopexit: ; preds =
%while.cond.backedge, %if.end8.us
ret void
}
The debug output appears to give a pretty good clue as to where you
could start if desired.
Philip
On 7/20/21 9:45 AM, Jingu Kang wrote:>
> Hi Philip,
>
> I have reduced the test case roughly as below.
>
> declare i32 @foo(i8 *)
>
> define void @test(i8* %s, i64 %a) {
>
> entry:
>
> %s.addr.a = getelementptr i8, i8* %s, i64 %a
>
> br label %while.body.us
>
> while.body.us:
>
> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
>
> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
>
> %incdec.val = load i8, i8* %s.addr, align 1
>
> %cmp1 = icmp eq i8 %incdec.val, 10
>
> br i1 %cmp1, label %if.end8.us, label %while.cond.backedge
>
> if.end8.us:
>
> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr)
>
> %cmp2 = icmp ult i32 %call9, 0
>
> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
>
> while.cond.backedge:
>
> %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a
>
> br i1 %cmp3, label %return.loopexit, label %while.body.us
>
> return.loopexit:
>
> ret void
>
> }
>
> Roughly, I unrolled the loop manually 2 times as below and ignored the
> remaining loop simply.
>
> define void @test(i8* %s, i64 %a) {
>
> entry:
>
> %s.addr.a = getelementptr i8, i8* %s, i64 %a
>
> br label %while.body.us
>
> while.body.us:
>
> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ]
>
> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1
>
> %incdec.val = load i8, i8* %s.addr, align 1
>
> %cmp1 = icmp eq i8 %incdec.val, 10
>
> br i1 %cmp1, label %if.end8.us, label %while.body.us.1
>
> while.body.us.1:
>
> %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1
>
> %incdec.val.1 = load i8, i8* %incdec.ptr, align 1
>
> %cmp1.1 = icmp eq i8 %incdec.val.1, 10
>
> br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge
>
> if.end8.us:
>
> %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [
> %incdec.ptr.1, %while.body.us.1 ]
>
> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi)
>
> %cmp2 = icmp ult i32 %call9, 0
>
> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit
>
> while.cond.backedge:
>
> %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a
>
> br i1 %cmp3, label %return.loopexit, label %while.body.us
>
> return.loopexit:
>
> ret void
>
> }
>
> If possible, can we make loop unroll pass handle this kind of cases
> please?
>
> If I missed something, please let me know.
>
> Thanks
>
> JinGu Kang
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
> *Philip Reames via llvm-dev
> *Sent:* 16 July 2021 20:12
> *To:* Jingu Kang <Jingu.Kang at arm.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with Multiple
> Exits
>
> Jingu,
>
> I'm not fluent in AArch64 assembly, sorry.� If you want more
> meaningful commentary, you'll need to reduce your c++ example further,
> and give a better explanation of what you expect the unroller to do
> for you.�
>
> Philip
>
> On 7/16/21 12:09 PM, Jingu Kang wrote:
>
> Sorry for poor example�
>
> �
>
> The AArch64 assembly output of the example from gcc is as below.
> The loop is unrolled 7 times. I have written some comments to
> explain how the assembly code is mapped to C source code. As you
> can see on `.L3` label, the �if (*s++ != '\n')� block is
> unrolled 7 times.
>
> �
>
> ������� stp���� x29, x30, [sp, -48]!
>
> ������� mov���� x29, sp
>
> ������� stp���� x19, x20, [sp, 16]
>
> ������� mov���� x19, x0� --> x19 is char *s
>
> ������� mov���� x20, x1� --> x20 is char
*end
>
> ������� str���� x21, [sp, 32]
>
> ������� orr���� x21, x3, x2� --> x21 is
> check1 | check2
>
> .L70:
>
> ������� sub���� x0, x20, x19� --> x0 = end
> - s;
>
> ������� add���� x1, x0, 1
>
> ������� ands��� x2, x1, 7� --> unroll count is
7
>
> ������� beq���� .L3���� --> .L3 is
> inside while loop. if (*s++ != '\n')
>
> ������� cmp���� x19, x20 �--> while(s <=
end)
>
> ������� bhi���� .L2���� --> .L2 is
> label ret1.
>
> �
>
> ������� ldrb��� w3, [x19], 1� --> start of
> remainder
>
> ������� cmp���� w3, 10
>
> ������� beq���� .L71
>
> ������� cmp���� x2, 1
>
> ������� beq���� .L3
>
> ������� cmp���� x2, 2
>
> ������� beq���� .L49
>
> ������� cmp���� x2, 3
>
> ������� beq���� .L50
>
> ������� cmp���� x2, 4
>
> ������� beq���� .L51
>
> ������� cmp���� x2, 5
>
> ������� beq���� .L52
>
> ������� cmp���� x2, 6
>
> ������� beq���� .L53
>
> ������� ldrb��� w4, [x19], 1
>
> ������� cmp���� w4, 10
>
> ������� bne���� .L53
>
> .L71:
>
> ������� cbz���� x21, .L4� --> if(check1 ||
> check2)
>
> ������� bl����� foo()
>
> ������� mov���� x19, x0
>
> ������� cbz���� x0, .L2�� --> if (!s)
> goto ret1;
>
> �
>
> .L4:���������������������
> --> if(boo(s)) goto ret0;
>
> ������� mov���� x0, x19
>
> ����� ��bl����� boo(char*)
>
> ������� cbz���� w0, .L70
>
> ������� mov���� w0, 0
>
> ������� ldp���� x19, x20, [sp, 16]
>
> ������� ldr���� x21, [sp, 32]
>
> ������� ldp���� x29, x30, [sp], 48
>
> ������� ret
>
> .L53:��������� --> if (*s++ != '\n') for
remainder
>
> ������� ldrb��� w5, [x19], 1
>
> ������� cmp���� w5, 10
>
> ������� beq���� .L71
>
> .L52:��������� --> if (*s++ != '\n') for
remainder
>
> ������� ldrb��� w6, [x19], 1
>
> ������� cmp���� w6, 10
>
> ������� beq���� .L71
>
> .L51:��������� --> if (*s++ != '\n') for
remainder
>
> ������� ldrb��� w7, [x19], 1
>
> ������� cmp���� w7, 10
>
> ������� beq���� .L71
>
> .L50:��������� --> if (*s++ != '\n') for
remainder
>
> ������� ldrb��� w8, [x19], 1
>
> ������� cmp���� w8, 10
>
> ������� beq���� .L71
>
> .L49:��������� --> if (*s++ != '\n') for
remainder
>
> ������� ldrb��� w9, [x19], 1
>
> ������� cmp���� w9, 10
>
> ������� beq���� .L71
>
>
.L3:�������������������������
> --> if (*s++ != '\n'), 7 times unrolled
>
> ������� cmp���� x19, x20
>
> ������� bhi���� .L2
>
> ������� ldrb��� w10, [x19]
>
> ������� add�� ��x19, x19, 1
>
> ������� mov���� x11, x19
>
> ������� cmp���� w10, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w12, [x19], 1
>
> ������� cmp���� w12, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w13, [x11, 1]
>
> ������� add���� x19, x11, 2
>
> ������� cmp���� w13, 10
>
> ������� beq��� �.L71
>
> ������� ldrb��� w14, [x11, 2]
>
> ������� add���� x19, x11, 3
>
> ������� cmp���� w14, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w15, [x11, 3]
>
> ������� add���� x19, x11, 4
>
> ������� cmp���� w15, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w16, [x11, 4]
>
> ������� add���� x19, x11, 5
>
> ������� cmp���� w16, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w17, [x11, 5]
>
> ������� add���� x19, x11, 6
>
> ������� cmp���� w17, 10
>
> ������� beq���� .L71
>
> ������� ldrb��� w18, [x11, 6]
>
> ������� add���� x19, x11, 7
>
> ������� cmp���� w18, 10
>
> ������� beq���� .L71
>
> ������� b������ .L3
>
> .L2:���������������������
> --> label ret1
>
> ������� mov���� w0, 1
>
> ������� ldp���� x19, x20, [sp, 16]
>
> ������� ldr���� x21, [sp, 32]
>
> ������� ldp���� x29, x30, [sp], 48
>
> ������� ret
>
> �
>
> I am sorry for showing you assembly output directly� It looks
> like the rtl level�s unroll pass of gcc unrolls above loop and I
> need to check it next week. Once I have clear idea about above
> unrolling from gcc, let me reduce the example more and let you know.
>
> �
>
> Thanks
>
> JinGu Kang
>
> �
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org>
> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip
> Reames via llvm-dev
> *Sent:* 16 July 2021 18:27
> *To:* Jingu Kang <Jingu.Kang at arm.com> <mailto:Jingu.Kang at
arm.com>
> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with
> Multiple Exits
>
> �
>
> �
>
> On 7/16/21 9:29 AM, Jingu Kang wrote:
>
> Hi Philip,
>
> Thanks for your kind reply.
>
> > A) Are you measuring on tip of tree?� There were changes
> for multiple exit unrolling which landed very recently.
>
> Yep, I am investigating benchmarks with llvm tip and I can see
> the llvm fails to unroll some loops with multiple exits.
>
> > B) One of your exits does not dominate your latch.� Those
> are generally hard
>
> > C) This example does not seem to require gotos.� I
> strongly suggest reducing your test cases if you want more
> informed commentary.�
>
> �
>
> I am looking at perlbench recently and it has `goto`
> statements inside loop. The example is a reduced case.
>
> Right, but the gotos aren't relevant for your reduced test.� You
> can reduce further.
>
>
> When I look at the gcc�s output of the example, it looks
> like gcc unrolls only the below `if` statement block�
>
> �
>
> ��� if (*s++ != '\n')
>
> ����� continue;
>
> Your phrasing here does not parse for me.� Can you restate this
> with different wording and maybe a fully worked example?
>
>
> �
>
> Thanks
>
> JinGu Kang
>
> �
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org>
> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Philip
> Reames via llvm-dev
> *Sent:* 16 July 2021 15:52
> *To:* Jingu Kang <Jingu.Kang at arm.com>
> <mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with
> Multiple Exits
>
> �
>
> A) Are you measuring on tip of tree?� There were changes for
> multiple exit unrolling which landed very recently.
>
> B) One of your exits does not dominate your latch.� Those
> are generally hard.�
>
> C) This example does not seem to require gotos.� I strongly
> suggest reducing your test cases if you want more informed
> commentary.�
>
> Philip
>
> On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote:
>
> Hi All,
>
> �
>
> While I am investigating benchmarks, I have found loops
> which llvm fails to unroll because the loops have multiple
> exits.
>
> For example,
>
> �
>
> char *foo(void);
>
> int boo(char *s);
>
> �
>
> int test(char *s, char *end, char *check1, char *check2) {
>
> � while (s <= end) {
>
> ��� if (*s++ != '\n')
>
> ����� continue;
>
> ��� if (check1 || check2) {
>
> ����� s = foo();
>
> ����� if (!s)
>
> ������� goto ret1;
>
> ��� }��
>
> ����if (boo(s))
>
> ����� goto ret0;
>
> � }
>
> � goto ret1;
>
> �
>
> ret0:
>
> � return 0;
>
> ret1:
>
> � return 1;
>
> }
>
> �
>
> Above code causes below messages from LoopUnroll pass.
>
> �
>
> Bailout for multi-exit handling when latch exit has >1
> predecessor.
>
> Multiple exit/exiting blocks in loop and multi-exit
> unrolling not enabled!
>
> �
>
> I can see the option `unroll-runtime-multi-exit` and
> comments about it. I wonder there are already reviews for
> the work on phabriactor or some people are working on it.
>
> If someone knows information about it, please share it.
>
> �
>
> Thanks
>
> JinGu Kang
>
> �
>
> �
>
> �
>
> �
>
> �
>
> �
>
>
>
>
>
> _______________________________________________
>
> LLVM Developers mailing list
>
> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210720/dc3a3a47/attachment.html>
Philip Reames via llvm-dev
2021-Aug-03 18:30 UTC
[llvm-dev] Question about Unrolling Loop with Multiple Exits
JFYI, https://reviews.llvm.org/D107381 works in this direction, and the reduced test case does unroll with that patch and the following command line. $ opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime -debug -unroll-runtime-multi-exit -unroll-runtime-epilog On 7/20/21 10:17 AM, Philip Reames wrote:> ./opt -loop-unroll -enable-new-pm=0 < llvm-dev.ll -S -unroll-runtime > -debug > Args: ./opt -loop-unroll -enable-new-pm=0 -S -unroll-runtime -debug > Loop Unroll: F[test] Loop %while.body.us > Loop Size = 10 > runtime unrolling with count: 8 > Exiting block %if.end8.us: TripCount=0, TripMultiple=1, BreakoutTrip=1 > Exiting block %while.cond.backedge: TripCount=0, TripMultiple=1, > BreakoutTrip=1 > Trying runtime unrolling on Loop: > Loop at depth 1 containing: > %while.body.us<header>,%if.end8.us<exiting>,%while.cond.backedge<latch><exiting> > Using prolog remainder. > Bailout for multi-exit handling when latch exit has >1 predecessor. > Multiple exit/exiting blocks in loop and multi-exit unrolling not enabled! > Won't unroll; remainder loop could not be generated when assuming > runtime trip count > ; ModuleID = '<stdin>' > source_filename = "<stdin>" > > declare i32 @foo(i8*) > > define void @test(i8* %s, i64 %a) { > entry: > %s.addr.a = getelementptr i8, i8* %s, i64 %a > br label %while.body.us > > while.body.us: ; preds = > %while.cond.backedge, %entry > %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] > %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 > %incdec.val = load i8, i8* %s.addr, align 1 > %cmp1 = icmp eq i8 %incdec.val, 10 > br i1 %cmp1, label %if.end8.us, label %while.cond.backedge > > if.end8.us: ; preds = %while.body.us > %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) > %cmp2 = icmp ult i32 %call9, 0 > br i1 %cmp2, label %while.cond.backedge, label %return.loopexit > > while.cond.backedge: ; preds = > %if.end8.us, %while.body.us > %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a > br i1 %cmp3, label %return.loopexit, label %while.body.us > > return.loopexit: ; preds = > %while.cond.backedge, %if.end8.us > ret void > } > > The debug output appears to give a pretty good clue as to where you > could start if desired. > > Philip > > On 7/20/21 9:45 AM, Jingu Kang wrote: >> >> Hi Philip, >> >> I have reduced the test case roughly as below. >> >> declare i32 @foo(i8 *) >> >> define void @test(i8* %s, i64 %a) { >> >> entry: >> >> %s.addr.a = getelementptr i8, i8* %s, i64 %a >> >> br label %while.body.us >> >> while.body.us: >> >> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] >> >> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 >> >> %incdec.val = load i8, i8* %s.addr, align 1 >> >> %cmp1 = icmp eq i8 %incdec.val, 10 >> >> br i1 %cmp1, label %if.end8.us, label %while.cond.backedge >> >> if.end8.us: >> >> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr) >> >> %cmp2 = icmp ult i32 %call9, 0 >> >> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit >> >> while.cond.backedge: >> >> %cmp3 = icmp eq i8* %incdec.ptr, %s.addr.a >> >> br i1 %cmp3, label %return.loopexit, label %while.body.us >> >> return.loopexit: >> >> ret void >> >> } >> >> Roughly, I unrolled the loop manually 2 times as below and ignored >> the remaining loop simply. >> >> define void @test(i8* %s, i64 %a) { >> >> entry: >> >> %s.addr.a = getelementptr i8, i8* %s, i64 %a >> >> br label %while.body.us >> >> while.body.us: >> >> %s.addr = phi i8* [ %incdec.ptr, %while.cond.backedge ], [ %s, %entry ] >> >> %incdec.ptr = getelementptr inbounds i8, i8* %s.addr, i64 1 >> >> %incdec.val = load i8, i8* %s.addr, align 1 >> >> %cmp1 = icmp eq i8 %incdec.val, 10 >> >> br i1 %cmp1, label %if.end8.us, label %while.body.us.1 >> >> while.body.us.1: >> >> %incdec.ptr.1 = getelementptr inbounds i8, i8* %incdec.ptr, i64 1 >> >> %incdec.val.1 = load i8, i8* %incdec.ptr, align 1 >> >> %cmp1.1 = icmp eq i8 %incdec.val.1, 10 >> >> br i1 %cmp1.1, label %if.end8.us, label %while.cond.backedge >> >> if.end8.us: >> >> %incdec.ptr.phi = phi i8* [ %incdec.ptr, %while.body.us ], [ >> %incdec.ptr.1, %while.body.us.1 ] >> >> %call9 = tail call i32 @foo(i8* nonnull %incdec.ptr.phi) >> >> %cmp2 = icmp ult i32 %call9, 0 >> >> br i1 %cmp2, label %while.cond.backedge, label %return.loopexit >> >> while.cond.backedge: >> >> %cmp3 = icmp eq i8* %incdec.ptr.1, %s.addr.a >> >> br i1 %cmp3, label %return.loopexit, label %while.body.us >> >> return.loopexit: >> >> ret void >> >> } >> >> If possible, can we make loop unroll pass handle this kind of cases >> please? >> >> If I missed something, please let me know. >> >> Thanks >> >> JinGu Kang >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of >> *Philip Reames via llvm-dev >> *Sent:* 16 July 2021 20:12 >> *To:* Jingu Kang <Jingu.Kang at arm.com> >> *Cc:* llvm-dev at lists.llvm.org >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with Multiple >> Exits >> >> Jingu, >> >> I'm not fluent in AArch64 assembly, sorry.� If you want more >> meaningful commentary, you'll need to reduce your c++ example >> further, and give a better explanation of what you expect the >> unroller to do for you.� >> >> Philip >> >> On 7/16/21 12:09 PM, Jingu Kang wrote: >> >> Sorry for poor example� >> >> � >> >> The AArch64 assembly output of the example from gcc is as below. >> The loop is unrolled 7 times. I have written some comments to >> explain how the assembly code is mapped to C source code. As you >> can see on `.L3` label, the �if (*s++ != '\n')� block is >> unrolled 7 times. >> >> � >> >> ������� stp���� x29, x30, [sp, -48]! >> >> ������� mov���� x29, sp >> >> ������� stp���� x19, x20, [sp, 16] >> >> ������� mov���� x19, x0� --> x19 is char *s >> >> ������� mov���� x20, x1� --> x20 is char *end >> >> ������� str���� x21, [sp, 32] >> >> ������� orr���� x21, x3, x2� --> x21 is >> check1 | check2 >> >> .L70: >> >> ������� sub���� x0, x20, x19� --> x0 >> end - s; >> >> ������� add���� x1, x0, 1 >> >> ������� ands��� x2, x1, 7� --> unroll count >> is 7 >> >> ������� beq���� .L3���� --> .L3 is >> inside while loop. if (*s++ != '\n') >> >> ������� cmp���� x19, x20 �--> while(s <= end) >> >> ������� bhi���� .L2���� --> .L2 is >> label ret1. >> >> � >> >> ������� ldrb��� w3, [x19], 1� --> start of >> remainder >> >> ������� cmp���� w3, 10 >> >> ������� beq���� .L71 >> >> ������� cmp���� x2, 1 >> >> ������� beq���� .L3 >> >> ������� cmp���� x2, 2 >> >> ������� beq���� .L49 >> >> ������� cmp���� x2, 3 >> >> ������� beq���� .L50 >> >> ������� cmp���� x2, 4 >> >> ������� beq���� .L51 >> >> ������� cmp���� x2, 5 >> >> ������� beq���� .L52 >> >> ������� cmp���� x2, 6 >> >> ������� beq���� .L53 >> >> ������� ldrb��� w4, [x19], 1 >> >> ������� cmp���� w4, 10 >> >> ������� bne���� .L53 >> >> .L71: >> >> ������� cbz���� x21, .L4� --> if(check1 >> || check2) >> >> ������� bl����� foo() >> >> ������� mov���� x19, x0 >> >> ������� cbz���� x0, .L2�� --> if (!s) >> goto ret1; >> >> � >> >> .L4:��������������������� >> --> if(boo(s)) goto ret0; >> >> ������� mov���� x0, x19 >> >> ����� ��bl����� boo(char*) >> >> ������� cbz���� w0, .L70 >> >> ������� mov���� w0, 0 >> >> ������� ldp���� x19, x20, [sp, 16] >> >> ������� ldr���� x21, [sp, 32] >> >> ������� ldp���� x29, x30, [sp], 48 >> >> ������� ret >> >> .L53:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w5, [x19], 1 >> >> ������� cmp���� w5, 10 >> >> ������� beq���� .L71 >> >> .L52:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w6, [x19], 1 >> >> ������� cmp���� w6, 10 >> >> ������� beq���� .L71 >> >> .L51:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w7, [x19], 1 >> >> ������� cmp���� w7, 10 >> >> ������� beq���� .L71 >> >> .L50:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w8, [x19], 1 >> >> ������� cmp���� w8, 10 >> >> ������� beq���� .L71 >> >> .L49:��������� --> if (*s++ != '\n') for remainder >> >> ������� ldrb��� w9, [x19], 1 >> >> ������� cmp���� w9, 10 >> >> ������� beq���� .L71 >> >> .L3:������������������������� >> --> if (*s++ != '\n'), 7 times unrolled >> >> ������� cmp���� x19, x20 >> >> ������� bhi���� .L2 >> >> ������� ldrb��� w10, [x19] >> >> ������� add�� ��x19, x19, 1 >> >> ������� mov���� x11, x19 >> >> ������� cmp���� w10, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w12, [x19], 1 >> >> ������� cmp���� w12, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w13, [x11, 1] >> >> ������� add���� x19, x11, 2 >> >> ������� cmp���� w13, 10 >> >> ������� beq��� �.L71 >> >> ������� ldrb��� w14, [x11, 2] >> >> ������� add���� x19, x11, 3 >> >> ������� cmp���� w14, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w15, [x11, 3] >> >> ������� add���� x19, x11, 4 >> >> ������� cmp���� w15, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w16, [x11, 4] >> >> ������� add���� x19, x11, 5 >> >> ������� cmp���� w16, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w17, [x11, 5] >> >> ������� add���� x19, x11, 6 >> >> ������� cmp���� w17, 10 >> >> ������� beq���� .L71 >> >> ������� ldrb��� w18, [x11, 6] >> >> ������� add���� x19, x11, 7 >> >> ������� cmp���� w18, 10 >> >> ������� beq���� .L71 >> >> ������� b������ .L3 >> >> .L2:��������������������� >> --> label ret1 >> >> ������� mov���� w0, 1 >> >> ������� ldp���� x19, x20, [sp, 16] >> >> ������� ldr���� x21, [sp, 32] >> >> ������� ldp���� x29, x30, [sp], 48 >> >> ������� ret >> >> � >> >> I am sorry for showing you assembly output directly� It looks >> like the rtl level�s unroll pass of gcc unrolls above loop and >> I need to check it next week. Once I have clear idea about above >> unrolling from gcc, let me reduce the example more and let you know. >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> >> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Philip >> Reames via llvm-dev >> *Sent:* 16 July 2021 18:27 >> *To:* Jingu Kang <Jingu.Kang at arm.com> <mailto:Jingu.Kang at arm.com> >> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with >> Multiple Exits >> >> � >> >> � >> >> On 7/16/21 9:29 AM, Jingu Kang wrote: >> >> Hi Philip, >> >> Thanks for your kind reply. >> >> > A) Are you measuring on tip of tree?� There were changes >> for multiple exit unrolling which landed very recently. >> >> Yep, I am investigating benchmarks with llvm tip and I can >> see the llvm fails to unroll some loops with multiple exits. >> >> > B) One of your exits does not dominate your latch.� Those >> are generally hard >> >> > C) This example does not seem to require gotos.� I >> strongly suggest reducing your test cases if you want more >> informed commentary.� >> >> � >> >> I am looking at perlbench recently and it has `goto` >> statements inside loop. The example is a reduced case. >> >> Right, but the gotos aren't relevant for your reduced test.� >> You can reduce further. >> >> >> When I look at the gcc�s output of the example, it looks >> like gcc unrolls only the below `if` statement block� >> >> � >> >> ��� if (*s++ != '\n') >> >> ����� continue; >> >> Your phrasing here does not parse for me.� Can you restate this >> with different wording and maybe a fully worked example? >> >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> >> <mailto:llvm-dev-bounces at lists.llvm.org> *On Behalf Of >> *Philip Reames via llvm-dev >> *Sent:* 16 July 2021 15:52 >> *To:* Jingu Kang <Jingu.Kang at arm.com> >> <mailto:Jingu.Kang at arm.com>; llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org> >> *Subject:* Re: [llvm-dev] Question about Unrolling Loop with >> Multiple Exits >> >> � >> >> A) Are you measuring on tip of tree?� There were changes >> for multiple exit unrolling which landed very recently. >> >> B) One of your exits does not dominate your latch.� Those >> are generally hard.� >> >> C) This example does not seem to require gotos.� I strongly >> suggest reducing your test cases if you want more informed >> commentary.� >> >> Philip >> >> On 7/16/21 7:42 AM, Jingu Kang via llvm-dev wrote: >> >> Hi All, >> >> � >> >> While I am investigating benchmarks, I have found loops >> which llvm fails to unroll because the loops have >> multiple exits. >> >> For example, >> >> � >> >> char *foo(void); >> >> int boo(char *s); >> >> � >> >> int test(char *s, char *end, char *check1, char *check2) { >> >> � while (s <= end) { >> >> ��� if (*s++ != '\n') >> >> ����� continue; >> >> ��� if (check1 || check2) { >> >> ����� s = foo(); >> >> ����� if (!s) >> >> ������� goto ret1; >> >> ��� }�� >> >> ����if (boo(s)) >> >> ����� goto ret0; >> >> � } >> >> � goto ret1; >> >> � >> >> ret0: >> >> � return 0; >> >> ret1: >> >> � return 1; >> >> } >> >> � >> >> Above code causes below messages from LoopUnroll pass. >> >> � >> >> Bailout for multi-exit handling when latch exit has >1 >> predecessor. >> >> Multiple exit/exiting blocks in loop and multi-exit >> unrolling not enabled! >> >> � >> >> I can see the option `unroll-runtime-multi-exit` and >> comments about it. I wonder there are already reviews for >> the work on phabriactor or some people are working on it. >> >> If someone knows information about it, please share it. >> >> � >> >> Thanks >> >> JinGu Kang >> >> � >> >> � >> >> � >> >> � >> >> � >> >> � >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210803/b0f20a57/attachment.html>