thr3ads.net - llvm dev - [llvm-dev] loop unrolling introduces conditional branch [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Xiangyang Guo via llvm-dev

2015-Aug-20 14:38 UTC

[llvm-dev] loop unrolling introduces conditional branch

Hi,

I want to use loop unrolling pass, however, I find that loop unrolling will
introduces conditional branch at end of every "unrolled" part. For
example,
consider the following code

*void foo( int n, int array_x[])*
*{*
*    for (int i=0; i < n; i++)*
*     array_x[i] = i; *
*}*

Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify
-loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg -S",
it gives me this IR:

*define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {*
*  %1 = icmp slt i32 0, %n*
*  br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge*

*.lr.ph <http://lr.ph/>:                                           ; preds
= %0, %7*
*  %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]*
*  %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv*
*  %3 = trunc i64 %indvars.iv to i32*
*  store i32 %3, i32* %2*
*  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1*
*  %lftr.wideiv = trunc i64 %indvars.iv.next to i32*
*  %exitcond = icmp ne i32 %lftr.wideiv, %n*
*  br i1 %exitcond, label %4, label %._crit_edge*

*._crit_edge:                                      ; preds = %.lr.ph
<http://lr.ph/>, %4, %7, %0*
*  ret void*

*; <label>:4                                       ; preds = %.lr.ph
<http://lr.ph/>*
*  %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next*
*  %6 = trunc i64 %indvars.iv.next to i32*
*  store i32 %6, i32* %5*
*  %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1*
*  %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32*
*  %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n*
*  br i1 %exitcond.1, label %7, label %._crit_edge*

*; <label>:7                                       ; preds = %4*
*  %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1*
*  %9 = trunc i64 %indvars.iv.next.1 to i32*
*  store i32 %9, i32* %8*
*  %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1*
*  %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32*
*  %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n*
*  br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label %._crit_edge*
*}*

As you can see, at the end of BB <label>4 and BB<label>7 there are
"add",
"icmp" and "br" instrcutions to check the boundary. I
understand this is
for the correctness. However, I would expect the loop unrolling can change
my code to something like this:

*void foo( int n, int array_x[])*
*{*
*    int j = n%3;*
*    int m = n - j;*
*    for (int i=0; i < m; i+=3){*
*     array_x[i] = i;*
*     array_x[i+1] = i+1;*
*     array_x[i+2] = i+2; *
*    }*
*    for(i=m; i<n; i++)*
*     array_x[i] = i;        *
*}*

In this case, the BB<label>4 and BB<label>7 will do not have the
"add",
"icmp" and "br" instructions because these BBs can be merged
together.

How can I achieve this? Thanks.

Regards,

Xiangyang
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150820/3c898a21/attachment.html>

Philip Reames via llvm-dev

2015-Aug-20 17:09 UTC

head link

[llvm-dev] loop unrolling introduces conditional branch

On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev
wrote:> Hi,
>
> I want to use loop unrolling pass, however, I find that loop unrolling 
> will introduces conditional branch at end of every "unrolled"
part.
> For example, consider the following code
>
> /void foo( int n, int array_x[])/
> /{/
> /    for (int i=0; i < n; i++)/
> / array_x[i] = i; /
> /}/
>
> Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify
> -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg 
> -S", it gives me this IR:
>
> /define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {/
> /  %1 = icmp slt i32 0, %n/
> /  br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge/
> /
> /
> /.lr.ph <http://lr.ph/>:             ; preds = %0, %7/
> /  %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]/
> /  %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv/
> /  %3 = trunc i64 %indvars.iv to i32/
> /  store i32 %3, i32* %2/
> /  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1/
> /  %lftr.wideiv = trunc i64 %indvars.iv.next to i32/
> /  %exitcond = icmp ne i32 %lftr.wideiv, %n/
> /  br i1 %exitcond, label %4, label %._crit_edge/
> /
> /
> /._crit_edge:                                      ; preds = %.lr.ph 
> <http://lr.ph/>, %4, %7, %0/
> /  ret void/
> /
> /
> /; <label>:4   ; preds = %.lr.ph <http://lr.ph/>/
> /  %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next/
> /  %6 = trunc i64 %indvars.iv.next to i32/
> /  store i32 %6, i32* %5/
> /  %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1/
> /  %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32/
> /  %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n/
> /  br i1 %exitcond.1, label %7, label %._crit_edge/
> /
> /
> /; <label>:7   ; preds = %4/
> /  %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1/
> /  %9 = trunc i64 %indvars.iv.next.1 to i32/
> /  store i32 %9, i32* %8/
> /  %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1/
> /  %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32/
> /  %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n/
> /  br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label
%._crit_edge/
> /}/
>
> As you can see, at the end of BB <label>4 and BB<label>7 there
are
> "add", "icmp" and "br" instrcutions to check
the boundary. I
> understand this is for the correctness. However, I would expect the 
> loop unrolling can change my code to something like this:
>
> /void foo( int n, int array_x[])/
> /{/
> /    int j = n%3;/
> /    int m = n - j;/
> /    for (int i=0; i < m; i+=3){/
> / array_x[i] = i;/
> / array_x[i+1] = i+1;/
> / array_x[i+2] = i+2; /
> /    }/
> /    for(i=m; i<n; i++)/
> / array_x[i] = i; /
> /}/
>
> In this case, the BB<label>4 and BB<label>7 will do not have
the
> "add", "icmp" and "br" instructions because
these BBs can be merged
> together.
>
> How can I achieve this? Thanks.One - rather heavy weight - way to do this would be to add the -irce 
pass after the loop unroll step.  InductiveRangeCheckElimination will 
introduce a post loop so as to eliminate the range checks in the inner 
loop.  This might not be the ideal transformation for this code, but it 
might get you closer to what you want.

A couple of caveats:
- 3.5 isn't recent enough to have a stable IRCE.  Download ToT.
- IRCE requires profiling information on the branches.  I'd start by 
manually annotating your IR to see if it works, then exploring a profile 
build if it does.

For the record, teaching the unroller to do this transformation (or a 
creating a new pass) would seem interesting.  You might check with 
Chandler and/or Michael (see recent review threads) for what their plans 
in this area are.>
> Regards,
>
> Xiangyang
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150820/4d86758a/attachment.html>

James Molloy via llvm-dev

2015-Aug-20 18:23 UTC

head link

[llvm-dev] loop unrolling introduces conditional branch

Hi Xiangyang,

The algorithm for loop unrolling was changed post-3.5 to do more what you'd
expect. If you use 3.6 or 3.7 you'll likely get better results.

Cheers,

James

On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote:
>
> Hi,
>
> I want to use loop unrolling pass, however, I find that loop unrolling
> will introduces conditional branch at end of every "unrolled"
part. For
> example, consider the following code
>
> *void foo( int n, int array_x[])*
> *{*
> *    for (int i=0; i < n; i++)*
> *     array_x[i] = i; *
> *}*
>
> Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify
> -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg
-S",
> it gives me this IR:
>
> *define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {*
> *  %1 = icmp slt i32 0, %n*
> *  br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge*
>
> *.lr.ph <http://lr.ph/>:                                           ;
preds
> = %0, %7*
> *  %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]*
> *  %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv*
> *  %3 = trunc i64 %indvars.iv to i32*
> *  store i32 %3, i32* %2*
> *  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1*
> *  %lftr.wideiv = trunc i64 %indvars.iv.next to i32*
> *  %exitcond = icmp ne i32 %lftr.wideiv, %n*
> *  br i1 %exitcond, label %4, label %._crit_edge*
>
> *._crit_edge:                                      ; preds = %.lr.ph
> <http://lr.ph/>, %4, %7, %0*
> *  ret void*
>
> *; <label>:4                                       ; preds = %.lr.ph
> <http://lr.ph/>*
> *  %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next*
> *  %6 = trunc i64 %indvars.iv.next to i32*
> *  store i32 %6, i32* %5*
> *  %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1*
> *  %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32*
> *  %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n*
> *  br i1 %exitcond.1, label %7, label %._crit_edge*
>
> *; <label>:7                                       ; preds = %4*
> *  %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1*
> *  %9 = trunc i64 %indvars.iv.next.1 to i32*
> *  store i32 %9, i32* %8*
> *  %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1*
> *  %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32*
> *  %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n*
> *  br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label
%._crit_edge*
> *}*
>
> As you can see, at the end of BB <label>4 and BB<label>7 there
are "add",
> "icmp" and "br" instrcutions to check the boundary. I
understand this is
> for the correctness. However, I would expect the loop unrolling can change
> my code to something like this:
>
> *void foo( int n, int array_x[])*
> *{*
> *    int j = n%3;*
> *    int m = n - j;*
> *    for (int i=0; i < m; i+=3){*
> *     array_x[i] = i;*
> *     array_x[i+1] = i+1;*
> *     array_x[i+2] = i+2; *
> *    }*
> *    for(i=m; i<n; i++)*
> *     array_x[i] = i;        *
> *}*
>
> In this case, the BB<label>4 and BB<label>7 will do not have
the "add",
> "icmp" and "br" instructions because these BBs can be
merged together.
>
> How can I achieve this? Thanks.
>
> One - rather heavy weight - way to do this would be to add the -irce pass
> after the loop unroll step.  InductiveRangeCheckElimination will introduce
> a post loop so as to eliminate the range checks in the inner loop.  This
> might not be the ideal transformation for this code, but it might get you
> closer to what you want.
>
> A couple of caveats:
> - 3.5 isn't recent enough to have a stable IRCE.  Download ToT.
> - IRCE requires profiling information on the branches.  I'd start by
> manually annotating your IR to see if it works, then exploring a profile
> build if it does.
>
> For the record, teaching the unroller to do this transformation (or a
> creating a new pass) would seem interesting.  You might check with Chandler
> and/or Michael (see recent review threads) for what their plans in this
> area are.
>
>
> Regards,
>
> Xiangyang
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150820/4f8c3cb1/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Aug 2015 - loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

Maybe Matching Threads