James Molloy via llvm-dev
2015-Aug-20 18:23 UTC
[llvm-dev] loop unrolling introduces conditional branch
Hi Xiangyang, The algorithm for loop unrolling was changed post-3.5 to do more what you'd expect. If you use 3.6 or 3.7 you'll likely get better results. Cheers, James On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote: > > Hi, > > I want to use loop unrolling pass, however, I find that loop unrolling > will introduces conditional branch at end of every "unrolled" part. For > example, consider the following code > > *void foo( int n, int array_x[])* > *{* > * for (int i=0; i < n; i++)* > * array_x[i] = i; * > *}* > > Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify > -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg -S", > it gives me this IR: > > *define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {* > * %1 = icmp slt i32 0, %n* > * br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge* > > *.lr.ph <http://lr.ph/>: ; preds > = %0, %7* > * %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]* > * %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv* > * %3 = trunc i64 %indvars.iv to i32* > * store i32 %3, i32* %2* > * %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1* > * %lftr.wideiv = trunc i64 %indvars.iv.next to i32* > * %exitcond = icmp ne i32 %lftr.wideiv, %n* > * br i1 %exitcond, label %4, label %._crit_edge* > > *._crit_edge: ; preds = %.lr.ph > <http://lr.ph/>, %4, %7, %0* > * ret void* > > *; <label>:4 ; preds = %.lr.ph > <http://lr.ph/>* > * %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next* > * %6 = trunc i64 %indvars.iv.next to i32* > * store i32 %6, i32* %5* > * %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1* > * %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32* > * %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n* > * br i1 %exitcond.1, label %7, label %._crit_edge* > > *; <label>:7 ; preds = %4* > * %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1* > * %9 = trunc i64 %indvars.iv.next.1 to i32* > * store i32 %9, i32* %8* > * %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1* > * %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32* > * %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n* > * br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label %._crit_edge* > *}* > > As you can see, at the end of BB <label>4 and BB<label>7 there are "add", > "icmp" and "br" instrcutions to check the boundary. I understand this is > for the correctness. However, I would expect the loop unrolling can change > my code to something like this: > > *void foo( int n, int array_x[])* > *{* > * int j = n%3;* > * int m = n - j;* > * for (int i=0; i < m; i+=3){* > * array_x[i] = i;* > * array_x[i+1] = i+1;* > * array_x[i+2] = i+2; * > * }* > * for(i=m; i<n; i++)* > * array_x[i] = i; * > *}* > > In this case, the BB<label>4 and BB<label>7 will do not have the "add", > "icmp" and "br" instructions because these BBs can be merged together. > > How can I achieve this? Thanks. > > One - rather heavy weight - way to do this would be to add the -irce pass > after the loop unroll step. InductiveRangeCheckElimination will introduce > a post loop so as to eliminate the range checks in the inner loop. This > might not be the ideal transformation for this code, but it might get you > closer to what you want. > > A couple of caveats: > - 3.5 isn't recent enough to have a stable IRCE. Download ToT. > - IRCE requires profiling information on the branches. I'd start by > manually annotating your IR to see if it works, then exploring a profile > build if it does. > > For the record, teaching the unroller to do this transformation (or a > creating a new pass) would seem interesting. You might check with Chandler > and/or Michael (see recent review threads) for what their plans in this > area are. > > > Regards, > > Xiangyang > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150820/4f8c3cb1/attachment.html>
Xiangyang Guo via llvm-dev
2015-Aug-21 14:51 UTC
[llvm-dev] loop unrolling introduces conditional branch
Hi, James and Philip, Thanks for your help. Based on your advice, I downloaded llvm-3.7. However, with this new version of LLVM, I have the following errors when I compile my previous code: g++ -o parser main.o `llvm-config --libs all` `llvm-config --ldflags --system-libs` -lpthread -ldl -rdynamic -ltinfo main.o:(.data.rel.ro._ZTIN4llvm17GetElementPtrInstE[_ZTIN4llvm17GetElementPtrInstE]+0x10): undefined reference to `typeinfo for llvm::Instruction' main.o:(.data.rel.ro._ZTIN4llvm8ICmpInstE[_ZTIN4llvm8ICmpInstE]+0x10): undefined reference to `typeinfo for llvm::CmpInst' BTW, in my code, I use LLVM API (IRBuilder and so on) to generate one Module and then use PassManager to add several passes. And my Makefile is pretty simple, it looks like this: *********************************************************************************************** all: parser OBJS = main.o \ LLVMCONFIG = llvm-config CPPFLAGS = `$(LLVMCONFIG) --cxxflags` -std=c++11 LDFLAGS = `$(LLVMCONFIG) --ldflags --system-libs` -lpthread -ldl -rdynamic -ltinfo LIBS = `$(LLVMCONFIG) --libs all` clean: $(RM) -rf parser $(OBJS) %.o: %.cpp g++ -g -c $(CPPFLAGS) -o $@ $< parser: $(OBJS) g++ -o $@ $(OBJS) $(LIBS) $(LDFLAGS) ********************************************************************************************** Do you have any idea? Thanks a lot. Regards, Xiangyang On Thu, Aug 20, 2015 at 2:23 PM, James Molloy <james at jamesmolloy.co.uk> wrote:> Hi Xiangyang, > > The algorithm for loop unrolling was changed post-3.5 to do more what > you'd expect. If you use 3.6 or 3.7 you'll likely get better results. > > Cheers, > > James > > On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote: >> >> Hi, >> >> I want to use loop unrolling pass, however, I find that loop unrolling >> will introduces conditional branch at end of every "unrolled" part. For >> example, consider the following code >> >> *void foo( int n, int array_x[])* >> *{* >> * for (int i=0; i < n; i++)* >> * array_x[i] = i; * >> *}* >> >> Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify >> -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg -S", >> it gives me this IR: >> >> *define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {* >> * %1 = icmp slt i32 0, %n* >> * br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge* >> >> *.lr.ph <http://lr.ph/>: ; >> preds = %0, %7* >> * %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]* >> * %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv* >> * %3 = trunc i64 %indvars.iv to i32* >> * store i32 %3, i32* %2* >> * %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1* >> * %lftr.wideiv = trunc i64 %indvars.iv.next to i32* >> * %exitcond = icmp ne i32 %lftr.wideiv, %n* >> * br i1 %exitcond, label %4, label %._crit_edge* >> >> *._crit_edge: ; preds = %.lr.ph >> <http://lr.ph/>, %4, %7, %0* >> * ret void* >> >> *; <label>:4 ; preds = %.lr.ph >> <http://lr.ph/>* >> * %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next* >> * %6 = trunc i64 %indvars.iv.next to i32* >> * store i32 %6, i32* %5* >> * %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1* >> * %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32* >> * %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n* >> * br i1 %exitcond.1, label %7, label %._crit_edge* >> >> *; <label>:7 ; preds = %4* >> * %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1* >> * %9 = trunc i64 %indvars.iv.next.1 to i32* >> * store i32 %9, i32* %8* >> * %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1* >> * %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32* >> * %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n* >> * br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label %._crit_edge* >> *}* >> >> As you can see, at the end of BB <label>4 and BB<label>7 there are "add", >> "icmp" and "br" instrcutions to check the boundary. I understand this is >> for the correctness. However, I would expect the loop unrolling can change >> my code to something like this: >> >> *void foo( int n, int array_x[])* >> *{* >> * int j = n%3;* >> * int m = n - j;* >> * for (int i=0; i < m; i+=3){* >> * array_x[i] = i;* >> * array_x[i+1] = i+1;* >> * array_x[i+2] = i+2; * >> * }* >> * for(i=m; i<n; i++)* >> * array_x[i] = i; * >> *}* >> >> In this case, the BB<label>4 and BB<label>7 will do not have the "add", >> "icmp" and "br" instructions because these BBs can be merged together. >> >> How can I achieve this? Thanks. >> >> One - rather heavy weight - way to do this would be to add the -irce pass >> after the loop unroll step. InductiveRangeCheckElimination will introduce >> a post loop so as to eliminate the range checks in the inner loop. This >> might not be the ideal transformation for this code, but it might get you >> closer to what you want. >> >> A couple of caveats: >> - 3.5 isn't recent enough to have a stable IRCE. Download ToT. >> - IRCE requires profiling information on the branches. I'd start by >> manually annotating your IR to see if it works, then exploring a profile >> build if it does. >> >> For the record, teaching the unroller to do this transformation (or a >> creating a new pass) would seem interesting. You might check with Chandler >> and/or Michael (see recent review threads) for what their plans in this >> area are. >> >> >> Regards, >> >> Xiangyang >> >> >> _______________________________________________ >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/9b888178/attachment.html>
Jeremy Lakeman via llvm-dev
2015-Aug-21 15:05 UTC
[llvm-dev] loop unrolling introduces conditional branch
There's been some recent noise on the mailing list about requiring -fno-rtti; http://lists.llvm.org/pipermail/llvm-dev/2015-August/089010.html Could that be it? On Sat, Aug 22, 2015 at 12:21 AM, Xiangyang Guo via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, James and Philip, Thanks for your help. > > Based on your advice, I downloaded llvm-3.7. However, with this new > version of LLVM, I have the following errors when I compile my previous > code: > > g++ -o parser main.o `llvm-config --libs all` `llvm-config --ldflags > --system-libs` -lpthread -ldl -rdynamic -ltinfo > main.o:(.data.rel.ro._ZTIN4llvm17GetElementPtrInstE[_ZTIN4llvm17GetElementPtrInstE]+0x10): > undefined reference to `typeinfo for llvm::Instruction' > main.o:(.data.rel.ro._ZTIN4llvm8ICmpInstE[_ZTIN4llvm8ICmpInstE]+0x10): > undefined reference to `typeinfo for llvm::CmpInst' > > BTW, in my code, I use LLVM API (IRBuilder and so on) to generate one > Module and then use PassManager to add several passes. And my Makefile is > pretty simple, it looks like this: > > *********************************************************************************************** > all: parser > > OBJS = main.o \ > > LLVMCONFIG = llvm-config > CPPFLAGS = `$(LLVMCONFIG) --cxxflags` -std=c++11 > LDFLAGS = `$(LLVMCONFIG) --ldflags --system-libs` -lpthread -ldl -rdynamic > -ltinfo > LIBS = `$(LLVMCONFIG) --libs all` > > clean: > $(RM) -rf parser $(OBJS) > > %.o: %.cpp > g++ -g -c $(CPPFLAGS) -o $@ $< > > > parser: $(OBJS) > g++ -o $@ $(OBJS) $(LIBS) $(LDFLAGS) > > ********************************************************************************************** > Do you have any idea? Thanks a lot. > > Regards, > > Xiangyang > > On Thu, Aug 20, 2015 at 2:23 PM, James Molloy <james at jamesmolloy.co.uk> > wrote: > >> Hi Xiangyang, >> >> The algorithm for loop unrolling was changed post-3.5 to do more what >> you'd expect. If you use 3.6 or 3.7 you'll likely get better results. >> >> Cheers, >> >> James >> >> On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote: >>> >>> Hi, >>> >>> I want to use loop unrolling pass, however, I find that loop unrolling >>> will introduces conditional branch at end of every "unrolled" part. For >>> example, consider the following code >>> >>> *void foo( int n, int array_x[])* >>> *{* >>> * for (int i=0; i < n; i++)* >>> * array_x[i] = i; * >>> *}* >>> >>> Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify >>> -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3 -simplifycfg -S", >>> it gives me this IR: >>> >>> *define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {* >>> * %1 = icmp slt i32 0, %n* >>> * br i1 %1, label %.lr.ph <http://lr.ph/>, label %._crit_edge* >>> >>> *.lr.ph <http://lr.ph/>: ; >>> preds = %0, %7* >>> * %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]* >>> * %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv* >>> * %3 = trunc i64 %indvars.iv to i32* >>> * store i32 %3, i32* %2* >>> * %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1* >>> * %lftr.wideiv = trunc i64 %indvars.iv.next to i32* >>> * %exitcond = icmp ne i32 %lftr.wideiv, %n* >>> * br i1 %exitcond, label %4, label %._crit_edge* >>> >>> *._crit_edge: ; preds = %.lr.ph >>> <http://lr.ph/>, %4, %7, %0* >>> * ret void* >>> >>> *; <label>:4 ; preds = %.lr.ph >>> <http://lr.ph/>* >>> * %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next* >>> * %6 = trunc i64 %indvars.iv.next to i32* >>> * store i32 %6, i32* %5* >>> * %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1* >>> * %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32* >>> * %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n* >>> * br i1 %exitcond.1, label %7, label %._crit_edge* >>> >>> *; <label>:7 ; preds = %4* >>> * %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1* >>> * %9 = trunc i64 %indvars.iv.next.1 to i32* >>> * store i32 %9, i32* %8* >>> * %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1* >>> * %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32* >>> * %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n* >>> * br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>, label %._crit_edge* >>> *}* >>> >>> As you can see, at the end of BB <label>4 and BB<label>7 there are >>> "add", "icmp" and "br" instrcutions to check the boundary. I understand >>> this is for the correctness. However, I would expect the loop unrolling can >>> change my code to something like this: >>> >>> *void foo( int n, int array_x[])* >>> *{* >>> * int j = n%3;* >>> * int m = n - j;* >>> * for (int i=0; i < m; i+=3){* >>> * array_x[i] = i;* >>> * array_x[i+1] = i+1;* >>> * array_x[i+2] = i+2; * >>> * }* >>> * for(i=m; i<n; i++)* >>> * array_x[i] = i; * >>> *}* >>> >>> In this case, the BB<label>4 and BB<label>7 will do not have the "add", >>> "icmp" and "br" instructions because these BBs can be merged together. >>> >>> How can I achieve this? Thanks. >>> >>> One - rather heavy weight - way to do this would be to add the -irce >>> pass after the loop unroll step. InductiveRangeCheckElimination will >>> introduce a post loop so as to eliminate the range checks in the inner >>> loop. This might not be the ideal transformation for this code, but it >>> might get you closer to what you want. >>> >>> A couple of caveats: >>> - 3.5 isn't recent enough to have a stable IRCE. Download ToT. >>> - IRCE requires profiling information on the branches. I'd start by >>> manually annotating your IR to see if it works, then exploring a profile >>> build if it does. >>> >>> For the record, teaching the unroller to do this transformation (or a >>> creating a new pass) would seem interesting. You might check with Chandler >>> and/or Michael (see recent review threads) for what their plans in this >>> area are. >>> >>> >>> Regards, >>> >>> Xiangyang >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150822/b1264e58/attachment.html>