thr3ads.net - llvm dev - [llvm-dev] loop unrolling introduces conditional branch [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Xiangyang Guo via llvm-dev

2015-Aug-22 03:27 UTC

[llvm-dev] loop unrolling introduces conditional branch

Hi,

I just tried llvm-3.8 (LLVM SVN Repository). With this version, -fno-rtti
can help me to compile my code and -irce can help me to do a better job for
loop unrolling. However, I still have one question. If I use Clang to
compile a piece of c++ code to .bc and then use 'opt -loop-rotate
-loop-unroll -irce', I can get what I want. I mean, there is no conditional
branch at the end of each unrolled part. However, If I use LLVM API such as
IRBuilder (CreateAdd, CreateGEP, CreateLoad and so on) to generate the .bc
(I dump the two .bc files and they looks like almost same except the
variable name), then 'opt -loop-rotate -loop-unroll -irce'I cannot get
what
I want. I mean, in this case, there is still loop boundary checking (add,
compare, conditional branch) at the end of each unrolled part.

I'm really confused about this. Does Clang do something special? Or do I
need to do something else to eliminate the unnecessary loop boundary
checking at the end of each unrolled part?

Thanks for your help.

Xiangyang


On Fri, Aug 21, 2015 at 11:29 AM, Xiangyang Guo <xguo6 at ncsu.edu> wrote:
> Hi, Jeremy,
>
> Thanks for your reply. I tried -fno-rtti yesterday and no luck.
>
> Regards,
>
> Xiangyang
>
> On Fri, Aug 21, 2015 at 11:05 AM, Jeremy Lakeman <Jeremy.Lakeman at
gmail.com
> > wrote:
>
>> There's been some recent noise on the mailing list about requiring
>> -fno-rtti;
>> http://lists.llvm.org/pipermail/llvm-dev/2015-August/089010.html
>>
>> Could that be it?
>>
>> On Sat, Aug 22, 2015 at 12:21 AM, Xiangyang Guo via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi, James and Philip, Thanks for your help.
>>>
>>> Based on your advice, I downloaded llvm-3.7. However, with this new
>>> version of LLVM, I have the following errors when I compile my
previous
>>> code:
>>>
>>> g++ -o parser main.o  `llvm-config --libs all` `llvm-config
--ldflags
>>> --system-libs` -lpthread -ldl -rdynamic -ltinfo
>>>
main.o:(.data.rel.ro._ZTIN4llvm17GetElementPtrInstE[_ZTIN4llvm17GetElementPtrInstE]+0x10):
>>> undefined reference to `typeinfo for llvm::Instruction'
>>>
main.o:(.data.rel.ro._ZTIN4llvm8ICmpInstE[_ZTIN4llvm8ICmpInstE]+0x10):
>>> undefined reference to `typeinfo for llvm::CmpInst'
>>>
>>> BTW, in my code, I use LLVM API (IRBuilder and so on) to generate
one
>>> Module and then use PassManager to add several passes. And my
Makefile is
>>> pretty simple, it looks like this:
>>>
>>>
***********************************************************************************************
>>> all: parser
>>>
>>> OBJS = main.o    \
>>>
>>> LLVMCONFIG = llvm-config
>>> CPPFLAGS = `$(LLVMCONFIG) --cxxflags` -std=c++11
>>> LDFLAGS = `$(LLVMCONFIG) --ldflags --system-libs` -lpthread -ldl
>>> -rdynamic -ltinfo
>>> LIBS = `$(LLVMCONFIG) --libs all`
>>>
>>> clean:
>>> $(RM) -rf parser $(OBJS)
>>>
>>> %.o: %.cpp
>>> g++ -g -c $(CPPFLAGS) -o $@ $<
>>>
>>>
>>> parser: $(OBJS)
>>> g++ -o $@ $(OBJS) $(LIBS) $(LDFLAGS)
>>>
>>>
**********************************************************************************************
>>> Do you have any idea? Thanks a lot.
>>>
>>> Regards,
>>>
>>> Xiangyang
>>>
>>> On Thu, Aug 20, 2015 at 2:23 PM, James Molloy <james at
jamesmolloy.co.uk>
>>> wrote:
>>>
>>>> Hi Xiangyang,
>>>>
>>>> The algorithm for loop unrolling was changed post-3.5 to do
more what
>>>> you'd expect. If you use 3.6 or 3.7 you'll likely get
better results.
>>>>
>>>> Cheers,
>>>>
>>>> James
>>>>
>>>> On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I want to use loop unrolling pass, however, I find that
loop unrolling
>>>>> will introduces conditional branch at end of every
"unrolled" part. For
>>>>> example, consider the following code
>>>>>
>>>>> *void foo( int n, int array_x[])*
>>>>> *{*
>>>>> *    for (int i=0; i < n; i++)*
>>>>> *     array_x[i] = i; *
>>>>> *}*
>>>>>
>>>>> Then I use this command "opt-3.5 try.bc -mem2reg
-loops -loop-simplify
>>>>> -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3
-simplifycfg -S",
>>>>> it gives me this IR:
>>>>>
>>>>> *define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {*
>>>>> *  %1 = icmp slt i32 0, %n*
>>>>> *  br i1 %1, label %.lr.ph <http://lr.ph/>, label
%._crit_edge*
>>>>>
>>>>> *.lr.ph <http://lr.ph/>:                             
;
>>>>> preds = %0, %7*
>>>>> *  %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0,
%0 ]*
>>>>> *  %2 = getelementptr inbounds i32* %array_x, i64
%indvars.iv*
>>>>> *  %3 = trunc i64 %indvars.iv to i32*
>>>>> *  store i32 %3, i32* %2*
>>>>> *  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1*
>>>>> *  %lftr.wideiv = trunc i64 %indvars.iv.next to i32*
>>>>> *  %exitcond = icmp ne i32 %lftr.wideiv, %n*
>>>>> *  br i1 %exitcond, label %4, label %._crit_edge*
>>>>>
>>>>> *._crit_edge:                                      ; preds
= %.lr.ph
>>>>> <http://lr.ph/>, %4, %7, %0*
>>>>> *  ret void*
>>>>>
>>>>> *; <label>:4                                       ;
preds = %.lr.ph
>>>>> <http://lr.ph/>*
>>>>> *  %5 = getelementptr inbounds i32* %array_x, i64
%indvars.iv.next*
>>>>> *  %6 = trunc i64 %indvars.iv.next to i32*
>>>>> *  store i32 %6, i32* %5*
>>>>> *  %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next,
1*
>>>>> *  %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32*
>>>>> *  %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n*
>>>>> *  br i1 %exitcond.1, label %7, label %._crit_edge*
>>>>>
>>>>> *; <label>:7                                       ;
preds = %4*
>>>>> *  %8 = getelementptr inbounds i32* %array_x, i64
%indvars.iv.next.1*
>>>>> *  %9 = trunc i64 %indvars.iv.next.1 to i32*
>>>>> *  store i32 %9, i32* %8*
>>>>> *  %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1,
1*
>>>>> *  %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32*
>>>>> *  %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n*
>>>>> *  br i1 %exitcond.2, label %.lr.ph <http://lr.ph/>,
label
>>>>> %._crit_edge*
>>>>> *}*
>>>>>
>>>>> As you can see, at the end of BB <label>4 and
BB<label>7 there are
>>>>> "add", "icmp" and "br"
instrcutions to check the boundary. I understand
>>>>> this is for the correctness. However, I would expect the
loop unrolling can
>>>>> change my code to something like this:
>>>>>
>>>>> *void foo( int n, int array_x[])*
>>>>> *{*
>>>>> *    int j = n%3;*
>>>>> *    int m = n - j;*
>>>>> *    for (int i=0; i < m; i+=3){*
>>>>> *     array_x[i] = i;*
>>>>> *     array_x[i+1] = i+1;*
>>>>> *     array_x[i+2] = i+2; *
>>>>> *    }*
>>>>> *    for(i=m; i<n; i++)*
>>>>> *     array_x[i] = i;        *
>>>>> *}*
>>>>>
>>>>> In this case, the BB<label>4 and BB<label>7
will do not have the
>>>>> "add", "icmp" and "br"
instructions because these BBs can be merged
>>>>> together.
>>>>>
>>>>> How can I achieve this? Thanks.
>>>>>
>>>>> One - rather heavy weight - way to do this would be to add
the -irce
>>>>> pass after the loop unroll step. 
InductiveRangeCheckElimination will
>>>>> introduce a post loop so as to eliminate the range checks
in the inner
>>>>> loop.  This might not be the ideal transformation for this
code, but it
>>>>> might get you closer to what you want.
>>>>>
>>>>> A couple of caveats:
>>>>> - 3.5 isn't recent enough to have a stable IRCE. 
Download ToT.
>>>>> - IRCE requires profiling information on the branches. 
I'd start by
>>>>> manually annotating your IR to see if it works, then
exploring a profile
>>>>> build if it does.
>>>>>
>>>>> For the record, teaching the unroller to do this
transformation (or a
>>>>> creating a new pass) would seem interesting.  You might
check with Chandler
>>>>> and/or Michael (see recent review threads) for what their
plans in this
>>>>> area are.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Xiangyang
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/e247b4f1/attachment.html>

Mehdi Amini via llvm-dev

2015-Aug-22 07:27 UTC

head link

[llvm-dev] loop unrolling introduces conditional branch

Can you post the two IR online?

— 
Mehdi
> On Aug 21, 2015, at 8:27 PM, Xiangyang Guo via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi, 
> 
> I just tried llvm-3.8 (LLVM SVN Repository). With this version, -fno-rtti
can help me to compile my code and -irce can help me to do a better job for loop
unrolling. However, I still have one question. If I use Clang to compile a piece
of c++ code to .bc and then use 'opt -loop-rotate -loop-unroll -irce', I
can get what I want. I mean, there is no conditional branch at the end of each
unrolled part. However, If I use LLVM API such as IRBuilder (CreateAdd,
CreateGEP, CreateLoad and so on) to generate the .bc (I dump the two .bc files
and they looks like almost same except the variable name), then 'opt
-loop-rotate -loop-unroll -irce'I cannot get what I want. I mean, in this
case, there is still loop boundary checking (add, compare, conditional branch)
at the end of each unrolled part.
> 
> I'm really confused about this. Does Clang do something special? Or do
I need to do something else to eliminate the unnecessary loop boundary checking
at the end of each unrolled part?
> 
> Thanks for your help.
> 
> Xiangyang
> 
> 
> On Fri, Aug 21, 2015 at 11:29 AM, Xiangyang Guo <xguo6 at ncsu.edu
<mailto:xguo6 at ncsu.edu>> wrote:
> Hi, Jeremy,
> 
> Thanks for your reply. I tried -fno-rtti yesterday and no luck.
> 
> Regards,
> 
> Xiangyang
> 
> On Fri, Aug 21, 2015 at 11:05 AM, Jeremy Lakeman <Jeremy.Lakeman at
gmail.com <mailto:Jeremy.Lakeman at gmail.com>> wrote:
> There's been some recent noise on the mailing list about requiring
-fno-rtti;
> http://lists.llvm.org/pipermail/llvm-dev/2015-August/089010.html
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_llvm-2Ddev_2015-2DAugust_089010.html&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=ymrIB-O3ZSdNYeAr6O77yr2EXY5oYesx1dTE2lvYifs&e=>
> 
> Could that be it?
> 
> On Sat, Aug 22, 2015 at 12:21 AM, Xiangyang Guo via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi, James and Philip, Thanks for your help.
> 
> Based on your advice, I downloaded llvm-3.7. However, with this new version
of LLVM, I have the following errors when I compile my previous code:
> 
> g++ -o parser main.o  `llvm-config --libs all` `llvm-config --ldflags
--system-libs` -lpthread -ldl -rdynamic -ltinfo
>
main.o:(.data.rel.ro._ZTIN4llvm17GetElementPtrInstE[_ZTIN4llvm17GetElementPtrInstE]+0x10):
undefined reference to `typeinfo for llvm::Instruction'
> main.o:(.data.rel.ro._ZTIN4llvm8ICmpInstE[_ZTIN4llvm8ICmpInstE]+0x10):
undefined reference to `typeinfo for llvm::CmpInst'
> 
> BTW, in my code, I use LLVM API (IRBuilder and so on) to generate one
Module and then use PassManager to add several passes. And my Makefile is pretty
simple, it looks like this:
>
***********************************************************************************************
> all: parser
> 
> OBJS = main.o    \
> 
> LLVMCONFIG = llvm-config
> CPPFLAGS = `$(LLVMCONFIG) --cxxflags` -std=c++11
> LDFLAGS = `$(LLVMCONFIG) --ldflags --system-libs` -lpthread -ldl -rdynamic
-ltinfo
> LIBS = `$(LLVMCONFIG) --libs all`
> 
> clean:
> 	$(RM) -rf parser $(OBJS)
> 
> %.o: %.cpp
> 	g++ -g -c $(CPPFLAGS) -o $@ $<
> 
> 
> parser: $(OBJS)
> 	g++ -o $@ $(OBJS) $(LIBS) $(LDFLAGS)
>
**********************************************************************************************
> Do you have any idea? Thanks a lot.
> 
> Regards,
> 
> Xiangyang 
> 
> On Thu, Aug 20, 2015 at 2:23 PM, James Molloy <james at
jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>> wrote:
> Hi Xiangyang,
> 
> The algorithm for loop unrolling was changed post-3.5 to do more what
you'd expect. If you use 3.6 or 3.7 you'll likely get better results.
> 
> Cheers,
> 
> James
> 
> On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote:
>> Hi, 
>> 
>> I want to use loop unrolling pass, however, I find that loop unrolling
will introduces conditional branch at end of every "unrolled" part.
For example, consider the following code
>> 
>> void foo( int n, int array_x[])
>> {
>>     for (int i=0; i < n; i++)
>>      array_x[i] = i; 
>> }
>> 
>> Then I use this command "opt-3.5 try.bc -mem2reg -loops
-loop-simplify -loop-rotate -lcssa -indvars -loop-unroll -unroll-count=3
-simplifycfg -S", it gives me this IR:
>> 
>> define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {
>>   %1 = icmp slt i32 0, %n
>>   br i1 %1, label %.lr.ph
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lr.ph_&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=EfR7wZAtVvK4UskGT4q9WtTD1gQPKtWnoc6nhM3hQoQ&e=>,
label %._crit_edge
>> 
>> .lr.ph
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lr.ph_&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=EfR7wZAtVvK4UskGT4q9WtTD1gQPKtWnoc6nhM3hQoQ&e=>:
; preds = %0, %7
>>   %indvars.iv = phi i64 [ %indvars.iv.next.2, %7 ], [ 0, %0 ]
>>   %2 = getelementptr inbounds i32* %array_x, i64 %indvars.iv
>>   %3 = trunc i64 %indvars.iv to i32
>>   store i32 %3, i32* %2
>>   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>   %lftr.wideiv = trunc i64 %indvars.iv.next to i32
>>   %exitcond = icmp ne i32 %lftr.wideiv, %n
>>   br i1 %exitcond, label %4, label %._crit_edge
>> 
>> ._crit_edge:                                      ; preds = %.lr.ph
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lr.ph_&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=EfR7wZAtVvK4UskGT4q9WtTD1gQPKtWnoc6nhM3hQoQ&e=>,
%4, %7, %0
>>   ret void
>> 
>> ; <label>:4                                       ; preds =
%.lr.ph
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lr.ph_&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=EfR7wZAtVvK4UskGT4q9WtTD1gQPKtWnoc6nhM3hQoQ&e=>
>>   %5 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next
>>   %6 = trunc i64 %indvars.iv.next to i32
>>   store i32 %6, i32* %5
>>   %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1
>>   %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32
>>   %exitcond.1 = icmp ne i32 %lftr.wideiv.1, %n
>>   br i1 %exitcond.1, label %7, label %._crit_edge
>> 
>> ; <label>:7                                       ; preds = %4
>>   %8 = getelementptr inbounds i32* %array_x, i64 %indvars.iv.next.1
>>   %9 = trunc i64 %indvars.iv.next.1 to i32
>>   store i32 %9, i32* %8
>>   %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1
>>   %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32
>>   %exitcond.2 = icmp ne i32 %lftr.wideiv.2, %n
>>   br i1 %exitcond.2, label %.lr.ph
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lr.ph_&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=EfR7wZAtVvK4UskGT4q9WtTD1gQPKtWnoc6nhM3hQoQ&e=>,
label %._crit_edge
>> }
>> 
>> As you can see, at the end of BB <label>4 and BB<label>7
there are "add", "icmp" and "br" instrcutions to
check the boundary. I understand this is for the correctness. However, I would
expect the loop unrolling can change my code to something like this:
>> 
>> void foo( int n, int array_x[])
>> {
>>     int j = n%3;
>>     int m = n - j;
>>     for (int i=0; i < m; i+=3){
>>      array_x[i] = i;
>>      array_x[i+1] = i+1;
>>      array_x[i+2] = i+2; 
>>     }
>>     for(i=m; i<n; i++)
>>      array_x[i] = i;        
>> }
>> 
>> In this case, the BB<label>4 and BB<label>7 will do not
have the "add", "icmp" and "br" instructions
because these BBs can be merged together.
>> 
>> How can I achieve this? Thanks.
> 
> One - rather heavy weight - way to do this would be to add the -irce pass
after the loop unroll step.  InductiveRangeCheckElimination will introduce a
post loop so as to eliminate the range checks in the inner loop.  This might not
be the ideal transformation for this code, but it might get you closer to what
you want.
> 
> A couple of caveats:
> - 3.5 isn't recent enough to have a stable IRCE.  Download ToT.
> - IRCE requires profiling information on the branches.  I'd start by
manually annotating your IR to see if it works, then exploring a profile build
if it does.
> 
> For the record, teaching the unroller to do this transformation (or a
creating a new pass) would seem interesting.  You might check with Chandler
and/or Michael (see recent review threads) for what their plans in this area
are.
>> 
>> Regards,
>> 
>> Xiangyang
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=mU0cUMoFPfz5slShAmRjaZoumLcsbxGp-IZOS-__wUk&e=>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=mU0cUMoFPfz5slShAmRjaZoumLcsbxGp-IZOS-__wUk&e=>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=mU0cUMoFPfz5slShAmRjaZoumLcsbxGp-IZOS-__wUk&e=>
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=iK-4Sl62Seah5JOtnUG-QuscAiOsYnFzalJonc_U6VU&s=mU0cUMoFPfz5slShAmRjaZoumLcsbxGp-IZOS-__wUk&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150822/da0b9ae9/attachment.html>

Xiangyang Guo via llvm-dev

2015-Aug-22 14:27 UTC

head link

[llvm-dev] loop unrolling introduces conditional branch

Hi, Mehdi,

For example, I have this very simple source code:
void foo( int n, int array_x[])
{
    for (int i=0; i < n; i++)
   array_x[i] = i;
}

After I use "clang -emit-llvm -o bc_from_clang.bc -c try.cc", I get
bc_from_clang.bc. With my code (using LLVM IRbuilder API), I get
bc_from_api.bc. Attachment please find thse two files. I also past the IR
here.
******************************** Clang Generate IR Start
***********************************************************
; ModuleID = 'bc_from_clang.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind uwtable
define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32*, align 8
  %i = alloca i32, align 4
  store i32 %n, i32* %1, align 4
  store i32* %array_x, i32** %2, align 8
  store i32 0, i32* %i, align 4
  br label %3

; <label>:3                                       ; preds = %13, %0
  %4 = load i32, i32* %i, align 4
  %5 = load i32, i32* %1, align 4
  %6 = icmp slt i32 %4, %5
  br i1 %6, label %7, label %16

; <label>:7                                       ; preds = %3
  %8 = load i32, i32* %i, align 4
  %9 = load i32, i32* %i, align 4
  %10 = sext i32 %9 to i64
  %11 = load i32*, i32** %2, align 8
  %12 = getelementptr inbounds i32, i32* %11, i64 %10
  store i32 %8, i32* %12, align 4
  br label %13

; <label>:13                                      ; preds = %7
  %14 = load i32, i32* %i, align 4
  %15 = add nsw i32 %14, 1
  store i32 %15, i32* %i, align 4
  br label %3

; <label>:16                                      ; preds = %3
  ret void
}

attributes #0 = { nounwind uwtable
"disable-tail-calls"="false"
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
"target-cpu"="x86-64"
"target-features"="+sse,+sse2"
"unsafe-fp-math"="false"
"use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.8.0 (trunk 245730) (llvm/trunk 245727)"}

******************************** Clang Generate IR End
 ***********************************************************

******************************** API Generate IR Start
***********************************************************
; ModuleID = 'bc_from_api.bc'
target triple = "x86_64-unkown-linux-gnu"

; Function Attrs: nounwind
define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {
entry:
  %n.addr = alloca i32, align 4
  %array_x.addr = alloca i32*, align 8
  %i = alloca i32, align 4
  store i32 %n, i32* %n.addr, align 4
  store i32* %array_x, i32** %array_x.addr, align 8
  store i32 0, i32* %i, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32, i32* %i, align 4
  %1 = load i32, i32* %n.addr, align 4
  %cmp = icmp slt i32 %0, %1
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %2 = load i32, i32* %i, align 4
  %3 = load i32, i32* %i, align 4
  %idxprom = sext i32 %3 to i64
  %4 = load i32*, i32** %array_x.addr, align 8
  %arrayidx = getelementptr inbounds i32, i32* %4, i64 %idxprom
  store i32 %2, i32* %arrayidx, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %5 = load i32, i32* %i, align 4
  %inc = add i32 %5, 1
  store i32 %inc, i32* %i, align 4
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret void
}

attributes #0 = { nounwind }

******************************** API Generate IR End
 ***********************************************************

Then I use "opt file.bc -mem2reg -loops -loop-simplify -loop-rotate -lcssa
-indvars -loop-unroll -unroll-count=4 -irce -simplifycfg -S" to run both
.bc files.
The first .bc file give me this:

***************************** Clang Generate IR with LoopUnrolling
Start**********************************************
; ModuleID = 'bc_from_clang.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind uwtable
define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {
  %1 = icmp slt i32 0, %n
  br i1 %1, label %.lr.ph, label %._crit_edge

.lr.ph:                                           ; preds = %0
  %2 = add i32 %n, -1
  %xtraiter = and i32 %n, 3
  %lcmp.mod = icmp ne i32 %xtraiter, 0
  br i1 %lcmp.mod, label %3, label %.lr.ph.split

; <label>:3                                       ; preds = %3, %.lr.ph
  %indvars.iv.prol = phi i64 [ 0, %.lr.ph ], [ %indvars.iv.next.prol, %3 ]
  %prol.iter = phi i32 [ %xtraiter, %.lr.ph ], [ %prol.iter.sub, %3 ]
  %4 = getelementptr inbounds i32, i32* %array_x, i64 %indvars.iv.prol
  %5 = trunc i64 %indvars.iv.prol to i32
  store i32 %5, i32* %4, align 4
  %indvars.iv.next.prol = add nuw nsw i64 %indvars.iv.prol, 1
  %lftr.wideiv.prol = trunc i64 %indvars.iv.next.prol to i32
  %exitcond.prol = icmp ne i32 %lftr.wideiv.prol, %n
  %prol.iter.sub = sub i32 %prol.iter, 1
  %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
  br i1 %prol.iter.cmp, label %3, label %.lr.ph.split, !llvm.loop !1

.lr.ph.split:                                     ; preds = %3, %.lr.ph
  %indvars.iv.unr = phi i64 [ 0, %.lr.ph ], [ %indvars.iv.next.prol, %3 ]
  %6 = icmp ult i32 %2, 3
  br i1 %6, label %._crit_edge, label %.lr.ph.split.split

.lr.ph.split.split:                               ; preds = %.lr.ph.split,
%.lr.ph.split.split
  %indvars.iv = phi i64 [ %indvars.iv.next.3, %.lr.ph.split.split ], [
%indvars.iv.unr, %.lr.ph.split ]
  %7 = getelementptr inbounds i32, i32* %array_x, i64 %indvars.iv
  %8 = trunc i64 %indvars.iv to i32
  store i32 %8, i32* %7, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %9 = getelementptr inbounds i32, i32* %array_x, i64 %indvars.iv.next
  %10 = trunc i64 %indvars.iv.next to i32
  store i32 %10, i32* %9, align 4
  %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next, 1
  %lftr.wideiv.1 = trunc i64 %indvars.iv.next.1 to i32
  %11 = getelementptr inbounds i32, i32* %array_x, i64 %indvars.iv.next.1
  %12 = trunc i64 %indvars.iv.next.1 to i32
  store i32 %12, i32* %11, align 4
  %indvars.iv.next.2 = add nuw nsw i64 %indvars.iv.next.1, 1
  %lftr.wideiv.2 = trunc i64 %indvars.iv.next.2 to i32
  %13 = getelementptr inbounds i32, i32* %array_x, i64 %indvars.iv.next.2
  %14 = trunc i64 %indvars.iv.next.2 to i32
  store i32 %14, i32* %13, align 4
  %indvars.iv.next.3 = add nuw nsw i64 %indvars.iv.next.2, 1
  %lftr.wideiv.3 = trunc i64 %indvars.iv.next.3 to i32
  %exitcond.3 = icmp ne i32 %lftr.wideiv.3, %n
  br i1 %exitcond.3, label %.lr.ph.split.split, label %._crit_edge

._crit_edge:                                      ; preds = %.lr.ph.split,
%.lr.ph.split.split, %0
  ret void
}

attributes #0 = { nounwind uwtable
"disable-tail-calls"="false"
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
"target-cpu"="x86-64"
"target-features"="+sse,+sse2"
"unsafe-fp-math"="false"
"use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.8.0 (trunk 245730) (llvm/trunk 245727)"}
!1 = distinct !{!1, !2}
!2 = !{!"llvm.loop.unroll.disable"}

******************************Clang Generate IR with LoopUnrolling
End***********************************************

The second .bc file gives me this:
******************************API Generate IR with LoopUnrolling
Start*************************************************
; ModuleID = 'bc_from_api.bc'
target triple = "x86_64-unkown-linux-gnu"

; Function Attrs: nounwind
define void @_Z3fooiPi(i32 %n, i32* %array_x) #0 {
entry:
  %cmp.1 = icmp slt i32 0, %n
  br i1 %cmp.1, label %for.body, label %for.end

for.body:                                         ; preds = %entry,
%for.body.3
  %i.02 = phi i32 [ %inc.3, %for.body.3 ], [ 0, %entry ]
  %idxprom = sext i32 %i.02 to i64
  %arrayidx = getelementptr inbounds i32, i32* %array_x, i64 %idxprom
  store i32 %i.02, i32* %arrayidx, align 4
  %inc = add nuw nsw i32 %i.02, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body.1, label %for.end

for.end:                                          ; preds = %for.body,
%for.body.1, %for.body.2, %for.body.3, %entry
  ret void

for.body.1:                                       ; preds = %for.body
  %idxprom.1 = sext i32 %inc to i64
  %arrayidx.1 = getelementptr inbounds i32, i32* %array_x, i64 %idxprom.1
  store i32 %inc, i32* %arrayidx.1, align 4
  %inc.1 = add nuw nsw i32 %inc, 1
  %cmp.1.3 = icmp slt i32 %inc.1, %n
  br i1 %cmp.1.3, label %for.body.2, label %for.end

for.body.2:                                       ; preds = %for.body.1
  %idxprom.2 = sext i32 %inc.1 to i64
  %arrayidx.2 = getelementptr inbounds i32, i32* %array_x, i64 %idxprom.2
  store i32 %inc.1, i32* %arrayidx.2, align 4
  %inc.2 = add nuw nsw i32 %inc.1, 1
  %cmp.2 = icmp slt i32 %inc.2, %n
  br i1 %cmp.2, label %for.body.3, label %for.end

for.body.3:                                       ; preds = %for.body.2
  %idxprom.3 = sext i32 %inc.2 to i64
  %arrayidx.3 = getelementptr inbounds i32, i32* %array_x, i64 %idxprom.3
  store i32 %inc.2, i32* %arrayidx.3, align 4
  %inc.3 = add nuw nsw i32 %inc.2, 1
  %cmp.3 = icmp slt i32 %inc.3, %n
  br i1 %cmp.3, label %for.body, label %for.end
}

attributes #0 = { nounwind }
******************************API Generate IR with LoopUnrolling
End**************************************************

Sorry I post two many code here. Can you give me any suggestion? Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150822/fd12e18f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bc_from_clang.bc
Type: application/octet-stream
Size: 1260 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150822/fd12e18f/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bc_from_api.bc
Type: application/octet-stream
Size: 788 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150822/fd12e18f/attachment-0001.obj>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Aug 2015 - loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

[llvm-dev] loop unrolling introduces conditional branch

Reasonably Related Threads