thr3ads.net - llvm dev - [LLVMdev] Modifying LoopUnrollingPass [May 2015]

If this information is useful, please help other people find it:
Share via:

yaduveer singh

2015-May-02 16:00 UTC

[LLVMdev] Modifying LoopUnrollingPass

Hi Zhoulai,

I am trying to modify "LoopUnrollPass" in llvm which produces multiple
copies of loop equal to the loop unroll factor.Currently, using multicore
architecture, say 3 for example and the execution goes like:

for 3 cores if there are 9 iterations of loop
core          instruction
1                   0,3,6
2                    1,4,7
3                    2,5,8

But I want to to modify such that it can execute in following way:

core          instruction
1                   0,1,2
2                   3,4,5
3                   6,7,8

I am not able to get where to modify for this. I tried creating a sample
pass using original LoopUnrollPass code and run "make", I received
following error:

loopunrollp.cpp:210:1: error: ‘void
llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been
declared inside ‘llvm’
/bin/rm: cannot remove
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
No such file or directory


Please help

Thanks,
Yaduveer

Michael Zolotukhin

2015-May-02 18:41 UTC

head link

[LLVMdev] Modifying LoopUnrollingPass

Hi Yaduveer,

As far as I remember, unroller in LoopVectorizer pass does what you want to
achieve (look for a message "LV: Trying to at least unroll the loops.” to
locate this in the code).

Michael
> On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com>
wrote:
> 
> Hi Zhoulai,
> 
> I am trying to modify "LoopUnrollPass" in llvm which produces
multiple
> copies of loop equal to the loop unroll factor.Currently, using multicore
> architecture, say 3 for example and the execution goes like:
> 
> for 3 cores if there are 9 iterations of loop
> core          instruction
> 1                   0,3,6
> 2                    1,4,7
> 3                    2,5,8
> 
> But I want to to modify such that it can execute in following way:
> 
> core          instruction
> 1                   0,1,2
> 2                   3,4,5
> 3                   6,7,8
> 
> I am not able to get where to modify for this. I tried creating a sample
> pass using original LoopUnrollPass code and run "make", I
received
> following error:
> 
> loopunrollp.cpp:210:1: error: ‘void
> llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been
> declared inside ‘llvm’
> /bin/rm: cannot remove
>
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
> No such file or directory
> 
> 
> Please help
> 
> Thanks,
> Yaduveer
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

yaduveer singh

2015-May-02 19:02 UTC

head link

[LLVMdev] Modifying LoopUnrollingPass

Hi Michael,

Thank you very much!
I will try this.

Regards,
Yaduveer


On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin <mzolotukhin at
apple.com>
wrote:
> Hi Yaduveer,
>
> As far as I remember, unroller in LoopVectorizer pass does what you want
> to achieve (look for a message "LV: Trying to at least unroll the
loops.”
> to locate this in the code).
>
> Michael
>
> > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at
gmail.com> wrote:
> >
> > Hi Zhoulai,
> >
> > I am trying to modify "LoopUnrollPass" in llvm which
produces multiple
> > copies of loop equal to the loop unroll factor.Currently, using
multicore
> > architecture, say 3 for example and the execution goes like:
> >
> > for 3 cores if there are 9 iterations of loop
> > core          instruction
> > 1                   0,3,6
> > 2                    1,4,7
> > 3                    2,5,8
> >
> > But I want to to modify such that it can execute in following way:
> >
> > core          instruction
> > 1                   0,1,2
> > 2                   3,4,5
> > 3                   6,7,8
> >
> > I am not able to get where to modify for this. I tried creating a
sample
> > pass using original LoopUnrollPass code and run "make", I
received
> > following error:
> >
> > loopunrollp.cpp:210:1: error: ‘void
> > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have
been
> > declared inside ‘llvm’
> > /bin/rm: cannot remove
> >
>
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
> > No such file or directory
> >
> >
> > Please help
> >
> > Thanks,
> > Yaduveer
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150503/cf1b7190/attachment.html>

Michael Zolotukhin

2015-May-04 17:36 UTC

head link

[LLVMdev] Modifying LoopUnrollingPass

Hi Yaduveer,

Vectorizer probably fails because it expects a loop in a certain form, and to
convert a loop to this form one need to run some other passes first. For
example, when you run “opt -O3”,  the following passes are invoked:
-targetlibinfo -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker
-basicaa -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg
-basiccg -prune-eh -inline-cost -inline -functionattrs -argpromotion -sroa
-domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation
-simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate
-domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch
-instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom
-loop-deletion -loop-unroll -memdep -mldst-motion -domtree -memdep -gvn -memdep
-memcpyopt -sccp -domtree -bdce -instcombine -lazy-value-info -jump-threading
-correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa -licm
-adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree -loops
-loop-simplify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution
-loop-accesses -loop-vectorize -instcombine -scalar-evolution -slp-vectorizer
-simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa
-scalar-evolution -loop-unroll -instsimplify -loop-simplify -lcssa -licm
-scalar-evolution -alignment-from-assumptions -strip-dead-prototypes -globaldce
-constmerge -verify

To get this list, you can use the following command:
llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

Now, when you get a list of passes to run before the vectorizer, you need to get
‘unoptimized’ IR and run the passes on it - that should give you IR just before
the vectorizer.

To get the unoptimized IR, you could use
clang -O3 -mllvm -disable-llvm-optzns -emit-llvm your_source.c -S -o
unoptimized_ir.ll
(Please note that we use “-O3 -mllvm -disable-llvm-optzns”, not just “-O0” -
that allows us to run analysis passes, but not transformations)

Now you run ‘opt’ with passes preceding the vectorizer to get IR before
vectorization:
opt -targetlibinfo -tti -no-aa -tbaa …… -scalar-evolution -loop-accesses
unoptimized_ir.ll -S -o ir_before_loop_vectorize.ll
(you might want to remove verifier passes from the list)

And after this you are ready to run the vectorizer:
opt -loop-vectorize ir_before_loop_vectorize.ll -S -o ir_after_loop_vectorize.ll

Hopefully, that’ll resolve the issues you are facing.

Thanks,
Michael

> On May 3, 2015, at 9:18 AM, yaduveer singh <yaduveer99 at gmail.com>
wrote:
> 
> Hi Michael,
> 
> I tried running my sample C program using "LoopVectorizePass" but
I was not able to get the output as I was expecting. Every time I got the
message
> "LV: Not vectorizing: Cannot prove legality."
> 
> Following is the scenario.
> 
> c-code
> 
> #include <stdio.h>
> int main()
> {
> 	int i;
> 	int a=2;
> 	int sum=0;
> 	int arr[400];
> 
> 	for(i=0;i<400;i=i+1)
> 	{
> 		arr[i]=a+i;
> 		sum+=arr[i];       
> 	}
> 	printf("Everything is Done for 1d\n");
> 	return 0;
> }
> 
> following are my command:
> 
> yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -S -emit-llvm loop1d.c
> yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -c -emit-llvm loop1d.c
> yaduveer at yaduveer-Inspiron-3542:~/RP$ opt -loop-vectorize
-force-vector-width=4 -mem2reg -loop-rotate -indvars -debug -stats loop1d.ll |
llvm-dis -o loop1dv1.ll
> 
> we found the following message on the terminal:(Attached is the details
found on terminal and the 2 .ll files)
> 
> LV: Checking a loop in "main" from loop1d.ll
> LV: Loop hints: force=? width=4 unroll=0
> LV: Not vectorizing: Cannot prove legality.
> 
> so can you please let me know if I am following correct steps.
> If not, please guide me.
> 
> 
> Thanks in advance.
> 
> 
> Regards,
> Yaduveer
> 
> On Sun, May 3, 2015 at 12:32 AM, yaduveer singh <yaduveer99 at gmail.com
<mailto:yaduveer99 at gmail.com>> wrote:
> Hi Michael,
> 
> Thank you very much! 
> I will try this.
> 
> Regards,
> Yaduveer
> 
> 
> On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin <mzolotukhin at
apple.com <mailto:mzolotukhin at apple.com>> wrote:
> Hi Yaduveer,
> 
> As far as I remember, unroller in LoopVectorizer pass does what you want to
achieve (look for a message "LV: Trying to at least unroll the loops.” to
locate this in the code).
> 
> Michael
> 
> > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com
<mailto:yaduveer99 at gmail.com>> wrote:
> >
> > Hi Zhoulai,
> >
> > I am trying to modify "LoopUnrollPass" in llvm which
produces multiple
> > copies of loop equal to the loop unroll factor.Currently, using
multicore
> > architecture, say 3 for example and the execution goes like:
> >
> > for 3 cores if there are 9 iterations of loop
> > core          instruction
> > 1                   0,3,6
> > 2                    1,4,7
> > 3                    2,5,8
> >
> > But I want to to modify such that it can execute in following way:
> >
> > core          instruction
> > 1                   0,1,2
> > 2                   3,4,5
> > 3                   6,7,8
> >
> > I am not able to get where to modify for this. I tried creating a
sample
> > pass using original LoopUnrollPass code and run "make", I
received
> > following error:
> >
> > loopunrollp.cpp:210:1: error: ‘void
> > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have
been
> > declared inside ‘llvm’
> > /bin/rm: cannot remove
> >
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
> > No such file or directory
> >
> >
> > Please help
> >
> > Thanks,
> > Yaduveer
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> 
>
<messageOnCommandLine.txt><loop1d.c><loop1d.ll><loop1dv1.ll>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/9ff4a44e/attachment.html>

suyog sarda

2015-May-04 17:55 UTC

head link

[LLVMdev] Modifying LoopUnrollingPass

Optimization passes running before LoopVectorizer should be able to combine
the two statements (this should be happening in O1. Pls check)

arr[i] = a + i
sum += arr[i]

to

sum += a + i

Not sure, why are you using the array there.

- Suyog
On 4 May 2015 23:11, "Michael Zolotukhin" <mzolotukhin at
apple.com> wrote:
> Hi Yaduveer,
>
> Vectorizer probably fails because it expects a loop in a certain form, and
> to convert a loop to this form one need to run some other passes first. For
> example, when you run "opt -O3",  the following passes are
invoked:
> *-targetlibinfo -tti -no-aa -tbaa -scoped-noalias
> -assumption-cache-tracker -basicaa -ipsccp -globalopt -deadargelim -domtree
> -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -inline
> -functionattrs -argpromotion -sroa -domtree -early-cse -lazy-value-info
> -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine
> -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify
> -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution
> -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll
> -memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp
> -domtree -bdce -instcombine -lazy-value-info -jump-threading
> -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa
> -licm -adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree
> -loops -loop-simplify -lcssa -loop-rotate -branch-prob -block-freq
> -scalar-evolution -loop-accesses -loop-vectorize -instcombine
> -scalar-evolution -slp-vectorizer -simplifycfg -domtree -instcombine -loops
> -loop-simplify -lcssa -scalar-evolution -loop-unroll -instsimplify
> -loop-simplify -lcssa -licm -scalar-evolution -alignment-from-assumptions
> -strip-dead-prototypes -globaldce -constmerge -verify*
>
> To get this list, you can use the following command:
> llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments
>
> Now, when you get a list of passes to run before the vectorizer, you need
> to get 'unoptimized' IR and run the passes on it - that should give
you IR
> just before the vectorizer.
>
> To get the unoptimized IR, you could use
> clang -O3 -mllvm -disable-llvm-optzns -emit-llvm your_source.c -S -o
> unoptimized_ir.ll
> (Please note that we use "-O3 -mllvm -disable-llvm-optzns", not
just "-O0"
> - that allows us to run analysis passes, but not transformations)
>
> Now you run 'opt' with passes preceding the vectorizer to get IR
before
> vectorization:
> opt -targetlibinfo -tti -no-aa -tbaa ...... -scalar-evolution
-loop-accesses
> unoptimized_ir.ll -S -o ir_before_loop_vectorize.ll
> (you might want to remove verifier passes from the list)
>
> And after this you are ready to run the vectorizer:
> opt -loop-vectorize ir_before_loop_vectorize.ll -S -o
> ir_after_loop_vectorize.ll
>
> Hopefully, that'll resolve the issues you are facing.
>
> Thanks,
> Michael
>
>
> On May 3, 2015, at 9:18 AM, yaduveer singh <yaduveer99 at gmail.com>
wrote:
>
> Hi Michael,
>
> I tried running my sample C program using "LoopVectorizePass" but
I was
> not able to get the output as I was expecting. Every time I got the message
> "LV: Not vectorizing: Cannot prove legality."
>
> Following is the scenario.
>
> c-code
>
> #include <stdio.h>
> int main()
> {
> int i;
> int a=2;
> int sum=0;
> int arr[400];
>
> for(i=0;i<400;i=i+1)
> {
> arr[i]=a+i;
> sum+=arr[i];
> }
> printf("Everything is Done for 1d\n");
> return 0;
> }
>
> following are my command:
>
> yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -S -emit-llvm loop1d.c
> yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -c -emit-llvm loop1d.c
> yaduveer at yaduveer-Inspiron-3542:~/RP$ opt -loop-vectorize
> -force-vector-width=4 -mem2reg -loop-rotate -indvars -debug -stats
> loop1d.ll | llvm-dis -o loop1dv1.ll
>
> we found the following message on the terminal:(Attached is the details
> found on terminal and the 2 .ll files)
>
> LV: Checking a loop in "main" from loop1d.ll
> LV: Loop hints: force=? width=4 unroll=0
> LV: Not vectorizing: Cannot prove legality.
>
> so can you please let me know if I am following correct steps.
> If not, please guide me.
>
>
> Thanks in advance.
>
>
> Regards,
> Yaduveer
>
> On Sun, May 3, 2015 at 12:32 AM, yaduveer singh <yaduveer99 at
gmail.com>
> wrote:
>
>> Hi Michael,
>>
>> Thank you very much!
>> I will try this.
>>
>> Regards,
>> Yaduveer
>>
>>
>> On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin <
>> mzolotukhin at apple.com> wrote:
>>
>>> Hi Yaduveer,
>>>
>>> As far as I remember, unroller in LoopVectorizer pass does what you
want
>>> to achieve (look for a message "LV: Trying to at least unroll
the loops."
>>> to locate this in the code).
>>>
>>> Michael
>>>
>>> > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at
gmail.com>
>>> wrote:
>>> >
>>> > Hi Zhoulai,
>>> >
>>> > I am trying to modify "LoopUnrollPass" in llvm which
produces multiple
>>> > copies of loop equal to the loop unroll factor.Currently,
using
>>> multicore
>>> > architecture, say 3 for example and the execution goes like:
>>> >
>>> > for 3 cores if there are 9 iterations of loop
>>> > core          instruction
>>> > 1                   0,3,6
>>> > 2                    1,4,7
>>> > 3                    2,5,8
>>> >
>>> > But I want to to modify such that it can execute in following
way:
>>> >
>>> > core          instruction
>>> > 1                   0,1,2
>>> > 2                   3,4,5
>>> > 3                   6,7,8
>>> >
>>> > I am not able to get where to modify for this. I tried
creating a
>>> sample
>>> > pass using original LoopUnrollPass code and run
"make", I received
>>> > following error:
>>> >
>>> > loopunrollp.cpp:210:1: error: 'void
>>> > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)'
should have been
>>> > declared inside 'llvm'
>>> > /bin/rm: cannot remove
>>> >
>>>
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
>>> > No such file or directory
>>> >
>>> >
>>> > Please help
>>> >
>>> > Thanks,
>>> > Yaduveer
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>
<messageOnCommandLine.txt><loop1d.c><loop1d.ll><loop1dv1.ll>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/42c83301/attachment.html>

Renato Golin

2015-May-05 18:31 UTC

head link

[LLVMdev] Modifying LoopUnrollingPass

Hi Yaduveer,

I may be missing something, but it seems you're trying to get
different cores running parts of the loop, which you'll get for free
if you use OpenMP.

The loop unroller is meant to increase load/store speed by loading a
lot of values, then operating on all of them, then writing back
altogether. Even if not vectorized (SIMD, not threads), it still has
some performance gains. Vectorization is also only about SIMD engines
in a single core (doing 2/4/8 operations at the same time), nothing to
do with using multiple cores.

Before you jump head first into the source, you need to ask yourself
the right question: What do you want to do?

1) Use all cores, dividing the loop into multiple cores, one block at
a time. Use OpenMP for this.
2) Use your SIMD engine on each core. Use the loop vectorizer for this.
3) Or is it just about load/store speed ups? The loop unroller will
help you here.

You can also use all three at the same time, having all cores running
their SIMD engines with a massively unrolled loop by using all of the
above.

cheers,
--renato

On 2 May 2015 at 17:00, yaduveer singh <yaduveer99 at gmail.com>
wrote:> Hi Zhoulai,
>
> I am trying to modify "LoopUnrollPass" in llvm which produces
multiple
> copies of loop equal to the loop unroll factor.Currently, using multicore
> architecture, say 3 for example and the execution goes like:
>
> for 3 cores if there are 9 iterations of loop
> core          instruction
> 1                   0,3,6
> 2                    1,4,7
> 3                    2,5,8
>
> But I want to to modify such that it can execute in following way:
>
> core          instruction
> 1                   0,1,2
> 2                   3,4,5
> 3                   6,7,8
>
> I am not able to get where to modify for this. I tried creating a sample
> pass using original LoopUnrollPass code and run "make", I
received
> following error:
>
> loopunrollp.cpp:210:1: error: ‘void
> llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been
> declared inside ‘llvm’
> /bin/rm: cannot remove
>
`/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp':
> No such file or directory
>
>
> Please help
>
> Thanks,
> Yaduveer
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - May 2015 - [LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

[LLVMdev] Modifying LoopUnrollingPass

Seemingly Similar Threads