Hi Zhoulai, I am trying to modify "LoopUnrollPass" in llvm which produces multiple copies of loop equal to the loop unroll factor.Currently, using multicore architecture, say 3 for example and the execution goes like: for 3 cores if there are 9 iterations of loop core instruction 1 0,3,6 2 1,4,7 3 2,5,8 But I want to to modify such that it can execute in following way: core instruction 1 0,1,2 2 3,4,5 3 6,7,8 I am not able to get where to modify for this. I tried creating a sample pass using original LoopUnrollPass code and run "make", I received following error: loopunrollp.cpp:210:1: error: ‘void llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been declared inside ‘llvm’ /bin/rm: cannot remove `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': No such file or directory Please help Thanks, Yaduveer
Hi Yaduveer, As far as I remember, unroller in LoopVectorizer pass does what you want to achieve (look for a message "LV: Trying to at least unroll the loops.” to locate this in the code). Michael> On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com> wrote: > > Hi Zhoulai, > > I am trying to modify "LoopUnrollPass" in llvm which produces multiple > copies of loop equal to the loop unroll factor.Currently, using multicore > architecture, say 3 for example and the execution goes like: > > for 3 cores if there are 9 iterations of loop > core instruction > 1 0,3,6 > 2 1,4,7 > 3 2,5,8 > > But I want to to modify such that it can execute in following way: > > core instruction > 1 0,1,2 > 2 3,4,5 > 3 6,7,8 > > I am not able to get where to modify for this. I tried creating a sample > pass using original LoopUnrollPass code and run "make", I received > following error: > > loopunrollp.cpp:210:1: error: ‘void > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been > declared inside ‘llvm’ > /bin/rm: cannot remove > `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': > No such file or directory > > > Please help > > Thanks, > Yaduveer > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Michael, Thank you very much! I will try this. Regards, Yaduveer On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:> Hi Yaduveer, > > As far as I remember, unroller in LoopVectorizer pass does what you want > to achieve (look for a message "LV: Trying to at least unroll the loops.” > to locate this in the code). > > Michael > > > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com> wrote: > > > > Hi Zhoulai, > > > > I am trying to modify "LoopUnrollPass" in llvm which produces multiple > > copies of loop equal to the loop unroll factor.Currently, using multicore > > architecture, say 3 for example and the execution goes like: > > > > for 3 cores if there are 9 iterations of loop > > core instruction > > 1 0,3,6 > > 2 1,4,7 > > 3 2,5,8 > > > > But I want to to modify such that it can execute in following way: > > > > core instruction > > 1 0,1,2 > > 2 3,4,5 > > 3 6,7,8 > > > > I am not able to get where to modify for this. I tried creating a sample > > pass using original LoopUnrollPass code and run "make", I received > > following error: > > > > loopunrollp.cpp:210:1: error: ‘void > > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been > > declared inside ‘llvm’ > > /bin/rm: cannot remove > > > `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': > > No such file or directory > > > > > > Please help > > > > Thanks, > > Yaduveer > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150503/cf1b7190/attachment.html>
Hi Yaduveer, Vectorizer probably fails because it expects a loop in a certain form, and to convert a loop to this form one need to run some other passes first. For example, when you run “opt -O3”, the following passes are invoked: -targetlibinfo -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -inline -functionattrs -argpromotion -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp -domtree -bdce -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree -loops -loop-simplify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution -loop-accesses -loop-vectorize -instcombine -scalar-evolution -slp-vectorizer -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -scalar-evolution -loop-unroll -instsimplify -loop-simplify -lcssa -licm -scalar-evolution -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -verify To get this list, you can use the following command: llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments Now, when you get a list of passes to run before the vectorizer, you need to get ‘unoptimized’ IR and run the passes on it - that should give you IR just before the vectorizer. To get the unoptimized IR, you could use clang -O3 -mllvm -disable-llvm-optzns -emit-llvm your_source.c -S -o unoptimized_ir.ll (Please note that we use “-O3 -mllvm -disable-llvm-optzns”, not just “-O0” - that allows us to run analysis passes, but not transformations) Now you run ‘opt’ with passes preceding the vectorizer to get IR before vectorization: opt -targetlibinfo -tti -no-aa -tbaa …… -scalar-evolution -loop-accesses unoptimized_ir.ll -S -o ir_before_loop_vectorize.ll (you might want to remove verifier passes from the list) And after this you are ready to run the vectorizer: opt -loop-vectorize ir_before_loop_vectorize.ll -S -o ir_after_loop_vectorize.ll Hopefully, that’ll resolve the issues you are facing. Thanks, Michael> On May 3, 2015, at 9:18 AM, yaduveer singh <yaduveer99 at gmail.com> wrote: > > Hi Michael, > > I tried running my sample C program using "LoopVectorizePass" but I was not able to get the output as I was expecting. Every time I got the message > "LV: Not vectorizing: Cannot prove legality." > > Following is the scenario. > > c-code > > #include <stdio.h> > int main() > { > int i; > int a=2; > int sum=0; > int arr[400]; > > for(i=0;i<400;i=i+1) > { > arr[i]=a+i; > sum+=arr[i]; > } > printf("Everything is Done for 1d\n"); > return 0; > } > > following are my command: > > yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -S -emit-llvm loop1d.c > yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -c -emit-llvm loop1d.c > yaduveer at yaduveer-Inspiron-3542:~/RP$ opt -loop-vectorize -force-vector-width=4 -mem2reg -loop-rotate -indvars -debug -stats loop1d.ll | llvm-dis -o loop1dv1.ll > > we found the following message on the terminal:(Attached is the details found on terminal and the 2 .ll files) > > LV: Checking a loop in "main" from loop1d.ll > LV: Loop hints: force=? width=4 unroll=0 > LV: Not vectorizing: Cannot prove legality. > > so can you please let me know if I am following correct steps. > If not, please guide me. > > > Thanks in advance. > > > Regards, > Yaduveer > > On Sun, May 3, 2015 at 12:32 AM, yaduveer singh <yaduveer99 at gmail.com <mailto:yaduveer99 at gmail.com>> wrote: > Hi Michael, > > Thank you very much! > I will try this. > > Regards, > Yaduveer > > > On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote: > Hi Yaduveer, > > As far as I remember, unroller in LoopVectorizer pass does what you want to achieve (look for a message "LV: Trying to at least unroll the loops.” to locate this in the code). > > Michael > > > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com <mailto:yaduveer99 at gmail.com>> wrote: > > > > Hi Zhoulai, > > > > I am trying to modify "LoopUnrollPass" in llvm which produces multiple > > copies of loop equal to the loop unroll factor.Currently, using multicore > > architecture, say 3 for example and the execution goes like: > > > > for 3 cores if there are 9 iterations of loop > > core instruction > > 1 0,3,6 > > 2 1,4,7 > > 3 2,5,8 > > > > But I want to to modify such that it can execute in following way: > > > > core instruction > > 1 0,1,2 > > 2 3,4,5 > > 3 6,7,8 > > > > I am not able to get where to modify for this. I tried creating a sample > > pass using original LoopUnrollPass code and run "make", I received > > following error: > > > > loopunrollp.cpp:210:1: error: ‘void > > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been > > declared inside ‘llvm’ > > /bin/rm: cannot remove > > `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': > > No such file or directory > > > > > > Please help > > > > Thanks, > > Yaduveer > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > > > <messageOnCommandLine.txt><loop1d.c><loop1d.ll><loop1dv1.ll>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/9ff4a44e/attachment.html>
Optimization passes running before LoopVectorizer should be able to combine the two statements (this should be happening in O1. Pls check) arr[i] = a + i sum += arr[i] to sum += a + i Not sure, why are you using the array there. - Suyog On 4 May 2015 23:11, "Michael Zolotukhin" <mzolotukhin at apple.com> wrote:> Hi Yaduveer, > > Vectorizer probably fails because it expects a loop in a certain form, and > to convert a loop to this form one need to run some other passes first. For > example, when you run "opt -O3", the following passes are invoked: > *-targetlibinfo -tti -no-aa -tbaa -scoped-noalias > -assumption-cache-tracker -basicaa -ipsccp -globalopt -deadargelim -domtree > -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -inline > -functionattrs -argpromotion -sroa -domtree -early-cse -lazy-value-info > -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine > -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify > -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution > -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll > -memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp > -domtree -bdce -instcombine -lazy-value-info -jump-threading > -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa > -licm -adce -simplifycfg -domtree -instcombine -barrier -float2int -domtree > -loops -loop-simplify -lcssa -loop-rotate -branch-prob -block-freq > -scalar-evolution -loop-accesses -loop-vectorize -instcombine > -scalar-evolution -slp-vectorizer -simplifycfg -domtree -instcombine -loops > -loop-simplify -lcssa -scalar-evolution -loop-unroll -instsimplify > -loop-simplify -lcssa -licm -scalar-evolution -alignment-from-assumptions > -strip-dead-prototypes -globaldce -constmerge -verify* > > To get this list, you can use the following command: > llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments > > Now, when you get a list of passes to run before the vectorizer, you need > to get 'unoptimized' IR and run the passes on it - that should give you IR > just before the vectorizer. > > To get the unoptimized IR, you could use > clang -O3 -mllvm -disable-llvm-optzns -emit-llvm your_source.c -S -o > unoptimized_ir.ll > (Please note that we use "-O3 -mllvm -disable-llvm-optzns", not just "-O0" > - that allows us to run analysis passes, but not transformations) > > Now you run 'opt' with passes preceding the vectorizer to get IR before > vectorization: > opt -targetlibinfo -tti -no-aa -tbaa ...... -scalar-evolution -loop-accesses > unoptimized_ir.ll -S -o ir_before_loop_vectorize.ll > (you might want to remove verifier passes from the list) > > And after this you are ready to run the vectorizer: > opt -loop-vectorize ir_before_loop_vectorize.ll -S -o > ir_after_loop_vectorize.ll > > Hopefully, that'll resolve the issues you are facing. > > Thanks, > Michael > > > On May 3, 2015, at 9:18 AM, yaduveer singh <yaduveer99 at gmail.com> wrote: > > Hi Michael, > > I tried running my sample C program using "LoopVectorizePass" but I was > not able to get the output as I was expecting. Every time I got the message > "LV: Not vectorizing: Cannot prove legality." > > Following is the scenario. > > c-code > > #include <stdio.h> > int main() > { > int i; > int a=2; > int sum=0; > int arr[400]; > > for(i=0;i<400;i=i+1) > { > arr[i]=a+i; > sum+=arr[i]; > } > printf("Everything is Done for 1d\n"); > return 0; > } > > following are my command: > > yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -S -emit-llvm loop1d.c > yaduveer at yaduveer-Inspiron-3542:~/RP$ clang -c -emit-llvm loop1d.c > yaduveer at yaduveer-Inspiron-3542:~/RP$ opt -loop-vectorize > -force-vector-width=4 -mem2reg -loop-rotate -indvars -debug -stats > loop1d.ll | llvm-dis -o loop1dv1.ll > > we found the following message on the terminal:(Attached is the details > found on terminal and the 2 .ll files) > > LV: Checking a loop in "main" from loop1d.ll > LV: Loop hints: force=? width=4 unroll=0 > LV: Not vectorizing: Cannot prove legality. > > so can you please let me know if I am following correct steps. > If not, please guide me. > > > Thanks in advance. > > > Regards, > Yaduveer > > On Sun, May 3, 2015 at 12:32 AM, yaduveer singh <yaduveer99 at gmail.com> > wrote: > >> Hi Michael, >> >> Thank you very much! >> I will try this. >> >> Regards, >> Yaduveer >> >> >> On Sun, May 3, 2015 at 12:11 AM, Michael Zolotukhin < >> mzolotukhin at apple.com> wrote: >> >>> Hi Yaduveer, >>> >>> As far as I remember, unroller in LoopVectorizer pass does what you want >>> to achieve (look for a message "LV: Trying to at least unroll the loops." >>> to locate this in the code). >>> >>> Michael >>> >>> > On May 2, 2015, at 9:00 AM, yaduveer singh <yaduveer99 at gmail.com> >>> wrote: >>> > >>> > Hi Zhoulai, >>> > >>> > I am trying to modify "LoopUnrollPass" in llvm which produces multiple >>> > copies of loop equal to the loop unroll factor.Currently, using >>> multicore >>> > architecture, say 3 for example and the execution goes like: >>> > >>> > for 3 cores if there are 9 iterations of loop >>> > core instruction >>> > 1 0,3,6 >>> > 2 1,4,7 >>> > 3 2,5,8 >>> > >>> > But I want to to modify such that it can execute in following way: >>> > >>> > core instruction >>> > 1 0,1,2 >>> > 2 3,4,5 >>> > 3 6,7,8 >>> > >>> > I am not able to get where to modify for this. I tried creating a >>> sample >>> > pass using original LoopUnrollPass code and run "make", I received >>> > following error: >>> > >>> > loopunrollp.cpp:210:1: error: 'void >>> > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)' should have been >>> > declared inside 'llvm' >>> > /bin/rm: cannot remove >>> > >>> `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': >>> > No such file or directory >>> > >>> > >>> > Please help >>> > >>> > Thanks, >>> > Yaduveer >>> > >>> > _______________________________________________ >>> > LLVM Developers mailing list >>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> > <messageOnCommandLine.txt><loop1d.c><loop1d.ll><loop1dv1.ll> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/42c83301/attachment.html>
Hi Yaduveer, I may be missing something, but it seems you're trying to get different cores running parts of the loop, which you'll get for free if you use OpenMP. The loop unroller is meant to increase load/store speed by loading a lot of values, then operating on all of them, then writing back altogether. Even if not vectorized (SIMD, not threads), it still has some performance gains. Vectorization is also only about SIMD engines in a single core (doing 2/4/8 operations at the same time), nothing to do with using multiple cores. Before you jump head first into the source, you need to ask yourself the right question: What do you want to do? 1) Use all cores, dividing the loop into multiple cores, one block at a time. Use OpenMP for this. 2) Use your SIMD engine on each core. Use the loop vectorizer for this. 3) Or is it just about load/store speed ups? The loop unroller will help you here. You can also use all three at the same time, having all cores running their SIMD engines with a massively unrolled loop by using all of the above. cheers, --renato On 2 May 2015 at 17:00, yaduveer singh <yaduveer99 at gmail.com> wrote:> Hi Zhoulai, > > I am trying to modify "LoopUnrollPass" in llvm which produces multiple > copies of loop equal to the loop unroll factor.Currently, using multicore > architecture, say 3 for example and the execution goes like: > > for 3 cores if there are 9 iterations of loop > core instruction > 1 0,3,6 > 2 1,4,7 > 3 2,5,8 > > But I want to to modify such that it can execute in following way: > > core instruction > 1 0,1,2 > 2 3,4,5 > 3 6,7,8 > > I am not able to get where to modify for this. I tried creating a sample > pass using original LoopUnrollPass code and run "make", I received > following error: > > loopunrollp.cpp:210:1: error: ‘void > llvm::initializeLoopUnrollpPass(llvm::PassRegistry&)’ should have been > declared inside ‘llvm’ > /bin/rm: cannot remove > `/home/yaduveer/RP/LLVM/llvm/lib/Transforms/loopunrollp/Debug+Asserts/loopunrollp.d.tmp': > No such file or directory > > > Please help > > Thanks, > Yaduveer > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev