Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:> It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > > try > > #define SIZE 8 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) > > for (int i = 0; i < SIZE; ++i) > > A[i] += B[i] + K; > > } > > (i don't remember if llvm also does runtime alias checks, but if it does, > you'd probably need to increase size to get it to vectorize) > > On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Andrey, >> >> Thanks. I found even when loop vectorizer and SLP vectorizer are enabled, >> my simple test still not get optimized. I also tried clang pragma in my >> test to force vectorization. What do you think is the problem? >> >> Test: >> >> #define SIZE 8 >> >> void bar(int *A, int* B,int K) { >> >> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >> >> for (int i = 0; i < SIZE; ++i) >> >> A[i] += B[i] + K; >> >> } >> >> Thanks, >> Xiaochu >> >> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com> >> wrote: >> >>> Hi Xiaochu, >>> >>> Clang uses -O0 by default, that doesn't run any optimizations. Try >>> supplying -O1 or higher. >>> >>> Yours, >>> Andrey >>> >>> >>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi there , >>>> >>>> I use clang-cl /Qvec test.c to compile the code. But the pass >>>> LoopVectorizer is never invoked. >>>> >>>> I was wondering if this is sufficient to enable auto vectorizer? >>>> >>>> Thanks, >>>> Xiaochu >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/d555cb15/attachment.html>
cat > test.c #define SIZE 128 void bar(int *restrict A, int* restrict B,int K) { #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) for (int i = 0; i < SIZE; ++i) A[i] += B[i] + K; } [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c -save-temps [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p" test.s|less pushq %rbp pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1] pslldq $8, %xmm1 ## xmm1 zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7] pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1] paddq %xmm1, %xmm3 pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1] punpckldq %xmm5, %xmm4 ## xmm4 xmm4[0],xmm5[0],xmm4[1],xmm5[1] pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3] Note: It also vectorizes at SIZE=8. Not sure what the exact translation of options from clang-cl to clang is. Maybe try adding /O3? On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com> wrote:> Hi Daniel, > > I increased the size of your test to be 128 but -stats still shows no loop > optimized... > > Xiaochu > > On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > >> It's not possible to know that A and B don't alias in this example. It's >> almost certainly not profitable to add a runtime check given the size of >> the loop. >> >> >> try >> >> #define SIZE 8 >> >> void bar(int *restrict A, int* restrict B,int K) { >> >> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >> >> for (int i = 0; i < SIZE; ++i) >> >> A[i] += B[i] + K; >> >> } >> >> (i don't remember if llvm also does runtime alias checks, but if it does, >> you'd probably need to increase size to get it to vectorize) >> >> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hi Andrey, >>> >>> Thanks. I found even when loop vectorizer and SLP vectorizer are >>> enabled, my simple test still not get optimized. I also tried clang pragma >>> in my test to force vectorization. What do you think is the problem? >>> >>> Test: >>> >>> #define SIZE 8 >>> >>> void bar(int *A, int* B,int K) { >>> >>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >>> >>> for (int i = 0; i < SIZE; ++i) >>> >>> A[i] += B[i] + K; >>> >>> } >>> >>> Thanks, >>> Xiaochu >>> >>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com> >>> wrote: >>> >>>> Hi Xiaochu, >>>> >>>> Clang uses -O0 by default, that doesn't run any optimizations. Try >>>> supplying -O1 or higher. >>>> >>>> Yours, >>>> Andrey >>>> >>>> >>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> Hi there , >>>>> >>>>> I use clang-cl /Qvec test.c to compile the code. But the pass >>>>> LoopVectorizer is never invoked. >>>>> >>>>> I was wondering if this is sufficient to enable auto vectorizer? >>>>> >>>>> Thanks, >>>>> Xiaochu >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/140b2b2e/attachment.html>
I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:> cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) > > for (int i = 0; i < SIZE; ++i) > > A[i] += B[i] + K; > > } > > [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c > -save-temps > [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p" > test.s|less > pushq %rbp > pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1] > pslldq $8, %xmm1 ## xmm1 > zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7] > pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1] > paddq %xmm1, %xmm3 > pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1] > punpckldq %xmm5, %xmm4 ## xmm4 > xmm4[0],xmm5[0],xmm4[1],xmm5[1] > pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3] > > > > Note: > It also vectorizes at SIZE=8. > > Not sure what the exact translation of options from clang-cl to clang is. > Maybe try adding /O3? > > > > > On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com> > wrote: > >> Hi Daniel, >> >> I increased the size of your test to be 128 but -stats still shows no >> loop optimized... >> >> Xiaochu >> >> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: >> >>> It's not possible to know that A and B don't alias in this example. >>> It's almost certainly not profitable to add a runtime check given the size >>> of the loop. >>> >>> >>> try >>> >>> #define SIZE 8 >>> >>> void bar(int *restrict A, int* restrict B,int K) { >>> >>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >>> >>> for (int i = 0; i < SIZE; ++i) >>> >>> A[i] += B[i] + K; >>> >>> } >>> >>> (i don't remember if llvm also does runtime alias checks, but if it >>> does, you'd probably need to increase size to get it to vectorize) >>> >>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi Andrey, >>>> >>>> Thanks. I found even when loop vectorizer and SLP vectorizer are >>>> enabled, my simple test still not get optimized. I also tried clang pragma >>>> in my test to force vectorization. What do you think is the problem? >>>> >>>> Test: >>>> >>>> #define SIZE 8 >>>> >>>> void bar(int *A, int* B,int K) { >>>> >>>> #pragma clang loop vectorize(enable) vectorize_width(2) >>>> unroll_count(8) >>>> >>>> for (int i = 0; i < SIZE; ++i) >>>> >>>> A[i] += B[i] + K; >>>> >>>> } >>>> >>>> Thanks, >>>> Xiaochu >>>> >>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com> >>>> wrote: >>>> >>>>> Hi Xiaochu, >>>>> >>>>> Clang uses -O0 by default, that doesn't run any optimizations. Try >>>>> supplying -O1 or higher. >>>>> >>>>> Yours, >>>>> Andrey >>>>> >>>>> >>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> Hi there , >>>>>> >>>>>> I use clang-cl /Qvec test.c to compile the code. But the pass >>>>>> LoopVectorizer is never invoked. >>>>>> >>>>>> I was wondering if this is sufficient to enable auto vectorizer? >>>>>> >>>>>> Thanks, >>>>>> Xiaochu >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/43f3ff64/attachment.html>