Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:> It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > > try > > #define SIZE 8 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) > > for (int i = 0; i < SIZE; ++i) > > A[i] += B[i] + K; > > } > > (i don't remember if llvm also does runtime alias checks, but if it does, > you'd probably need to increase size to get it to vectorize) > > On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Andrey, >> >> Thanks. I found even when loop vectorizer and SLP vectorizer are enabled, >> my simple test still not get optimized. I also tried clang pragma in my >> test to force vectorization. What do you think is the problem? >> >> Test: >> >> #define SIZE 8 >> >> void bar(int *A, int* B,int K) { >> >> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >> >> for (int i = 0; i < SIZE; ++i) >> >> A[i] += B[i] + K; >> >> } >> >> Thanks, >> Xiaochu >> >> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com> >> wrote: >> >>> Hi Xiaochu, >>> >>> Clang uses -O0 by default, that doesn't run any optimizations. Try >>> supplying -O1 or higher. >>> >>> Yours, >>> Andrey >>> >>> >>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi there , >>>> >>>> I use clang-cl /Qvec test.c to compile the code. But the pass >>>> LoopVectorizer is never invoked. >>>> >>>> I was wondering if this is sufficient to enable auto vectorizer? >>>> >>>> Thanks, >>>> Xiaochu >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/d555cb15/attachment.html>
cat > test.c
#define SIZE 128
void bar(int *restrict A, int* restrict B,int K) {
#pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
for (int i = 0; i < SIZE; ++i)
A[i] += B[i] + K;
}
[dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c -save-temps
[dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p"
test.s|less
pushq %rbp
pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1]
pslldq $8, %xmm1 ## xmm1
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1]
paddq %xmm1, %xmm3
pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1]
punpckldq %xmm5, %xmm4 ## xmm4 xmm4[0],xmm5[0],xmm4[1],xmm5[1]
pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3]
Note:
It also vectorizes at SIZE=8.
Not sure what the exact translation of options from clang-cl to clang is.
Maybe try adding /O3?
On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com>
wrote:
> Hi Daniel,
>
> I increased the size of your test to be 128 but -stats still shows no loop
> optimized...
>
> Xiaochu
>
> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>
>> It's not possible to know that A and B don't alias in this
example. It's
>> almost certainly not profitable to add a runtime check given the size
of
>> the loop.
>>
>>
>> try
>>
>> #define SIZE 8
>>
>> void bar(int *restrict A, int* restrict B,int K) {
>>
>> #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>
>> for (int i = 0; i < SIZE; ++i)
>>
>> A[i] += B[i] + K;
>>
>> }
>>
>> (i don't remember if llvm also does runtime alias checks, but if it
does,
>> you'd probably need to increase size to get it to vectorize)
>>
>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi Andrey,
>>>
>>> Thanks. I found even when loop vectorizer and SLP vectorizer are
>>> enabled, my simple test still not get optimized. I also tried clang
pragma
>>> in my test to force vectorization. What do you think is the
problem?
>>>
>>> Test:
>>>
>>> #define SIZE 8
>>>
>>> void bar(int *A, int* B,int K) {
>>>
>>> #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>>
>>> for (int i = 0; i < SIZE; ++i)
>>>
>>> A[i] += B[i] + K;
>>>
>>> }
>>>
>>> Thanks,
>>> Xiaochu
>>>
>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>> wrote:
>>>
>>>> Hi Xiaochu,
>>>>
>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>> supplying -O1 or higher.
>>>>
>>>> Yours,
>>>> Andrey
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi there ,
>>>>>
>>>>> I use clang-cl /Qvec test.c to compile the code. But the
pass
>>>>> LoopVectorizer is never invoked.
>>>>>
>>>>> I was wondering if this is sufficient to enable auto
vectorizer?
>>>>>
>>>>> Thanks,
>>>>> Xiaochu
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/140b2b2e/attachment.html>
I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:> cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) > > for (int i = 0; i < SIZE; ++i) > > A[i] += B[i] + K; > > } > > [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c > -save-temps > [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p" > test.s|less > pushq %rbp > pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1] > pslldq $8, %xmm1 ## xmm1 > zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7] > pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1] > paddq %xmm1, %xmm3 > pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1] > punpckldq %xmm5, %xmm4 ## xmm4 > xmm4[0],xmm5[0],xmm4[1],xmm5[1] > pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3] > > > > Note: > It also vectorizes at SIZE=8. > > Not sure what the exact translation of options from clang-cl to clang is. > Maybe try adding /O3? > > > > > On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com> > wrote: > >> Hi Daniel, >> >> I increased the size of your test to be 128 but -stats still shows no >> loop optimized... >> >> Xiaochu >> >> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: >> >>> It's not possible to know that A and B don't alias in this example. >>> It's almost certainly not profitable to add a runtime check given the size >>> of the loop. >>> >>> >>> try >>> >>> #define SIZE 8 >>> >>> void bar(int *restrict A, int* restrict B,int K) { >>> >>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) >>> >>> for (int i = 0; i < SIZE; ++i) >>> >>> A[i] += B[i] + K; >>> >>> } >>> >>> (i don't remember if llvm also does runtime alias checks, but if it >>> does, you'd probably need to increase size to get it to vectorize) >>> >>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi Andrey, >>>> >>>> Thanks. I found even when loop vectorizer and SLP vectorizer are >>>> enabled, my simple test still not get optimized. I also tried clang pragma >>>> in my test to force vectorization. What do you think is the problem? >>>> >>>> Test: >>>> >>>> #define SIZE 8 >>>> >>>> void bar(int *A, int* B,int K) { >>>> >>>> #pragma clang loop vectorize(enable) vectorize_width(2) >>>> unroll_count(8) >>>> >>>> for (int i = 0; i < SIZE; ++i) >>>> >>>> A[i] += B[i] + K; >>>> >>>> } >>>> >>>> Thanks, >>>> Xiaochu >>>> >>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com> >>>> wrote: >>>> >>>>> Hi Xiaochu, >>>>> >>>>> Clang uses -O0 by default, that doesn't run any optimizations. Try >>>>> supplying -O1 or higher. >>>>> >>>>> Yours, >>>>> Andrey >>>>> >>>>> >>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> Hi there , >>>>>> >>>>>> I use clang-cl /Qvec test.c to compile the code. But the pass >>>>>> LoopVectorizer is never invoked. >>>>>> >>>>>> I was wondering if this is sufficient to enable auto vectorizer? >>>>>> >>>>>> Thanks, >>>>>> Xiaochu >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/43f3ff64/attachment.html>