thr3ads.net - llvm dev - [llvm-dev] Invoke loop vectorizer [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Xiaochu Liu via llvm-dev

2016-Aug-12 19:20 UTC

[llvm-dev] Invoke loop vectorizer

I'm not compiling it to x86. Should loop optimizer something independent of
the target? If so, should the vectorized code on IR level?

On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
> cat > test.c
>
> #define SIZE 128
>
> void bar(int *restrict A, int* restrict B,int K) {
>
>   #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
>
>   for (int i = 0; i < SIZE; ++i)
>
>     A[i] += B[i] + K;
>
> }
>
> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
> -save-temps
> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i
"^\s*p"
> test.s|less
>         pushq   %rbp
>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>         pslldq  $8, %xmm1               ## xmm1 >
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>         paddq   %xmm1, %xmm3
>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>         punpckldq       %xmm5, %xmm4    ## xmm4 >
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>
>
>
> Note:
> It also vectorizes at SIZE=8.
>
> Not sure what the exact translation of options from clang-cl to clang is.
> Maybe try adding /O3?
>
>
>
>
> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at
gmail.com>
> wrote:
>
>> Hi Daniel,
>>
>> I increased the size of your test to be 128 but -stats still shows no
>> loop optimized...
>>
>> Xiaochu
>>
>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>>
>>> It's not possible to know that A and B don't alias in this
example.
>>> It's almost certainly not profitable to add a runtime check
given the size
>>> of the loop.
>>>
>>>
>>> try
>>>
>>> #define SIZE 8
>>>
>>> void bar(int *restrict A, int* restrict B,int K) {
>>>
>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>>
>>>   for (int i = 0; i < SIZE; ++i)
>>>
>>>     A[i] += B[i] + K;
>>>
>>> }
>>>
>>> (i don't remember if llvm also does runtime alias checks, but
if it
>>> does, you'd probably need to increase size to get it to
vectorize)
>>>
>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi Andrey,
>>>>
>>>> Thanks. I found even when loop vectorizer and SLP vectorizer
are
>>>> enabled, my simple test still not get optimized. I also tried
clang pragma
>>>> in my test to force vectorization. What do you think is the
problem?
>>>>
>>>> Test:
>>>>
>>>> #define SIZE 8
>>>>
>>>> void bar(int *A, int* B,int K) {
>>>>
>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>> unroll_count(8)
>>>>
>>>>   for (int i = 0; i < SIZE; ++i)
>>>>
>>>>     A[i] += B[i] + K;
>>>>
>>>> }
>>>>
>>>> Thanks,
>>>> Xiaochu
>>>>
>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Xiaochu,
>>>>>
>>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>>> supplying -O1 or higher.
>>>>>
>>>>> Yours,
>>>>> Andrey
>>>>>
>>>>>
>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi there ,
>>>>>>
>>>>>> I use clang-cl /Qvec test.c to compile the code. But
the pass
>>>>>> LoopVectorizer is never invoked.
>>>>>>
>>>>>> I was wondering if this is sufficient to enable auto
vectorizer?
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaochu
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/43f3ff64/attachment.html>

Michael Kuperstein via llvm-dev

2016-Aug-12 19:46 UTC

head link

[llvm-dev] Invoke loop vectorizer

The loop vectorizer is not independent of the target, since it queries the
target for cost estimates to make the vectorization profitability decision.

Your code has a pragma explicitly requesting vectorization, so
profitability should not come into play, but there may be other
target-related issues. One example I can think of is that we will never
vectorize if the target has no vector registers.

On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I'm not compiling it to x86. Should loop optimizer something
independent
> of the target? If so, should the vectorized code on IR level?
>
> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>
>> cat > test.c
>>
>> #define SIZE 128
>>
>> void bar(int *restrict A, int* restrict B,int K) {
>>
>>   #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>
>>   for (int i = 0; i < SIZE; ++i)
>>
>>     A[i] += B[i] + K;
>>
>> }
>>
>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
>> -save-temps
>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i
"^\s*p"
>> test.s|less
>>         pushq   %rbp
>>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>>         pslldq  $8, %xmm1               ## xmm1 >>
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>>         paddq   %xmm1, %xmm3
>>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>>         punpckldq       %xmm5, %xmm4    ## xmm4 >>
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>>
>>
>>
>> Note:
>> It also vectorizes at SIZE=8.
>>
>> Not sure what the exact translation of options from clang-cl to clang
is.
>> Maybe try adding /O3?
>>
>>
>>
>>
>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at
gmail.com>
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> I increased the size of your test to be 128 but -stats still shows
no
>>> loop optimized...
>>>
>>> Xiaochu
>>>
>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>>>
>>>> It's not possible to know that A and B don't alias in
this example.
>>>> It's almost certainly not profitable to add a runtime check
given the size
>>>> of the loop.
>>>>
>>>>
>>>> try
>>>>
>>>> #define SIZE 8
>>>>
>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>
>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>> unroll_count(8)
>>>>
>>>>   for (int i = 0; i < SIZE; ++i)
>>>>
>>>>     A[i] += B[i] + K;
>>>>
>>>> }
>>>>
>>>> (i don't remember if llvm also does runtime alias checks,
but if it
>>>> does, you'd probably need to increase size to get it to
vectorize)
>>>>
>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> Thanks. I found even when loop vectorizer and SLP
vectorizer are
>>>>> enabled, my simple test still not get optimized. I also
tried clang pragma
>>>>> in my test to force vectorization. What do you think is the
problem?
>>>>>
>>>>> Test:
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *A, int* B,int K) {
>>>>>
>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>>     A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Xiaochu
>>>>>
>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Xiaochu,
>>>>>>
>>>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>>>> supplying -O1 or higher.
>>>>>>
>>>>>> Yours,
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hi there ,
>>>>>>>
>>>>>>> I use clang-cl /Qvec test.c to compile the code.
But the pass
>>>>>>> LoopVectorizer is never invoked.
>>>>>>>
>>>>>>> I was wondering if this is sufficient to enable
auto vectorizer?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xiaochu
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/78a56220/attachment.html>

Daniel Berlin via llvm-dev

2016-Aug-12 20:21 UTC

head link

[llvm-dev] Invoke loop vectorizer

Errr, so you are using clang-cl but not on x86 or x86-64?
That's probably "not well tested"

On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu <xiaochu1122 at gmail.com>
wrote:
> I'm not compiling it to x86. Should loop optimizer something
independent
> of the target? If so, should the vectorized code on IR level?
>
> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>
>> cat > test.c
>>
>> #define SIZE 128
>>
>> void bar(int *restrict A, int* restrict B,int K) {
>>
>>   #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>
>>   for (int i = 0; i < SIZE; ++i)
>>
>>     A[i] += B[i] + K;
>>
>> }
>>
>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
>> -save-temps
>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i
"^\s*p"
>> test.s|less
>>         pushq   %rbp
>>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>>         pslldq  $8, %xmm1               ## xmm1 >>
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>>         paddq   %xmm1, %xmm3
>>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>>         punpckldq       %xmm5, %xmm4    ## xmm4 >>
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>>
>>
>>
>> Note:
>> It also vectorizes at SIZE=8.
>>
>> Not sure what the exact translation of options from clang-cl to clang
is.
>> Maybe try adding /O3?
>>
>>
>>
>>
>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at
gmail.com>
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> I increased the size of your test to be 128 but -stats still shows
no
>>> loop optimized...
>>>
>>> Xiaochu
>>>
>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>>>
>>>> It's not possible to know that A and B don't alias in
this example.
>>>> It's almost certainly not profitable to add a runtime check
given the size
>>>> of the loop.
>>>>
>>>>
>>>> try
>>>>
>>>> #define SIZE 8
>>>>
>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>
>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>> unroll_count(8)
>>>>
>>>>   for (int i = 0; i < SIZE; ++i)
>>>>
>>>>     A[i] += B[i] + K;
>>>>
>>>> }
>>>>
>>>> (i don't remember if llvm also does runtime alias checks,
but if it
>>>> does, you'd probably need to increase size to get it to
vectorize)
>>>>
>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> Thanks. I found even when loop vectorizer and SLP
vectorizer are
>>>>> enabled, my simple test still not get optimized. I also
tried clang pragma
>>>>> in my test to force vectorization. What do you think is the
problem?
>>>>>
>>>>> Test:
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *A, int* B,int K) {
>>>>>
>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>>     A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Xiaochu
>>>>>
>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Xiaochu,
>>>>>>
>>>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>>>> supplying -O1 or higher.
>>>>>>
>>>>>> Yours,
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hi there ,
>>>>>>>
>>>>>>> I use clang-cl /Qvec test.c to compile the code.
But the pass
>>>>>>> LoopVectorizer is never invoked.
>>>>>>>
>>>>>>> I was wondering if this is sufficient to enable
auto vectorizer?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xiaochu
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/62b4e881/attachment.html>

Daniel Berlin via llvm-dev

2016-Aug-12 20:21 UTC

head link

[llvm-dev] Invoke loop vectorizer

Right, and if you are not running it on the target, it's also not going to
detect the target features right, i believe?


On Fri, Aug 12, 2016 at 12:46 PM, Michael Kuperstein <mkuper at
google.com>
wrote:
> The loop vectorizer is not independent of the target, since it queries the
> target for cost estimates to make the vectorization profitability decision.
>
> Your code has a pragma explicitly requesting vectorization, so
> profitability should not come into play, but there may be other
> target-related issues. One example I can think of is that we will never
> vectorize if the target has no vector registers.
>
> On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I'm not compiling it to x86. Should loop optimizer something
independent
>> of the target? If so, should the vectorized code on IR level?
>>
>> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>>
>>> cat > test.c
>>>
>>> #define SIZE 128
>>>
>>> void bar(int *restrict A, int* restrict B,int K) {
>>>
>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>>
>>>   for (int i = 0; i < SIZE; ++i)
>>>
>>>     A[i] += B[i] + K;
>>>
>>> }
>>>
>>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
>>> -save-temps
>>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i
"^\s*p"
>>> test.s|less
>>>         pushq   %rbp
>>>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>>>         pslldq  $8, %xmm1               ## xmm1 >>>
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>>>         paddq   %xmm1, %xmm3
>>>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>>>         punpckldq       %xmm5, %xmm4    ## xmm4 >>>
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>>>
>>>
>>>
>>> Note:
>>> It also vectorizes at SIZE=8.
>>>
>>> Not sure what the exact translation of options from clang-cl to
clang is.
>>> Maybe try adding /O3?
>>>
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at
gmail.com>
>>> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I increased the size of your test to be 128 but -stats still
shows no
>>>> loop optimized...
>>>>
>>>> Xiaochu
>>>>
>>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin
at dberlin.org> wrote:
>>>>
>>>>> It's not possible to know that A and B don't alias
in this example.
>>>>> It's almost certainly not profitable to add a runtime
check given the size
>>>>> of the loop.
>>>>>
>>>>>
>>>>> try
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>>
>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>>     A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> (i don't remember if llvm also does runtime alias
checks, but if it
>>>>> does, you'd probably need to increase size to get it to
vectorize)
>>>>>
>>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Thanks. I found even when loop vectorizer and SLP
vectorizer are
>>>>>> enabled, my simple test still not get optimized. I also
tried clang pragma
>>>>>> in my test to force vectorization. What do you think is
the problem?
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>> #define SIZE 8
>>>>>>
>>>>>> void bar(int *A, int* B,int K) {
>>>>>>
>>>>>>   #pragma clang loop vectorize(enable)
vectorize_width(2)
>>>>>> unroll_count(8)
>>>>>>
>>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>>
>>>>>>     A[i] += B[i] + K;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaochu
>>>>>>
>>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Xiaochu,
>>>>>>>
>>>>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>>>>> supplying -O1 or higher.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi there ,
>>>>>>>>
>>>>>>>> I use clang-cl /Qvec test.c to compile the
code. But the pass
>>>>>>>> LoopVectorizer is never invoked.
>>>>>>>>
>>>>>>>> I was wondering if this is sufficient to enable
auto vectorizer?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Xiaochu
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/f3bcf55a/attachment.html>

Xiaochu Liu via llvm-dev

2016-Aug-12 20:26 UTC

head link

[llvm-dev] Invoke loop vectorizer

Thanks, guys!

I found that my target is missing getNumberOfRegistets function. Loop
vectorizer is invoked but no loop was examined...

My back end is still under construction... Sorry about that.

Thanks,
Xiaochu

On Aug 12, 2016 1:21 PM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:

Right, and if you are not running it on the target, it's also not going to
detect the target features right, i believe?


On Fri, Aug 12, 2016 at 12:46 PM, Michael Kuperstein <mkuper at
google.com>
wrote:
> The loop vectorizer is not independent of the target, since it queries the
> target for cost estimates to make the vectorization profitability decision.
>
> Your code has a pragma explicitly requesting vectorization, so
> profitability should not come into play, but there may be other
> target-related issues. One example I can think of is that we will never
> vectorize if the target has no vector registers.
>
> On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I'm not compiling it to x86. Should loop optimizer something
independent
>> of the target? If so, should the vectorized code on IR level?
>>
>> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>>
>>> cat > test.c
>>>
>>> #define SIZE 128
>>>
>>> void bar(int *restrict A, int* restrict B,int K) {
>>>
>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
unroll_count(8)
>>>
>>>   for (int i = 0; i < SIZE; ++i)
>>>
>>>     A[i] += B[i] + K;
>>>
>>> }
>>>
>>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
>>> -save-temps
>>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i
"^\s*p"
>>> test.s|less
>>>         pushq   %rbp
>>>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>>>         pslldq  $8, %xmm1               ## xmm1 >>>
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>>>         paddq   %xmm1, %xmm3
>>>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>>>         punpckldq       %xmm5, %xmm4    ## xmm4 >>>
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>>>
>>>
>>>
>>> Note:
>>> It also vectorizes at SIZE=8.
>>>
>>> Not sure what the exact translation of options from clang-cl to
clang is.
>>> Maybe try adding /O3?
>>>
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at
gmail.com>
>>> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I increased the size of your test to be 128 but -stats still
shows no
>>>> loop optimized...
>>>>
>>>> Xiaochu
>>>>
>>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin
at dberlin.org> wrote:
>>>>
>>>>> It's not possible to know that A and B don't alias
in this example.
>>>>> It's almost certainly not profitable to add a runtime
check given the size
>>>>> of the loop.
>>>>>
>>>>>
>>>>> try
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>>
>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>>     A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> (i don't remember if llvm also does runtime alias
checks, but if it
>>>>> does, you'd probably need to increase size to get it to
vectorize)
>>>>>
>>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Thanks. I found even when loop vectorizer and SLP
vectorizer are
>>>>>> enabled, my simple test still not get optimized. I also
tried clang pragma
>>>>>> in my test to force vectorization. What do you think is
the problem?
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>> #define SIZE 8
>>>>>>
>>>>>> void bar(int *A, int* B,int K) {
>>>>>>
>>>>>>   #pragma clang loop vectorize(enable)
vectorize_width(2)
>>>>>> unroll_count(8)
>>>>>>
>>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>>
>>>>>>     A[i] += B[i] + K;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaochu
>>>>>>
>>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko"
<andreybokhanko at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Xiaochu,
>>>>>>>
>>>>>>> Clang uses -O0 by default, that doesn't run any
optimizations. Try
>>>>>>> supplying -O1 or higher.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi there ,
>>>>>>>>
>>>>>>>> I use clang-cl /Qvec test.c to compile the
code. But the pass
>>>>>>>> LoopVectorizer is never invoked.
>>>>>>>>
>>>>>>>> I was wondering if this is sufficient to enable
auto vectorizer?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Xiaochu
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/c14d4e91/attachment.html>

llvm dev - Aug 2016 - Invoke loop vectorizer

[llvm-dev] Invoke loop vectorizer

[llvm-dev] Invoke loop vectorizer

[llvm-dev] Invoke loop vectorizer

[llvm-dev] Invoke loop vectorizer

[llvm-dev] Invoke loop vectorizer