hameeza ahmed via llvm-dev
2017-Jul-01 20:54 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:> I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm > -polly-vectorizer=stripmine -o stencil_poly.ll > > Please specify what is wrong with my code? > > > On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Hello, >> >> I am trying to vectorize following stencil code; >> >> #include <stdio.h> >> #define N 100351 >> >> // This function computes 2D-5 point Jacobi stencil >> void stencil(int a[restrict][N]) >> { >> int i, j, k; >> for (k = 0; k < 100; k++) >> { for (i = 1; i <= N-2; i++) >> { for (j = 1; j <= N-2; j++) >> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] >> + a[i][j+1]); >> } >> } >> }} >> >> I have used the following commands >> >> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -disable-llvm-optzns >> -o stencil.ll >> >> opt -S -O3 stencil.ll -o stencil_o3.ll >> >> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >> >> But the code is not vectorized. It still uses the scalar instructions; >> >> Please correct me. >> >> Thank You >> >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/6381642e/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Jul-01 22:11 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
further i modified the code to the following;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[restrict][N], int b[restrict][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
a[i][j] = b[i][j];
}
}
but still no vectorization in IR. Also, when I set vector width
explicitly to 64, it gives the following error:
remark: <unknown>:0:0: loop not vectorized: call instruction cannot be
vectorized
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
I need serious help on this. Please guide me.
On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Does it happen due to loop carried dependence? if yes what is the solution
> to vectorize such codes?
>
>
> please reply. i m waiting.
>
> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at
gmail.com> wrote:
>
>> I even tried polly but still my llvm IR does not contain vector
>> instructions. i used the following command;
>>
>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
>> -polly-vectorizer=stripmine -o stencil_poly.ll
>>
>> Please specify what is wrong with my code?
>>
>>
>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am trying to vectorize following stencil code;
>>>
>>> #include <stdio.h>
>>> #define N 100351
>>>
>>> // This function computes 2D-5 point Jacobi stencil
>>> void stencil(int a[restrict][N])
>>> {
>>> int i, j, k;
>>> for (k = 0; k < 100; k++)
>>> { for (i = 1; i <= N-2; i++)
>>> { for (j = 1; j <= N-2; j++)
>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
a[i][j-1]
>>> + a[i][j+1]);
>>> }
>>> }
>>> }}
>>>
>>> I have used the following commands
>>>
>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm
>>> -disable-llvm-optzns -o stencil.ll
>>>
>>> opt -S -O3 stencil.ll -o stencil_o3.ll
>>>
>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
>>>
>>> But the code is not vectorized. It still uses the scalar
instructions;
>>>
>>> Please correct me.
>>>
>>> Thank You
>>>
>>>
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/d0c25c02/attachment.html>
hameeza ahmed via llvm-dev
2017-Jul-01 22:33 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
a[i][j] = b[i][j];
}
}
I removed restrict over here.
On Sun, Jul 2, 2017 at 3:11 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> further i modified the code to the following;
>
> #include <stdio.h>
> #define N 100351
>
> // This function computes 2D-5 point Jacobi stencil
> void stencil(int a[restrict][N], int b[restrict][N])
> {
> int i, j, k;
> for (k = 0; k < N; k++) {
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
> a[i][j+1]);
>
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> a[i][j] = b[i][j];
>
> }
> }
>
> but still no vectorization in IR. Also, when I set vector width
> explicitly to 64, it gives the following error:
>
> remark: <unknown>:0:0: loop not vectorized: call instruction cannot
be
> vectorized
> remark: <unknown>:0:0: loop not vectorized: value that could not be
> identified as reduction is used outside the loop
>
> I need serious help on this. Please guide me.
>
> On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
>
>> Does it happen due to loop carried dependence? if yes what is the
>> solution to vectorize such codes?
>>
>>
>> please reply. i m waiting.
>>
>> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at
gmail.com> wrote:
>>
>>> I even tried polly but still my llvm IR does not contain vector
>>> instructions. i used the following command;
>>>
>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
>>> -polly-vectorizer=stripmine -o stencil_poly.ll
>>>
>>> Please specify what is wrong with my code?
>>>
>>>
>>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at
gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to vectorize following stencil code;
>>>>
>>>> #include <stdio.h>
>>>> #define N 100351
>>>>
>>>> // This function computes 2D-5 point Jacobi stencil
>>>> void stencil(int a[restrict][N])
>>>> {
>>>> int i, j, k;
>>>> for (k = 0; k < 100; k++)
>>>> { for (i = 1; i <= N-2; i++)
>>>> { for (j = 1; j <= N-2; j++)
>>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>>>> a[i][j-1] + a[i][j+1]);
>>>> }
>>>> }
>>>> }}
>>>>
>>>> I have used the following commands
>>>>
>>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm
>>>> -disable-llvm-optzns -o stencil.ll
>>>>
>>>> opt -S -O3 stencil.ll -o stencil_o3.ll
>>>>
>>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
>>>>
>>>> But the code is not vectorized. It still uses the scalar
instructions;
>>>>
>>>> Please correct me.
>>>>
>>>> Thank You
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/e04385fa/attachment.html>
Serge Preis via llvm-dev
2017-Jul-03 05:19 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Hello,
This is not equivalent rewrite: your original code definitely shouldn't
vectorize because it has backward cross-iteration (loop-carried) dependency:
your value on iteration j+1 depend on value from iteration j you've just
written. In case of vectorization you need to do load-operation-store on
multiple consecutive values at once and it is impossible in this case.
In your new code all values of a[] in right-hand side of the main loop are from
previous k iteration (because on current k iteration you're writing to
'b'). So there is no way to vectorize loop in its original form, but new
form is definitely vectorizable.
I am second to recommend you filing a bug over 'restrict' behavior. And
you may in fact save some memory by making 'b' 1D array (this is not
equivalent rewrite once again though)
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j++)
b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for (j = 1; j <= N-2; j++)
a[i][j] = b[j];
}
}
}
There is a way to vectorize your code even more efficient. This will look like
more low-level, but probably closest to what you would expect from
vectorization.
void stencil(int a[][N])
{
int i, j, k;
for (k = 0; k < 100; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j+=16) {
int b[16]; // This should become a register on KNL
for (v = 0; v < 16; v++) {
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
a[i][j-1+v] + a[i][j+1+v]);
}
for (v = 0; v < 16; v++) { // This will be a single store
operation
a[i][j+v] = b[v];
}
}
// You should explicitly take care about the tail of j-loop
#if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual name,
just a designator
for (;j <= N-2; j++) {
a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
}
#else
for (v = 0; v < 16-(N-2-j); v++) { // This would become masked
non-loop
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v]
+ a[i][j+1+v]);
}
for (v = 0; v < 16-(N-2-j); v++) { // This will be a single
masked store operation
a[i][j+v] = b[v];
}
#endif
}
}
Unfortunalely compiler cannot do this for you: this is not equivalent
transformation of original code. I am also not aware of any way to express this
desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in this
case).
Minor note: You're using 'int' for data, than multiply by 0.25
(divide by 4) and than write it back to 'int'. This will cost you 2
conversion to/from double while you may just place (...) / 4 which should be
optimized to simple sequecnce with shifts (not to single shift due to
signedness, but still better than conversions with changes of element size
4->8->4 and data size INT->FP->INT).
And by the way why do you divide by 4, not by 5 as number of points suggest?
Serge Preis
02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at
lists.llvm.org>:> further i modified the code to the following;
>
> #include <stdio.h>
> #define N 100351
>
> // This function computes 2D-5 point Jacobi stencil
> void stencil(int a[restrict][N], int b[restrict][N])
> {
> int i, j, k;
> for (k = 0; k < N; k++) {
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
>
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> a[i][j] = b[i][j];
>
> }
> }
>
> but still no vectorization in IR. Also, when I set vector width
explicitly to 64, it gives the following error:
>
> remark: <unknown>:0:0: loop not vectorized: call instruction cannot
be vectorized
> remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
>
> I need serious help on this. Please guide me.
>
> On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com> wrote:
>> Does it happen due to loop carried dependence? if yes what is the
solution to vectorize such codes?
>>
>> please reply. i m waiting.
>>
>> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at
gmail.com> wrote:
>>> I even tried polly but still my llvm IR does not contain vector
instructions. i used the following command;
>>>
>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
-polly-vectorizer=stripmine -o stencil_poly.ll
>>>
>>> Please specify what is wrong with my code?
>>>
>>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at
gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am trying to vectorize following stencil code;
>>>>
>>>> #include <stdio.h>
>>>> #define N 100351
>>>>
>>>> // This function computes 2D-5 point Jacobi stencil
>>>> void stencil(int a[restrict][N])
>>>> {
>>>> int i, j, k;
>>>> for (k = 0; k < 100; k++)
>>>> { for (i = 1; i <= N-2; i++)
>>>> { for (j = 1; j <= N-2; j++)
>>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
a[i][j-1] + a[i][j+1]);
>>>> }
>>>> }
>>>> }}
>>>>
>>>> I have used the following commands
>>>>
>>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm
-disable-llvm-optzns -o stencil.ll
>>>>
>>>> opt -S -O3 stencil.ll -o stencil_o3.ll
>>>>
>>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
>>>>
>>>> But the code is not vectorized. It still uses the scalar
instructions;
>>>>
>>>> Please correct me.
>>>>
>>>> Thank You
> ,
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev