hameeza ahmed via llvm-dev
2017-Jul-01 20:54 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:> I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm > -polly-vectorizer=stripmine -o stencil_poly.ll > > Please specify what is wrong with my code? > > > On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Hello, >> >> I am trying to vectorize following stencil code; >> >> #include <stdio.h> >> #define N 100351 >> >> // This function computes 2D-5 point Jacobi stencil >> void stencil(int a[restrict][N]) >> { >> int i, j, k; >> for (k = 0; k < 100; k++) >> { for (i = 1; i <= N-2; i++) >> { for (j = 1; j <= N-2; j++) >> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] >> + a[i][j+1]); >> } >> } >> }} >> >> I have used the following commands >> >> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -disable-llvm-optzns >> -o stencil.ll >> >> opt -S -O3 stencil.ll -o stencil_o3.ll >> >> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >> >> But the code is not vectorized. It still uses the scalar instructions; >> >> Please correct me. >> >> Thank You >> >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/6381642e/attachment-0001.html>
hameeza ahmed via llvm-dev
2017-Jul-01 22:11 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
further i modified the code to the following; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[restrict][N], int b[restrict][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) a[i][j] = b[i][j]; } } but still no vectorization in IR. Also, when I set vector width explicitly to 64, it gives the following error: remark: <unknown>:0:0: loop not vectorized: call instruction cannot be vectorized remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop I need serious help on this. Please guide me. On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Does it happen due to loop carried dependence? if yes what is the solution > to vectorize such codes? > > > please reply. i m waiting. > > On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > >> I even tried polly but still my llvm IR does not contain vector >> instructions. i used the following command; >> >> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm >> -polly-vectorizer=stripmine -o stencil_poly.ll >> >> Please specify what is wrong with my code? >> >> >> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> >> wrote: >> >>> Hello, >>> >>> I am trying to vectorize following stencil code; >>> >>> #include <stdio.h> >>> #define N 100351 >>> >>> // This function computes 2D-5 point Jacobi stencil >>> void stencil(int a[restrict][N]) >>> { >>> int i, j, k; >>> for (k = 0; k < 100; k++) >>> { for (i = 1; i <= N-2; i++) >>> { for (j = 1; j <= N-2; j++) >>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] >>> + a[i][j+1]); >>> } >>> } >>> }} >>> >>> I have used the following commands >>> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm >>> -disable-llvm-optzns -o stencil.ll >>> >>> opt -S -O3 stencil.ll -o stencil_o3.ll >>> >>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >>> >>> But the code is not vectorized. It still uses the scalar instructions; >>> >>> Please correct me. >>> >>> Thank You >>> >>> >>> >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/d0c25c02/attachment.html>
hameeza ahmed via llvm-dev
2017-Jul-01 22:33 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) a[i][j] = b[i][j]; } } I removed restrict over here. On Sun, Jul 2, 2017 at 3:11 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> further i modified the code to the following; > > #include <stdio.h> > #define N 100351 > > // This function computes 2D-5 point Jacobi stencil > void stencil(int a[restrict][N], int b[restrict][N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) > for (j = 1; j <= N-2; j++) > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + > a[i][j+1]); > > for (i = 1; i <= N-2; i++) > for (j = 1; j <= N-2; j++) > a[i][j] = b[i][j]; > > } > } > > but still no vectorization in IR. Also, when I set vector width > explicitly to 64, it gives the following error: > > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be > vectorized > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > I need serious help on this. Please guide me. > > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Does it happen due to loop carried dependence? if yes what is the >> solution to vectorize such codes? >> >> >> please reply. i m waiting. >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: >> >>> I even tried polly but still my llvm IR does not contain vector >>> instructions. i used the following command; >>> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm >>> -polly-vectorizer=stripmine -o stencil_poly.ll >>> >>> Please specify what is wrong with my code? >>> >>> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I am trying to vectorize following stencil code; >>>> >>>> #include <stdio.h> >>>> #define N 100351 >>>> >>>> // This function computes 2D-5 point Jacobi stencil >>>> void stencil(int a[restrict][N]) >>>> { >>>> int i, j, k; >>>> for (k = 0; k < 100; k++) >>>> { for (i = 1; i <= N-2; i++) >>>> { for (j = 1; j <= N-2; j++) >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >>>> a[i][j-1] + a[i][j+1]); >>>> } >>>> } >>>> }} >>>> >>>> I have used the following commands >>>> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm >>>> -disable-llvm-optzns -o stencil.ll >>>> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll >>>> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >>>> >>>> But the code is not vectorized. It still uses the scalar instructions; >>>> >>>> Please correct me. >>>> >>>> Thank You >>>> >>>> >>>> >>>> >>>> >>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170702/e04385fa/attachment.html>
Serge Preis via llvm-dev
2017-Jul-03 05:19 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Hello, This is not equivalent rewrite: your original code definitely shouldn't vectorize because it has backward cross-iteration (loop-carried) dependency: your value on iteration j+1 depend on value from iteration j you've just written. In case of vectorization you need to do load-operation-store on multiple consecutive values at once and it is impossible in this case. In your new code all values of a[] in right-hand side of the main loop are from previous k iteration (because on current k iteration you're writing to 'b'). So there is no way to vectorize loop in its original form, but new form is definitely vectorizable. I am second to recommend you filing a bug over 'restrict' behavior. And you may in fact save some memory by making 'b' 1D array (this is not equivalent rewrite once again though) // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j++) b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for (j = 1; j <= N-2; j++) a[i][j] = b[j]; } } } There is a way to vectorize your code even more efficient. This will look like more low-level, but probably closest to what you would expect from vectorization. void stencil(int a[][N]) { int i, j, k; for (k = 0; k < 100; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j+=16) { int b[16]; // This should become a register on KNL for (v = 0; v < 16; v++) { b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] + a[i][j+1+v]); } for (v = 0; v < 16; v++) { // This will be a single store operation a[i][j+v] = b[v]; } } // You should explicitly take care about the tail of j-loop #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual name, just a designator for (;j <= N-2; j++) { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); } #else for (v = 0; v < 16-(N-2-j); v++) { // This would become masked non-loop b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] + a[i][j+1+v]); } for (v = 0; v < 16-(N-2-j); v++) { // This will be a single masked store operation a[i][j+v] = b[v]; } #endif } } Unfortunalely compiler cannot do this for you: this is not equivalent transformation of original code. I am also not aware of any way to express this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in this case). Minor note: You're using 'int' for data, than multiply by 0.25 (divide by 4) and than write it back to 'int'. This will cost you 2 conversion to/from double while you may just place (...) / 4 which should be optimized to simple sequecnce with shifts (not to single shift due to signedness, but still better than conversions with changes of element size 4->8->4 and data size INT->FP->INT). And by the way why do you divide by 4, not by 5 as number of points suggest? Serge Preis 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>:> further i modified the code to the following; > > #include <stdio.h> > #define N 100351 > > // This function computes 2D-5 point Jacobi stencil > void stencil(int a[restrict][N], int b[restrict][N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) > for (j = 1; j <= N-2; j++) > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); > > for (i = 1; i <= N-2; i++) > for (j = 1; j <= N-2; j++) > a[i][j] = b[i][j]; > > } > } > > but still no vectorization in IR. Also, when I set vector width explicitly to 64, it gives the following error: > > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be vectorized > remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop > > I need serious help on this. Please guide me. > > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote: >> Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? >> >> please reply. i m waiting. >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: >>> I even tried polly but still my llvm IR does not contain vector instructions. i used the following command; >>> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine -o stencil_poly.ll >>> >>> Please specify what is wrong with my code? >>> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote: >>>> Hello, >>>> >>>> I am trying to vectorize following stencil code; >>>> >>>> #include <stdio.h> >>>> #define N 100351 >>>> >>>> // This function computes 2D-5 point Jacobi stencil >>>> void stencil(int a[restrict][N]) >>>> { >>>> int i, j, k; >>>> for (k = 0; k < 100; k++) >>>> { for (i = 1; i <= N-2; i++) >>>> { for (j = 1; j <= N-2; j++) >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); >>>> } >>>> } >>>> }} >>>> >>>> I have used the following commands >>>> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -disable-llvm-optzns -o stencil.ll >>>> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll >>>> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >>>> >>>> But the code is not vectorized. It still uses the scalar instructions; >>>> >>>> Please correct me. >>>> >>>> Thank You > , > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev