Michael Kruse via llvm-dev
2017-Oct-24 13:30 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark: <unknown>:0:0: loop not vectorized The message is not entirely accurate. The PHI is not used outside of the loop, but is in the loop header, therefore cannot be if-converted and is not a reduction. Polly also had difficulties with this construction as well. As presented at the dev meeting, VPlan will be able to insert a shuffle instruction when encountering this situation. Michael 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org>:> stencil.ll is attached here. > > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> wrote: >> >> >> >> Hello, >> >> To me this is an issue in llvm loop vectorizer (if N is large enough to >> prevent complete unrolling of j-loop). >> >> Woud you mind to share stencil.ll than I would say more definitely what >> the issue is. >> >> Regards, >> Serge. >> >> >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: >> >> Hello, >> >> The following stencil function that you wrote few months ago (for >> computing 2D-5 point Jacobi stencil) is vectorizable. >> >> void stencil(int a[][N], int b[N]) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> but when i write the above code in main i am getting error >> >> opt -S -O3 stencil.ll -o stencil_o3.ll >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> identified as reduction is used outside the loop >> >> Why is that so? >> >> my code is follows: >> int a[N][N], b[N]; >> int main(void) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> >> Please help. >> >> Thank You. >> >> >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> >> wrote: >> >> Hello, >> >> This is not equivalent rewrite: your original code definitely shouldn't >> vectorize because it has backward cross-iteration (loop-carried) dependency: >> your value on iteration j+1 depend on value from iteration j you've just >> written. In case of vectorization you need to do load-operation-store on >> multiple consecutive values at once and it is impossible in this case. >> >> In your new code all values of a[] in right-hand side of the main loop are >> from previous k iteration (because on current k iteration you're writing to >> 'b'). So there is no way to vectorize loop in its original form, but new >> form is definitely vectorizable. >> >> I am second to recommend you filing a bug over 'restrict' behavior. And >> you may in fact save some memory by making 'b' 1D array (this is not >> equivalent rewrite once again though) >> >> // This function computes 2D-5 point Jacobi stencil >> void stencil(int a[][N], int b[N]) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> There is a way to vectorize your code even more efficient. This will look >> like more low-level, but probably closest to what you would expect from >> vectorization. >> >> void stencil(int a[][N]) >> { >> int i, j, k; >> for (k = 0; k < 100; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j+=16) { >> int b[16]; // This should become a register on KNL >> for (v = 0; v < 16; v++) { >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + >> a[i][j-1+v] + a[i][j+1+v]); >> } >> for (v = 0; v < 16; v++) { // This will be a single store >> operation >> a[i][j+v] = b[v]; >> } >> } >> // You should explicitly take care about the tail of j-loop >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual >> name, just a designator >> for (;j <= N-2; j++) { >> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> } >> #else >> for (v = 0; v < 16-(N-2-j); v++) { // This would become masked >> non-loop >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + >> a[i][j-1+v] + a[i][j+1+v]); >> } >> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single >> masked store operation >> a[i][j+v] = b[v]; >> } >> #endif >> } >> } >> >> Unfortunalely compiler cannot do this for you: this is not equivalent >> transformation of original code. I am also not aware of any way to express >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in >> this case). >> >> Minor note: You're using 'int' for data, than multiply by 0.25 (divide by >> 4) and than write it back to 'int'. This will cost you 2 conversion to/from >> double while you may just place (...) / 4 which should be optimized to >> simple sequecnce with shifts (not to single shift due to signedness, but >> still better than conversions with changes of element size 4->8->4 and data >> size INT->FP->INT). >> >> And by the way why do you divide by 4, not by 5 as number of points >> suggest? >> >> Serge Preis >> >> >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: >> > further i modified the code to the following; >> > >> > #include <stdio.h> >> > #define N 100351 >> > >> > // This function computes 2D-5 point Jacobi stencil >> > void stencil(int a[restrict][N], int b[restrict][N]) >> > { >> > int i, j, k; >> > for (k = 0; k < N; k++) { >> > for (i = 1; i <= N-2; i++) >> > for (j = 1; j <= N-2; j++) >> > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + >> > a[i][j+1]); >> > >> > for (i = 1; i <= N-2; i++) >> > for (j = 1; j <= N-2; j++) >> > a[i][j] = b[i][j]; >> > >> > } >> > } >> > >> > but still no vectorization in IR. Also, when I set vector width >> > explicitly to 64, it gives the following error: >> > >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be >> > vectorized >> > remark: <unknown>:0:0: loop not vectorized: value that could not be >> > identified as reduction is used outside the loop >> > >> > I need serious help on this. Please guide me. >> > >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >> > wrote: >> >> Does it happen due to loop carried dependence? if yes what is the >> >> solution to vectorize such codes? >> >> >> >> please reply. i m waiting. >> >> >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: >> >>> I even tried polly but still my llvm IR does not contain vector >> >>> instructions. i used the following command; >> >>> >> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll >> >>> >> >>> Please specify what is wrong with my code? >> >>> >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> >> >>> wrote: >> >>>> Hello, >> >>>> >> >>>> I am trying to vectorize following stencil code; >> >>>> >> >>>> #include <stdio.h> >> >>>> #define N 100351 >> >>>> >> >>>> // This function computes 2D-5 point Jacobi stencil >> >>>> void stencil(int a[restrict][N]) >> >>>> { >> >>>> int i, j, k; >> >>>> for (k = 0; k < 100; k++) >> >>>> { for (i = 1; i <= N-2; i++) >> >>>> { for (j = 1; j <= N-2; j++) >> >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >>>> a[i][j-1] + a[i][j+1]); >> >>>> } >> >>>> } >> >>>> }} >> >>>> >> >>>> I have used the following commands >> >>>> >> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm >> >>>> -disable-llvm-optzns -o stencil.ll >> >>>> >> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll >> >>>> >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >> >>>> >> >>>> But the code is not vectorized. It still uses the scalar >> >>>> instructions; >> >>>> >> >>>> Please correct me. >> >>>> >> >>>> Thank You >> > , >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
hameeza ahmed via llvm-dev
2017-Oct-24 18:37 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Thank you. But how to resolve it, my work depends on it. On Tue, Oct 24, 2017 at 6:30 PM, Michael Kruse <llvmdev at meinersbur.de> wrote:> Your problem is due to GVN partial reduction elimination (PRE) which > introduces a PHI node the current loop vectorizer cannot handle: > > opt -O3 stencil.ll -pass-remarks=loop-vectorize > -pass-remarks-missed=loop-vectorize > -pass-remarks-analysis=loop-vectorize > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > remark: <unknown>:0:0: loop not vectorized > > The message is not entirely accurate. The PHI is not used outside of > the loop, but is in the loop header, therefore cannot be if-converted > and is not a reduction. Polly also had difficulties with this > construction as well. > > As presented at the dev meeting, VPlan will be able to insert a > shuffle instruction when encountering this situation. > > > Michael > > > > > > 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev < > llvm-dev at lists.llvm.org>: > > stencil.ll is attached here. > > > > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> > wrote: > >> > >> > >> > >> Hello, > >> > >> To me this is an issue in llvm loop vectorizer (if N is large enough to > >> prevent complete unrolling of j-loop). > >> > >> Woud you mind to share stencil.ll than I would say more definitely what > >> the issue is. > >> > >> Regards, > >> Serge. > >> > >> > >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: > >> > >> Hello, > >> > >> The following stencil function that you wrote few months ago (for > >> computing 2D-5 point Jacobi stencil) is vectorizable. > >> > >> void stencil(int a[][N], int b[N]) > >> { > >> int i, j, k; > >> for (k = 0; k < N; k++) { > >> for (i = 1; i <= N-2; i++) { > >> for (j = 1; j <= N-2; j++) > >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > >> a[i][j-1] + a[i][j+1]); > >> for (j = 1; j <= N-2; j++) > >> a[i][j] = b[j]; > >> } > >> } > >> } > >> > >> but when i write the above code in main i am getting error > >> > >> opt -S -O3 stencil.ll -o stencil_o3.ll > >> remark: <unknown>:0:0: loop not vectorized: value that could not be > >> identified as reduction is used outside the loop > >> > >> Why is that so? > >> > >> my code is follows: > >> int a[N][N], b[N]; > >> int main(void) > >> { > >> int i, j, k; > >> for (k = 0; k < N; k++) { > >> for (i = 1; i <= N-2; i++) { > >> for (j = 1; j <= N-2; j++) > >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > >> a[i][j-1] + a[i][j+1]); > >> for (j = 1; j <= N-2; j++) > >> a[i][j] = b[j]; > >> } > >> } > >> } > >> > >> > >> Please help. > >> > >> Thank You. > >> > >> > >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> > >> wrote: > >> > >> Hello, > >> > >> This is not equivalent rewrite: your original code definitely shouldn't > >> vectorize because it has backward cross-iteration (loop-carried) > dependency: > >> your value on iteration j+1 depend on value from iteration j you've just > >> written. In case of vectorization you need to do load-operation-store on > >> multiple consecutive values at once and it is impossible in this case. > >> > >> In your new code all values of a[] in right-hand side of the main loop > are > >> from previous k iteration (because on current k iteration you're > writing to > >> 'b'). So there is no way to vectorize loop in its original form, but new > >> form is definitely vectorizable. > >> > >> I am second to recommend you filing a bug over 'restrict' behavior. And > >> you may in fact save some memory by making 'b' 1D array (this is not > >> equivalent rewrite once again though) > >> > >> // This function computes 2D-5 point Jacobi stencil > >> void stencil(int a[][N], int b[N]) > >> { > >> int i, j, k; > >> for (k = 0; k < N; k++) { > >> for (i = 1; i <= N-2; i++) { > >> for (j = 1; j <= N-2; j++) > >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > >> a[i][j-1] + a[i][j+1]); > >> for (j = 1; j <= N-2; j++) > >> a[i][j] = b[j]; > >> } > >> } > >> } > >> > >> There is a way to vectorize your code even more efficient. This will > look > >> like more low-level, but probably closest to what you would expect from > >> vectorization. > >> > >> void stencil(int a[][N]) > >> { > >> int i, j, k; > >> for (k = 0; k < 100; k++) { > >> for (i = 1; i <= N-2; i++) { > >> for (j = 1; j <= N-2; j+=16) { > >> int b[16]; // This should become a register on KNL > >> for (v = 0; v < 16; v++) { > >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] > + > >> a[i][j-1+v] + a[i][j+1+v]); > >> } > >> for (v = 0; v < 16; v++) { // This will be a single > store > >> operation > >> a[i][j+v] = b[v]; > >> } > >> } > >> // You should explicitly take care about the tail of j-loop > >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an > actual > >> name, just a designator > >> for (;j <= N-2; j++) { > >> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > >> a[i][j-1] + a[i][j+1]); > >> } > >> #else > >> for (v = 0; v < 16-(N-2-j); v++) { // This would become > masked > >> non-loop > >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + > >> a[i][j-1+v] + a[i][j+1+v]); > >> } > >> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single > >> masked store operation > >> a[i][j+v] = b[v]; > >> } > >> #endif > >> } > >> } > >> > >> Unfortunalely compiler cannot do this for you: this is not equivalent > >> transformation of original code. I am also not aware of any way to > express > >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't > work in > >> this case). > >> > >> Minor note: You're using 'int' for data, than multiply by 0.25 (divide > by > >> 4) and than write it back to 'int'. This will cost you 2 conversion > to/from > >> double while you may just place (...) / 4 which should be optimized to > >> simple sequecnce with shifts (not to single shift due to signedness, but > >> still better than conversions with changes of element size 4->8->4 and > data > >> size INT->FP->INT). > >> > >> And by the way why do you divide by 4, not by 5 as number of points > >> suggest? > >> > >> Serge Preis > >> > >> > >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" < > llvm-dev at lists.llvm.org>: > >> > further i modified the code to the following; > >> > > >> > #include <stdio.h> > >> > #define N 100351 > >> > > >> > // This function computes 2D-5 point Jacobi stencil > >> > void stencil(int a[restrict][N], int b[restrict][N]) > >> > { > >> > int i, j, k; > >> > for (k = 0; k < N; k++) { > >> > for (i = 1; i <= N-2; i++) > >> > for (j = 1; j <= N-2; j++) > >> > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] > + > >> > a[i][j+1]); > >> > > >> > for (i = 1; i <= N-2; i++) > >> > for (j = 1; j <= N-2; j++) > >> > a[i][j] = b[i][j]; > >> > > >> > } > >> > } > >> > > >> > but still no vectorization in IR. Also, when I set vector width > >> > explicitly to 64, it gives the following error: > >> > > >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be > >> > vectorized > >> > remark: <unknown>:0:0: loop not vectorized: value that could not be > >> > identified as reduction is used outside the loop > >> > > >> > I need serious help on this. Please guide me. > >> > > >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> > >> > wrote: > >> >> Does it happen due to loop carried dependence? if yes what is the > >> >> solution to vectorize such codes? > >> >> > >> >> please reply. i m waiting. > >> >> > >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> > wrote: > >> >>> I even tried polly but still my llvm IR does not contain vector > >> >>> instructions. i used the following command; > >> >>> > >> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm > >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll > >> >>> > >> >>> Please specify what is wrong with my code? > >> >>> > >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com > > > >> >>> wrote: > >> >>>> Hello, > >> >>>> > >> >>>> I am trying to vectorize following stencil code; > >> >>>> > >> >>>> #include <stdio.h> > >> >>>> #define N 100351 > >> >>>> > >> >>>> // This function computes 2D-5 point Jacobi stencil > >> >>>> void stencil(int a[restrict][N]) > >> >>>> { > >> >>>> int i, j, k; > >> >>>> for (k = 0; k < 100; k++) > >> >>>> { for (i = 1; i <= N-2; i++) > >> >>>> { for (j = 1; j <= N-2; j++) > >> >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > >> >>>> a[i][j-1] + a[i][j+1]); > >> >>>> } > >> >>>> } > >> >>>> }} > >> >>>> > >> >>>> I have used the following commands > >> >>>> > >> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm > >> >>>> -disable-llvm-optzns -o stencil.ll > >> >>>> > >> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll > >> >>>> > >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s > >> >>>> > >> >>>> But the code is not vectorized. It still uses the scalar > >> >>>> instructions; > >> >>>> > >> >>>> Please correct me. > >> >>>> > >> >>>> Thank You > >> > , > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171024/08b8183f/attachment.html>
Tim Northover via llvm-dev
2017-Oct-24 20:17 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
> But how to resolve it, my work depends on it.You're doing a PhD, which pretty much by definition that involves independent thought. How do *you* think it should be resolved? Tim.
Michael Kruse via llvm-dev
2017-Oct-25 12:34 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
$ opt -O3 stencil.ll -pass-remarks=loop-vectorize -enable-load-pre=0 remark: <unknown>:0:0: vectorized loop (vectorization width: 16, interleaved count: 1) Michael 2017-10-24 20:37 GMT+02:00 hameeza ahmed <hahmed2305 at gmail.com>:> Thank you. > > But how to resolve it, my work depends on it. > > On Tue, Oct 24, 2017 at 6:30 PM, Michael Kruse <llvmdev at meinersbur.de> > wrote: >> >> Your problem is due to GVN partial reduction elimination (PRE) which >> introduces a PHI node the current loop vectorizer cannot handle: >> >> opt -O3 stencil.ll -pass-remarks=loop-vectorize >> -pass-remarks-missed=loop-vectorize >> -pass-remarks-analysis=loop-vectorize >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> identified as reduction is used outside the loop >> remark: <unknown>:0:0: loop not vectorized >> >> The message is not entirely accurate. The PHI is not used outside of >> the loop, but is in the loop header, therefore cannot be if-converted >> and is not a reduction. Polly also had difficulties with this >> construction as well. >> >> As presented at the dev meeting, VPlan will be able to insert a >> shuffle instruction when encountering this situation. >> >> >> Michael >> >> >> >> >> >> 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev >> <llvm-dev at lists.llvm.org>: >> > stencil.ll is attached here. >> > >> > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> >> > wrote: >> >> >> >> >> >> >> >> Hello, >> >> >> >> To me this is an issue in llvm loop vectorizer (if N is large enough to >> >> prevent complete unrolling of j-loop). >> >> >> >> Woud you mind to share stencil.ll than I would say more definitely what >> >> the issue is. >> >> >> >> Regards, >> >> Serge. >> >> >> >> >> >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: >> >> >> >> Hello, >> >> >> >> The following stencil function that you wrote few months ago (for >> >> computing 2D-5 point Jacobi stencil) is vectorizable. >> >> >> >> void stencil(int a[][N], int b[N]) >> >> { >> >> int i, j, k; >> >> for (k = 0; k < N; k++) { >> >> for (i = 1; i <= N-2; i++) { >> >> for (j = 1; j <= N-2; j++) >> >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >> a[i][j-1] + a[i][j+1]); >> >> for (j = 1; j <= N-2; j++) >> >> a[i][j] = b[j]; >> >> } >> >> } >> >> } >> >> >> >> but when i write the above code in main i am getting error >> >> >> >> opt -S -O3 stencil.ll -o stencil_o3.ll >> >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> >> identified as reduction is used outside the loop >> >> >> >> Why is that so? >> >> >> >> my code is follows: >> >> int a[N][N], b[N]; >> >> int main(void) >> >> { >> >> int i, j, k; >> >> for (k = 0; k < N; k++) { >> >> for (i = 1; i <= N-2; i++) { >> >> for (j = 1; j <= N-2; j++) >> >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >> a[i][j-1] + a[i][j+1]); >> >> for (j = 1; j <= N-2; j++) >> >> a[i][j] = b[j]; >> >> } >> >> } >> >> } >> >> >> >> >> >> Please help. >> >> >> >> Thank You. >> >> >> >> >> >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> >> >> wrote: >> >> >> >> Hello, >> >> >> >> This is not equivalent rewrite: your original code definitely shouldn't >> >> vectorize because it has backward cross-iteration (loop-carried) >> >> dependency: >> >> your value on iteration j+1 depend on value from iteration j you've >> >> just >> >> written. In case of vectorization you need to do load-operation-store >> >> on >> >> multiple consecutive values at once and it is impossible in this case. >> >> >> >> In your new code all values of a[] in right-hand side of the main loop >> >> are >> >> from previous k iteration (because on current k iteration you're >> >> writing to >> >> 'b'). So there is no way to vectorize loop in its original form, but >> >> new >> >> form is definitely vectorizable. >> >> >> >> I am second to recommend you filing a bug over 'restrict' behavior. And >> >> you may in fact save some memory by making 'b' 1D array (this is not >> >> equivalent rewrite once again though) >> >> >> >> // This function computes 2D-5 point Jacobi stencil >> >> void stencil(int a[][N], int b[N]) >> >> { >> >> int i, j, k; >> >> for (k = 0; k < N; k++) { >> >> for (i = 1; i <= N-2; i++) { >> >> for (j = 1; j <= N-2; j++) >> >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >> a[i][j-1] + a[i][j+1]); >> >> for (j = 1; j <= N-2; j++) >> >> a[i][j] = b[j]; >> >> } >> >> } >> >> } >> >> >> >> There is a way to vectorize your code even more efficient. This will >> >> look >> >> like more low-level, but probably closest to what you would expect from >> >> vectorization. >> >> >> >> void stencil(int a[][N]) >> >> { >> >> int i, j, k; >> >> for (k = 0; k < 100; k++) { >> >> for (i = 1; i <= N-2; i++) { >> >> for (j = 1; j <= N-2; j+=16) { >> >> int b[16]; // This should become a register on KNL >> >> for (v = 0; v < 16; v++) { >> >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] >> >> + >> >> a[i][j-1+v] + a[i][j+1+v]); >> >> } >> >> for (v = 0; v < 16; v++) { // This will be a single >> >> store >> >> operation >> >> a[i][j+v] = b[v]; >> >> } >> >> } >> >> // You should explicitly take care about the tail of j-loop >> >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an >> >> actual >> >> name, just a designator >> >> for (;j <= N-2; j++) { >> >> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >> a[i][j-1] + a[i][j+1]); >> >> } >> >> #else >> >> for (v = 0; v < 16-(N-2-j); v++) { // This would become >> >> masked >> >> non-loop >> >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + >> >> a[i][j-1+v] + a[i][j+1+v]); >> >> } >> >> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single >> >> masked store operation >> >> a[i][j+v] = b[v]; >> >> } >> >> #endif >> >> } >> >> } >> >> >> >> Unfortunalely compiler cannot do this for you: this is not equivalent >> >> transformation of original code. I am also not aware of any way to >> >> express >> >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't >> >> work in >> >> this case). >> >> >> >> Minor note: You're using 'int' for data, than multiply by 0.25 (divide >> >> by >> >> 4) and than write it back to 'int'. This will cost you 2 conversion >> >> to/from >> >> double while you may just place (...) / 4 which should be optimized to >> >> simple sequecnce with shifts (not to single shift due to signedness, >> >> but >> >> still better than conversions with changes of element size 4->8->4 and >> >> data >> >> size INT->FP->INT). >> >> >> >> And by the way why do you divide by 4, not by 5 as number of points >> >> suggest? >> >> >> >> Serge Preis >> >> >> >> >> >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" >> >> <llvm-dev at lists.llvm.org>: >> >> > further i modified the code to the following; >> >> > >> >> > #include <stdio.h> >> >> > #define N 100351 >> >> > >> >> > // This function computes 2D-5 point Jacobi stencil >> >> > void stencil(int a[restrict][N], int b[restrict][N]) >> >> > { >> >> > int i, j, k; >> >> > for (k = 0; k < N; k++) { >> >> > for (i = 1; i <= N-2; i++) >> >> > for (j = 1; j <= N-2; j++) >> >> > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] >> >> > + >> >> > a[i][j+1]); >> >> > >> >> > for (i = 1; i <= N-2; i++) >> >> > for (j = 1; j <= N-2; j++) >> >> > a[i][j] = b[i][j]; >> >> > >> >> > } >> >> > } >> >> > >> >> > but still no vectorization in IR. Also, when I set vector width >> >> > explicitly to 64, it gives the following error: >> >> > >> >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot >> >> > be >> >> > vectorized >> >> > remark: <unknown>:0:0: loop not vectorized: value that could not be >> >> > identified as reduction is used outside the loop >> >> > >> >> > I need serious help on this. Please guide me. >> >> > >> >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >> >> > wrote: >> >> >> Does it happen due to loop carried dependence? if yes what is the >> >> >> solution to vectorize such codes? >> >> >> >> >> >> please reply. i m waiting. >> >> >> >> >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> >> >> >> wrote: >> >> >>> I even tried polly but still my llvm IR does not contain vector >> >> >>> instructions. i used the following command; >> >> >>> >> >> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm >> >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll >> >> >>> >> >> >>> Please specify what is wrong with my code? >> >> >>> >> >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed >> >> >>> <hahmed2305 at gmail.com> >> >> >>> wrote: >> >> >>>> Hello, >> >> >>>> >> >> >>>> I am trying to vectorize following stencil code; >> >> >>>> >> >> >>>> #include <stdio.h> >> >> >>>> #define N 100351 >> >> >>>> >> >> >>>> // This function computes 2D-5 point Jacobi stencil >> >> >>>> void stencil(int a[restrict][N]) >> >> >>>> { >> >> >>>> int i, j, k; >> >> >>>> for (k = 0; k < 100; k++) >> >> >>>> { for (i = 1; i <= N-2; i++) >> >> >>>> { for (j = 1; j <= N-2; j++) >> >> >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >> >>>> a[i][j-1] + a[i][j+1]); >> >> >>>> } >> >> >>>> } >> >> >>>> }} >> >> >>>> >> >> >>>> I have used the following commands >> >> >>>> >> >> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm >> >> >>>> -disable-llvm-optzns -o stencil.ll >> >> >>>> >> >> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll >> >> >>>> >> >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >> >> >>>> >> >> >>>> But the code is not vectorized. It still uses the scalar >> >> >>>> instructions; >> >> >>>> >> >> >>>> Please correct me. >> >> >>>> >> >> >>>> Thank You >> >> > , >> >> > >> >> > _______________________________________________ >> >> > LLVM Developers mailing list >> >> > llvm-dev at lists.llvm.org >> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > >