Serge Preis via llvm-dev
2017-Oct-23 06:37 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue is.</div><div> </div><div>Regards,</div><div>Serge.</div><div> </div><div> </div><div>22.10.2017, 03:21, "hameeza ahmed" <hahmed2305@gmail.com>:</div><blockquote type="cite"><div>Hello,<div> </div><div>The following stencil function that you wrote few months ago (for computing 2D-5 point Jacobi stencil) is vectorizable.</div><div> </div><div><div>void stencil(int a[][N], int b[N])</div><div>{</div><div> int i, j, k;</div><div> for (k = 0; k < N; k++) {</div><div> for (i = 1; i <= N-2; i++) {</div><div> for (j = 1; j <= N-2; j++)</div><div> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);</div><div> for (j = 1; j <= N-2; j++)</div><div> a[i][j] = b[j];</div><div> }</div><div> }</div><div>}</div></div><div> </div><div>but when i write the above code in main i am getting error </div><div> </div><div><div>opt -S -O3 stencil.ll -o stencil_o3.ll</div><div>remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop</div></div><div> </div><div>Why is that so?</div><div> </div><div>my code is follows:</div><div>int a[N][N], b[N];</div><div><div>int main(void)</div><div>{</div><div> int i, j, k;</div><div> for (k = 0; k < N; k++) {</div><div> for (i = 1; i <= N-2; i++) {</div><div> for (j = 1; j <= N-2; j++)</div><div> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);</div><div> for (j = 1; j <= N-2; j++)</div><div> a[i][j] = b[j];</div><div> }</div><div> }</div><div>}</div></div><div> </div><div> </div><div>Please help.</div><div> </div><div>Thank You.</div><div> </div></div><div> <div>On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <span><<a target="_blank" href="mailto:spreis@yandex-team.ru">spreis@yandex-team.ru</a>></span> wrote:<blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex;">Hello,<br /><br />This is not equivalent rewrite: your original code definitely shouldn't vectorize because it has backward cross-iteration (loop-carried) dependency: your value on iteration j+1 depend on value from iteration j you've just written. In case of vectorization you need to do load-operation-store on multiple consecutive values at once and it is impossible in this case.<br /><br />In your new code all values of a[] in right-hand side of the main loop are from previous k iteration (because on current k iteration you're writing to 'b'). So there is no way to vectorize loop in its original form, but new form is definitely vectorizable.<br /><br />I am second to recommend you filing a bug over 'restrict' behavior. And you may in fact save some memory by making 'b' 1D array (this is not equivalent rewrite once again though)<br /><br /><span>// This function computes 2D-5 point Jacobi stencil</span><br />void stencil(int a[][N], int b[N])<br /><span>{<br /> int i, j, k;<br /> for (k = 0; k < N; k++) {<br /> for (i = 1; i <= N-2; i++) {<br /> for (j = 1; j <= N-2; j++)</span><br /> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br /><span> for (j = 1; j <= N-2; j++)</span><br /> a[i][j] = b[j];<br /> }<br /> }<br />}<br /><br />There is a way to vectorize your code even more efficient. This will look like more low-level, but probably closest to what you would expect from vectorization.<br /><br />void stencil(int a[][N])<br /><span>{<br /> int i, j, k;<br /> for (k = 0; k < 100; k++) { <br /> for (i = 1; i <= N-2; i++) {</span><br /> for (j = 1; j <= N-2; j+=16) {<br /> int b[16]; // This should become a register on KNL<br /> for (v = 0; v < 16; v++) {<br /> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] + a[i][j+1+v]);<br /> }<br /> for (v = 0; v < 16; v++) { // This will be a single store operation<br /> a[i][j+v] = b[v];<br /> }<br /> }<br /> // You should explicitly take care about the tail of j-loop<br />#if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual name, just a designator<br /> for (;j <= N-2; j++) {<br /><span> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br /> }</span><br />#else<br /> for (v = 0; v < 16-(N-2-j); v++) { // This would become masked non-loop<br /> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] + a[i][j+1+v]);<br /> }<br /> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single masked store operation<br /> a[i][j+v] = b[v];<br /> }<br />#endif<br /> }<br />}<br /><br />Unfortunalely compiler cannot do this for you: this is not equivalent transformation of original code. I am also not aware of any way to express this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in this case).<br /><br />Minor note: You're using 'int' for data, than multiply by 0.25 (divide by 4) and than write it back to 'int'. This will cost you 2 conversion to/from double while you may just place (...) / 4 which should be optimized to simple sequecnce with shifts (not to single shift due to signedness, but still better than conversions with changes of element size 4->8->4 and data size INT->FP->INT).<br /><br />And by the way why do you divide by 4, not by 5 as number of points suggest?<br /><br />Serge Preis<br /><br /><br />02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>>:<div><div>> further i modified the code to the following;<br />><br />> #include <stdio.h><br />> #define N 100351<br />><br />> // This function computes 2D-5 point Jacobi stencil<br />> void stencil(int a[restrict][N], int b[restrict][N])<br />> {<br />> int i, j, k;<br />> for (k = 0; k < N; k++) {<br />> for (i = 1; i <= N-2; i++)<br />> for (j = 1; j <= N-2; j++)<br />> b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br />><br />> for (i = 1; i <= N-2; i++)<br />> for (j = 1; j <= N-2; j++)<br />> a[i][j] = b[i][j];<br />><br />> }<br />> }<br />><br />> but still no vectorization in IR. Also, when I set vector width explicitly to 64, it gives the following error:<br />><br />> remark: <unknown>:0:0: loop not vectorized: call instruction cannot be vectorized<br />> remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop<br />><br />> I need serious help on this. Please guide me.<br />><br />> On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <<a href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>> wrote:<br />>> Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes?<br />>><br />>> please reply. i m waiting.<br />>><br />>> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <<a href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>> wrote:<br />>>> I even tried polly but still my llvm IR does not contain vector instructions. i used the following command;<br />>>><br />>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine -o stencil_poly.ll<br />>>><br />>>> Please specify what is wrong with my code?<br />>>><br />>>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <<a href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>> wrote:<br />>>>> Hello,<br />>>>><br />>>>> I am trying to vectorize following stencil code;<br />>>>><br />>>>> #include <stdio.h><br />>>>> #define N 100351<br />>>>><br />>>>> // This function computes 2D-5 point Jacobi stencil<br />>>>> void stencil(int a[restrict][N])<br />>>>> {<br />>>>> int i, j, k;<br />>>>> for (k = 0; k < 100; k++)<br />>>>> { for (i = 1; i <= N-2; i++)<br />>>>> { for (j = 1; j <= N-2; j++)<br />>>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br />>>>> }<br />>>>> }<br />>>>> }}<br />>>>><br />>>>> I have used the following commands<br />>>>><br />>>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -disable-llvm-optzns -o stencil.ll<br />>>>><br />>>>> opt -S -O3 stencil.ll -o stencil_o3.ll<br />>>>><br />>>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s<br />>>>><br />>>>> But the code is not vectorized. It still uses the scalar instructions;<br />>>>><br />>>>> Please correct me.<br />>>>><br />>>>> Thank You</div></div>> ,<br />><br />> _______________________________________________<br />> LLVM Developers mailing list<br />> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br />> <a target="_blank" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></blockquote></div></div></blockquote>
hameeza ahmed via llvm-dev
2017-Oct-23 06:53 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
#include <stdio.h> #define N 66 // This function computes 2D-5 point Jacobi stencil int a[N][N] b[N]; int main(void) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j++) b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for (j = 1; j <= N-2; j++) a[i][j] = b[j]; } } } On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> wrote:> > > Hello, > > To me this is an issue in llvm loop vectorizer (if N is large enough to > prevent complete unrolling of j-loop). > > Woud you mind to share stencil.ll than I would say more definitely what > the issue is. > > Regards, > Serge. > > > 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: > > Hello, > > The following stencil function that you wrote few months ago (for > computing 2D-5 point Jacobi stencil) is vectorizable. > > void stencil(int a[][N], int b[N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > but when i write the above code in main i am getting error > > opt -S -O3 stencil.ll -o stencil_o3.ll > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > Why is that so? > > my code is follows: > int a[N][N], b[N]; > int main(void) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > > Please help. > > Thank You. > > > On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> > wrote: > > Hello, > > This is not equivalent rewrite: your original code definitely shouldn't > vectorize because it has backward cross-iteration (loop-carried) > dependency: your value on iteration j+1 depend on value from iteration j > you've just written. In case of vectorization you need to do > load-operation-store on multiple consecutive values at once and it is > impossible in this case. > > In your new code all values of a[] in right-hand side of the main loop are > from previous k iteration (because on current k iteration you're writing to > 'b'). So there is no way to vectorize loop in its original form, but new > form is definitely vectorizable. > > I am second to recommend you filing a bug over 'restrict' behavior. And > you may in fact save some memory by making 'b' 1D array (this is not > equivalent rewrite once again though) > > // This function computes 2D-5 point Jacobi stencil > void stencil(int a[][N], int b[N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > There is a way to vectorize your code even more efficient. This will look > like more low-level, but probably closest to what you would expect from > vectorization. > > void stencil(int a[][N]) > { > int i, j, k; > for (k = 0; k < 100; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j+=16) { > int b[16]; // This should become a register on KNL > for (v = 0; v < 16; v++) { > b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + > a[i][j-1+v] + a[i][j+1+v]); > } > for (v = 0; v < 16; v++) { // This will be a single store > operation > a[i][j+v] = b[v]; > } > } > // You should explicitly take care about the tail of j-loop > #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an > actual name, just a designator > for (;j <= N-2; j++) { > a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > } > #else > for (v = 0; v < 16-(N-2-j); v++) { // This would become masked > non-loop > b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + > a[i][j-1+v] + a[i][j+1+v]); > } > for (v = 0; v < 16-(N-2-j); v++) { // This will be a single > masked store operation > a[i][j+v] = b[v]; > } > #endif > } > } > > Unfortunalely compiler cannot do this for you: this is not equivalent > transformation of original code. I am also not aware of any way to express > this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work > in this case). > > Minor note: You're using 'int' for data, than multiply by 0.25 (divide by > 4) and than write it back to 'int'. This will cost you 2 conversion to/from > double while you may just place (...) / 4 which should be optimized to > simple sequecnce with shifts (not to single shift due to signedness, but > still better than conversions with changes of element size 4->8->4 and data > size INT->FP->INT). > > And by the way why do you divide by 4, not by 5 as number of points > suggest? > > Serge Preis > > > 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: > > further i modified the code to the following; > > > > #include <stdio.h> > > #define N 100351 > > > > // This function computes 2D-5 point Jacobi stencil > > void stencil(int a[restrict][N], int b[restrict][N]) > > { > > int i, j, k; > > for (k = 0; k < N; k++) { > > for (i = 1; i <= N-2; i++) > > for (j = 1; j <= N-2; j++) > > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + > a[i][j+1]); > > > > for (i = 1; i <= N-2; i++) > > for (j = 1; j <= N-2; j++) > > a[i][j] = b[i][j]; > > > > } > > } > > > > but still no vectorization in IR. Also, when I set vector width > explicitly to 64, it gives the following error: > > > > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be > vectorized > > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > > > I need serious help on this. Please guide me. > > > > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Does it happen due to loop carried dependence? if yes what is the > solution to vectorize such codes? > >> > >> please reply. i m waiting. > >> > >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > >>> I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > >>> > >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm > -polly-vectorizer=stripmine -o stencil_poly.ll > >>> > >>> Please specify what is wrong with my code? > >>> > >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >>>> Hello, > >>>> > >>>> I am trying to vectorize following stencil code; > >>>> > >>>> #include <stdio.h> > >>>> #define N 100351 > >>>> > >>>> // This function computes 2D-5 point Jacobi stencil > >>>> void stencil(int a[restrict][N]) > >>>> { > >>>> int i, j, k; > >>>> for (k = 0; k < 100; k++) > >>>> { for (i = 1; i <= N-2; i++) > >>>> { for (j = 1; j <= N-2; j++) > >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > >>>> } > >>>> } > >>>> }} > >>>> > >>>> I have used the following commands > >>>> > >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm > -disable-llvm-optzns -o stencil.ll > >>>> > >>>> opt -S -O3 stencil.ll -o stencil_o3.ll > >>>> > >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s > >>>> > >>>> But the code is not vectorized. It still uses the scalar instructions; > >>>> > >>>> Please correct me. > >>>> > >>>> Thank You > > , > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171023/ccfc5452/attachment.html>
hameeza ahmed via llvm-dev
2017-Oct-23 06:58 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
stencil.ll is attached here. On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> wrote:> > > Hello, > > To me this is an issue in llvm loop vectorizer (if N is large enough to > prevent complete unrolling of j-loop). > > Woud you mind to share stencil.ll than I would say more definitely what > the issue is. > > Regards, > Serge. > > > 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: > > Hello, > > The following stencil function that you wrote few months ago (for > computing 2D-5 point Jacobi stencil) is vectorizable. > > void stencil(int a[][N], int b[N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > but when i write the above code in main i am getting error > > opt -S -O3 stencil.ll -o stencil_o3.ll > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > Why is that so? > > my code is follows: > int a[N][N], b[N]; > int main(void) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > > Please help. > > Thank You. > > > On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> > wrote: > > Hello, > > This is not equivalent rewrite: your original code definitely shouldn't > vectorize because it has backward cross-iteration (loop-carried) > dependency: your value on iteration j+1 depend on value from iteration j > you've just written. In case of vectorization you need to do > load-operation-store on multiple consecutive values at once and it is > impossible in this case. > > In your new code all values of a[] in right-hand side of the main loop are > from previous k iteration (because on current k iteration you're writing to > 'b'). So there is no way to vectorize loop in its original form, but new > form is definitely vectorizable. > > I am second to recommend you filing a bug over 'restrict' behavior. And > you may in fact save some memory by making 'b' 1D array (this is not > equivalent rewrite once again though) > > // This function computes 2D-5 point Jacobi stencil > void stencil(int a[][N], int b[N]) > { > int i, j, k; > for (k = 0; k < N; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j++) > b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > for (j = 1; j <= N-2; j++) > a[i][j] = b[j]; > } > } > } > > There is a way to vectorize your code even more efficient. This will look > like more low-level, but probably closest to what you would expect from > vectorization. > > void stencil(int a[][N]) > { > int i, j, k; > for (k = 0; k < 100; k++) { > for (i = 1; i <= N-2; i++) { > for (j = 1; j <= N-2; j+=16) { > int b[16]; // This should become a register on KNL > for (v = 0; v < 16; v++) { > b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + > a[i][j-1+v] + a[i][j+1+v]); > } > for (v = 0; v < 16; v++) { // This will be a single store > operation > a[i][j+v] = b[v]; > } > } > // You should explicitly take care about the tail of j-loop > #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an > actual name, just a designator > for (;j <= N-2; j++) { > a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > } > #else > for (v = 0; v < 16-(N-2-j); v++) { // This would become masked > non-loop > b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + > a[i][j-1+v] + a[i][j+1+v]); > } > for (v = 0; v < 16-(N-2-j); v++) { // This will be a single > masked store operation > a[i][j+v] = b[v]; > } > #endif > } > } > > Unfortunalely compiler cannot do this for you: this is not equivalent > transformation of original code. I am also not aware of any way to express > this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work > in this case). > > Minor note: You're using 'int' for data, than multiply by 0.25 (divide by > 4) and than write it back to 'int'. This will cost you 2 conversion to/from > double while you may just place (...) / 4 which should be optimized to > simple sequecnce with shifts (not to single shift due to signedness, but > still better than conversions with changes of element size 4->8->4 and data > size INT->FP->INT). > > And by the way why do you divide by 4, not by 5 as number of points > suggest? > > Serge Preis > > > 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: > > further i modified the code to the following; > > > > #include <stdio.h> > > #define N 100351 > > > > // This function computes 2D-5 point Jacobi stencil > > void stencil(int a[restrict][N], int b[restrict][N]) > > { > > int i, j, k; > > for (k = 0; k < N; k++) { > > for (i = 1; i <= N-2; i++) > > for (j = 1; j <= N-2; j++) > > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + > a[i][j+1]); > > > > for (i = 1; i <= N-2; i++) > > for (j = 1; j <= N-2; j++) > > a[i][j] = b[i][j]; > > > > } > > } > > > > but still no vectorization in IR. Also, when I set vector width > explicitly to 64, it gives the following error: > > > > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be > vectorized > > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > > > I need serious help on this. Please guide me. > > > > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >> Does it happen due to loop carried dependence? if yes what is the > solution to vectorize such codes? > >> > >> please reply. i m waiting. > >> > >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > >>> I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > >>> > >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm > -polly-vectorizer=stripmine -o stencil_poly.ll > >>> > >>> Please specify what is wrong with my code? > >>> > >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> > wrote: > >>>> Hello, > >>>> > >>>> I am trying to vectorize following stencil code; > >>>> > >>>> #include <stdio.h> > >>>> #define N 100351 > >>>> > >>>> // This function computes 2D-5 point Jacobi stencil > >>>> void stencil(int a[restrict][N]) > >>>> { > >>>> int i, j, k; > >>>> for (k = 0; k < 100; k++) > >>>> { for (i = 1; i <= N-2; i++) > >>>> { for (j = 1; j <= N-2; j++) > >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + > a[i][j-1] + a[i][j+1]); > >>>> } > >>>> } > >>>> }} > >>>> > >>>> I have used the following commands > >>>> > >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm > -disable-llvm-optzns -o stencil.ll > >>>> > >>>> opt -S -O3 stencil.ll -o stencil_o3.ll > >>>> > >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s > >>>> > >>>> But the code is not vectorized. It still uses the scalar instructions; > >>>> > >>>> Please correct me. > >>>> > >>>> Thank You > > , > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171023/b2b7ee26/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: stencil.ll Type: application/octet-stream Size: 7283 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171023/b2b7ee26/attachment-0001.obj>
Michael Kruse via llvm-dev
2017-Oct-24 13:30 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark: <unknown>:0:0: loop not vectorized The message is not entirely accurate. The PHI is not used outside of the loop, but is in the loop header, therefore cannot be if-converted and is not a reduction. Polly also had difficulties with this construction as well. As presented at the dev meeting, VPlan will be able to insert a shuffle instruction when encountering this situation. Michael 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org>:> stencil.ll is attached here. > > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru> wrote: >> >> >> >> Hello, >> >> To me this is an issue in llvm loop vectorizer (if N is large enough to >> prevent complete unrolling of j-loop). >> >> Woud you mind to share stencil.ll than I would say more definitely what >> the issue is. >> >> Regards, >> Serge. >> >> >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>: >> >> Hello, >> >> The following stencil function that you wrote few months ago (for >> computing 2D-5 point Jacobi stencil) is vectorizable. >> >> void stencil(int a[][N], int b[N]) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> but when i write the above code in main i am getting error >> >> opt -S -O3 stencil.ll -o stencil_o3.ll >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> identified as reduction is used outside the loop >> >> Why is that so? >> >> my code is follows: >> int a[N][N], b[N]; >> int main(void) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> >> Please help. >> >> Thank You. >> >> >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> >> wrote: >> >> Hello, >> >> This is not equivalent rewrite: your original code definitely shouldn't >> vectorize because it has backward cross-iteration (loop-carried) dependency: >> your value on iteration j+1 depend on value from iteration j you've just >> written. In case of vectorization you need to do load-operation-store on >> multiple consecutive values at once and it is impossible in this case. >> >> In your new code all values of a[] in right-hand side of the main loop are >> from previous k iteration (because on current k iteration you're writing to >> 'b'). So there is no way to vectorize loop in its original form, but new >> form is definitely vectorizable. >> >> I am second to recommend you filing a bug over 'restrict' behavior. And >> you may in fact save some memory by making 'b' 1D array (this is not >> equivalent rewrite once again though) >> >> // This function computes 2D-5 point Jacobi stencil >> void stencil(int a[][N], int b[N]) >> { >> int i, j, k; >> for (k = 0; k < N; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j++) >> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> for (j = 1; j <= N-2; j++) >> a[i][j] = b[j]; >> } >> } >> } >> >> There is a way to vectorize your code even more efficient. This will look >> like more low-level, but probably closest to what you would expect from >> vectorization. >> >> void stencil(int a[][N]) >> { >> int i, j, k; >> for (k = 0; k < 100; k++) { >> for (i = 1; i <= N-2; i++) { >> for (j = 1; j <= N-2; j+=16) { >> int b[16]; // This should become a register on KNL >> for (v = 0; v < 16; v++) { >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + >> a[i][j-1+v] + a[i][j+1+v]); >> } >> for (v = 0; v < 16; v++) { // This will be a single store >> operation >> a[i][j+v] = b[v]; >> } >> } >> // You should explicitly take care about the tail of j-loop >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual >> name, just a designator >> for (;j <= N-2; j++) { >> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> a[i][j-1] + a[i][j+1]); >> } >> #else >> for (v = 0; v < 16-(N-2-j); v++) { // This would become masked >> non-loop >> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + >> a[i][j-1+v] + a[i][j+1+v]); >> } >> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single >> masked store operation >> a[i][j+v] = b[v]; >> } >> #endif >> } >> } >> >> Unfortunalely compiler cannot do this for you: this is not equivalent >> transformation of original code. I am also not aware of any way to express >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in >> this case). >> >> Minor note: You're using 'int' for data, than multiply by 0.25 (divide by >> 4) and than write it back to 'int'. This will cost you 2 conversion to/from >> double while you may just place (...) / 4 which should be optimized to >> simple sequecnce with shifts (not to single shift due to signedness, but >> still better than conversions with changes of element size 4->8->4 and data >> size INT->FP->INT). >> >> And by the way why do you divide by 4, not by 5 as number of points >> suggest? >> >> Serge Preis >> >> >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: >> > further i modified the code to the following; >> > >> > #include <stdio.h> >> > #define N 100351 >> > >> > // This function computes 2D-5 point Jacobi stencil >> > void stencil(int a[restrict][N], int b[restrict][N]) >> > { >> > int i, j, k; >> > for (k = 0; k < N; k++) { >> > for (i = 1; i <= N-2; i++) >> > for (j = 1; j <= N-2; j++) >> > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + >> > a[i][j+1]); >> > >> > for (i = 1; i <= N-2; i++) >> > for (j = 1; j <= N-2; j++) >> > a[i][j] = b[i][j]; >> > >> > } >> > } >> > >> > but still no vectorization in IR. Also, when I set vector width >> > explicitly to 64, it gives the following error: >> > >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be >> > vectorized >> > remark: <unknown>:0:0: loop not vectorized: value that could not be >> > identified as reduction is used outside the loop >> > >> > I need serious help on this. Please guide me. >> > >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com> >> > wrote: >> >> Does it happen due to loop carried dependence? if yes what is the >> >> solution to vectorize such codes? >> >> >> >> please reply. i m waiting. >> >> >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: >> >>> I even tried polly but still my llvm IR does not contain vector >> >>> instructions. i used the following command; >> >>> >> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll >> >>> >> >>> Please specify what is wrong with my code? >> >>> >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com> >> >>> wrote: >> >>>> Hello, >> >>>> >> >>>> I am trying to vectorize following stencil code; >> >>>> >> >>>> #include <stdio.h> >> >>>> #define N 100351 >> >>>> >> >>>> // This function computes 2D-5 point Jacobi stencil >> >>>> void stencil(int a[restrict][N]) >> >>>> { >> >>>> int i, j, k; >> >>>> for (k = 0; k < 100; k++) >> >>>> { for (i = 1; i <= N-2; i++) >> >>>> { for (j = 1; j <= N-2; j++) >> >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + >> >>>> a[i][j-1] + a[i][j+1]); >> >>>> } >> >>>> } >> >>>> }} >> >>>> >> >>>> I have used the following commands >> >>>> >> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm >> >>>> -disable-llvm-optzns -o stencil.ll >> >>>> >> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll >> >>>> >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s >> >>>> >> >>>> But the code is not vectorized. It still uses the scalar >> >>>> instructions; >> >>>> >> >>>> Please correct me. >> >>>> >> >>>> Thank You >> > , >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >