hameeza ahmed via llvm-dev
2017-Jun-27 19:54 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
Hello, i am trying to vectorize a simple matrix multiplication in llvm; here is my code; #include <stdio.h> #define N 1000 // This function multiplies A[][] and B[][], and stores // the result in C[][] void multiply(int A[][N], int B[][N], int C[][N]) { int i, j, k; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; k++) C[i][j] += A[i][k]*B[k][j]; } } } here are the commands; clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o mat.ll opt -S -O3 mat.ll -o mat_o3.ll llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s with this command i got the below error opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop it is unable to vectorize the matrix multiplication and in .ll and .s files i see the scalar instructions. Why is that so? What is my mistake?? Kindly correct me. Looking forward to your reply Thank You -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/bbfc7ffa/attachment.html>
Serge Preis via llvm-dev
2017-Jun-28 04:36 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
<div> </div><div>Hello,</div><div> </div><div>The message basically means that llvm failed to recognize C[i][j] as valid reduction. In order to make C[i][j] valid reduction is to be privatized into some scalar value for the innermost loop. In you case aliasing analysis fails to prove that C[i][j] never aliases with A and B and this seems correct. So you need something like this to make loop vectorizable:</div><div> </div><div>be explicit:</div><div> </div><div><div><div>#include <stdio.h></div><div>#define N 1000</div><div> </div><div>// This function multiplies A[][] and B[][], and stores</div><div>// the result in C[][]</div><div>void multiply(int A[][N], int B[][N], int C[][N])</div><div>{</div><div> int i, j, k;</div><div> for (i = 0; i < N; i++)</div><div> {</div><div> for (j = 0; j < N; j++)</div><div> {</div><div>int res = 0;</div><div> for (k = 0; k < N; k++)</div><div> res += A[i][k]*B[k][j];</div><div><div>C[i][j] = res;</div></div><div> }</div><div> }</div><div>}</div></div></div><div> </div><div>or just add restrict to arguments:</div><div> </div><div><div><div><div>// This function multiplies A[][] and B[][], and stores</div><div>// the result in C[][]</div><div>void multiply(int A[restrict][N], int B[restrict][N], int C[restrict][N])</div><div>{</div><div> int i, j, k;</div><div> for (i = 0; i < N; i++)</div><div> {</div><div> for (j = 0; j < N; j++)</div><div> {</div><div> for (k = 0; k < N; k++)</div><div> C[i][j] += A[i][k]*B[k][j];</div><div><div> }</div></div><div> }</div><div>}</div></div></div><div> </div></div><div> </div><div>On the practical side of things though the following loops reordering should provide much better performance when vectorizsed because in your case you have gather operation (strided load) from B + costly reduce operation in j-loop.</div><div> </div><div><div><div><div>#include <stdio.h></div><div>#define N 1000</div><div> </div><div>// This function multiplies A[][] and B[][], and stores</div><div>// the result in C[][]</div><div>void multiply(int A[][N], int B[][N], int C[][N])</div><div>{</div><div> int i, j, k;</div><div> for (i = 0; i < N; i++)</div><div> {</div><div><div> for (j = 0; j < N; j++)</div>C[i][j] = 0;</div><div><div> for (k = 0; k < N; k++) {</div> for (j = 0; j < N; j++)</div><div> {</div><div> C[i][j] += A[i][k]*B[k][j];</div><div><div> }</div><div>}</div></div><div> }</div><div>}</div></div></div></div><div> </div><div> </div><div> </div><div>28.06.2017, 02:54, "hameeza ahmed via llvm-dev" <llvm-dev@lists.llvm.org>:</div><blockquote type="cite"><div>Hello, <div>i am trying to vectorize a simple matrix multiplication in llvm;</div><div>here is my code;</div><div> </div><div><div>#include <stdio.h></div><div>#define N 1000</div><div> </div><div>// This function multiplies A[][] and B[][], and stores</div><div>// the result in C[][]</div><div>void multiply(int A[][N], int B[][N], int C[][N])</div><div>{</div><div> int i, j, k;</div><div> for (i = 0; i < N; i++)</div><div> {</div><div> for (j = 0; j < N; j++)</div><div> {</div><div> C[i][j] = 0;</div><div> for (k = 0; k < N; k++)</div><div> C[i][j] += A[i][k]*B[k][j];</div><div> }</div><div> }</div><div>}</div></div><div> </div><div>here are the commands;</div><div> </div><div> </div><div> </div><div><div>clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o mat.ll</div></div><div> </div><div><div>opt -S -O3 mat.ll -o mat_o3.ll</div></div><div> </div><div><div>llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s</div></div><div> </div><div> </div><div>with this command i got the below error</div><div><div>opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll</div><div> </div><div> </div><div>remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop</div></div><div> </div><div> </div><div>it is unable to vectorize the matrix multiplication and in .ll and .s files i see the scalar instructions.</div><div> </div><div>Why is that so? What is my mistake?? Kindly correct me.</div><div> </div><div>Looking forward to your reply</div><div> </div><div>Thank You</div><div> </div></div>,<p>_______________________________________________<br />LLVM Developers mailing list<br /><a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br /><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></p></blockquote>
Tobias Grosser via llvm-dev
2017-Jun-28 06:11 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
On Tue, Jun 27, 2017, at 09:54 PM, hameeza ahmed via llvm-dev wrote:> Hello, > i am trying to vectorize a simple matrix multiplication in llvm; > here is my code; > > #include <stdio.h> > #define N 1000 > > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > C[i][j] = 0; > for (k = 0; k < N; k++) > C[i][j] += A[i][k]*B[k][j]; > } > } > } > > here are the commands; > > > > clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o > mat.ll > > opt -S -O3 mat.ll -o mat_o3.ll > > llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s > > > with this command i got the below error > opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll > > > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > > it is unable to vectorize the matrix multiplication and in .ll and .s > files > i see the scalar instructions. > > Why is that so? What is my mistake?? Kindly correct me.You might also try Polly. We detect and optimize this code into very high-performance code. Best, Tobias> Looking forward to your reply > > Thank You > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hal Finkel via llvm-dev
2017-Jun-28 12:20 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
On 06/27/2017 11:36 PM, Serge Preis wrote:> Hello, > The message basically means that llvm failed to recognize C[i][j] as > valid reduction. In order to make C[i][j] valid reduction is to be > privatized into some scalar value for the innermost loop. In you case > aliasing analysis fails to prove that C[i][j] never aliases with A and > B and this seems correct. So you need something like this to make loop > vectorizable: > be explicit: > #include <stdio.h> > #define N 1000 > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > int res = 0; > for (k = 0; k < N; k++) > res += A[i][k]*B[k][j]; > C[i][j] = res; > } > } > } > or just add restrict to arguments: > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[restrict][N], int B[restrict][N], int C[restrict][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > for (k = 0; k < N; k++) > C[i][j] += A[i][k]*B[k][j]; > } > } > }I'd advised Hameeza to file a bug report for this. We should be able to vectorize this without the restrict by emitting runtime checks. -Hal> On the practical side of things though the following loops reordering > should provide much better performance when vectorizsed because in > your case you have gather operation (strided load) from B + costly > reduce operation in j-loop. > #include <stdio.h> > #define N 1000 > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > C[i][j] = 0; > for (k = 0; k < N; k++) { > for (j = 0; j < N; j++) > { > C[i][j] += A[i][k]*B[k][j]; > } > } > } > } > 28.06.2017, 02:54, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: >> Hello, >> i am trying to vectorize a simple matrix multiplication in llvm; >> here is my code; >> #include <stdio.h> >> #define N 1000 >> // This function multiplies A[][] and B[][], and stores >> // the result in C[][] >> void multiply(int A[][N], int B[][N], int C[][N]) >> { >> int i, j, k; >> for (i = 0; i < N; i++) >> { >> for (j = 0; j < N; j++) >> { >> C[i][j] = 0; >> for (k = 0; k < N; k++) >> C[i][j] += A[i][k]*B[k][j]; >> } >> } >> } >> here are the commands; >> clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns >> -o mat.ll >> opt -S -O3 mat.ll -o mat_o3.ll >> llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s >> with this command i got the below error >> opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> identified as reduction is used outside the loop >> it is unable to vectorize the matrix multiplication and in .ll and .s >> files i see the scalar instructions. >> Why is that so? What is my mistake?? Kindly correct me. >> Looking forward to your reply >> Thank You >> , >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/d0eee687/attachment.html>