hameeza ahmed via llvm-dev
2017-Jun-27 19:54 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
Hello,
i am trying to vectorize a simple matrix multiplication in llvm;
here is my code;
#include <stdio.h>
#define N 1000
// This function multiplies A[][] and B[][], and stores
// the result in C[][]
void multiply(int A[][N], int B[][N], int C[][N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i][j] = 0;
for (k = 0; k < N; k++)
C[i][j] += A[i][k]*B[k][j];
}
}
}
here are the commands;
clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o
mat.ll
opt -S -O3 mat.ll -o mat_o3.ll
llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s
with this command i got the below error
opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
it is unable to vectorize the matrix multiplication and in .ll and .s files
i see the scalar instructions.
Why is that so? What is my mistake?? Kindly correct me.
Looking forward to your reply
Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/bbfc7ffa/attachment.html>
Serge Preis via llvm-dev
2017-Jun-28 04:36 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
<div> </div><div>Hello,</div><div> </div><div>The
message basically means that llvm failed to recognize C[i][j] as valid
reduction. In order to make C[i][j] valid reduction is to be privatized into
some scalar value for the innermost loop. In you case aliasing analysis fails to
prove that C[i][j] never aliases with A and B and this seems correct. So you
need something like this to make loop
vectorizable:</div><div> </div><div>be
explicit:</div><div> </div><div><div><div>#include
<stdio.h></div><div>#define N
1000</div><div> </div><div>// This function multiplies
A[][] and B[][], and stores</div><div>// the result in
C[][]</div><div>void multiply(int A[][N], int B[][N], int
C[][N])</div><div>{</div><div> int i, j,
k;</div><div> for (i = 0; i < N; i++)</div><div>
{</div><div> for (j = 0; j < N;
j++)</div><div> {</div><div>int res =
0;</div><div> for (k = 0; k < N;
k++)</div><div> res +=
A[i][k]*B[k][j];</div><div><div>C[i][j] =
res;</div></div><div> }</div><div>
}</div><div>}</div></div></div><div> </div><div>or
just add restrict to
arguments:</div><div> </div><div><div><div><div>//
This function multiplies A[][] and B[][], and stores</div><div>//
the result in C[][]</div><div>void multiply(int A[restrict][N], int
B[restrict][N], int
C[restrict][N])</div><div>{</div><div> int i, j,
k;</div><div> for (i = 0; i < N; i++)</div><div>
{</div><div> for (j = 0; j < N;
j++)</div><div> {</div><div> for (k =
0; k < N; k++)</div><div> C[i][j] +=
A[i][k]*B[k][j];</div><div><div>
}</div></div><div>
}</div><div>}</div></div></div><div> </div></div><div> </div><div>On
the practical side of things though the following loops reordering should
provide much better performance when vectorizsed because in your case you have
gather operation (strided load) from B + costly reduce operation in
j-loop.</div><div> </div><div><div><div><div>#include
<stdio.h></div><div>#define N
1000</div><div> </div><div>// This function multiplies
A[][] and B[][], and stores</div><div>// the result in
C[][]</div><div>void multiply(int A[][N], int B[][N], int
C[][N])</div><div>{</div><div> int i, j,
k;</div><div> for (i = 0; i < N; i++)</div><div>
{</div><div><div> for (j = 0; j < N;
j++)</div>C[i][j] = 0;</div><div><div> for (k =
0; k < N; k++) {</div> for (j = 0; j < N;
j++)</div><div> {</div><div>
C[i][j] += A[i][k]*B[k][j];</div><div><div>
}</div><div>}</div></div><div>
}</div><div>}</div></div></div></div><div> </div><div> </div><div> </div><div>28.06.2017,
02:54, "hameeza ahmed via llvm-dev"
<llvm-dev@lists.llvm.org>:</div><blockquote
type="cite"><div>Hello, <div>i am trying to vectorize a
simple matrix multiplication in llvm;</div><div>here is my
code;</div><div> </div><div><div>#include
<stdio.h></div><div>#define N
1000</div><div> </div><div>// This function multiplies
A[][] and B[][], and stores</div><div>// the result in
C[][]</div><div>void multiply(int A[][N], int B[][N], int
C[][N])</div><div>{</div><div> int i, j,
k;</div><div> for (i = 0; i < N; i++)</div><div>
{</div><div> for (j = 0; j < N;
j++)</div><div> {</div><div> C[i][j] =
0;</div><div> for (k = 0; k < N;
k++)</div><div> C[i][j] +=
A[i][k]*B[k][j];</div><div> }</div><div>
}</div><div>}</div></div><div> </div><div>here
are the
commands;</div><div> </div><div> </div><div> </div><div><div>clang
-S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o
mat.ll</div></div><div> </div><div><div>opt
-S -O3 mat.ll -o
mat_o3.ll</div></div><div> </div><div><div>llc
-x86-asm-syntax=intel mat_o3.ll -o
mat_intel.s</div></div><div> </div><div> </div><div>with
this command i got the below error</div><div><div>opt -S -O3
-force-vector-width=16 mat.ll -o
mat_o3.ll</div><div> </div><div> </div><div>remark:
<unknown>:0:0: loop not vectorized: value that could not be identified as
reduction is used outside the
loop</div></div><div> </div><div> </div><div>it
is unable to vectorize the matrix multiplication and in .ll and .s files i see
the scalar instructions.</div><div> </div><div>Why is
that so? What is my mistake?? Kindly correct
me.</div><div> </div><div>Looking forward to your
reply</div><div> </div><div>Thank
You</div><div> </div></div>,<p>_______________________________________________<br
/>LLVM Developers mailing list<br /><a
href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br
/><a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></p></blockquote>
Tobias Grosser via llvm-dev
2017-Jun-28 06:11 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
On Tue, Jun 27, 2017, at 09:54 PM, hameeza ahmed via llvm-dev wrote:> Hello, > i am trying to vectorize a simple matrix multiplication in llvm; > here is my code; > > #include <stdio.h> > #define N 1000 > > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > C[i][j] = 0; > for (k = 0; k < N; k++) > C[i][j] += A[i][k]*B[k][j]; > } > } > } > > here are the commands; > > > > clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o > mat.ll > > opt -S -O3 mat.ll -o mat_o3.ll > > llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s > > > with this command i got the below error > opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll > > > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > > > it is unable to vectorize the matrix multiplication and in .ll and .s > files > i see the scalar instructions. > > Why is that so? What is my mistake?? Kindly correct me.You might also try Polly. We detect and optimize this code into very high-performance code. Best, Tobias> Looking forward to your reply > > Thank You > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hal Finkel via llvm-dev
2017-Jun-28 12:20 UTC
[llvm-dev] LLVM Matrix Multiplication Loop Vectorizer
On 06/27/2017 11:36 PM, Serge Preis wrote:> Hello, > The message basically means that llvm failed to recognize C[i][j] as > valid reduction. In order to make C[i][j] valid reduction is to be > privatized into some scalar value for the innermost loop. In you case > aliasing analysis fails to prove that C[i][j] never aliases with A and > B and this seems correct. So you need something like this to make loop > vectorizable: > be explicit: > #include <stdio.h> > #define N 1000 > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > int res = 0; > for (k = 0; k < N; k++) > res += A[i][k]*B[k][j]; > C[i][j] = res; > } > } > } > or just add restrict to arguments: > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[restrict][N], int B[restrict][N], int C[restrict][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > { > for (k = 0; k < N; k++) > C[i][j] += A[i][k]*B[k][j]; > } > } > }I'd advised Hameeza to file a bug report for this. We should be able to vectorize this without the restrict by emitting runtime checks. -Hal> On the practical side of things though the following loops reordering > should provide much better performance when vectorizsed because in > your case you have gather operation (strided load) from B + costly > reduce operation in j-loop. > #include <stdio.h> > #define N 1000 > // This function multiplies A[][] and B[][], and stores > // the result in C[][] > void multiply(int A[][N], int B[][N], int C[][N]) > { > int i, j, k; > for (i = 0; i < N; i++) > { > for (j = 0; j < N; j++) > C[i][j] = 0; > for (k = 0; k < N; k++) { > for (j = 0; j < N; j++) > { > C[i][j] += A[i][k]*B[k][j]; > } > } > } > } > 28.06.2017, 02:54, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>: >> Hello, >> i am trying to vectorize a simple matrix multiplication in llvm; >> here is my code; >> #include <stdio.h> >> #define N 1000 >> // This function multiplies A[][] and B[][], and stores >> // the result in C[][] >> void multiply(int A[][N], int B[][N], int C[][N]) >> { >> int i, j, k; >> for (i = 0; i < N; i++) >> { >> for (j = 0; j < N; j++) >> { >> C[i][j] = 0; >> for (k = 0; k < N; k++) >> C[i][j] += A[i][k]*B[k][j]; >> } >> } >> } >> here are the commands; >> clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns >> -o mat.ll >> opt -S -O3 mat.ll -o mat_o3.ll >> llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s >> with this command i got the below error >> opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll >> remark: <unknown>:0:0: loop not vectorized: value that could not be >> identified as reduction is used outside the loop >> it is unable to vectorize the matrix multiplication and in .ll and .s >> files i see the scalar instructions. >> Why is that so? What is my mistake?? Kindly correct me. >> Looking forward to your reply >> Thank You >> , >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/d0eee687/attachment.html>