Displaying 1 result from an estimated 1 matches for "a_i_j".
Did you mean:
_i_
2016 Jul 11
2
extra loads in nested for-loop
...ould reuse %1
This loading from a[i][j] happens again for each iteration and seems quite
inefficient.
I changed the C code to explicitly do the load of a[i][j] outside of the
innermost loop and that (as would be expected) eliminates the extra load:
void f1( InArray c, InArray a, InArray b ) {
int a_i_j;
#pragma clang loop unroll_count(UNROLL_DIM)
for(int i=0;i<DIM;i++){
#pragma clang loop unroll_count(UNROLL_DIM)
for(int j=0;j<DIM;j++) {
a_i_j = a[i][j];
#pragma clang loop unroll_count(UNROLL_DIM)
for(int k=0;k<DIM;k++) {
c[i][k] = c[i]...