search for: 8ints

Displaying 4 results from an estimated 4 matches for "8ints".

Did you mean: hints
2013 Sep 27
2
[LLVMdev] Trip count and Loop Vectorizer
...ng to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform memcpy for such small moves, especially when the outer loop is unrolled since the trip count is constant (4). The 4 calls to memcpy is not efficient. * Therefore, I disabled the memcpy optimization for such cases, and f...
2013 Sep 27
0
[LLVMdev] Trip count and Loop Vectorizer
...doesn’t make sense. For instance, this loop: > void foo(int a[4][8], int n) > { > int b[4][8]; > for(int i = 0; i < 4; i++) { > for(int j = 0; j < n; j++) { > a[i][j] = b[i][j]; > } > } > } > * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform memcpy for such small moves, especially when the outer loop is unrolled since the trip count is constant (4). The 4 calls to memcpy is not efficient. > * Therefore, I disabled the memcpy optimization for such cases,...
2013 Sep 27
2
[LLVMdev] Trip count and Loop Vectorizer
...ng to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform memcpy for such small moves, especially when the outer loop is unrolled since the trip count is constant (4). The 4 calls to memcpy is not efficient. * Therefore, I disabled the memcpy optimization for such cases, and f...
2013 Sep 27
0
[LLVMdev] Trip count and Loop Vectorizer
...doesn’t make sense. For instance, this loop: > void foo(int a[4][8], int n) > { > int b[4][8]; > for(int i = 0; i < 4; i++) { > for(int j = 0; j < n; j++) { > a[i][j] = b[i][j]; > } > } > } > * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform memcpy for such small moves, especially when the outer loop is unrolled since the trip count is constant (4). The 4 calls to memcpy is not efficient. > * Therefore, I disabled the memcpy optimization for such cases,...