search for: peri_col

Displaying 2 results from an estimated 2 matches for "peri_col".

2018 Jun 21
2
NVPTX - Reordering load instructions
...ant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > > __shared__ float dia[BLOCK_SIZE][BLOCK_SIZE]; > __shared__ float peri_col[BLOCK_SIZE][BLOCK_SIZE]; > > int idx = threadIdx.x - BLOCK_SIZE; > for (int i = 0; i < BLOCK_SIZE; i++) { > for (int j = 0; j < i; j++) > peri_col[idx][i] -= peri_col[idx][j] * dia[j][i]; > peri_col[idx][i] /= dia[i][i]; > } NVCC emits PTX instructions where all l...
2018 Jun 21
2
NVPTX - Reordering load instructions
...> source code consists of two nested loops that get fully unrolled, doing > > some basic arithmetic with values loaded from shared memory: > > > >> #define BLOCK_SIZE 16 > >> > >> __shared__ float dia[BLOCK_SIZE][BLOCK_SIZE]; > >> __shared__ float peri_col[BLOCK_SIZE][BLOCK_SIZE]; > >> > >> int idx = threadIdx.x - BLOCK_SIZE; > >> for (int i = 0; i < BLOCK_SIZE; i++) { > >> for (int j = 0; j < i; j++) > >> peri_col[idx][i] -= peri_col[idx][j] * dia[j][i]; > >> peri_col[idx][i] /= dia[...