search for: kernel_dia

Displaying 2 results from an estimated 2 matches for "kernel_dia".

2018 Jun 21
2
NVPTX - Reordering load instructions
...t i = 0; i < BLOCK_SIZE; i++) { > for (int j = 0; j < i; j++) > peri_col[idx][i] -= peri_col[idx][j] * dia[j][i]; > peri_col[idx][i] /= dia[i][i]; > } NVCC emits PTX instructions where all loads from shared memory are packed together: > ... > ld.shared.f32 %f546, [kernel_dia+440]; > ld.shared.f32 %f545, [%r4+-996]; > ld.shared.f32 %f544, [kernel_dia+56]; > ld.shared.f32 %f543, [kernel_dia+88]; > ld.shared.f32 %f542, [kernel_dia+500]; > ld.shared.f32 %f541, [kernel_dia+84]; > ld.shared.f32 %f540, [%r4+-972]; > ld.shared.f32 %f539, [%r4...
2018 Jun 21
2
NVPTX - Reordering load instructions
...gt; >> peri_col[idx][i] -= peri_col[idx][j] * dia[j][i]; > >> peri_col[idx][i] /= dia[i][i]; > >> } > > NVCC emits PTX instructions where all loads from shared memory are > > packed together: > > > >> ... > >> ld.shared.f32 %f546, [kernel_dia+440]; > >> ld.shared.f32 %f545, [%r4+-996]; > >> ld.shared.f32 %f544, [kernel_dia+56]; > >> ld.shared.f32 %f543, [kernel_dia+88]; > >> ld.shared.f32 %f542, [kernel_dia+500]; > >> ld.shared.f32 %f541, [kernel_dia+84]; > >> ld.shared.f32...