Displaying 2 results from an estimated 2 matches for "kernel_dia".
2018 Jun 21
2
NVPTX - Reordering load instructions
...t i = 0; i < BLOCK_SIZE; i++) {
> for (int j = 0; j < i; j++)
> peri_col[idx][i] -= peri_col[idx][j] * dia[j][i];
> peri_col[idx][i] /= dia[i][i];
> }
NVCC emits PTX instructions where all loads from shared memory are
packed together:
> ...
> ld.shared.f32 %f546, [kernel_dia+440];
> ld.shared.f32 %f545, [%r4+-996];
> ld.shared.f32 %f544, [kernel_dia+56];
> ld.shared.f32 %f543, [kernel_dia+88];
> ld.shared.f32 %f542, [kernel_dia+500];
> ld.shared.f32 %f541, [kernel_dia+84];
> ld.shared.f32 %f540, [%r4+-972];
> ld.shared.f32 %f539, [%r4...
2018 Jun 21
2
NVPTX - Reordering load instructions
...gt; >> peri_col[idx][i] -= peri_col[idx][j] * dia[j][i];
> >> peri_col[idx][i] /= dia[i][i];
> >> }
> > NVCC emits PTX instructions where all loads from shared memory are
> > packed together:
> >
> >> ...
> >> ld.shared.f32 %f546, [kernel_dia+440];
> >> ld.shared.f32 %f545, [%r4+-996];
> >> ld.shared.f32 %f544, [kernel_dia+56];
> >> ld.shared.f32 %f543, [kernel_dia+88];
> >> ld.shared.f32 %f542, [kernel_dia+500];
> >> ld.shared.f32 %f541, [kernel_dia+84];
> >> ld.shared.f32...