search for: f536

Displaying 2 results from an estimated 2 matches for "f536".

Did you mean: 536
2018 Jun 21
2
NVPTX - Reordering load instructions
...rnel_dia+88]; > ld.shared.f32 %f542, [kernel_dia+500]; > ld.shared.f32 %f541, [kernel_dia+84]; > ld.shared.f32 %f540, [%r4+-972]; > ld.shared.f32 %f539, [%r4+-1008]; > ld.shared.f32 %f538, [kernel_dia+496]; > ld.shared.f32 %f537, [kernel_dia+136]; > ld.shared.f32 %f536, [%r4+-976]; > ld.shared.f32 %f535, [kernel_dia+428]; > ... # hundreds of these Even though this heavily bloats register usage (and NVCC seems to do this unconditionally, even with launch configurations where this could hurt performance), it allows the CUDA PTX JIT to emit 128-bit loads:...
2018 Jun 21
2
NVPTX - Reordering load instructions
...500]; > >> ld.shared.f32 %f541, [kernel_dia+84]; > >> ld.shared.f32 %f540, [%r4+-972]; > >> ld.shared.f32 %f539, [%r4+-1008]; > >> ld.shared.f32 %f538, [kernel_dia+496]; > >> ld.shared.f32 %f537, [kernel_dia+136]; > >> ld.shared.f32 %f536, [%r4+-976]; > >> ld.shared.f32 %f535, [kernel_dia+428]; > >> ... # hundreds of these > > Even though this heavily bloats register usage (and NVCC seems to do > > this unconditionally, even with launch configurations whe > <https://maps.google.com/?q=ons+whe&am...