Displaying 2 results from an estimated 2 matches for "f536".
Did you mean:
536
2018 Jun 21
2
NVPTX - Reordering load instructions
...rnel_dia+88];
> ld.shared.f32 %f542, [kernel_dia+500];
> ld.shared.f32 %f541, [kernel_dia+84];
> ld.shared.f32 %f540, [%r4+-972];
> ld.shared.f32 %f539, [%r4+-1008];
> ld.shared.f32 %f538, [kernel_dia+496];
> ld.shared.f32 %f537, [kernel_dia+136];
> ld.shared.f32 %f536, [%r4+-976];
> ld.shared.f32 %f535, [kernel_dia+428];
> ... # hundreds of these
Even though this heavily bloats register usage (and NVCC seems to do
this unconditionally, even with launch configurations where this could
hurt performance), it allows the CUDA PTX JIT to emit 128-bit loads:...
2018 Jun 21
2
NVPTX - Reordering load instructions
...500];
> >> ld.shared.f32 %f541, [kernel_dia+84];
> >> ld.shared.f32 %f540, [%r4+-972];
> >> ld.shared.f32 %f539, [%r4+-1008];
> >> ld.shared.f32 %f538, [kernel_dia+496];
> >> ld.shared.f32 %f537, [kernel_dia+136];
> >> ld.shared.f32 %f536, [%r4+-976];
> >> ld.shared.f32 %f535, [kernel_dia+428];
> >> ... # hundreds of these
> > Even though this heavily bloats register usage (and NVCC seems to do
> > this unconditionally, even with launch configurations whe
> <https://maps.google.com/?q=ons+whe&am...