search for: v_foo_f32

Displaying 7 results from an estimated 7 matches for "v_foo_f32".

2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...rns/questions. First of all, to implement >>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>> this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 &...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...;> I can think of a few concerns/questions. First of all, to implement >>>> the prefix scan, we'll need to do a code sequence that looks like >>>> this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>> v_foo_f32 v1, v0, v1 row...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...>>>>>> to implement the prefix scan, we'll need to do a code sequence >>>>>> that looks like this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 &...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ic low-level shuffle intrinsics implemented that you need to do this, but I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...hat you need to do this, but >> I can think of a few concerns/questions. First of all, to implement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >> v_nop // Add tw...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...implement >>>>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>>>> this, modified from >>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>> v_foo_f32...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...x scan, we'll need to do a >>>>>>>>> code sequence that looks like this, modified from >>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>>>>> (replace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>>>> v_mov_b32 v1, v0 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>&g...