thr3ads.net - search: "row

Displaying 7 results from an estimated 7 matches for "row_shr".

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...ions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 v_nop // Add two independent instructions to avoid...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...; this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_shr...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...ed to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >&g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...ttp://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>&...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two >>>>>> independent instructions to avoid a data hazard v_nop >>>>>> v_foo_f32 v1...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...eplace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>>>> v_mov_b32 v1, v0 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add >>>>>>>>> two independent instructions to avoid a data hazard v_nop &...

search for: row_shr