thr3ads.net - search: "mov

Displaying 7 results from an estimated 7 matches for "mov_dpp".

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting v0 right...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...d a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>> >>>> The problem is that the way these instructions use the DPP word isn't >>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>> intrinsic, but it isn't enough. For example, take the first >>>> instruction: >>>> >>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>> >>>> What it's doing is shifting v0 right by one within each row and adding >>>&g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word >>>>>> isn't currently expressible in LLVM. We have the >>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, >>>>>> take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting v0 right...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...t:15 row_mask:0xa // Instruction 6 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 The problem is that the way these instructions use the DPP word isn't currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, take the first instruction: v_foo_f32 v1, v0, v1 row_shr:1 What it's doing is shifting v0 right by one within each row and adding it to v1. v1 stays the same in the first lane of each row, however. With llvm.amdgcn.mov_dpp, we could try to expr...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >> >> The problem is that the way these instructions use the DPP word isn't >> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >> intrinsic, but it isn't enough. For example, take the first >> instruction: >> >> v_foo_f32 v1, v0, v1 row_shr:1 >> >> What it's doing is shifting v0 right by one within each row and adding >> it to v1. v1 stays the same in the first lane of each...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...gt;> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>>>> >>>>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>>>> instruction: >>>>>>>> >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>> >>>>>>&g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...t;>>>> 7 >>>>>>>>> >>>>>>>>> The problem is that the way these instructions use the DPP >>>>>>>>> word isn't currently expressible in LLVM. We have the >>>>>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For >>>>>>>>> example, take the first >>>>>>>>> instruction: >>>>>>>>> >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>>> >&...

search for: mov_dpp