search for: row_bcast

Displaying 7 results from an estimated 7 matches for "row_bcast".

2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The pr...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...dd two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>> >>>> The problem is that the way these instructions...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...hazard v_nop v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 The problem is that the way these instructions use the DPP word isn't currently expressible in LLVM. We have the llvm.amdgcn.m...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...0xe // Instruction 4 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >> >> The problem is that the way these instructions use the DPP word isn't >> cur...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...gt;> v_nop >>>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>>>> v_nop >>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 >>>>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>>>> v_nop >>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>>>>...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...>>>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>>>>> v_nop // Add two independent instructions to avoid a data >>>>>>>>> hazard v_nop >>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction >>>>>>>>> 6 v_nop // Add two independent instructions to avoid a data >>>>>>>>> hazard v_nop >>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction >>>>&gt...