search for: bank_mask

Displaying 7 results from an estimated 7 matches for "bank_mask".

2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
I'm wondering about the focus on bound_cntl. Any cleared bit in the row_mask or bank_mask will also disable updating the result. Brian -----Original Message----- From: Connor Abbott [mailto:cwabbott0 at gmail.com] Sent: Wednesday, June 14, 2017 6:13 PM To: tstellar at redhat.com Cc: Matt Arsenault; llvm-dev at lists.llvm.org; Kolton, Sam; Sumner, Brian; Pykhtin, Valery Subject: Re: [l...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...; v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>> v_nop // Add two independent instructions to...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...// Instruction 1 >>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two >>>>>> independent instructions to avoid a data hazard v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>> v_nop // Add two independent instructions...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ration): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6 v_nop // A...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...// Instruction 2 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>>>> v_nop >>>>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 >>>>>>>> v_nop // Add two independent instructions to avoid a data hazard >>>>>>>> v_nop >>>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5 >>>>>>>> v_nop // Add...