Displaying 7 results from an estimated 7 matches for "row_bcast".
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
>>>>>>
>>>>>> The pr...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...dd two independent instructions to avoid a data hazard
>>>> v_nop
>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>> v_nop
>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>> v_nop
>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
>>>>
>>>> The problem is that the way these instructions...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
>>>>>>
>>>>>> The...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...hazard
v_nop
v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
The problem is that the way these instructions use the DPP word isn't
currently expressible in LLVM. We have the llvm.amdgcn.m...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...0xe // Instruction 4
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
>>
>> The problem is that the way these instructions use the DPP word isn't
>> cur...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...gt;> v_nop
>>>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>>>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>>>> v_nop
>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction 6
>>>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>>>> v_nop
>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7
>>>>>>>>...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...>>>>>>>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
>>>>>>>>> v_nop // Add two independent instructions to avoid a data
>>>>>>>>> hazard v_nop
>>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:15 row_mask:0xa // Instruction
>>>>>>>>> 6 v_nop // Add two independent instructions to avoid a data
>>>>>>>>> hazard v_nop
>>>>>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction
>>>>>...