Displaying 7 results from an estimated 7 matches for "v_foo_f32".
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...rns/questions. First of all, to implement
>>>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>>>> this, modified from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
&...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...;> I can think of a few concerns/questions. First of all, to implement
>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>> this, modified from
>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>> v_foo_f32 with the appropriate operation):
>>>>
>>>> ; v0 is the input register
>>>> v_mov_b32 v1, v0
>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>> v_foo_f32 v1, v0, v1 row...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...>>>>>> to implement the prefix scan, we'll need to do a code sequence
>>>>>> that looks like this, modified from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>>>>> (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
&...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ic
low-level shuffle intrinsics implemented that you need to do this, but
I can think of a few concerns/questions. First of all, to implement
the prefix scan, we'll need to do a code sequence that looks like
this, modified from
http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
v_foo_f32 with the appropriate operation):
; v0 is the input register
v_mov_b32 v1, v0
v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...hat you need to do this, but
>> I can think of a few concerns/questions. First of all, to implement
>> the prefix scan, we'll need to do a code sequence that looks like
>> this, modified from
>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>> v_foo_f32 with the appropriate operation):
>>
>> ; v0 is the input register
>> v_mov_b32 v1, v0
>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>> v_nop // Add tw...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...implement
>>>>>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>>>>>> this, modified from
>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>
>>>>>>>> ; v0 is the input register
>>>>>>>> v_mov_b32 v1, v0
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>>> v_foo_f32...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...x scan, we'll need to do a
>>>>>>>>> code sequence that looks like this, modified from
>>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>>>>>>>> (replace
>>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>>
>>>>>>>>> ; v0 is the input register
>>>>>>>>> v_mov_b32 v1, v0
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>&g...