Displaying 7 results from an estimated 7 matches for "v_nop".
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...nput register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_nop
&...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...;>>
>>>> ; v0 is the input register
>>>> v_mov_b32 v1, v0
>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>> v_nop
>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>> v_nop
>>>> v_foo_f32 v1, v1...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...gt;>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two
>>>>>> independent instructions to avoid a data hazard v_nop
>>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>> v_no...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
v_foo_f32 with the appropriate operation):
; v0 is the input register
v_mov_b32 v1, v0
v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruction 5
v_nop // Add two independent instructio...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...;> v_foo_f32 with the appropriate operation):
>>
>> ; v0 is the input register
>> v_mov_b32 v1, v0
>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 row_shr:8 bank_mask:0xc // Instruct...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...t;> v_mov_b32 v1, v0
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>>>>> v_nop
>>>>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>>>>>>>> v_nop // Add two independent instructions to avoid a data hazard
>>...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...gt;>>>>>> v_mov_b32 v1, v0
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add
>>>>>>>>> two independent instructions to avoid a data hazard v_nop
>>>>>>>>> v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
>>>>>>>>> v_nop // Add two independent instructions to avoid a data...