Displaying 7 results from an estimated 7 matches for "bound_ctrl".
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...ect. If I'm reading the source correctly, this will
>>>>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move
>>>>>>>> with the dpp modifier, and no restrictions on the destination). We
>>>>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp
>>>>>>>> 0 in lane 0, but that won't work with operations whose identity is
>>>>>>>> non-0 like min and max. What we need is something like:
>>>>>>>>
>>>>>>...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...;> but this is incorrect. If I'm reading the source correctly, this will
>>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move
>>>>>> with the dpp modifier, and no restrictions on the destination). We
>>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp
>>>>>> 0 in lane 0, but that won't work with operations whose identity is
>>>>>> non-0 like min and max. What we need is something like:
>>>>>>
>>>>
>>>> Why is %tmp g...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...reading the source correctly,
>>>>>>>>> this will make %tmp garbage in lane 0 (since it just turns
>>>>>>>>> into a normal move with the dpp modifier, and no restrictions
>>>>>>>>> on the destination). We could set bound_ctrl to 0 to work
>>>>>>>>> around this, since it will make %tmp
>>>>>>>>> 0 in lane 0, but that won't work with operations whose
>>>>>>>>> identity is
>>>>>>>>> non-0 like min and max. What...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...t; but this is incorrect. If I'm reading the source correctly, this
>>>>>> will make %tmp garbage in lane 0 (since it just turns into a
>>>>>> normal move with the dpp modifier, and no restrictions on the
>>>>>> destination). We could set bound_ctrl to 0 to work around this,
>>>>>> since it will make %tmp
>>>>>> 0 in lane 0, but that won't work with operations whose identity
>>>>>> is
>>>>>> non-0 like min and max. What we need is something like:
>>>>>...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...>>>
>>>> but this is incorrect. If I'm reading the source correctly, this will
>>>> make %tmp garbage in lane 0 (since it just turns into a normal move
>>>> with the dpp modifier, and no restrictions on the destination). We
>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp
>>>> 0 in lane 0, but that won't work with operations whose identity is
>>>> non-0 like min and max. What we need is something like:
>>>>
>>
>> Why is %tmp garbage? I thought the two options were...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...>> %result = foo %tmp, %input
>>
>> but this is incorrect. If I'm reading the source correctly, this will
>> make %tmp garbage in lane 0 (since it just turns into a normal move
>> with the dpp modifier, and no restrictions on the destination). We
>> could set bound_ctrl to 0 to work around this, since it will make %tmp
>> 0 in lane 0, but that won't work with operations whose identity is
>> non-0 like min and max. What we need is something like:
>>
Why is %tmp garbage? I thought the two options were 0 (bound_ctrl =0)
or %input (bound_ctrl =...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...e:
%tmp = call llvm.amdgcn.mov_dpp %input row_shr:1
%result = foo %tmp, %input
but this is incorrect. If I'm reading the source correctly, this will
make %tmp garbage in lane 0 (since it just turns into a normal move
with the dpp modifier, and no restrictions on the destination). We
could set bound_ctrl to 0 to work around this, since it will make %tmp
0 in lane 0, but that won't work with operations whose identity is
non-0 like min and max. What we need is something like:
%result = call llvm.amdgcn.foo_dpp %result, %input, %result row_shr:1
where llvm.amdgcn.foo_dpp copies the first argumen...