search for: bound_ctrl

Displaying 7 results from an estimated 7 matches for "bound_ctrl".

2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...ect. If I'm reading the source correctly, this will >>>>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>>>>>> with the dpp modifier, and no restrictions on the destination). We >>>>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>>>>>> 0 in lane 0, but that won't work with operations whose identity is >>>>>>>> non-0 like min and max. What we need is something like: >>>>>>>> >>>>>&gt...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...;> but this is incorrect. If I'm reading the source correctly, this will >>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>>>> with the dpp modifier, and no restrictions on the destination). We >>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>>>> 0 in lane 0, but that won't work with operations whose identity is >>>>>> non-0 like min and max. What we need is something like: >>>>>> >>>> >>>> Why is %tmp g...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...reading the source correctly, >>>>>>>>> this will make %tmp garbage in lane 0 (since it just turns >>>>>>>>> into a normal move with the dpp modifier, and no restrictions >>>>>>>>> on the destination). We could set bound_ctrl to 0 to work >>>>>>>>> around this, since it will make %tmp >>>>>>>>> 0 in lane 0, but that won't work with operations whose >>>>>>>>> identity is >>>>>>>>> non-0 like min and max. What...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...t; but this is incorrect. If I'm reading the source correctly, this >>>>>> will make %tmp garbage in lane 0 (since it just turns into a >>>>>> normal move with the dpp modifier, and no restrictions on the >>>>>> destination). We could set bound_ctrl to 0 to work around this, >>>>>> since it will make %tmp >>>>>> 0 in lane 0, but that won't work with operations whose identity >>>>>> is >>>>>> non-0 like min and max. What we need is something like: >>>>>...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...>>> >>>> but this is incorrect. If I'm reading the source correctly, this will >>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>> with the dpp modifier, and no restrictions on the destination). We >>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>> 0 in lane 0, but that won't work with operations whose identity is >>>> non-0 like min and max. What we need is something like: >>>> >> >> Why is %tmp garbage? I thought the two options were...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...>> %result = foo %tmp, %input >> >> but this is incorrect. If I'm reading the source correctly, this will >> make %tmp garbage in lane 0 (since it just turns into a normal move >> with the dpp modifier, and no restrictions on the destination). We >> could set bound_ctrl to 0 to work around this, since it will make %tmp >> 0 in lane 0, but that won't work with operations whose identity is >> non-0 like min and max. What we need is something like: >> Why is %tmp garbage? I thought the two options were 0 (bound_ctrl =0) or %input (bound_ctrl =...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...e: %tmp = call llvm.amdgcn.mov_dpp %input row_shr:1 %result = foo %tmp, %input but this is incorrect. If I'm reading the source correctly, this will make %tmp garbage in lane 0 (since it just turns into a normal move with the dpp modifier, and no restrictions on the destination). We could set bound_ctrl to 0 to work around this, since it will make %tmp 0 in lane 0, but that won't work with operations whose identity is non-0 like min and max. What we need is something like: %result = call llvm.amdgcn.foo_dpp %result, %input, %result row_shr:1 where llvm.amdgcn.foo_dpp copies the first argumen...