thr3ads.net - search: "bound

Displaying 7 results from an estimated 7 matches for "bound_ctrl".

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...ect. If I'm reading the source correctly, this will >>>>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>>>>>> with the dpp modifier, and no restrictions on the destination). We >>>>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>>>>>> 0 in lane 0, but that won't work with operations whose identity is >>>>>>>> non-0 like min and max. What we need is something like: >>>>>>>> >>>>>&gt...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...;> but this is incorrect. If I'm reading the source correctly, this will >>>>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>>>> with the dpp modifier, and no restrictions on the destination). We >>>>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>>>> 0 in lane 0, but that won't work with operations whose identity is >>>>>> non-0 like min and max. What we need is something like: >>>>>> >>>> >>>> Why is %tmp g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...reading the source correctly, >>>>>>>>> this will make %tmp garbage in lane 0 (since it just turns >>>>>>>>> into a normal move with the dpp modifier, and no restrictions >>>>>>>>> on the destination). We could set bound_ctrl to 0 to work >>>>>>>>> around this, since it will make %tmp >>>>>>>>> 0 in lane 0, but that won't work with operations whose >>>>>>>>> identity is >>>>>>>>> non-0 like min and max. What...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...t; but this is incorrect. If I'm reading the source correctly, this >>>>>> will make %tmp garbage in lane 0 (since it just turns into a >>>>>> normal move with the dpp modifier, and no restrictions on the >>>>>> destination). We could set bound_ctrl to 0 to work around this, >>>>>> since it will make %tmp >>>>>> 0 in lane 0, but that won't work with operations whose identity >>>>>> is >>>>>> non-0 like min and max. What we need is something like: >>>>>...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...>>> >>>> but this is incorrect. If I'm reading the source correctly, this will >>>> make %tmp garbage in lane 0 (since it just turns into a normal move >>>> with the dpp modifier, and no restrictions on the destination). We >>>> could set bound_ctrl to 0 to work around this, since it will make %tmp >>>> 0 in lane 0, but that won't work with operations whose identity is >>>> non-0 like min and max. What we need is something like: >>>> >> >> Why is %tmp garbage? I thought the two options were...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...>> %result = foo %tmp, %input >> >> but this is incorrect. If I'm reading the source correctly, this will >> make %tmp garbage in lane 0 (since it just turns into a normal move >> with the dpp modifier, and no restrictions on the destination). We >> could set bound_ctrl to 0 to work around this, since it will make %tmp >> 0 in lane 0, but that won't work with operations whose identity is >> non-0 like min and max. What we need is something like: >> Why is %tmp garbage? I thought the two options were 0 (bound_ctrl =0) or %input (bound_ctrl =...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...e: %tmp = call llvm.amdgcn.mov_dpp %input row_shr:1 %result = foo %tmp, %input but this is incorrect. If I'm reading the source correctly, this will make %tmp garbage in lane 0 (since it just turns into a normal move with the dpp modifier, and no restrictions on the destination). We could set bound_ctrl to 0 to work around this, since it will make %tmp 0 in lane 0, but that won't work with operations whose identity is non-0 like min and max. What we need is something like: %result = call llvm.amdgcn.foo_dpp %result, %input, %result row_shr:1 where llvm.amdgcn.foo_dpp copies the first argumen...

search for: bound_ctrl