Displaying 3 results from an estimated 3 matches for "mininvocationsinclusivescanamd".
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ic that returns the first argument with inactive lanes set to
the second argument. We'd also need something like WQM to make all the
lanes active during the sequence. But that raises some hairy
requirements for register allocation. For example, in something like:
foo = ...
if (...) {
bar = minInvocationsInclusiveScanAMD(...)
} else {
... = foo;
}
we have to make sure that foo isn't allocated to the same register as
one of the temporaries used inside minInvocationsInclusiveScanAMD(),
though they don't interfere. That's because the implementation of
minInvocationsInclusiveScanAMD() will do funny thi...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...>> the second argument. We'd also need something like WQM to make all the
>> lanes active during the sequence. But that raises some hairy
>> requirements for register allocation. For example, in something like:
>>
>> foo = ...
>> if (...) {
>> bar = minInvocationsInclusiveScanAMD(...)
>> } else {
>> ... = foo;
>> }
>>
>> we have to make sure that foo isn't allocated to the same register as
>> one of the temporaries used inside minInvocationsInclusiveScanAMD(),
>> though they don't interfere. That's because the implem...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...something like WQM to make all the
>>>> lanes active during the sequence. But that raises some hairy
>>>> requirements for register allocation. For example, in something like:
>>>>
>>>> foo = ...
>>>> if (...) {
>>>> bar = minInvocationsInclusiveScanAMD(...)
>>>> } else {
>>>> ... = foo;
>>>> }
>>>>
>>>> we have to make sure that foo isn't allocated to the same register as
>>>> one of the temporaries used inside minInvocationsInclusiveScanAMD(),
>>>> though...