Displaying 2 results from an estimated 2 matches for "v_alignbit_b32".
2018 Jul 02
2
Rotates, once again
...here is no EXTRV.
- NVPTX has SHF
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions-shf)
with both left/right shift variants and with both "clamp" (clamps shift
count at 32) and "wrap" (shift count taken mod 32) modes.
- GCN has v_alignbit_b32 which is a right funnel shift, and it seems to
be defined to take shift distances mod 32.
based on that sampling, modulo behavior seems like a good choice for a
generic IR instruction, and if you're going to pick one direction, right
shifts are the one to use. Not sure about other ISAs.
-F...
2018 Jul 02
2
Rotates, once again
1. I'm not sure what you mean by "full vector" here - using the same
shift distance for all lanes (as opposed to per-lane distances), or
doing a treat-the-vector-as-bag-of-bits shift that doesn't have any
internal lane boundaries? If the latter, that doesn't really help you
much with implementing a per-lane rotate.
I think the most useful generalization of a vector