Displaying 1 result from an estimated 1 matches for "v_cvt_u32_f32_e32".
2016 Oct 03
5
Is this undefined behavior optimization legal?
...a common way to implement this function on a target
with 32-bit registers would be to zero initialize a 32-bit register to hold
the initial vector and then 'mask' and 'or' the inserted value with the
initial vector. In AMDGPU assembly it would look something like:
v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_and_b32 v1, v1, 0x000000ff
v_or_b32 v0, v0, v1
The optimization the SelectionDAG does for us in this function, though, ends
up removing the mask operation. Which gives us:
v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_or_b32 v0, v0, v1
The reason the SelectionDAG is doing this is because...