Displaying 1 result from an estimated 1 matches for "rd10".
Did you mean:
r10
2014 Oct 24
3
[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs
...add.s64 %rd4, %rd4, 12;
setp.lt.s32 %p2, %r6, %r3;
@%p2 bra BB0_2;
in which %r6 is the induction variable i.
With widening, the loop body becomes:
BB0_2: // =>This Inner Loop Header:
Depth=1
mul.lo.s64 %rd8, %rd10, %rd10;
st.u32 [%rd9], %rd8;
add.s64 %rd10, %rd10, 3;
add.s64 %rd9, %rd9, 12;
setp.lt.s64 %p2, %rd10, %rd1;
@%p2 bra BB0_2;
Although the number of PTX instructions in both versions are the same, the
version with widening uses...