Chad Rosier
2014-Aug-28 16:38 UTC
[LLVMdev] Rewriting compare instructions to avoid materializing previous induction variable
All, I've noticed cases where LSR generates IR like the following: for.cond: ; preds = %for.body %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 ;; i++ %2 = add i64 %indvars.iv.next, -1 ;; previous i for cmp %tmp = trunc i64 %2 to i32 %cmp = icmp slt i32 %tmp, %0 ;; i < e br i1 %cmp, label %for.body, label %for.end.loopexit Basically, the comparison is happening after the induction variable is incremented, so LSR derives the previous induction variable by subtracting 1. (Without LSR we actually use a register to save the previous value of the induction variable, so I think deriving the value from the incremented induction variable is goodness; no need to keep a register live across loop iterations). For my test case (on AArch64), we generates assembly like this: .LBB0_2: ldr w12, [x10, x11, lsl #2] cbz w12, .LBB0_4 add x11, x11, #1 sub w12, w11, #1 cmp w12, w9 b.lt .LBB0_2 However, I believe this is equivalent to: .LBB0_2: ldr w12, [x10, x11, lsl #2] cbz w12, .LBB0_4 add x11, x11, #1 cmp w11, w9 b.le .LBB0_2 We transform the comparison from (i < e) -> (i+1 <= e), so that we don't have to materialize the previous value of i. If my assumptions are correct, my question is how should this be implemented? My first thought was to try something in CodeGenPrepare (as LSR is run rather late), but I have limited experience with this pass. Alternatively, I think I could write this as an InstCombine, which I believe will be called by the CodeGenPrepare pass. Thoughts? Chad
Chad Rosier
2014-Aug-29 18:31 UTC
[LLVMdev] Rewriting compare instructions to avoid materializing previous induction variable
FWIW, InstCombineCmp already has a similar solution, but it isn't able to handle the intervening trunc. Working on a fix now. :D> All, > I've noticed cases where LSR generates IR like the following: > > for.cond: ; preds = %for.body > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 ;; i++ > %2 = add i64 %indvars.iv.next, -1 ;; previous i for > cmp > %tmp = trunc i64 %2 to i32 > %cmp = icmp slt i32 %tmp, %0 ;; i < e > br i1 %cmp, label %for.body, label %for.end.loopexit > > Basically, the comparison is happening after the induction variable is > incremented, so LSR derives the previous induction variable by subtracting > 1. (Without LSR we actually use a register to save the previous value of > the induction variable, so I think deriving the value from the incremented > induction variable is goodness; no need to keep a register live across > loop iterations). > > For my test case (on AArch64), we generates assembly like this: > > .LBB0_2: > ldr w12, [x10, x11, lsl #2] > cbz w12, .LBB0_4 > add x11, x11, #1 > sub w12, w11, #1 > cmp w12, w9 > b.lt .LBB0_2 > > However, I believe this is equivalent to: > > .LBB0_2: > ldr w12, [x10, x11, lsl #2] > cbz w12, .LBB0_4 > add x11, x11, #1 > cmp w11, w9 > b.le .LBB0_2 > > We transform the comparison from (i < e) -> (i+1 <= e), so that we don't > have to materialize the previous value of i. > > If my assumptions are correct, my question is how should this be > implemented? My first thought was to try something in CodeGenPrepare (as > LSR is run rather late), but I have limited experience with this pass. > Alternatively, I think I could write this as an InstCombine, which I > believe will be called by the CodeGenPrepare pass. > > Thoughts? > > Chad > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >