Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being transformed to post-inc form in OptimizeLoopTermCond() before the initial formulae are determined. Luckily, I was able to remove the redundant add instruction with this hack, but I really doubt if it make sense to prevent a loop terminating condition from being changed to postinc form when it's already a post-inc user. # Input IR : define void @foo(i32 %n, i32* %P) { entry: %cmp7 = icmp sgt i32 %n, 1 br i1 %cmp7, label %for.body.preheader, label %for.end for.body.preheader: ; preds = %entry %n_sext = sext i32 %n to i64 br label %for.body for.body: %K.in = phi i64 [ %n_sext, %for.body.preheader ], [ %K, %for.body ] %K = add i64 %K.in, 1 %StoredAddr = getelementptr i32, i32* %P, i64 %K %StoredValue = trunc i64 %K to i32 store volatile i32 %StoredValue, i32* %StoredAddr %cmp = icmp sgt i64 %K, 1 br i1 %cmp, label %for.body, label %for.end for.end: ret void } # Output in AArch64 where you can see redundant add instructions for stored value, store address, and in cmp : foo: .cfi_startproc // BB#0: cmp w0, #2 b.lt .LBB0_3 // BB#1: sxtw x9, w0 add w8, w0, #1 .LBB0_2: add x10, x1, x9, lsl #2 add x9, x9, #1 str w8, [x10, #4] add w8, w8, #1 cmp x9, #1 b.gt .LBB0_2 .LBB0_3: ret
> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being transformed to post-inc form in OptimizeLoopTermCond() before the initial formulae are determined. Luckily, I was able to remove the redundant add instruction with this hack, but I really doubt if it make sense to prevent a loop terminating condition from being changed to postinc form when it's already a post-inc user.I agree, but don’t have a better suggestion. You could file a bug. Anyone have time to try out some fixes? Andy> # Input IR : > > define void @foo(i32 %n, i32* %P) { > entry: > %cmp7 = icmp sgt i32 %n, 1 > br i1 %cmp7, label %for.body.preheader, label %for.end > > for.body.preheader: ; preds = %entry > %n_sext = sext i32 %n to i64 > br label %for.body > > for.body: > %K.in = phi i64 [ %n_sext, %for.body.preheader ], [ %K, %for.body ] > %K = add i64 %K.in, 1 > > %StoredAddr = getelementptr i32, i32* %P, i64 %K > %StoredValue = trunc i64 %K to i32 > store volatile i32 %StoredValue, i32* %StoredAddr > %cmp = icmp sgt i64 %K, 1 > br i1 %cmp, label %for.body, label %for.end > > for.end: > ret void > } > > > # Output in AArch64 where you can see redundant add instructions for stored value, store address, and in cmp : > > foo: > .cfi_startproc > // BB#0: > cmp w0, #2 > b.lt .LBB0_3 > // BB#1: > sxtw x9, w0 > add w8, w0, #1 > .LBB0_2: > add x10, x1, x9, lsl #2 > add x9, x9, #1 > str w8, [x10, #4] > add w8, w8, #1 > cmp x9, #1 > b.gt .LBB0_2 > .LBB0_3: > ret > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Thanks Andy for your response. We already have a related bug opened in https://llvm.org/bugs/show_bug.cgi?id=26913 . I may happy to prepare a fix for it. However, as I don’t have much experience in LSR, I first need to get some fundamental idea. For me, it seems that LSR try to handle a loop terminating condition as the post-inc form, while handling other IV users as pre-inc. If this is true, what the reasoning behind the use of post-inc. Is there any assumption about using post-inc or pre-inc form in the cost model? Thanks, Jun -----Original Message----- From: atrick at apple.com [mailto:atrick at apple.com] Sent: Friday, May 27, 2016 6:15 PM To: junbuml at codeaurora.org Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Handling post-inc users in LSR> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being transformed to post-inc form in OptimizeLoopTermCond() before the initial formulae are determined. Luckily, I was able to remove the redundant add instruction with this hack, but I really doubt if it make sense to prevent a loop terminating condition from being changed to postinc form when it's already a post-inc user.I agree, but don’t have a better suggestion. You could file a bug. Anyone have time to try out some fixes? Andy> # Input IR : > > define void @foo(i32 %n, i32* %P) { > entry: > %cmp7 = icmp sgt i32 %n, 1 > br i1 %cmp7, label %for.body.preheader, label %for.end > > for.body.preheader: ; preds = %entry > %n_sext = sext i32 %n to i64 > br label %for.body > > for.body: > %K.in = phi i64 [ %n_sext, %for.body.preheader ], [ %K, %for.body ] > %K = add i64 %K.in, 1 > > %StoredAddr = getelementptr i32, i32* %P, i64 %K %StoredValue = > trunc i64 %K to i32 store volatile i32 %StoredValue, i32* %StoredAddr > %cmp = icmp sgt i64 %K, 1 br i1 %cmp, label %for.body, label %for.end > > for.end: > ret void > } > > > # Output in AArch64 where you can see redundant add instructions for stored value, store address, and in cmp : > > foo: > .cfi_startproc > // BB#0: > cmp w0, #2 > b.lt .LBB0_3 > // BB#1: > sxtw x9, w0 > add w8, w0, #1 > .LBB0_2: > add x10, x1, x9, lsl #2 > add x9, x9, #1 > str w8, [x10, #4] > add w8, w8, #1 > cmp x9, #1 > b.gt .LBB0_2 > .LBB0_3: > ret > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Maybe Matching Threads
- Handling post-inc users in LSR
- Aarch64: unaligned access despite -mstrict-align
- [atomics][AArch64] Possible bug in cmpxchg lowering
- [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid
- Aarch64: unaligned access despite -mstrict-align