Boris Boesler via llvm-dev
2020-Jun-10 08:58 UTC
[llvm-dev] LoopStrengthReduction generates false code
The IR after LSR is: *** IR Dump After Loop Strength Reduction *** ; Preheader: entry: tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* @buffer, i32 0, i32 0)) #2 br label %while.body ; Loop: while.body: ; preds = %while.body, %entry %lsr.iv = phi i32 [ %lsr.iv.next, %while.body ], [ 0, %entry ] %uglygep = getelementptr i8, i8* bitcast ([10 x i32]* @buffer to i8*), i32 %lsr.iv %uglygep1 = bitcast i8* %uglygep to i32* %0 = load i32, i32* %uglygep1, align 4, !tbaa !2 %cmp1 = icmp ne i32 %0, -559038737 %cmp11 = icmp eq i32 %lsr.iv, 0 %cmp = or i1 %cmp11, %cmp1 %lsr.iv.next = add nuw i32 %lsr.iv, 8 br i1 %cmp, label %while.body, label %while.end ; Exit blocks while.end: ; preds = %while.body store volatile i32 %0, i32* %result, align 4, !tbaa !2 ret void I guess "%uglygep = getelementptr.." will be lowered to @buffer + (%lsr.iv * StoreSize(i32)). That's what I see in the final code. But then %lsr.iv.next should be incremented by 1; BUT it is incremented by 8. Incrementing %lsr.iv.next by 8 would make sense if getelementptr were lowered to @buffer + %lsr.iv. Thanks for your help, Boris> Am 09.06.2020 um 21:56 schrieb Eli Friedman <efriedma at quicinc.com>: > > Hmm. Then I'm not sure; at first glance, the debug output looks fine either way. Could you show the IR after LSR, and explain why it's wrong? > > -Eli > >> -----Original Message----- >> From: Boris Boesler <baembel at gmx.de> >> Sent: Tuesday, June 9, 2020 11:59 AM >> To: Eli Friedman <efriedma at quicinc.com> >> Cc: llvm-dev at lists.llvm.org >> Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code >> >> Hm, no. I expect byte addresses - everywhere. The compiler should not know >> that the arch needs word addresses. During lowering LOAD and STORE get >> explicit conversion operations for the memory address. Even if my arch was >> byte addressed the code would be false/illegal. >> >> Boris >> >>> Am 09.06.2020 um 19:36 schrieb Eli Friedman <efriedma at quicinc.com>: >>> >>> Blindly guessing here, "memory is not byte addressed", but you never fixed >> ScalarEvolution to handle that, so it's modeling the GEP in a way you're not >> expecting. >>> >>> -Eli >>> >>>> -----Original Message----- >>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Boris >> Boesler >>>> via llvm-dev >>>> Sent: Tuesday, June 9, 2020 1:17 AM >>>> To: llvm-dev at lists.llvm.org >>>> Subject: [EXT] [llvm-dev] LoopStrengthReduction generates false code >>>> >>>> Hi. >>>> >>>> In my backend I get false code after using StrengthLoopReduction. In the >>>> generated code the loop index variable is multiplied by 8 (correct, >> everything >>>> is 64 bit aligned) to get an address offset, and the index variable is >>>> incremented by 1*8, which is not correct. It should be incremented by 1 >>>> only. The factor 8 appears again. >>>> >>>> I compared the debug output (-debug-only=scalar-evolution,loop-reduce) >> for >>>> my backend and the ARM backend, but simply can't read/understand it. >>>> They differ in factors 4 vs 8 (ok), but there are more differences, probably >>>> caused by the implementation of TargetTransformInfo for ARM, while I >>>> haven't implemented it for my arch, yet. >>>> >>>> How can I debug this further? In my arch everything is 64 bit aligned >> (factor 8 >>>> in many passes) and the memory is not byte addressed. >>>> >>>> Thanks, >>>> Boris >>>> >>>> ----8<---- >>>> >>>> LLVM assembly: >>>> >>>> @buffer = common dso_local global [10 x i32] zeroinitializer, align 4 >>>> >>>> ; Function Attrs: nounwind >>>> define dso_local void @some_main(i32* %result) local_unnamed_addr #0 >> { >>>> entry: >>>> tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x >> i32]* >>>> @buffer, i32 0, i32 0)) #2 >>>> br label %while.body >>>> >>>> while.body: ; preds = %entry, %while.body >>>> %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ] >>>> %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0, >>>> i32 %i.010 >>>> %0 = load i32, i32* %arrayidx, align 4, !tbaa !2 >>>> %cmp1 = icmp ne i32 %0, -559038737 >>>> %inc = add nuw nsw i32 %i.010, 1 >>>> %cmp11 = icmp eq i32 %i.010, 0 >>>> %cmp = or i1 %cmp11, %cmp1 >>>> br i1 %cmp, label %while.body, label %while.end >>>> >>>> while.end: ; preds = %while.body >>>> %arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 >> 0, >>>> i32 %i.010 >>>> %1 = load i32, i32* %arrayidx2, align 4, !tbaa !2 >>>> store volatile i32 %1, i32* %result, align 4, !tbaa !2 >>>> ret void >>>> } >>>> >>>> declare dso_local void @fill_array(i32*) local_unnamed_addr #1 >>>> >>>> attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp- >> math"="false" >>>> "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame- >> pointer- >>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" >>>> "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp- >>>> math"="false" "no-trapping-math"="false" "stack-protector-buffer- >> size"="8" >>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>> attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable- >>>> tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer- >>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" >>>> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no- >> trapping- >>>> math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" >>>> "use-soft-float"="false" } >>>> attributes #2 = { nounwind } >>>> >>>> !llvm.module.flags = !{!0} >>>> !llvm.ident = !{!1} >>>> >>>> !0 = !{i32 1, !"wchar_size", i32 4} >>>> !1 = !{!"clang version 7.0.1 (tags/RELEASE_701/final)"} >>>> !2 = !{!3, !3, i64 0} >>>> !3 = !{!"int", !4, i64 0} >>>> !4 = !{!"omnipotent char", !5, i64 0} >>>> !5 = !{!"Simple C/C++ TBAA"} >>>> >>>> >>>> (-debug-only=scalar-evolution,loop-reduce) for my arch: >>>> >>>> LSR on loop %while.body: >>>> Collecting IV Chains. >>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) >>>> IV={@buffer,+,8}<nsw><%while.body> >>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) >>>> IV={0,+,1}<nuw><nsw><%while.body> >>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1 >>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 >>>> LSR has identified the following interesting factors and types: *8 >>>> LSR is examining the following fixup sites: >>>> UserInst=%cmp11, OperandValToReplace=%i.010 >>>> UserInst=%0, OperandValToReplace=%arrayidx >>>> LSR found 2 uses: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,8}<nsw><%while.body>) >>>> >>>> After generating reuse formulae: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,8}<nsw><%while.body>) >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: >> i32 >>>> Filtering out formula reg({0,+,1}<nuw><nsw><%while.body>) >>>> in favor of formula reg({0,+,-1}<nw><%while.body>) >>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, >>>> widest fixup type: i32* >>>> >>>> After filtering out undesirable candidates: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,8}<nsw><%while.body>) >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>> New best at 2 instructions 2 regs, with addrec cost 2. >>>> Regs: {0,+,-1}<nw><%while.body> {@buffer,+,8}<nsw><%while.body> >>>> New best at 2 instructions 2 regs, with addrec cost 1, plus 1 base add. >>>> Regs: {0,+,8}<nuw><nsw><%while.body> @buffer >>>> >>>> The chosen solution requires 2 instructions 2 regs, with addrec cost 1, >> plus 1 >>>> base add: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>> >>>> >>>> (-debug-only=scalar-evolution,loop-reduce) for ARM: >>>> >>>> LSR on loop %while.body: >>>> Collecting IV Chains. >>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) >>>> IV={@buffer,+,4}<nsw><%while.body> >>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) >>>> IV={0,+,1}<nuw><nsw><%while.body> >>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1 >>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 >>>> LSR has identified the following interesting factors and types: *4 >>>> LSR is examining the following fixup sites: >>>> UserInst=%cmp11, OperandValToReplace=%i.010 >>>> UserInst=%0, OperandValToReplace=%arrayidx >>>> LSR found 2 uses: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,4}<nsw><%while.body>) >>>> >>>> After generating reuse formulae: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> reg({0,+,4}<nuw><nsw><%while.body>) >>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,4}<nsw><%while.body>) >>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) >>>> -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) >>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>> reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) >>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: >> i32 >>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, >>>> widest fixup type: i32* >>>> Filtering out formula -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) >>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) >>>> Filtering out formula reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) >>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) >>>> >>>> After filtering out undesirable candidates: >>>> LSR is examining the following uses: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> reg({0,+,4}<nuw><nsw><%while.body>) >>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg({@buffer,+,4}<nsw><%while.body>) >>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) >>>> New best at 1 instruction 2 regs, with addrec cost 1. >>>> Regs: {0,+,-1}<nw><%while.body> @buffer >>>> >>>> The chosen solution requires 1 instruction 2 regs, with addrec cost 1: >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>> reg({0,+,-1}<nw><%while.body>) >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >> type: >>>> i32* >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Eli Friedman via llvm-dev
2020-Jun-10 19:04 UTC
[llvm-dev] LoopStrengthReduction generates false code
" getelementptr i8" is a GEP over byte-size elements, so it is in fact just "@buffer + %lsr.iv". Note we bitcast the operand to i8*, then bitcast the result from i8* to i32*. -Eli> -----Original Message----- > From: Boris Boesler <baembel at gmx.de> > Sent: Wednesday, June 10, 2020 1:59 AM > To: Eli Friedman <efriedma at quicinc.com> > Cc: llvm-dev at lists.llvm.org > Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code > > The IR after LSR is: > > *** IR Dump After Loop Strength Reduction *** > ; Preheader: > entry: > tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* > @buffer, i32 0, i32 0)) #2 > br label %while.body > > ; Loop: > while.body: ; preds = %while.body, %entry > %lsr.iv = phi i32 [ %lsr.iv.next, %while.body ], [ 0, %entry ] > %uglygep = getelementptr i8, i8* bitcast ([10 x i32]* @buffer to i8*), i32 > %lsr.iv > %uglygep1 = bitcast i8* %uglygep to i32* > %0 = load i32, i32* %uglygep1, align 4, !tbaa !2 > %cmp1 = icmp ne i32 %0, -559038737 > %cmp11 = icmp eq i32 %lsr.iv, 0 > %cmp = or i1 %cmp11, %cmp1 > %lsr.iv.next = add nuw i32 %lsr.iv, 8 > br i1 %cmp, label %while.body, label %while.end > > ; Exit blocks > while.end: ; preds = %while.body > store volatile i32 %0, i32* %result, align 4, !tbaa !2 > ret void > > I guess "%uglygep = getelementptr.." will be lowered to @buffer + (%lsr.iv * > StoreSize(i32)). That's what I see in the final code. But then %lsr.iv.next > should be incremented by 1; BUT it is incremented by 8. > > Incrementing %lsr.iv.next by 8 would make sense if getelementptr were > lowered to @buffer + %lsr.iv. > > Thanks for your help, > Boris > > > > > > Am 09.06.2020 um 21:56 schrieb Eli Friedman <efriedma at quicinc.com>: > > > > Hmm. Then I'm not sure; at first glance, the debug output looks fine either > way. Could you show the IR after LSR, and explain why it's wrong? > > > > -Eli > > > >> -----Original Message----- > >> From: Boris Boesler <baembel at gmx.de> > >> Sent: Tuesday, June 9, 2020 11:59 AM > >> To: Eli Friedman <efriedma at quicinc.com> > >> Cc: llvm-dev at lists.llvm.org > >> Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code > >> > >> Hm, no. I expect byte addresses - everywhere. The compiler should not > know > >> that the arch needs word addresses. During lowering LOAD and STORE get > >> explicit conversion operations for the memory address. Even if my arch > was > >> byte addressed the code would be false/illegal. > >> > >> Boris > >> > >>> Am 09.06.2020 um 19:36 schrieb Eli Friedman <efriedma at quicinc.com>: > >>> > >>> Blindly guessing here, "memory is not byte addressed", but you never > fixed > >> ScalarEvolution to handle that, so it's modeling the GEP in a way you're > not > >> expecting. > >>> > >>> -Eli > >>> > >>>> -----Original Message----- > >>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Boris > >> Boesler > >>>> via llvm-dev > >>>> Sent: Tuesday, June 9, 2020 1:17 AM > >>>> To: llvm-dev at lists.llvm.org > >>>> Subject: [EXT] [llvm-dev] LoopStrengthReduction generates false code > >>>> > >>>> Hi. > >>>> > >>>> In my backend I get false code after using StrengthLoopReduction. In > the > >>>> generated code the loop index variable is multiplied by 8 (correct, > >> everything > >>>> is 64 bit aligned) to get an address offset, and the index variable is > >>>> incremented by 1*8, which is not correct. It should be incremented by 1 > >>>> only. The factor 8 appears again. > >>>> > >>>> I compared the debug output (-debug-only=scalar-evolution,loop- > reduce) > >> for > >>>> my backend and the ARM backend, but simply can't read/understand > it. > >>>> They differ in factors 4 vs 8 (ok), but there are more differences, > probably > >>>> caused by the implementation of TargetTransformInfo for ARM, while I > >>>> haven't implemented it for my arch, yet. > >>>> > >>>> How can I debug this further? In my arch everything is 64 bit aligned > >> (factor 8 > >>>> in many passes) and the memory is not byte addressed. > >>>> > >>>> Thanks, > >>>> Boris > >>>> > >>>> ----8<---- > >>>> > >>>> LLVM assembly: > >>>> > >>>> @buffer = common dso_local global [10 x i32] zeroinitializer, align 4 > >>>> > >>>> ; Function Attrs: nounwind > >>>> define dso_local void @some_main(i32* %result) local_unnamed_addr > #0 > >> { > >>>> entry: > >>>> tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x > >> i32]* > >>>> @buffer, i32 0, i32 0)) #2 > >>>> br label %while.body > >>>> > >>>> while.body: ; preds = %entry, %while.body > >>>> %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ] > >>>> %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 > 0, > >>>> i32 %i.010 > >>>> %0 = load i32, i32* %arrayidx, align 4, !tbaa !2 > >>>> %cmp1 = icmp ne i32 %0, -559038737 > >>>> %inc = add nuw nsw i32 %i.010, 1 > >>>> %cmp11 = icmp eq i32 %i.010, 0 > >>>> %cmp = or i1 %cmp11, %cmp1 > >>>> br i1 %cmp, label %while.body, label %while.end > >>>> > >>>> while.end: ; preds = %while.body > >>>> %arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, > i32 > >> 0, > >>>> i32 %i.010 > >>>> %1 = load i32, i32* %arrayidx2, align 4, !tbaa !2 > >>>> store volatile i32 %1, i32* %result, align 4, !tbaa !2 > >>>> ret void > >>>> } > >>>> > >>>> declare dso_local void @fill_array(i32*) local_unnamed_addr #1 > >>>> > >>>> attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp- > >> math"="false" > >>>> "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame- > >> pointer- > >>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp- > math"="false" > >>>> "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros- > fp- > >>>> math"="false" "no-trapping-math"="false" "stack-protector-buffer- > >> size"="8" > >>>> "unsafe-fp-math"="false" "use-soft-float"="false" } > >>>> attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" > "disable- > >>>> tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer- > >>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp- > math"="false" > >>>> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no- > >> trapping- > >>>> math"="false" "stack-protector-buffer-size"="8" "unsafe-fp- > math"="false" > >>>> "use-soft-float"="false" } > >>>> attributes #2 = { nounwind } > >>>> > >>>> !llvm.module.flags = !{!0} > >>>> !llvm.ident = !{!1} > >>>> > >>>> !0 = !{i32 1, !"wchar_size", i32 4} > >>>> !1 = !{!"clang version 7.0.1 (tags/RELEASE_701/final)"} > >>>> !2 = !{!3, !3, i64 0} > >>>> !3 = !{!"int", !4, i64 0} > >>>> !4 = !{!"omnipotent char", !5, i64 0} > >>>> !5 = !{!"Simple C/C++ TBAA"} > >>>> > >>>> > >>>> (-debug-only=scalar-evolution,loop-reduce) for my arch: > >>>> > >>>> LSR on loop %while.body: > >>>> Collecting IV Chains. > >>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) > >>>> IV={@buffer,+,8}<nsw><%while.body> > >>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) > >>>> IV={0,+,1}<nuw><nsw><%while.body> > >>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) > IV+1 > >>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 > >>>> LSR has identified the following interesting factors and types: *8 > >>>> LSR is examining the following fixup sites: > >>>> UserInst=%cmp11, OperandValToReplace=%i.010 > >>>> UserInst=%0, OperandValToReplace=%arrayidx > >>>> LSR found 2 uses: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,8}<nsw><%while.body>) > >>>> > >>>> After generating reuse formulae: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> reg({0,+,8}<nuw><nsw><%while.body>) > >>>> reg({0,+,1}<nuw><nsw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,8}<nsw><%while.body>) > >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) > >>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: > >> i32 > >>>> Filtering out formula reg({0,+,1}<nuw><nsw><%while.body>) > >>>> in favor of formula reg({0,+,-1}<nw><%while.body>) > >>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), > Offsets={0}, > >>>> widest fixup type: i32* > >>>> > >>>> After filtering out undesirable candidates: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> reg({0,+,8}<nuw><nsw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,8}<nsw><%while.body>) > >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) > >>>> New best at 2 instructions 2 regs, with addrec cost 2. > >>>> Regs: {0,+,-1}<nw><%while.body> {@buffer,+,8}<nsw><%while.body> > >>>> New best at 2 instructions 2 regs, with addrec cost 1, plus 1 base add. > >>>> Regs: {0,+,8}<nuw><nsw><%while.body> @buffer > >>>> > >>>> The chosen solution requires 2 instructions 2 regs, with addrec cost 1, > >> plus 1 > >>>> base add: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,8}<nuw><nsw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) > >>>> > >>>> > >>>> (-debug-only=scalar-evolution,loop-reduce) for ARM: > >>>> > >>>> LSR on loop %while.body: > >>>> Collecting IV Chains. > >>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) > >>>> IV={@buffer,+,4}<nsw><%while.body> > >>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) > >>>> IV={0,+,1}<nuw><nsw><%while.body> > >>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) > IV+1 > >>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 > >>>> LSR has identified the following interesting factors and types: *4 > >>>> LSR is examining the following fixup sites: > >>>> UserInst=%cmp11, OperandValToReplace=%i.010 > >>>> UserInst=%0, OperandValToReplace=%arrayidx > >>>> LSR found 2 uses: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,4}<nsw><%while.body>) > >>>> > >>>> After generating reuse formulae: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> reg({0,+,4}<nuw><nsw><%while.body>) > >>>> reg({0,+,1}<nuw><nsw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,4}<nsw><%while.body>) > >>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) > >>>> -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) > >>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) > >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) > >>>> reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) > >>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: > >> i32 > >>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), > Offsets={0}, > >>>> widest fixup type: i32* > >>>> Filtering out formula -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) > >>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) > >>>> Filtering out formula reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) > >>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) > >>>> > >>>> After filtering out undesirable candidates: > >>>> LSR is examining the following uses: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> reg({0,+,4}<nuw><nsw><%while.body>) > >>>> reg({0,+,1}<nuw><nsw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg({@buffer,+,4}<nsw><%while.body>) > >>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) > >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) > >>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) > >>>> New best at 1 instruction 2 regs, with addrec cost 1. > >>>> Regs: {0,+,-1}<nw><%while.body> @buffer > >>>> > >>>> The chosen solution requires 1 instruction 2 regs, with addrec cost 1: > >>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 > >>>> reg({0,+,-1}<nw><%while.body>) > >>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup > >> type: > >>>> i32* > >>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Boris Boesler via llvm-dev
2020-Jun-12 08:37 UTC
[llvm-dev] LoopStrengthReduction generates false code
Sorry, no. First, I made some small changes to my backend and the code is slighly different. But I think it is irrelevant, the while loop is unchanged, but the end block is different. (generated IR code below) I added some debug printing in SelectionDAGBuilder::visitGetElementPtr() and the 3 GEPs are lowered to two base + %lsr.iv * 8 and one base + offset. Also, if I align i32 to 32 bits (which is illegal on this arch!), then the expected code is generated. I'll have a closer look at SelectionDAGBuilder::visitGetElementPtr(). Boris new code: @buffer = common dso_local global [10 x i32] zeroinitializer, align 4 ; Function Attrs: nounwind define dso_local void @some_main(i32* %result) local_unnamed_addr #0 { entry: tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* @buffer, i32 0, i32 0)) #2 br label %while.body while.body: ; preds = %while.body, %entry %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ] %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0, i32 %i.010 %0 = load i32, i32* %arrayidx, align 4, !tbaa !2 %cmp1 = icmp ne i32 %0, -559038737 %inc = add nuw nsw i32 %i.010, 1 %cmp11 = icmp eq i32 %i.010, 0 %cmp = or i1 %cmp11, %cmp1 br i1 %cmp, label %while.body, label %while.end while.end: ; preds = %while.body %arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0, i32 %i.010 %1 = load i32, i32* %arrayidx2, align 4, !tbaa !2 store volatile i32 %1, i32* %result, align 4, !tbaa !2 ret void } declare dso_local void @fill_array(i32*) local_unnamed_addr #1 *** IR Dump After Loop Strength Reduction *** ; Preheader: entry: tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* @buffer, i32 0, i32 0)) #2 br label %while.body ; Loop: while.body: ; preds = %while.body, %entry %lsr.iv = phi i32 [ %lsr.iv.next, %while.body ], [ 0, %entry ] %uglygep2 = getelementptr i8, i8* bitcast ([10 x i32]* @buffer to i8*), i32 %lsr.iv %uglygep23 = bitcast i8* %uglygep2 to i32* %0 = load i32, i32* %uglygep23, align 4, !tbaa !2 %cmp1 = icmp ne i32 %0, -559038737 %cmp11 = icmp eq i32 %lsr.iv, 0 %cmp = or i1 %cmp11, %cmp1 %lsr.iv.next = add nuw i32 %lsr.iv, 8 br i1 %cmp, label %while.body, label %while.end ; Exit blocks while.end: ; preds = %while.body %uglygep = getelementptr i8, i8* bitcast ([10 x i32]* @buffer to i8*), i32 %lsr.iv.next %uglygep1 = bitcast i8* %uglygep to i32* %scevgep = getelementptr i32, i32* %uglygep1, i32 -1 %1 = load i32, i32* %scevgep, align 4, !tbaa !2 store volatile i32 %1, i32* %result, align 4, !tbaa !2 ret void> Am 10.06.2020 um 21:04 schrieb Eli Friedman <efriedma at quicinc.com>: > > " getelementptr i8" is a GEP over byte-size elements, so it is in fact just "@buffer + %lsr.iv". Note we bitcast the operand to i8*, then bitcast the result from i8* to i32*. > > -Eli > >> -----Original Message----- >> From: Boris Boesler <baembel at gmx.de> >> Sent: Wednesday, June 10, 2020 1:59 AM >> To: Eli Friedman <efriedma at quicinc.com> >> Cc: llvm-dev at lists.llvm.org >> Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code >> >> The IR after LSR is: >> >> *** IR Dump After Loop Strength Reduction *** >> ; Preheader: >> entry: >> tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* >> @buffer, i32 0, i32 0)) #2 >> br label %while.body >> >> ; Loop: >> while.body: ; preds = %while.body, %entry >> %lsr.iv = phi i32 [ %lsr.iv.next, %while.body ], [ 0, %entry ] >> %uglygep = getelementptr i8, i8* bitcast ([10 x i32]* @buffer to i8*), i32 >> %lsr.iv >> %uglygep1 = bitcast i8* %uglygep to i32* >> %0 = load i32, i32* %uglygep1, align 4, !tbaa !2 >> %cmp1 = icmp ne i32 %0, -559038737 >> %cmp11 = icmp eq i32 %lsr.iv, 0 >> %cmp = or i1 %cmp11, %cmp1 >> %lsr.iv.next = add nuw i32 %lsr.iv, 8 >> br i1 %cmp, label %while.body, label %while.end >> >> ; Exit blocks >> while.end: ; preds = %while.body >> store volatile i32 %0, i32* %result, align 4, !tbaa !2 >> ret void >> >> I guess "%uglygep = getelementptr.." will be lowered to @buffer + (%lsr.iv * >> StoreSize(i32)). That's what I see in the final code. But then %lsr.iv.next >> should be incremented by 1; BUT it is incremented by 8. >> >> Incrementing %lsr.iv.next by 8 would make sense if getelementptr were >> lowered to @buffer + %lsr.iv. >> >> Thanks for your help, >> Boris >> >> >> >> >>> Am 09.06.2020 um 21:56 schrieb Eli Friedman <efriedma at quicinc.com>: >>> >>> Hmm. Then I'm not sure; at first glance, the debug output looks fine either >> way. Could you show the IR after LSR, and explain why it's wrong? >>> >>> -Eli >>> >>>> -----Original Message----- >>>> From: Boris Boesler <baembel at gmx.de> >>>> Sent: Tuesday, June 9, 2020 11:59 AM >>>> To: Eli Friedman <efriedma at quicinc.com> >>>> Cc: llvm-dev at lists.llvm.org >>>> Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code >>>> >>>> Hm, no. I expect byte addresses - everywhere. The compiler should not >> know >>>> that the arch needs word addresses. During lowering LOAD and STORE get >>>> explicit conversion operations for the memory address. Even if my arch >> was >>>> byte addressed the code would be false/illegal. >>>> >>>> Boris >>>> >>>>> Am 09.06.2020 um 19:36 schrieb Eli Friedman <efriedma at quicinc.com>: >>>>> >>>>> Blindly guessing here, "memory is not byte addressed", but you never >> fixed >>>> ScalarEvolution to handle that, so it's modeling the GEP in a way you're >> not >>>> expecting. >>>>> >>>>> -Eli >>>>> >>>>>> -----Original Message----- >>>>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Boris >>>> Boesler >>>>>> via llvm-dev >>>>>> Sent: Tuesday, June 9, 2020 1:17 AM >>>>>> To: llvm-dev at lists.llvm.org >>>>>> Subject: [EXT] [llvm-dev] LoopStrengthReduction generates false code >>>>>> >>>>>> Hi. >>>>>> >>>>>> In my backend I get false code after using StrengthLoopReduction. In >> the >>>>>> generated code the loop index variable is multiplied by 8 (correct, >>>> everything >>>>>> is 64 bit aligned) to get an address offset, and the index variable is >>>>>> incremented by 1*8, which is not correct. It should be incremented by 1 >>>>>> only. The factor 8 appears again. >>>>>> >>>>>> I compared the debug output (-debug-only=scalar-evolution,loop- >> reduce) >>>> for >>>>>> my backend and the ARM backend, but simply can't read/understand >> it. >>>>>> They differ in factors 4 vs 8 (ok), but there are more differences, >> probably >>>>>> caused by the implementation of TargetTransformInfo for ARM, while I >>>>>> haven't implemented it for my arch, yet. >>>>>> >>>>>> How can I debug this further? In my arch everything is 64 bit aligned >>>> (factor 8 >>>>>> in many passes) and the memory is not byte addressed. >>>>>> >>>>>> Thanks, >>>>>> Boris >>>>>> >>>>>> ----8<---- >>>>>> >>>>>> LLVM assembly: >>>>>> >>>>>> @buffer = common dso_local global [10 x i32] zeroinitializer, align 4 >>>>>> >>>>>> ; Function Attrs: nounwind >>>>>> define dso_local void @some_main(i32* %result) local_unnamed_addr >> #0 >>>> { >>>>>> entry: >>>>>> tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x >>>> i32]* >>>>>> @buffer, i32 0, i32 0)) #2 >>>>>> br label %while.body >>>>>> >>>>>> while.body: ; preds = %entry, %while.body >>>>>> %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ] >>>>>> %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 >> 0, >>>>>> i32 %i.010 >>>>>> %0 = load i32, i32* %arrayidx, align 4, !tbaa !2 >>>>>> %cmp1 = icmp ne i32 %0, -559038737 >>>>>> %inc = add nuw nsw i32 %i.010, 1 >>>>>> %cmp11 = icmp eq i32 %i.010, 0 >>>>>> %cmp = or i1 %cmp11, %cmp1 >>>>>> br i1 %cmp, label %while.body, label %while.end >>>>>> >>>>>> while.end: ; preds = %while.body >>>>>> %arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, >> i32 >>>> 0, >>>>>> i32 %i.010 >>>>>> %1 = load i32, i32* %arrayidx2, align 4, !tbaa !2 >>>>>> store volatile i32 %1, i32* %result, align 4, !tbaa !2 >>>>>> ret void >>>>>> } >>>>>> >>>>>> declare dso_local void @fill_array(i32*) local_unnamed_addr #1 >>>>>> >>>>>> attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp- >>>> math"="false" >>>>>> "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame- >>>> pointer- >>>>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp- >> math"="false" >>>>>> "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros- >> fp- >>>>>> math"="false" "no-trapping-math"="false" "stack-protector-buffer- >>>> size"="8" >>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>>>> attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" >> "disable- >>>>>> tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer- >>>>>> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp- >> math"="false" >>>>>> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no- >>>> trapping- >>>>>> math"="false" "stack-protector-buffer-size"="8" "unsafe-fp- >> math"="false" >>>>>> "use-soft-float"="false" } >>>>>> attributes #2 = { nounwind } >>>>>> >>>>>> !llvm.module.flags = !{!0} >>>>>> !llvm.ident = !{!1} >>>>>> >>>>>> !0 = !{i32 1, !"wchar_size", i32 4} >>>>>> !1 = !{!"clang version 7.0.1 (tags/RELEASE_701/final)"} >>>>>> !2 = !{!3, !3, i64 0} >>>>>> !3 = !{!"int", !4, i64 0} >>>>>> !4 = !{!"omnipotent char", !5, i64 0} >>>>>> !5 = !{!"Simple C/C++ TBAA"} >>>>>> >>>>>> >>>>>> (-debug-only=scalar-evolution,loop-reduce) for my arch: >>>>>> >>>>>> LSR on loop %while.body: >>>>>> Collecting IV Chains. >>>>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) >>>>>> IV={@buffer,+,8}<nsw><%while.body> >>>>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) >>>>>> IV={0,+,1}<nuw><nsw><%while.body> >>>>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) >> IV+1 >>>>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 >>>>>> LSR has identified the following interesting factors and types: *8 >>>>>> LSR is examining the following fixup sites: >>>>>> UserInst=%cmp11, OperandValToReplace=%i.010 >>>>>> UserInst=%0, OperandValToReplace=%arrayidx >>>>>> LSR found 2 uses: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,8}<nsw><%while.body>) >>>>>> >>>>>> After generating reuse formulae: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,8}<nsw><%while.body>) >>>>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: >>>> i32 >>>>>> Filtering out formula reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> in favor of formula reg({0,+,-1}<nw><%while.body>) >>>>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), >> Offsets={0}, >>>>>> widest fixup type: i32* >>>>>> >>>>>> After filtering out undesirable candidates: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,8}<nsw><%while.body>) >>>>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> New best at 2 instructions 2 regs, with addrec cost 2. >>>>>> Regs: {0,+,-1}<nw><%while.body> {@buffer,+,8}<nsw><%while.body> >>>>>> New best at 2 instructions 2 regs, with addrec cost 1, plus 1 base add. >>>>>> Regs: {0,+,8}<nuw><nsw><%while.body> @buffer >>>>>> >>>>>> The chosen solution requires 2 instructions 2 regs, with addrec cost 1, >>>> plus 1 >>>>>> base add: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>) >>>>>> >>>>>> >>>>>> (-debug-only=scalar-evolution,loop-reduce) for ARM: >>>>>> >>>>>> LSR on loop %while.body: >>>>>> Collecting IV Chains. >>>>>> IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) >>>>>> IV={@buffer,+,4}<nsw><%while.body> >>>>>> IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) >>>>>> IV={0,+,1}<nuw><nsw><%while.body> >>>>>> IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) >> IV+1 >>>>>> Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0 >>>>>> LSR has identified the following interesting factors and types: *4 >>>>>> LSR is examining the following fixup sites: >>>>>> UserInst=%cmp11, OperandValToReplace=%i.010 >>>>>> UserInst=%0, OperandValToReplace=%arrayidx >>>>>> LSR found 2 uses: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,4}<nsw><%while.body>) >>>>>> >>>>>> After generating reuse formulae: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> reg({0,+,4}<nuw><nsw><%while.body>) >>>>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,4}<nsw><%while.body>) >>>>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) >>>>>> -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) >>>>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>>>> reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) >>>>>> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: >>>> i32 >>>>>> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), >> Offsets={0}, >>>>>> widest fixup type: i32* >>>>>> Filtering out formula -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>) >>>>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) >>>>>> Filtering out formula reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>) >>>>>> in favor of formula reg({@buffer,+,4}<nsw><%while.body>) >>>>>> >>>>>> After filtering out undesirable candidates: >>>>>> LSR is examining the following uses: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> reg({0,+,4}<nuw><nsw><%while.body>) >>>>>> reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg({@buffer,+,4}<nsw><%while.body>) >>>>>> reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>) >>>>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>>>> reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>) >>>>>> New best at 1 instruction 2 regs, with addrec cost 1. >>>>>> Regs: {0,+,-1}<nw><%while.body> @buffer >>>>>> >>>>>> The chosen solution requires 1 instruction 2 regs, with addrec cost 1: >>>>>> LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 >>>>>> reg({0,+,-1}<nw><%while.body>) >>>>>> LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup >>>> type: >>>>>> i32* >>>>>> reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>) >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >