Sanjoy Das
2015-Jan-08 07:20 UTC
[LLVMdev] missing optimization for icmps in induction variables?
Hi Nick, I checked in something towards (1) yesterday -- http://reviews.llvm.org/D6748 I was under the impression that (2) is exactly the kind of predicate ScalarEvolution::isKnownPredicate is designed to solve (using isImpliedCondXXX or something like that). Is there a reason to prefer GVN over that? On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky <nicholas at mxc.ca> wrote:> Sanjoy Das wrote: >> >> Hi all, >> >> I'm trying to get llvm to optimize away the %cmp to true in >> >> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { >> entry: >> %length = load i32* %length_ptr, !range !0 >> %len.sub.1 = sub i32 %length, 1 >> %upper = icmp slt i32 %init, %len.sub.1 >> br i1 %upper, label %loop, label %exit >> >> loop: >> %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] >> %civ.inc = add i32 %civ, 1 >> %cmp = icmp slt i32 %civ.inc, %length >> br i1 %cmp, label %latch, label %break >> >> latch: >> store i32 0, i32* %array >> %check = icmp slt i32 %civ.inc, %len.sub.1 >> br i1 %check, label %loop, label %break >> >> break: >> ret i32 %civ.inc >> >> exit: >> ret i32 42 >> } >> >> !0 = !{i32 0, i32 2147483647} >> >> >> One way to prove "%cmp == true" in two steps >> >> 1. notice that since both on the backedge and entry, %civ is known to >> be less than %len.sub.1, which not i32_signed_max. This means >> %civ.inc is an "add nsw". >> >> 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". >> This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> >> "%civ.inc `slt` %len". >> >> Currently neither of these happen (i.e. even if I make transformation >> (1) manually, (2) does not kick in). >> >> Is the above reasoning correct? If it is, what is the right place to >> add this logic to? I'm thinking ScalarEvolution (so that this gets >> picked up by SimplifyIndVar), but maybe there is a more idiomatic >> place? The case above is a simplified form of my real workload, which >> involves smin and smax expressions; so the implementation has to be >> easily generalizable to such cases. > > > Before reading your two steps I was going to suggest jump threading. Jump > threading is where we optimize redundant tests across blocks that feed into > branches (block A tests property X and branches to block B which also tests > property X). However jump threading is powered by lazy value info, which I > don't think is suited for the sort of reasoning in your two steps. > > One option is GVN. GVN does have x < y expressions but it doesn't try to > deduce nuw/nsw bits. It might be possible to add that, but it isn't > immediately obvious to me how. GVN also does path-sensitive expression > commoning. > > Nick
Nick Lewycky
2015-Jan-13 06:05 UTC
[LLVMdev] missing optimization for icmps in induction variables?
Sanjoy Das wrote:> Hi Nick, > > I checked in something towards (1) yesterday -- http://reviews.llvm.org/D6748Ah, I missed that. Catching up on email.> I was under the impression that (2) is exactly the kind of predicate > ScalarEvolution::isKnownPredicate is designed to solve (using > isImpliedCondXXX or something like that). Is there a reason to prefer > GVN over that?If you think of the problem as a loop problem then yes ScalarEvolution::isKnownPredicate makes sense. If you think of it as a redundancy elimination problem then GVN makes sense. Does this sort of problem occur outside of loops?> On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky<nicholas at mxc.ca> wrote: >> Sanjoy Das wrote: >>> >>> Hi all, >>> >>> I'm trying to get llvm to optimize away the %cmp to true in >>> >>> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { >>> entry: >>> %length = load i32* %length_ptr, !range !0 >>> %len.sub.1 = sub i32 %length, 1 >>> %upper = icmp slt i32 %init, %len.sub.1 >>> br i1 %upper, label %loop, label %exit >>> >>> loop: >>> %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] >>> %civ.inc = add i32 %civ, 1 >>> %cmp = icmp slt i32 %civ.inc, %length >>> br i1 %cmp, label %latch, label %break >>> >>> latch: >>> store i32 0, i32* %array >>> %check = icmp slt i32 %civ.inc, %len.sub.1 >>> br i1 %check, label %loop, label %break >>> >>> break: >>> ret i32 %civ.inc >>> >>> exit: >>> ret i32 42 >>> } >>> >>> !0 = !{i32 0, i32 2147483647} >>> >>> >>> One way to prove "%cmp == true" in two steps >>> >>> 1. notice that since both on the backedge and entry, %civ is known to >>> be less than %len.sub.1, which not i32_signed_max. This means >>> %civ.inc is an "add nsw". >>> >>> 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". >>> This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> >>> "%civ.inc `slt` %len". >>> >>> Currently neither of these happen (i.e. even if I make transformation >>> (1) manually, (2) does not kick in). >>> >>> Is the above reasoning correct? If it is, what is the right place to >>> add this logic to? I'm thinking ScalarEvolution (so that this gets >>> picked up by SimplifyIndVar), but maybe there is a more idiomatic >>> place? The case above is a simplified form of my real workload, which >>> involves smin and smax expressions; so the implementation has to be >>> easily generalizable to such cases. >> >> >> Before reading your two steps I was going to suggest jump threading. Jump >> threading is where we optimize redundant tests across blocks that feed into >> branches (block A tests property X and branches to block B which also tests >> property X). However jump threading is powered by lazy value info, which I >> don't think is suited for the sort of reasoning in your two steps. >> >> One option is GVN. GVN does have x< y expressions but it doesn't try to >> deduce nuw/nsw bits. It might be possible to add that, but it isn't >> immediately obvious to me how. GVN also does path-sensitive expression >> commoning. >> >> Nick >
Sanjoy Das
2015-Jan-14 23:23 UTC
[LLVMdev] missing optimization for icmps in induction variables?
> If you think of the problem as a loop problem then yes > ScalarEvolution::isKnownPredicate makes sense. If you think of it as a > redundancy elimination problem then GVN makes sense. Does this sort of > problem occur outside of loops?I've only observed them happen inside loops, but that is only because I've been looking at loops. :) At least in theory if LLVM's GVN can be made to in a path-sensitive manner, then there is no need to do such tricks using SCEV. I've only cursorily glanced into GVN.cpp, but given that we want LLVM to conclude that '(x nsw+ 5) SLT L' implies 'x SLT (L nsw- 5)', will the value numbering for cmp expressions have to become richer? The operands to icmp themselves won't value number to the same value, because they're not the same; but maybe there is some canonicalization form we want to reduce the icmp expressions to before numbering them? -- Sanjoy> > >> On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky<nicholas at mxc.ca> wrote: >>> >>> Sanjoy Das wrote: >>>> >>>> >>>> Hi all, >>>> >>>> I'm trying to get llvm to optimize away the %cmp to true in >>>> >>>> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { >>>> entry: >>>> %length = load i32* %length_ptr, !range !0 >>>> %len.sub.1 = sub i32 %length, 1 >>>> %upper = icmp slt i32 %init, %len.sub.1 >>>> br i1 %upper, label %loop, label %exit >>>> >>>> loop: >>>> %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] >>>> %civ.inc = add i32 %civ, 1 >>>> %cmp = icmp slt i32 %civ.inc, %length >>>> br i1 %cmp, label %latch, label %break >>>> >>>> latch: >>>> store i32 0, i32* %array >>>> %check = icmp slt i32 %civ.inc, %len.sub.1 >>>> br i1 %check, label %loop, label %break >>>> >>>> break: >>>> ret i32 %civ.inc >>>> >>>> exit: >>>> ret i32 42 >>>> } >>>> >>>> !0 = !{i32 0, i32 2147483647} >>>> >>>> >>>> One way to prove "%cmp == true" in two steps >>>> >>>> 1. notice that since both on the backedge and entry, %civ is known to >>>> be less than %len.sub.1, which not i32_signed_max. This means >>>> %civ.inc is an "add nsw". >>>> >>>> 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". >>>> This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> >>>> "%civ.inc `slt` %len". >>>> >>>> Currently neither of these happen (i.e. even if I make transformation >>>> (1) manually, (2) does not kick in). >>>> >>>> Is the above reasoning correct? If it is, what is the right place to >>>> add this logic to? I'm thinking ScalarEvolution (so that this gets >>>> picked up by SimplifyIndVar), but maybe there is a more idiomatic >>>> place? The case above is a simplified form of my real workload, which >>>> involves smin and smax expressions; so the implementation has to be >>>> easily generalizable to such cases. >>> >>> >>> >>> Before reading your two steps I was going to suggest jump threading. Jump >>> threading is where we optimize redundant tests across blocks that feed >>> into >>> branches (block A tests property X and branches to block B which also >>> tests >>> property X). However jump threading is powered by lazy value info, which >>> I >>> don't think is suited for the sort of reasoning in your two steps. >>> >>> One option is GVN. GVN does have x< y expressions but it doesn't try to >>> deduce nuw/nsw bits. It might be possible to add that, but it isn't >>> immediately obvious to me how. GVN also does path-sensitive expression >>> commoning. >>> >>> Nick >> >> >