Sanjoy Das
2014-Dec-18 20:31 UTC
[LLVMdev] missing optimization for icmps in induction variables?
Hi all, I'm trying to get llvm to optimize away the %cmp to true in define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } !0 = !{i32 0, i32 2147483647} One way to prove "%cmp == true" in two steps 1. notice that since both on the backedge and entry, %civ is known to be less than %len.sub.1, which not i32_signed_max. This means %civ.inc is an "add nsw". 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> "%civ.inc `slt` %len". Currently neither of these happen (i.e. even if I make transformation (1) manually, (2) does not kick in). Is the above reasoning correct? If it is, what is the right place to add this logic to? I'm thinking ScalarEvolution (so that this gets picked up by SimplifyIndVar), but maybe there is a more idiomatic place? The case above is a simplified form of my real workload, which involves smin and smax expressions; so the implementation has to be easily generalizable to such cases. Thanks, -- Sanjoy
Nick Lewycky
2015-Jan-08 06:06 UTC
[LLVMdev] missing optimization for icmps in induction variables?
Sanjoy Das wrote:> Hi all, > > I'm trying to get llvm to optimize away the %cmp to true in > > define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { > entry: > %length = load i32* %length_ptr, !range !0 > %len.sub.1 = sub i32 %length, 1 > %upper = icmp slt i32 %init, %len.sub.1 > br i1 %upper, label %loop, label %exit > > loop: > %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] > %civ.inc = add i32 %civ, 1 > %cmp = icmp slt i32 %civ.inc, %length > br i1 %cmp, label %latch, label %break > > latch: > store i32 0, i32* %array > %check = icmp slt i32 %civ.inc, %len.sub.1 > br i1 %check, label %loop, label %break > > break: > ret i32 %civ.inc > > exit: > ret i32 42 > } > > !0 = !{i32 0, i32 2147483647} > > > One way to prove "%cmp == true" in two steps > > 1. notice that since both on the backedge and entry, %civ is known to > be less than %len.sub.1, which not i32_signed_max. This means > %civ.inc is an "add nsw". > > 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". > This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> > "%civ.inc `slt` %len". > > Currently neither of these happen (i.e. even if I make transformation > (1) manually, (2) does not kick in). > > Is the above reasoning correct? If it is, what is the right place to > add this logic to? I'm thinking ScalarEvolution (so that this gets > picked up by SimplifyIndVar), but maybe there is a more idiomatic > place? The case above is a simplified form of my real workload, which > involves smin and smax expressions; so the implementation has to be > easily generalizable to such cases.Before reading your two steps I was going to suggest jump threading. Jump threading is where we optimize redundant tests across blocks that feed into branches (block A tests property X and branches to block B which also tests property X). However jump threading is powered by lazy value info, which I don't think is suited for the sort of reasoning in your two steps. One option is GVN. GVN does have x < y expressions but it doesn't try to deduce nuw/nsw bits. It might be possible to add that, but it isn't immediately obvious to me how. GVN also does path-sensitive expression commoning. Nick
Sanjoy Das
2015-Jan-08 07:20 UTC
[LLVMdev] missing optimization for icmps in induction variables?
Hi Nick, I checked in something towards (1) yesterday -- http://reviews.llvm.org/D6748 I was under the impression that (2) is exactly the kind of predicate ScalarEvolution::isKnownPredicate is designed to solve (using isImpliedCondXXX or something like that). Is there a reason to prefer GVN over that? On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky <nicholas at mxc.ca> wrote:> Sanjoy Das wrote: >> >> Hi all, >> >> I'm trying to get llvm to optimize away the %cmp to true in >> >> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { >> entry: >> %length = load i32* %length_ptr, !range !0 >> %len.sub.1 = sub i32 %length, 1 >> %upper = icmp slt i32 %init, %len.sub.1 >> br i1 %upper, label %loop, label %exit >> >> loop: >> %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] >> %civ.inc = add i32 %civ, 1 >> %cmp = icmp slt i32 %civ.inc, %length >> br i1 %cmp, label %latch, label %break >> >> latch: >> store i32 0, i32* %array >> %check = icmp slt i32 %civ.inc, %len.sub.1 >> br i1 %check, label %loop, label %break >> >> break: >> ret i32 %civ.inc >> >> exit: >> ret i32 42 >> } >> >> !0 = !{i32 0, i32 2147483647} >> >> >> One way to prove "%cmp == true" in two steps >> >> 1. notice that since both on the backedge and entry, %civ is known to >> be less than %len.sub.1, which not i32_signed_max. This means >> %civ.inc is an "add nsw". >> >> 2. on both the entry and backedge, we know "%civ `slt` %len.sub.1". >> This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==> >> "%civ.inc `slt` %len". >> >> Currently neither of these happen (i.e. even if I make transformation >> (1) manually, (2) does not kick in). >> >> Is the above reasoning correct? If it is, what is the right place to >> add this logic to? I'm thinking ScalarEvolution (so that this gets >> picked up by SimplifyIndVar), but maybe there is a more idiomatic >> place? The case above is a simplified form of my real workload, which >> involves smin and smax expressions; so the implementation has to be >> easily generalizable to such cases. > > > Before reading your two steps I was going to suggest jump threading. Jump > threading is where we optimize redundant tests across blocks that feed into > branches (block A tests property X and branches to block B which also tests > property X). However jump threading is powered by lazy value info, which I > don't think is suited for the sort of reasoning in your two steps. > > One option is GVN. GVN does have x < y expressions but it doesn't try to > deduce nuw/nsw bits. It might be possible to add that, but it isn't > immediately obvious to me how. GVN also does path-sensitive expression > commoning. > > Nick