Chawla, Pankaj via llvm-dev
2019-Aug-21 00:35 UTC
[llvm-dev] missing simplification in ScalarEvolution?
Thanks for the suggestion but datalayout info did not solve the problem! -Pankaj -----Original Message----- From: Philip Reames <listmail at philipreames.com> Sent: Tuesday, August 20, 2019 5:26 PM To: Chawla, Pankaj <pankaj.chawla at intel.com>; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] missing simplification in ScalarEvolution? Try adding a datalayout with pointer size information to your reduced case, and see what happens. Not sure this is your problem, but I've been bitten by this before with hand reduced examples... Philip On 8/20/19 3:43 PM, Chawla, Pankaj via llvm-dev wrote:> Hi, > > I have this small test case- > > %struct1 = type { i32, i32 } > > @glob_const = internal constant [4 x %struct1] [%struct1 { i32 4, i32 > 5 }, %struct1 { i32 8, i32 9 }, %struct1 { i32 16, i32 0 }, %struct1 { > i32 32, i32 10 }], align 16 > > define void @foo() { > entry: > br label %loop > > loop: ; preds = %loop, %entry > %iv = phi %struct1* [ getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 0, i64 0), %entry ], [ %iv.inc, %loop ] > %gep = getelementptr inbounds %struct1, %struct1* %iv, i64 0, i32 0 > %ld = load i32, i32* %gep, align 8 > %iv.inc = getelementptr inbounds %struct1, %struct1* %iv, i64 1 > %cmp = icmp ult %struct1* %iv.inc, getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 1, i64 0) > br i1 %cmp, label %loop, label %exit > > exit: > ret void > } > > I was expecting the backedge taken count of the loop to be evaluated > to 3 by ScalarEvolution but instead it evaluates to this- > > Loop %loop: backedge-taken count is ((-1 + (-1 * @glob_const) + ((8 + > @glob_const)<nuw><nsw> umax (32 + @glob_const)<nsw>)) /u 8) Loop %loop: max backedge-taken count is 3, actual taken count either this or zero. > > > Can we deduce the final value of the IV (32 + @glob_const) to be greater than the initial value (8 + @glob_const) and simplify the backedge taken count to 3? > > Thanks, > Pankaj > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Sanjoy Das via llvm-dev
2019-Aug-25 20:52 UTC
[llvm-dev] missing simplification in ScalarEvolution?
I didn't step through this in a debugger, but I suspect we could fix the problem by: 1. Teaching SCEV to infer nuw for (32 + @glob_const)<nsw> (if legal) 2. Generalize ScalarEvolution::isKnownPredicateViaNoOverflow to work with this case That should fold the umax and give us a more precise trip count. -- Sanjoy On Tue, Aug 20, 2019 at 5:36 PM Chawla, Pankaj via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Thanks for the suggestion but datalayout info did not solve the problem! > > -Pankaj > > -----Original Message----- > From: Philip Reames <listmail at philipreames.com> > Sent: Tuesday, August 20, 2019 5:26 PM > To: Chawla, Pankaj <pankaj.chawla at intel.com>; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] missing simplification in ScalarEvolution? > > Try adding a datalayout with pointer size information to your reduced case, and see what happens. Not sure this is your problem, but I've been bitten by this before with hand reduced examples... > > Philip > > On 8/20/19 3:43 PM, Chawla, Pankaj via llvm-dev wrote: > > Hi, > > > > I have this small test case- > > > > %struct1 = type { i32, i32 } > > > > @glob_const = internal constant [4 x %struct1] [%struct1 { i32 4, i32 > > 5 }, %struct1 { i32 8, i32 9 }, %struct1 { i32 16, i32 0 }, %struct1 { > > i32 32, i32 10 }], align 16 > > > > define void @foo() { > > entry: > > br label %loop > > > > loop: ; preds = %loop, %entry > > %iv = phi %struct1* [ getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 0, i64 0), %entry ], [ %iv.inc, %loop ] > > %gep = getelementptr inbounds %struct1, %struct1* %iv, i64 0, i32 0 > > %ld = load i32, i32* %gep, align 8 > > %iv.inc = getelementptr inbounds %struct1, %struct1* %iv, i64 1 > > %cmp = icmp ult %struct1* %iv.inc, getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 1, i64 0) > > br i1 %cmp, label %loop, label %exit > > > > exit: > > ret void > > } > > > > I was expecting the backedge taken count of the loop to be evaluated > > to 3 by ScalarEvolution but instead it evaluates to this- > > > > Loop %loop: backedge-taken count is ((-1 + (-1 * @glob_const) + ((8 + > > @glob_const)<nuw><nsw> umax (32 + @glob_const)<nsw>)) /u 8) Loop %loop: max backedge-taken count is 3, actual taken count either this or zero. > > > > > > Can we deduce the final value of the IV (32 + @glob_const) to be greater than the initial value (8 + @glob_const) and simplify the backedge taken count to 3? > > > > Thanks, > > Pankaj > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Chawla, Pankaj via llvm-dev
2019-Aug-26 01:19 UTC
[llvm-dev] missing simplification in ScalarEvolution?
Hi Sanjoy, Thanks for the reply! Your approach sounds good to me! I think 1) is legal as address wraparound in unsigned range doesn't make sense given a positive offset, but I am not sure. I think umax will not be added if we can prove the predicate as known. I am not sure whether umax will get simplified if we add nuw to the expressions. -Pankaj -----Original Message----- From: Sanjoy Das <sanjoy at playingwithpointers.com> Sent: Sunday, August 25, 2019 1:53 PM To: Chawla, Pankaj <pankaj.chawla at intel.com> Cc: Philip Reames <listmail at philipreames.com>; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] missing simplification in ScalarEvolution? I didn't step through this in a debugger, but I suspect we could fix the problem by: 1. Teaching SCEV to infer nuw for (32 + @glob_const)<nsw> (if legal) 2. Generalize ScalarEvolution::isKnownPredicateViaNoOverflow to work with this case That should fold the umax and give us a more precise trip count. -- Sanjoy On Tue, Aug 20, 2019 at 5:36 PM Chawla, Pankaj via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Thanks for the suggestion but datalayout info did not solve the problem! > > -Pankaj > > -----Original Message----- > From: Philip Reames <listmail at philipreames.com> > Sent: Tuesday, August 20, 2019 5:26 PM > To: Chawla, Pankaj <pankaj.chawla at intel.com>; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] missing simplification in ScalarEvolution? > > Try adding a datalayout with pointer size information to your reduced case, and see what happens. Not sure this is your problem, but I've been bitten by this before with hand reduced examples... > > Philip > > On 8/20/19 3:43 PM, Chawla, Pankaj via llvm-dev wrote: > > Hi, > > > > I have this small test case- > > > > %struct1 = type { i32, i32 } > > > > @glob_const = internal constant [4 x %struct1] [%struct1 { i32 4, > > i32 > > 5 }, %struct1 { i32 8, i32 9 }, %struct1 { i32 16, i32 0 }, %struct1 > > { > > i32 32, i32 10 }], align 16 > > > > define void @foo() { > > entry: > > br label %loop > > > > loop: ; preds = %loop, %entry > > %iv = phi %struct1* [ getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 0, i64 0), %entry ], [ %iv.inc, %loop ] > > %gep = getelementptr inbounds %struct1, %struct1* %iv, i64 0, i32 0 > > %ld = load i32, i32* %gep, align 8 > > %iv.inc = getelementptr inbounds %struct1, %struct1* %iv, i64 1 > > %cmp = icmp ult %struct1* %iv.inc, getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 1, i64 0) > > br i1 %cmp, label %loop, label %exit > > > > exit: > > ret void > > } > > > > I was expecting the backedge taken count of the loop to be evaluated > > to 3 by ScalarEvolution but instead it evaluates to this- > > > > Loop %loop: backedge-taken count is ((-1 + (-1 * @glob_const) + ((8 > > + @glob_const)<nuw><nsw> umax (32 + @glob_const)<nsw>)) /u 8) Loop %loop: max backedge-taken count is 3, actual taken count either this or zero. > > > > > > Can we deduce the final value of the IV (32 + @glob_const) to be greater than the initial value (8 + @glob_const) and simplify the backedge taken count to 3? > > > > Thanks, > > Pankaj > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev