Zhou Sheng
2008-Dec-09 06:48 UTC
[LLVMdev] scalar-evolution + indvars fail to get the loop trip count?
Hi, Seems pass scalar-evolution+indvars fail to get the loop trip count of the following case: int foo(int x, int y, int lam[256], int alp[256]) { int i; int z = y; for (i = 255; i >= 0; i--) { z += x; lam[i] = alp[i]; } return z; } The final optimized ll code is : define i32 @foo(i32 %x, i32 %y, i32* %lam, i32* %alp) nounwind { entry: br label %bb bb: ; preds = %bb, %entry %indvar = phi i32 [ 0, %entry ], [ %indvar.next, %bb ] ; <i32> [#uses=4] %i.0.reg2mem.0 = sub i32 255, %indvar ; <i32> [#uses=2] %tmp7 = getelementptr i32* %alp, i32 %i.0.reg2mem.0 ; <i32*> [#uses=1] %tmp8 = load i32* %tmp7, align 4 ; <i32> [#uses=1] %tmp11 = getelementptr i32* %lam, i32 %i.0.reg2mem.0 ; <i32*> [#uses=1] store i32 %tmp8, i32* %tmp11, align 4 %tmp13 = sub i32 254, %indvar ; <i32> [#uses=1] %tmp16 = icmp slt i32 %tmp13, 0 ; <i1> [#uses=1] %indvar.next = add i32 %indvar, 1 ; <i32> [#uses=1] br i1 %tmp16, label %bb17, label %bb bb17: ; preds = %bb %indvar.lcssa = phi i32 [ %indvar, %bb ] ; <i32> [#uses=1] %tmp28 = mul i32 %indvar.lcssa, %x ; <i32> [#uses=1] %z.0.reg2mem.0 = add i32 %y, %x ; <i32> [#uses=1] %tmp4 = add i32 %z.0.reg2mem.0, %tmp28 ; <i32> [#uses=1] ret i32 %tmp4 } Surely the loop trip count is 256, but the Loop::getTripCount() will return NULL. This is due to the exit condition: %tmp16 = icmp slt i32 %tmp13, 0 ; <i1> [#uses=1] pass indvars should transform it into a canonical exit condtion EQ or NE, but as scalarevolution returned NonComputedItCount, it failed to do that. In pass ScalarEvolution three algorithms failed to compute the trip count: 1. In SCEVAddRecExpr::getNumIterationsInRange(), as the loop stride is negative, the exit value expression will be zero: (Here A is the stride value) // The exit value should be (End+A)/A. APInt ExitVal = (End + A).udiv(A); ConstantInt *ExitValue = ConstantInt::get(ExitVal); 2. In Line 1953, the switch-case computes the trip count due to different Cond code. But in this case, as it exits on true, the slt is converted to sge, not switch-case for this condtion. 3. As the loop count is over 100, the Brute-Force algorithm also doent work Can we improve this? Thanks, Sheng. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20081209/26c31c44/attachment.html>
Nick Lewycky
2008-Dec-09 07:25 UTC
[LLVMdev] scalar-evolution + indvars fail to get the loop trip count?
Zhou Sheng wrote:> Hi, > > Seems pass scalar-evolution+indvars fail to get the loop trip count of > the following case: > > int foo(int x, int y, int lam[256], int alp[256]) { > int i; > int z = y; > > for (i = 255; i >= 0; i--) { > z += x; > lam[i] = alp[i]; > } > return z; > } > > > The final optimized ll code is : > > define i32 @foo(i32 %x, i32 %y, i32* %lam, i32* %alp) nounwind { > entry: > br label %bb > > bb: ; preds = %bb, %entry > %indvar = phi i32 [ 0, %entry ], [ %indvar.next, %bb ] > ; <i32> [#uses=4] > %i.0.reg2mem.0 = sub i32 255, %indvar ; <i32> [#uses=2] > %tmp7 = getelementptr i32* %alp, i32 %i.0.reg2mem.0 > ; <i32*> [#uses=1] > %tmp8 = load i32* %tmp7, align 4 ; <i32> [#uses=1] > %tmp11 = getelementptr i32* %lam, i32 %i.0.reg2mem.0 > ; <i32*> [#uses=1] > store i32 %tmp8, i32* %tmp11, align 4 > %tmp13 = sub i32 254, %indvar ; <i32> [#uses=1] > %tmp16 = icmp slt i32 %tmp13, 0 ; <i1> [#uses=1] > %indvar.next = add i32 %indvar, 1 ; <i32> [#uses=1] > br i1 %tmp16, label %bb17, label %bb > > bb17: ; preds = %bb > %indvar.lcssa = phi i32 [ %indvar, %bb ] ; <i32> > [#uses=1] > %tmp28 = mul i32 %indvar.lcssa, %x ; <i32> [#uses=1] > %z.0.reg2mem.0 = add i32 %y, %x ; <i32> [#uses=1] > %tmp4 = add i32 %z.0.reg2mem.0, %tmp28 ; <i32> [#uses=1] > ret i32 %tmp4 > }Having the final .ll file doesn't help debug this. If you run opt -analyze -scalar-evolution on the .ll you pasted, it will correctly print out the loop trip count. I've modified llvm-gcc to remove all the passes after indvars.> Surely the loop trip count is 256, but the Loop::getTripCount() will > return NULL. > This is due to the exit condition: > > %tmp16 = icmp slt i32 %tmp13, 0 ; <i1> [#uses=1] > > pass indvars should transform it into a canonical exit condtion EQ or > NE, but as scalarevolution returned NonComputedItCount, it failed to do > that. > > In pass ScalarEvolution three algorithms failed to compute the trip count: > > 1. In SCEVAddRecExpr::getNumIterationsInRange(), as the loop stride is > negative, the exit value expression will be zero: (Here A is the stride > value) > > // The exit value should be (End+A)/A. > APInt ExitVal = (End + A).udiv(A); > ConstantInt *ExitValue = ConstantInt::get(ExitVal);Yes, that code path will fail for this example.> 2. In Line 1953, the switch-case computes the trip count due to > different Cond code. > But in this case, as it exits on true, the slt is converted to sge, > not switch-case for this condtion.Not sure what you mean. The problem is that HowManyLessThans was refusing to work on a signed less-than or equal to, even in the trivial case (just because I haven't yet gotten around to working on the complex case of a non-unit stride).> 3. As the loop count is over 100, the Brute-Force algorithm also doent work > > Can we improve this?Done. Committed in r60748. Nick Lewycky> Thanks, > Sheng. > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev