On Feb 13, 2014, at 2:55 PM, Matt Arsenault <Matthew.Arsenault at amd.com>
wrote:
> Hi,
>
> I'm trying to solve a problem with loop induction variables being
larger than they need to be. It looks like IndVarSimplify or LoopStrengthReduce
are supposed to do what I want, but it isn't happening.
>
> If I have a function like this, the local pointers are 32-bits, but size_t
is 64. 64 bit integers require an extra register (but is a legal type), and i64
add isn't a legal operation for the target, so it should be avoided. The
loop induction variable should be replaced with a cheaper 32-bit integer, since
the bound and the pointer are 32-bits. Instead, the bound is extended to i64,
the induction variable and bounds check stays i64, and then has to be truncated
to the pointer size. LSR does nothing after concluding that there are no
"interesting" IV users.
>
> How / where should I go about fixing this? I don't really understand
the difference between indvars and LSR.
Interesting problem. I don’t have a solution but can make a few observations.
The IR coming out of your frontend has already promoted the cmp ult %inc, %num
to i64. Demoting it requires reasoning that %inc does not overflow within the
loop. The indvars pass could in theory do this. It already uses SCEV to
determine that sext(trunc(%inc)) == %inc. But demoting compares isn’t something
indvars currently does. Instead, indvars tries to promote induction variables to
hoist sext/zext outside the loop, promoting compares in the process. It does
this as long as i64 is a legal type. So even if your frontend were to generate
the i32 IV, you may need to teach indvars *not* to promote the IV by checking
the cost model in addition to isLegalInteger.
It looks to me like the trunc is an interesting user of %i.08. That means that
the SCEV expression is a recurrence (evolves from some loop invariant base,
adding a stride at each iteration). So I don’t know why LSR claims no
interesting users. However, I’m not surprised LSR doesn’t do anything. It mainly
tries to reduce the number of registers live in the loop. So not much it can do
here.
A new target-sensitive IV optimization based on SCEV could go in either indvars
or LSR. There are already random loop exit optimizations in both passes that
don’t really fit with the main pass. It’s mainly a question of whether you want
to do it before or after vectorization.
-Andy
> void matrixSum(local double* partialSums, local double* finalSum, int num)
> {
> double sum = 0.0;
>
> for (size_t i = 0; i < num; ++i) // size_t is i64
> {
> sum += partialSums[i];
> }
>
> finalSum[0] = sum;
> }
>
>
> define void @matrixSum(double addrspace(3)* nocapture readonly
%partialSums, double addrspace(3)* nocapture %finalSum, i32 %num) #0 {
> entry:
> %conv = sext i32 %num to i64
> %cmp6 = icmp eq i32 %num, 0
> br i1 %cmp6, label %for.end, label %for.body
>
> for.body: ; preds = %entry,
%for.body
> %i.08 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
> %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ]
> %idxprom = trunc i64 %i.08 to i32
> %arrayidx = getelementptr inbounds double addrspace(3)* %partialSums, i32
%idxprom
> %0 = load double addrspace(3)* %arrayidx, align 8, !tbaa !2
> %add = fadd double %sum.07, %0
> %inc = add i64 %i.08, 1
> %cmp = icmp ult i64 %inc, %conv
> br i1 %cmp, label %for.body, label %for.end
>
> for.end: ; preds = %for.body,
%entry
> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]
> store double %sum.0.lcssa, double addrspace(3)* %finalSum, align 8, !tbaa
!2
> ret void
> }
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev