Hi,
I'm trying to get the following loop to vectorize (simple reduction):
unsigned int sum2(unsigned int *a, int len){
  unsigned int s = 0;
  for (int i = 0; i < len; i += 4)
    s += *a++;
  return s;
}
The loop fails to vectorize because SCEV could not compute the loop exit
count. It appears SCEV cannot handle the non-unit increment of the loop
counter. Is this a known limitation of SCEV or is there something that can
be improved (and if so where should I start looking?)
Thanks,
Paul
Here's the IR for the loop and the SCEV dump:
for.body:                                         ; preds %for.body.lr.ph,
%for.body
  %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ]
  %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ]
  %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
  %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1
  %0 = load i32* %a.addr.05, align 4, !tbaa !0
  %add = add i32 %0, %s.06
  %add1 = add nsw i32 %i.07, 4
  %cmp = icmp slt i32 %add1, %len
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
Classifying expressions for: @_Z4sum2Pji
%i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ]
--> {0,+,4}<nuw><nsw><%for.body>	 Exits:
<<Unknown>>
%s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ]
--> %s.06	 Exits: <<Unknown>>
%a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
--> {%a,+,4}<nw><%for.body>	 Exits: <<Unknown>>
%incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1
--> {(4 + %a),+,4}<nw><%for.body>	 Exits: <<Unknown>>
%0 = load i32* %a.addr.05, align 4, !tbaa !0
--> %0	 Exits: <<Unknown>>
%add = add i32 %0, %s.06
--> (%0 + %s.06)	 Exits: <<Unknown>>
%add1 = add nsw i32 %i.07, 4
--> {4,+,4}<nuw><nsw><%for.body>	 Exits:
<<Unknown>>
%add.lcssa = phi i32 [ %add, %for.body ]
--> %add.lcssa
%s.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0,
%entry ]
--> %s.0.lcssa
Determining loop execution counts for: @_Z4sum2Pji
Loop %for.body: Unpredictable backedge-taken count.
Loop %for.body: Unpredictable max backedge-taken count.
On 22 August 2013 13:24, Redmond, Paul <paul.redmond at intel.com> wrote:> Hi, > > I'm trying to get the following loop to vectorize (simple reduction): > > unsigned int sum2(unsigned int *a, int len){ > unsigned int s = 0; > for (int i = 0; i < len; i += 4) > s += *a++; > return s; > } > > > The loop fails to vectorize because SCEV could not compute the loop exit > count. It appears SCEV cannot handle the non-unit increment of the loop > counter. Is this a known limitation of SCEV or is there something that can > be improved (and if so where should I start looking?) >It's a known limitation, see ScalarEvolution.cpp:5568. The fundamental problem is that len in your example could be (unsigned) -1, -2 or -3, in which case your loop is infinite. Nick> Here's the IR for the loop and the SCEV dump: > > for.body: ; preds > %for.body.lr.ph, %for.body > %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] > %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] > %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body > ] > %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 > %0 = load i32* %a.addr.05, align 4, !tbaa !0 > %add = add i32 %0, %s.06 > %add1 = add nsw i32 %i.07, 4 > %cmp = icmp slt i32 %add1, %len > br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge > > > Classifying expressions for: @_Z4sum2Pji > %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] > --> {0,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> > %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] > --> %s.06 Exits: <<Unknown>> > %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ] > --> {%a,+,4}<nw><%for.body> Exits: <<Unknown>> > %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 > --> {(4 + %a),+,4}<nw><%for.body> Exits: <<Unknown>> > %0 = load i32* %a.addr.05, align 4, !tbaa !0 > --> %0 Exits: <<Unknown>> > %add = add i32 %0, %s.06 > --> (%0 + %s.06) Exits: <<Unknown>> > %add1 = add nsw i32 %i.07, 4 > --> {4,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> > %add.lcssa = phi i32 [ %add, %for.body ] > --> %add.lcssa > %s.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0, > %entry ] > --> %s.0.lcssa > Determining loop execution counts for: @_Z4sum2Pji > Loop %for.body: Unpredictable backedge-taken count. > Loop %for.body: Unpredictable max backedge-taken count. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130822/3dff4ced/attachment.html>
On Thu, Aug 22, 2013 at 6:40 PM, Nick Lewycky <nlewycky at google.com> wrote:> It's a known limitation, see ScalarEvolution.cpp:5568. > > The fundamental problem is that len in your example could be (unsigned) > -1, -2 or -3, in which case your loop is infinite. >Unless I'm missing something, if len is -1 (or otherwise less than 0) the loop has 0 trip count. Did you mean INT_MAX-1, INT_MAX-2 etc? In this case, I believe, the behavior is undefined, because adds are marked with "nsw". Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130822/74d2c0f9/attachment.html>