Hi, I'm trying to get the following loop to vectorize (simple reduction): unsigned int sum2(unsigned int *a, int len){ unsigned int s = 0; for (int i = 0; i < len; i += 4) s += *a++; return s; } The loop fails to vectorize because SCEV could not compute the loop exit count. It appears SCEV cannot handle the non-unit increment of the loop counter. Is this a known limitation of SCEV or is there something that can be improved (and if so where should I start looking?) Thanks, Paul Here's the IR for the loop and the SCEV dump: for.body: ; preds %for.body.lr.ph, %for.body %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ] %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 %0 = load i32* %a.addr.05, align 4, !tbaa !0 %add = add i32 %0, %s.06 %add1 = add nsw i32 %i.07, 4 %cmp = icmp slt i32 %add1, %len br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge Classifying expressions for: @_Z4sum2Pji %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] --> {0,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] --> %s.06 Exits: <<Unknown>> %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ] --> {%a,+,4}<nw><%for.body> Exits: <<Unknown>> %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 --> {(4 + %a),+,4}<nw><%for.body> Exits: <<Unknown>> %0 = load i32* %a.addr.05, align 4, !tbaa !0 --> %0 Exits: <<Unknown>> %add = add i32 %0, %s.06 --> (%0 + %s.06) Exits: <<Unknown>> %add1 = add nsw i32 %i.07, 4 --> {4,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> %add.lcssa = phi i32 [ %add, %for.body ] --> %add.lcssa %s.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0, %entry ] --> %s.0.lcssa Determining loop execution counts for: @_Z4sum2Pji Loop %for.body: Unpredictable backedge-taken count. Loop %for.body: Unpredictable max backedge-taken count.
On 22 August 2013 13:24, Redmond, Paul <paul.redmond at intel.com> wrote:> Hi, > > I'm trying to get the following loop to vectorize (simple reduction): > > unsigned int sum2(unsigned int *a, int len){ > unsigned int s = 0; > for (int i = 0; i < len; i += 4) > s += *a++; > return s; > } > > > The loop fails to vectorize because SCEV could not compute the loop exit > count. It appears SCEV cannot handle the non-unit increment of the loop > counter. Is this a known limitation of SCEV or is there something that can > be improved (and if so where should I start looking?) >It's a known limitation, see ScalarEvolution.cpp:5568. The fundamental problem is that len in your example could be (unsigned) -1, -2 or -3, in which case your loop is infinite. Nick> Here's the IR for the loop and the SCEV dump: > > for.body: ; preds > %for.body.lr.ph, %for.body > %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] > %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] > %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body > ] > %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 > %0 = load i32* %a.addr.05, align 4, !tbaa !0 > %add = add i32 %0, %s.06 > %add1 = add nsw i32 %i.07, 4 > %cmp = icmp slt i32 %add1, %len > br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge > > > Classifying expressions for: @_Z4sum2Pji > %i.07 = phi i32 [ 0, %for.body.lr.ph ], [ %add1, %for.body ] > --> {0,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> > %s.06 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ] > --> %s.06 Exits: <<Unknown>> > %a.addr.05 = phi i32* [ %a, %for.body.lr.ph ], [ %incdec.ptr, %for.body ] > --> {%a,+,4}<nw><%for.body> Exits: <<Unknown>> > %incdec.ptr = getelementptr inbounds i32* %a.addr.05, i64 1 > --> {(4 + %a),+,4}<nw><%for.body> Exits: <<Unknown>> > %0 = load i32* %a.addr.05, align 4, !tbaa !0 > --> %0 Exits: <<Unknown>> > %add = add i32 %0, %s.06 > --> (%0 + %s.06) Exits: <<Unknown>> > %add1 = add nsw i32 %i.07, 4 > --> {4,+,4}<nuw><nsw><%for.body> Exits: <<Unknown>> > %add.lcssa = phi i32 [ %add, %for.body ] > --> %add.lcssa > %s.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0, > %entry ] > --> %s.0.lcssa > Determining loop execution counts for: @_Z4sum2Pji > Loop %for.body: Unpredictable backedge-taken count. > Loop %for.body: Unpredictable max backedge-taken count. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130822/3dff4ced/attachment.html>
On Thu, Aug 22, 2013 at 6:40 PM, Nick Lewycky <nlewycky at google.com> wrote:> It's a known limitation, see ScalarEvolution.cpp:5568. > > The fundamental problem is that len in your example could be (unsigned) > -1, -2 or -3, in which case your loop is infinite. >Unless I'm missing something, if len is -1 (or otherwise less than 0) the loop has 0 trip count. Did you mean INT_MAX-1, INT_MAX-2 etc? In this case, I believe, the behavior is undefined, because adds are marked with "nsw". Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130822/74d2c0f9/attachment.html>
Possibly Parallel Threads
- [LLVMdev] scev questions
- [LLVMdev] alias analysis on llvm internal globals
- [LLVMdev] Virtual register def doesn't dominate all uses
- [LLVMdev] Virtual register def doesn't dominate all uses
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'