Andy, et al., If I start with C code like this: void foo(long k, int * restrict a, int * restrict b, int * restrict c) { if (k > 0) { for (int i = 0; i < 2047; i++) { a[i] = a[i + k] + b[i] * c[i]; } } } Clang -O3 will produce code like this: entry: %cmp = icmp sgt i64 %k, 0 br i1 %cmp, label %for.body.preheader, label %if.end for.body.preheader: br label %for.body for.body: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ] %add = add nsw i64 %indvars.iv, %k %arrayidx = getelementptr inbounds i32* %a, i64 %add %0 = load i32* %arrayidx, align 4, !tbaa !1 ... %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1 ... My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is negative, as the explicit loop guard guarantees. On the path toward that goal, I've run into the following: The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body> but the SCEV for &a[i + k] looks like this: {((4 * %k) + %a),+,4}<%for.body> and the problem is that this expression, even though it comes from a inbounds GEP, with nsw on all contributing instructions, does not have the <nsw> flag. I think that the problem is the (4 * %k) 'Index' expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to construct the overall expression for the address, the result also lacks the nsw flag. The reason that the (4 * %k) 'Index' expression lacks the flags is explained in createSCEV: // Don't apply this instruction's NSW or NUW flags to the new // expression. The instruction may be guarded by control flow that the // no-wrap behavior depends on. Non-control-equivalent instructions can be // mapped to the same SCEV expression, and it would be incorrect to transfer // NSW/NUW semantics to those operations. and I appreciate this concern, but I suspect it is supposed to be possible to recover the flags on the resulting AddRec, and how is that supposed to happen? One issue may be that the control flow mentioned in the comment could be in the loop, although in this one-BB loop with no calls, I think that can be ruled out. Thanks again, Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On Oct 31, 2013, at 1:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:> Andy, et al., > > If I start with C code like this: > > void foo(long k, int * restrict a, int * restrict b, int * restrict c) { > if (k > 0) { > for (int i = 0; i < 2047; i++) { > a[i] = a[i + k] + b[i] * c[i]; > } > } > } > > Clang -O3 will produce code like this: > > entry: > %cmp = icmp sgt i64 %k, 0 > br i1 %cmp, label %for.body.preheader, label %if.end > > for.body.preheader: > br label %for.body > > for.body: > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ] > %add = add nsw i64 %indvars.iv, %k > %arrayidx = getelementptr inbounds i32* %a, i64 %add > %0 = load i32* %arrayidx, align 4, !tbaa !1 > ... > %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv > store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1 > ... > > My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is negative, as the explicit loop guard guarantees. On the path toward that goal, I've run into the following: > > The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body> > > but the SCEV for &a[i + k] looks like this: {((4 * %k) + %a),+,4}<%for.body> > > and the problem is that this expression, even though it comes from a inbounds GEP, with nsw on all contributing instructions, does not have the <nsw> flag. I think that the problem is the (4 * %k) 'Index' expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to construct the overall expression for the address, the result also lacks the nsw flag. > > The reason that the (4 * %k) 'Index' expression lacks the flags is explained in createSCEV: > > // Don't apply this instruction's NSW or NUW flags to the new > // expression. The instruction may be guarded by control flow that the > // no-wrap behavior depends on. Non-control-equivalent instructions can be > // mapped to the same SCEV expression, and it would be incorrect to transfer > // NSW/NUW semantics to those operations. > > and I appreciate this concern, but I suspect it is supposed to be possible to recover the flags on the resulting AddRec, and how is that supposed to happen? One issue may be that the control flow mentioned in the comment could be in the loop, although in this one-BB loop with no calls, I think that can be ruled out.I'm sorry, I don't understand why the scale and addition on an inbounds GEP can be considered NSW. Can we assume malloc won't span the high/low address boundary? Then were left with folks doing mmap... Well, based on my understanding, createNodeForGEP looks wrong--I really hope I'm the one that's wrong. As an aside, you could still prove no-self-wrap for this recurrence. If the base of the GEP is an NW recurrence (note that NSW or NUW implies NW), the GEP itself is inbounds, and the GEP simplifies to a recurrences, then I think you can infer that the recurrence for the GEP is also NW. I don’t think that adds useful information to this query though :( Along those lines, I’m a little baffled as to why you want an NSW recurrence. You already know the difference is 4*k and k > 0. Isn’t that enough? Is the problem that 4*k may overflow? Maybe this sort of alias analysis needs to reason about array indices, not addresses. -Andy> > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
----- Original Message -----> > On Oct 31, 2013, at 1:24 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > > Andy, et al., > > > > If I start with C code like this: > > > > void foo(long k, int * restrict a, int * restrict b, int * restrict > > c) { > > if (k > 0) { > > for (int i = 0; i < 2047; i++) { > > a[i] = a[i + k] + b[i] * c[i]; > > } > > } > > } > > > > Clang -O3 will produce code like this: > > > > entry: > > %cmp = icmp sgt i64 %k, 0 > > br i1 %cmp, label %for.body.preheader, label %if.end > > > > for.body.preheader: > > br label %for.body > > > > for.body: > > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, > > %for.body.preheader ] > > %add = add nsw i64 %indvars.iv, %k > > %arrayidx = getelementptr inbounds i32* %a, i64 %add > > %0 = load i32* %arrayidx, align 4, !tbaa !1 > > ... > > %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv > > store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1 > > ... > > > > My goal here is to detect that, within the loop, (&a[i] - &a[i + > > k]) is negative, as the explicit loop guard guarantees. On the > > path toward that goal, I've run into the following: > > > > The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body> > > > > but the SCEV for &a[i + k] looks like this: {((4 * %k) + > > %a),+,4}<%for.body> > > > > and the problem is that this expression, even though it comes from > > a inbounds GEP, with nsw on all contributing instructions, does > > not have the <nsw> flag. I think that the problem is the (4 * %k) > > 'Index' expression, which lacks the no-wrap flags, and when > > createNodeForGEP uses it to construct the overall expression for > > the address, the result also lacks the nsw flag. > > > > The reason that the (4 * %k) 'Index' expression lacks the flags is > > explained in createSCEV: > > > > // Don't apply this instruction's NSW or NUW flags to the new > > // expression. The instruction may be guarded by control flow > > that the > > // no-wrap behavior depends on. Non-control-equivalent > > instructions can be > > // mapped to the same SCEV expression, and it would be incorrect > > to transfer > > // NSW/NUW semantics to those operations. > > > > and I appreciate this concern, but I suspect it is supposed to be > > possible to recover the flags on the resulting AddRec, and how is > > that supposed to happen? One issue may be that the control flow > > mentioned in the comment could be in the loop, although in this > > one-BB loop with no calls, I think that can be ruled out. > > I'm sorry, I don't understand why the scale and addition on an > inbounds GEP can be considered NSW. Can we assume malloc won't span > the high/low address boundary?I think that we're okay on this front. The language reference defines inbounds GEPs in terms of "infinitely precise signed arithmetic", so I think that wrapping and being inbounds are incompatible.>Then were left with folks doing > mmap... Well, based on my understanding, createNodeForGEP looks > wrong--I really hope I'm the one that's wrong.I think that the code is proabably okay: it is not forcing the result to have the nsw flag (if it did, I probably would not have written my e-mail), but applying it only to the 'address computation' itself. On the other hand, maybe that violates SCEV's general philosophy of being control-flow independent (maybe the logic that prohibits us from applying an add's nsw flag to an SCEVAddExpr should prohibit this as well).> > As an aside, you could still prove no-self-wrap for this recurrence. > If the base of the GEP is an NW recurrence (note that NSW or NUW > implies NW), the GEP itself is inbounds, and the GEP simplifies to a > recurrences, then I think you can infer that the recurrence for the > GEP is also NW.Okay, so you mean that I could force set the NW flag on the resulting GEP in that case?> I don’t think that adds useful information to this > query though :( > > Along those lines, I’m a little baffled as to why you want an NSW > recurrence. You already know the difference is 4*k and k > 0. Isn’t > that enough? Is the problem that 4*k may overflow?Ah, I forgot to explain, one problem is that I can't divide {((4 * %k) + %a),+,4}<%for.body> by 4 to get ride of the type size. I was trying to do this in order to make it easier for isLoopEntryGuardedByCond to match the condition, but even dividing by 4 does not work if the recurrence is not <nsw> (as far as I can tell).> > Maybe this sort of alias analysis needs to reason about array > indices, not addresses.Being able to divide by the type sizes would make this easier ;) Thanks again, Hal> > -Andy > > > > > Thanks again, > > Hal > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory