thr3ads.net - llvm dev - [LLVMdev] SCEV and GEP NSW flag [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Oct-31 20:24 UTC

[LLVMdev] SCEV and GEP NSW flag

Andy, et al.,

If I start with C code like this:

void foo(long k, int * restrict a, int * restrict b, int * restrict c) {
  if (k > 0) {
    for (int i = 0; i < 2047; i++) {
      a[i] = a[i + k] + b[i] * c[i];
    }
  }
}

Clang -O3 will produce code like this:

entry:
  %cmp = icmp sgt i64 %k, 0
  br i1 %cmp, label %for.body.preheader, label %if.end

for.body.preheader:
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,
%for.body.preheader ]
  %add = add nsw i64 %indvars.iv, %k
  %arrayidx = getelementptr inbounds i32* %a, i64 %add
  %0 = load i32* %arrayidx, align 4, !tbaa !1
...
  %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
  store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
...

My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is
negative, as the explicit loop guard guarantees. On the path toward that goal,
I've run into the following:

The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>

but the SCEV for &a[i + k] looks like this: {((4 * %k) +
%a),+,4}<%for.body>

and the problem is that this expression, even though it comes from a inbounds
GEP, with nsw on all contributing instructions, does not have the <nsw>
flag. I think that the problem is the (4 * %k) 'Index' expression, which
lacks the no-wrap flags, and when createNodeForGEP uses it to construct the
overall expression for the address, the result also lacks the nsw flag.

The reason that the (4 * %k) 'Index' expression lacks the flags is
explained in createSCEV:

    // Don't apply this instruction's NSW or NUW flags to the new
    // expression. The instruction may be guarded by control flow that the
    // no-wrap behavior depends on. Non-control-equivalent instructions can be
    // mapped to the same SCEV expression, and it would be incorrect to transfer
    // NSW/NUW semantics to those operations.

and I appreciate this concern, but I suspect it is supposed to be possible to
recover the flags on the resulting AddRec, and how is that supposed to happen?
One issue may be that the control flow mentioned in the comment could be in the
loop, although in this one-BB loop with no calls, I think that can be ruled out.

Thanks again,
Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

2013-Nov-02 06:27 UTC

head link

[LLVMdev] SCEV and GEP NSW flag

On Oct 31, 2013, at 1:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> Andy, et al.,
> 
> If I start with C code like this:
> 
> void foo(long k, int * restrict a, int * restrict b, int * restrict c) {
>  if (k > 0) {
>    for (int i = 0; i < 2047; i++) {
>      a[i] = a[i + k] + b[i] * c[i];
>    }
>  }
> }
> 
> Clang -O3 will produce code like this:
> 
> entry:
>  %cmp = icmp sgt i64 %k, 0
>  br i1 %cmp, label %for.body.preheader, label %if.end
> 
> for.body.preheader:
>  br label %for.body
> 
> for.body:
>  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,
%for.body.preheader ]
>  %add = add nsw i64 %indvars.iv, %k
>  %arrayidx = getelementptr inbounds i32* %a, i64 %add
>  %0 = load i32* %arrayidx, align 4, !tbaa !1
> ...
>  %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
>  store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
> ...
> 
> My goal here is to detect that, within the loop, (&a[i] - &a[i +
k]) is negative, as the explicit loop guard guarantees. On the path toward that
goal, I've run into the following:
> 
> The SCEV for &a[i] looks like this:
{%a,+,4}<nsw><%for.body>
> 
> but the SCEV for &a[i + k] looks like this: {((4 * %k) +
%a),+,4}<%for.body>
> 
> and the problem is that this expression, even though it comes from a
inbounds GEP, with nsw on all contributing instructions, does not have the
<nsw> flag. I think that the problem is the (4 * %k) 'Index'
expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to
construct the overall expression for the address, the result also lacks the nsw
flag.
> 
> The reason that the (4 * %k) 'Index' expression lacks the flags is
explained in createSCEV:
> 
>    // Don't apply this instruction's NSW or NUW flags to the new
>    // expression. The instruction may be guarded by control flow that the
>    // no-wrap behavior depends on. Non-control-equivalent instructions can
be
>    // mapped to the same SCEV expression, and it would be incorrect to
transfer
>    // NSW/NUW semantics to those operations.
> 
> and I appreciate this concern, but I suspect it is supposed to be possible
to recover the flags on the resulting AddRec, and how is that supposed to
happen? One issue may be that the control flow mentioned in the comment could be
in the loop, although in this one-BB loop with no calls, I think that can be
ruled out.
I'm sorry, I don't understand why the scale and addition on an inbounds
GEP can be considered NSW. Can we assume malloc won't span the high/low
address boundary? Then were left with folks doing mmap... Well, based on my
understanding, createNodeForGEP looks wrong--I really hope I'm the one
that's wrong.

As an aside, you could still prove no-self-wrap for this recurrence. If the base
of the GEP is an NW recurrence (note that NSW or NUW implies NW), the GEP itself
is inbounds, and the GEP simplifies to a recurrences, then I think you can infer
that the recurrence for the GEP is also NW. I don’t think that adds useful
information to this query though :(

Along those lines, I’m a little baffled as to why you want an NSW recurrence.
You already know the difference is 4*k and k > 0. Isn’t that enough? Is the
problem that 4*k may overflow?

Maybe this sort of alias analysis needs to reason about array indices, not
addresses.

-Andy
> 
> Thanks again,
> Hal
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

Hal Finkel

2013-Nov-02 12:43 UTC

head link

[LLVMdev] SCEV and GEP NSW flag

----- Original Message -----> 
> On Oct 31, 2013, at 1:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > Andy, et al.,
> > 
> > If I start with C code like this:
> > 
> > void foo(long k, int * restrict a, int * restrict b, int * restrict
> > c) {
> >  if (k > 0) {
> >    for (int i = 0; i < 2047; i++) {
> >      a[i] = a[i + k] + b[i] * c[i];
> >    }
> >  }
> > }
> > 
> > Clang -O3 will produce code like this:
> > 
> > entry:
> >  %cmp = icmp sgt i64 %k, 0
> >  br i1 %cmp, label %for.body.preheader, label %if.end
> > 
> > for.body.preheader:
> >  br label %for.body
> > 
> > for.body:
> >  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,
> >  %for.body.preheader ]
> >  %add = add nsw i64 %indvars.iv, %k
> >  %arrayidx = getelementptr inbounds i32* %a, i64 %add
> >  %0 = load i32* %arrayidx, align 4, !tbaa !1
> > ...
> >  %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
> >  store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
> > ...
> > 
> > My goal here is to detect that, within the loop, (&a[i] - &a[i
+
> > k]) is negative, as the explicit loop guard guarantees. On the
> > path toward that goal, I've run into the following:
> > 
> > The SCEV for &a[i] looks like this:
{%a,+,4}<nsw><%for.body>
> > 
> > but the SCEV for &a[i + k] looks like this: {((4 * %k) +
> > %a),+,4}<%for.body>
> > 
> > and the problem is that this expression, even though it comes from
> > a inbounds GEP, with nsw on all contributing instructions, does
> > not have the <nsw> flag. I think that the problem is the (4 *
%k)
> > 'Index' expression, which lacks the no-wrap flags, and when
> > createNodeForGEP uses it to construct the overall expression for
> > the address, the result also lacks the nsw flag.
> > 
> > The reason that the (4 * %k) 'Index' expression lacks the
flags is
> > explained in createSCEV:
> > 
> >    // Don't apply this instruction's NSW or NUW flags to the
new
> >    // expression. The instruction may be guarded by control flow
> >    that the
> >    // no-wrap behavior depends on. Non-control-equivalent
> >    instructions can be
> >    // mapped to the same SCEV expression, and it would be incorrect
> >    to transfer
> >    // NSW/NUW semantics to those operations.
> > 
> > and I appreciate this concern, but I suspect it is supposed to be
> > possible to recover the flags on the resulting AddRec, and how is
> > that supposed to happen? One issue may be that the control flow
> > mentioned in the comment could be in the loop, although in this
> > one-BB loop with no calls, I think that can be ruled out.
> 
> I'm sorry, I don't understand why the scale and addition on an
> inbounds GEP can be considered NSW. Can we assume malloc won't span
> the high/low address boundary? 
I think that we're okay on this front. The language reference defines
inbounds GEPs in terms of "infinitely precise signed arithmetic", so I
think that wrapping and being inbounds are incompatible.
>Then were left with folks doing
> mmap... Well, based on my understanding, createNodeForGEP looks
> wrong--I really hope I'm the one that's wrong.
I think that the code is proabably okay: it is not forcing the result to have
the nsw flag (if it did, I probably would not have written my e-mail), but
applying it only to the 'address computation' itself. On the other hand,
maybe that violates SCEV's general philosophy of being control-flow
independent (maybe the logic that prohibits us from applying an add's nsw
flag to an SCEVAddExpr should prohibit this as well).
> 
> As an aside, you could still prove no-self-wrap for this recurrence.
> If the base of the GEP is an NW recurrence (note that NSW or NUW
> implies NW), the GEP itself is inbounds, and the GEP simplifies to a
> recurrences, then I think you can infer that the recurrence for the
> GEP is also NW.
Okay, so you mean that I could force set the NW flag on the resulting GEP in
that case?
> I don’t think that adds useful information to this
> query though :(
> 
> Along those lines, I’m a little baffled as to why you want an NSW
> recurrence. You already know the difference is 4*k and k > 0. Isn’t
> that enough? Is the problem that 4*k may overflow?
Ah, I forgot to explain, one problem is that I can't divide {((4 * %k) + 
%a),+,4}<%for.body> by 4 to get ride of the type size. I was trying to do
this in order to make it easier for isLoopEntryGuardedByCond to match the
condition, but even dividing by 4 does not work if the recurrence is not
<nsw> (as far as I can tell).
> 
> Maybe this sort of alias analysis needs to reason about array
> indices, not addresses.
Being able to divide by the type sizes would make this easier ;)

Thanks again,
Hal
> 
> -Andy
> 
> > 
> > Thanks again,
> > Hal
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Oct 2013 - [LLVMdev] SCEV and GEP NSW flag

[LLVMdev] SCEV and GEP NSW flag

[LLVMdev] SCEV and GEP NSW flag

[LLVMdev] SCEV and GEP NSW flag

Maybe Matching Threads