thr3ads.net - llvm dev - [LLVMdev] First attempt at recognizing pointer reduction [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Oct-23 14:41 UTC

[LLVMdev] First attempt at recognizing pointer reduction

On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at
apple.com>wrote:
> I don’t think that recognizing this as a reduction is going to get you
> far. A reduction is beneficial if the value reduced is only truly needed
> outside of a loop.
> This is not the case here (we are storing/loading from the pointer).
>
Hi Arnold, Nadav,

Let me resurrect this discussion a bit.

There are few things that I need to check while validating a stride access
loop:
 1. If there is an induction variable, and it has stride > 1. This is easy,
and can be easily extended on isInductionVariable() by adding Step
to InductionInfo;
 2. If the reduction variables' access are sequential and compatible with
the stride. This is the core of the change, and will require SCEV
computation, as extensively described by Arnold.
 3. If the reduction variable has any use outside of the loop. This will
need to account for global variables, which is in itself, a use of the
computed value.

The problem I'm having, and why I'm resurrecting this thread, is in this
simple code:

#define OFFSET 255
extern char b[OFFSET];
char *fn1 (char a[]) {
    unsigned i;

    for (i=0; i<OFFSET; i += 3) {
      a[i] = OFFSET - b[i] - DELTA;
      a[i+1] = OFFSET - b[i+1] - DELTA;
      a[i+2] = OFFSET - b[i+2] - DELTA;
    }
    return &a[0];
}

This is the most trivial example. The exit count is constant, the
operations are the same and compatible with the stride of 3, and the
variable "a" is used outside the loop. However, the optimized code
that the
vectorizer sees is after SCEV has passed through it, leaving the IR very
similar to my original example:

for.body:                                         ; preds = %entry,
%for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds [255 x i8]* @b, i64 0, i64 %indvars.iv
  %0 = load i8* %arrayidx, align 1
  %1 = xor i8 %0, -1
  %2 = load i8* @DELTA, align 1
  %sub2 = sub i8 %1, %2
  %arrayidx5 = getelementptr inbounds i8* %a, i64 %indvars.iv
  store i8 %sub2, i8* %arrayidx5, align 1
  ...
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 3
  %11 = trunc i64 %indvars.iv.next to i32
  %cmp = icmp ult i32 %11, 255
  br i1 %cmp, label %for.body, label %for.end

So, as you can see, my original patch to recognize pointer reductions is
still relevant, since I'll need to act on the list of induction and
reduction variables to discover if the reduction's stride is compatible
with the induction's step.

Also, I'll need that to make sure that the reduction in question has access
outside (via global / extern qualifiers), so ignoring load/store
instructions that won't have any users, because the variables are not used
directly, but through &a[0].

Arnold,

I agree with you that the analysis should not be done exclusively trying to
manipulate the pointer reduction variables (but using SCEV as you
describe), but I can't see how I can get past the validation phase if, at
least, I don't recognize the GEP PHIs as a reduction variable.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/ea1c2899/attachment.html>

Arnold Schwaighofer

2013-Oct-23 15:05 UTC

head link

[LLVMdev] First attempt at recognizing pointer reduction

On Oct 23, 2013, at 9:41 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> I don’t think that recognizing this as a reduction is going to get you far.
A reduction is beneficial if the value reduced is only truly needed outside of a
loop.
> This is not the case here (we are storing/loading from the pointer).
> 
> Hi Arnold, Nadav,
> 
> Let me resurrect this discussion a bit.
> 
> There are few things that I need to check while validating a stride access
loop:
>  1. If there is an induction variable, and it has stride > 1. This is
easy, and can be easily extended on isInductionVariable() by adding Step to
InductionInfo;
Yes.
>  2. If the reduction variables' access are sequential and compatible
with the stride. This is the core of the change, and will require SCEV
computation, as extensively described by Arnold.
>  3. If the reduction variable has any use outside of the loop. This will
need to account for global variables, which is in itself, a use of the computed
value.
> 
In the examples you gave there are no reduction variables in the loop
vectorizer’s sense. But, they all have memory accesses that are strided.

So for your example below we would need two things:

1.) analyze the memory accesses in the loop for strided accesses and deal with
them appropriately during analysis and transformation
2.) be able to handle loops with non-unit stride integer induction variables -
this only makes sense if we have support for 1.)

There is an even simpler example that would just deal with strided accesses:

for (i=0; i < OFFSET/3; i++) {
  a[3*i] = OFFSET - b[3*i] - DELTA
  a[3*i+1] = ….
}

Once we can handle this example we can extend our induction variable checks to
handle loops with stride (because then it would sense to handle them).


Best,
Arnold
> The problem I'm having, and why I'm resurrecting this thread, is in
this simple code:
> 
> #define OFFSET 255
> extern char b[OFFSET];
> char *fn1 (char a[]) {
>     unsigned i;
> 
>     for (i=0; i<OFFSET; i += 3) {
>       a[i] = OFFSET - b[i] - DELTA;
>       a[i+1] = OFFSET - b[i+1] - DELTA;
>       a[i+2] = OFFSET - b[i+2] - DELTA;
>     }
>     return &a[0];
> }
> 
> This is the most trivial example. The exit count is constant, the
operations are the same and compatible with the stride of 3, and the variable
"a" is used outside the loop. However, the optimized code that the
vectorizer sees is after SCEV has passed through it, leaving the IR very similar
to my original example:
> 
> for.body:                                         ; preds = %entry,
%for.body
>   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
>   %arrayidx = getelementptr inbounds [255 x i8]* @b, i64 0, i64 %indvars.iv
>   %0 = load i8* %arrayidx, align 1
>   %1 = xor i8 %0, -1
>   %2 = load i8* @DELTA, align 1
>   %sub2 = sub i8 %1, %2
>   %arrayidx5 = getelementptr inbounds i8* %a, i64 %indvars.iv
>   store i8 %sub2, i8* %arrayidx5, align 1
>   ...
>   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 3
>   %11 = trunc i64 %indvars.iv.next to i32
>   %cmp = icmp ult i32 %11, 255
>   br i1 %cmp, label %for.body, label %for.end
> 
> So, as you can see, my original patch to recognize pointer reductions is
still relevant, since I'll need to act on the list of induction and
reduction variables to discover if the reduction's stride is compatible with
the induction's step.
> 
> Also, I'll need that to make sure that the reduction in question has
access outside (via global / extern qualifiers), so ignoring load/store
instructions that won't have any users, because the variables are not used
directly, but through &a[0].
> 
> Arnold,
> 
> I agree with you that the analysis should not be done exclusively trying to
manipulate the pointer reduction variables (but using SCEV as you describe), but
I can't see how I can get past the validation phase if, at least, I
don't recognize the GEP PHIs as a reduction variable.
> 
> cheers,
> --renato

Renato Golin

2013-Oct-23 20:10 UTC

head link

[LLVMdev] First attempt at recognizing pointer reduction

On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at
apple.com>wrote:
> In the examples you gave there are no reduction variables in the loop
> vectorizer’s sense. But, they all have memory accesses that are strided.
>
This is what I don't get. As far as I understood, a reduction variable is
the one that aggregates the computation done by the loop, and is used
outside the loop.

In my example, I'm aggregating a computation in an array and returning this
array for later use, what am I missing here?

1.) analyze the memory accesses in the loop for strided accesses and
deal> with them appropriately during analysis and transformation
> 2.) be able to handle loops with non-unit stride integer induction
> variables - this only makes sense if we have support for 1.)
>
So, despite all this, we still have to pass validation, so canVectorize()
must return true. As it stands, all my examples can't vectorize because of
the extra memory PHI, and your example below can't vectorize because it
can't find the array bounds.

I'm assuming that, as soon as I teach the validation to accept your loop
(given additional checks), and teach about the strides and costs, it will
vectorize. But if I go back to my original code, it won't, because of the
reduction PHI.

So, again, I agree with you, one step at a time, I'll work with your loop,
because it's the straightest path from here. But at some time, I'll have
to
either identify the memory PHIs or transform the loop to avoid them at all.
Does that make sense?

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/47c22e7d/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] First attempt at recognizing pointer reduction

[LLVMdev] First attempt at recognizing pointer reduction

[LLVMdev] First attempt at recognizing pointer reduction

[LLVMdev] First attempt at recognizing pointer reduction

Reasonably Related Threads