Renato Golin
2013-Oct-23 14:41 UTC
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:> I don’t think that recognizing this as a reduction is going to get you > far. A reduction is beneficial if the value reduced is only truly needed > outside of a loop. > This is not the case here (we are storing/loading from the pointer). >Hi Arnold, Nadav, Let me resurrect this discussion a bit. There are few things that I need to check while validating a stride access loop: 1. If there is an induction variable, and it has stride > 1. This is easy, and can be easily extended on isInductionVariable() by adding Step to InductionInfo; 2. If the reduction variables' access are sequential and compatible with the stride. This is the core of the change, and will require SCEV computation, as extensively described by Arnold. 3. If the reduction variable has any use outside of the loop. This will need to account for global variables, which is in itself, a use of the computed value. The problem I'm having, and why I'm resurrecting this thread, is in this simple code: #define OFFSET 255 extern char b[OFFSET]; char *fn1 (char a[]) { unsigned i; for (i=0; i<OFFSET; i += 3) { a[i] = OFFSET - b[i] - DELTA; a[i+1] = OFFSET - b[i+1] - DELTA; a[i+2] = OFFSET - b[i+2] - DELTA; } return &a[0]; } This is the most trivial example. The exit count is constant, the operations are the same and compatible with the stride of 3, and the variable "a" is used outside the loop. However, the optimized code that the vectorizer sees is after SCEV has passed through it, leaving the IR very similar to my original example: for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [255 x i8]* @b, i64 0, i64 %indvars.iv %0 = load i8* %arrayidx, align 1 %1 = xor i8 %0, -1 %2 = load i8* @DELTA, align 1 %sub2 = sub i8 %1, %2 %arrayidx5 = getelementptr inbounds i8* %a, i64 %indvars.iv store i8 %sub2, i8* %arrayidx5, align 1 ... %indvars.iv.next = add nuw nsw i64 %indvars.iv, 3 %11 = trunc i64 %indvars.iv.next to i32 %cmp = icmp ult i32 %11, 255 br i1 %cmp, label %for.body, label %for.end So, as you can see, my original patch to recognize pointer reductions is still relevant, since I'll need to act on the list of induction and reduction variables to discover if the reduction's stride is compatible with the induction's step. Also, I'll need that to make sure that the reduction in question has access outside (via global / extern qualifiers), so ignoring load/store instructions that won't have any users, because the variables are not used directly, but through &a[0]. Arnold, I agree with you that the analysis should not be done exclusively trying to manipulate the pointer reduction variables (but using SCEV as you describe), but I can't see how I can get past the validation phase if, at least, I don't recognize the GEP PHIs as a reduction variable. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/ea1c2899/attachment.html>
Arnold Schwaighofer
2013-Oct-23 15:05 UTC
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 23, 2013, at 9:41 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop. > This is not the case here (we are storing/loading from the pointer). > > Hi Arnold, Nadav, > > Let me resurrect this discussion a bit. > > There are few things that I need to check while validating a stride access loop: > 1. If there is an induction variable, and it has stride > 1. This is easy, and can be easily extended on isInductionVariable() by adding Step to InductionInfo;Yes.> 2. If the reduction variables' access are sequential and compatible with the stride. This is the core of the change, and will require SCEV computation, as extensively described by Arnold. > 3. If the reduction variable has any use outside of the loop. This will need to account for global variables, which is in itself, a use of the computed value. >In the examples you gave there are no reduction variables in the loop vectorizer’s sense. But, they all have memory accesses that are strided. So for your example below we would need two things: 1.) analyze the memory accesses in the loop for strided accesses and deal with them appropriately during analysis and transformation 2.) be able to handle loops with non-unit stride integer induction variables - this only makes sense if we have support for 1.) There is an even simpler example that would just deal with strided accesses: for (i=0; i < OFFSET/3; i++) { a[3*i] = OFFSET - b[3*i] - DELTA a[3*i+1] = …. } Once we can handle this example we can extend our induction variable checks to handle loops with stride (because then it would sense to handle them). Best, Arnold> The problem I'm having, and why I'm resurrecting this thread, is in this simple code: > > #define OFFSET 255 > extern char b[OFFSET]; > char *fn1 (char a[]) { > unsigned i; > > for (i=0; i<OFFSET; i += 3) { > a[i] = OFFSET - b[i] - DELTA; > a[i+1] = OFFSET - b[i+1] - DELTA; > a[i+2] = OFFSET - b[i+2] - DELTA; > } > return &a[0]; > } > > This is the most trivial example. The exit count is constant, the operations are the same and compatible with the stride of 3, and the variable "a" is used outside the loop. However, the optimized code that the vectorizer sees is after SCEV has passed through it, leaving the IR very similar to my original example: > > for.body: ; preds = %entry, %for.body > %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] > %arrayidx = getelementptr inbounds [255 x i8]* @b, i64 0, i64 %indvars.iv > %0 = load i8* %arrayidx, align 1 > %1 = xor i8 %0, -1 > %2 = load i8* @DELTA, align 1 > %sub2 = sub i8 %1, %2 > %arrayidx5 = getelementptr inbounds i8* %a, i64 %indvars.iv > store i8 %sub2, i8* %arrayidx5, align 1 > ... > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 3 > %11 = trunc i64 %indvars.iv.next to i32 > %cmp = icmp ult i32 %11, 255 > br i1 %cmp, label %for.body, label %for.end > > So, as you can see, my original patch to recognize pointer reductions is still relevant, since I'll need to act on the list of induction and reduction variables to discover if the reduction's stride is compatible with the induction's step. > > Also, I'll need that to make sure that the reduction in question has access outside (via global / extern qualifiers), so ignoring load/store instructions that won't have any users, because the variables are not used directly, but through &a[0]. > > Arnold, > > I agree with you that the analysis should not be done exclusively trying to manipulate the pointer reduction variables (but using SCEV as you describe), but I can't see how I can get past the validation phase if, at least, I don't recognize the GEP PHIs as a reduction variable. > > cheers, > --renato
Renato Golin
2013-Oct-23 20:10 UTC
[LLVMdev] First attempt at recognizing pointer reduction
On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:> In the examples you gave there are no reduction variables in the loop > vectorizer’s sense. But, they all have memory accesses that are strided. >This is what I don't get. As far as I understood, a reduction variable is the one that aggregates the computation done by the loop, and is used outside the loop. In my example, I'm aggregating a computation in an array and returning this array for later use, what am I missing here? 1.) analyze the memory accesses in the loop for strided accesses and deal> with them appropriately during analysis and transformation > 2.) be able to handle loops with non-unit stride integer induction > variables - this only makes sense if we have support for 1.) >So, despite all this, we still have to pass validation, so canVectorize() must return true. As it stands, all my examples can't vectorize because of the extra memory PHI, and your example below can't vectorize because it can't find the array bounds. I'm assuming that, as soon as I teach the validation to accept your loop (given additional checks), and teach about the strides and costs, it will vectorize. But if I go back to my original code, it won't, because of the reduction PHI. So, again, I agree with you, one step at a time, I'll work with your loop, because it's the straightest path from here. But at some time, I'll have to either identify the memory PHIs or transform the loop to avoid them at all. Does that make sense? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/47c22e7d/attachment.html>
Possibly Parallel Threads
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] First attempt at recognizing pointer reduction