Renato Golin
2013-Oct-21 15:23 UTC
[LLVMdev] First attempt at recognizing pointer reduction
Hi Nadav, Arnold, I managed to find some time to work on the pointer reduction, and I got a patch that can make "canVectorize()" pass. Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). This makes it recognize the writes and reads, but then "canVectorizeMemory()" bails because it can't find the array bounds. This will be my next step, but I'd like to know if I'm in the right direction, with the right assumptions about the reduction detection, before proceeding into more complicated bits. An example of the debug output I get for my earlier example: LV: Checking a loop in "fn1" LV: Found a loop: for.body == Induction = LV: Found an induction variable. == Our write reduction = LV: Found a pointer add reduction PHI. %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] == Not sure what this is... Will check. = LV: Found an non-int non-pointer PHI. == Our read reduction = LV: Found a pointer add reduction PHI. %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] == Below are the memory operations's validation, which I'll deal with later = LV: Found an unidentified write ptr: i8* %WRITE LV: Found an unidentified write ptr: i8* %WRITE LV: Found an unidentified write ptr: i8* %WRITE LV: Found an unidentified read ptr: i8* %READ LV: Found an unidentified read ptr: i8* %READ LV: Found an unidentified read ptr: i8* %READ LV: Found a runtime check ptr: %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] LV: Found a runtime check ptr: %incdec.ptr16 = getelementptr inbounds i8* %WRITE913, i64 1 LV: Found a runtime check ptr: %incdec.ptr17 = getelementptr inbounds i8* %WRITE913, i64 2 LV: Found a runtime check ptr: %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] LV: Found a runtime check ptr: %incdec.ptr = getelementptr inbounds i8* %READ1012, i64 1 LV: Found a runtime check ptr: %incdec.ptr1 = getelementptr inbounds i8* %READ1012, i64 2 LV: We need to do 18 pointer comparisons. LV: We can't vectorize because we can't find the array bounds. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/9dbace7c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: pointer-reduction.patch Type: application/octet-stream Size: 3694 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/9dbace7c/attachment.obj>
Nadav Rotem
2013-Oct-21 16:00 UTC
[LLVMdev] First attempt at recognizing pointer reduction
Renato, This looks like the right direction. Did you run it on the LLVM test suite to check if it finds new loops to vectorize ? Thanks, Nadav On Oct 21, 2013, at 8:23 AM, Renato Golin <renato.golin at linaro.org> wrote:> Hi Nadav, Arnold, > > I managed to find some time to work on the pointer reduction, and I got a patch that can make "canVectorize()" pass. > > Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). > > This makes it recognize the writes and reads, but then "canVectorizeMemory()" bails because it can't find the array bounds. This will be my next step, but I'd like to know if I'm in the right direction, with the right assumptions about the reduction detection, before proceeding into more complicated bits. > > An example of the debug output I get for my earlier example: > > LV: Checking a loop in "fn1" > LV: Found a loop: for.body > > == Induction => > LV: Found an induction variable. > > == Our write reduction => > LV: Found a pointer add reduction PHI. %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] > > == Not sure what this is... Will check. => > LV: Found an non-int non-pointer PHI. > > == Our read reduction => > LV: Found a pointer add reduction PHI. %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] > > == Below are the memory operations's validation, which I'll deal with later => > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified read ptr: i8* %READ > LV: Found an unidentified read ptr: i8* %READ > LV: Found an unidentified read ptr: i8* %READ > LV: Found a runtime check ptr: %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] > LV: Found a runtime check ptr: %incdec.ptr16 = getelementptr inbounds i8* %WRITE913, i64 1 > LV: Found a runtime check ptr: %incdec.ptr17 = getelementptr inbounds i8* %WRITE913, i64 2 > LV: Found a runtime check ptr: %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] > LV: Found a runtime check ptr: %incdec.ptr = getelementptr inbounds i8* %READ1012, i64 1 > LV: Found a runtime check ptr: %incdec.ptr1 = getelementptr inbounds i8* %READ1012, i64 2 > LV: We need to do 18 pointer comparisons. > LV: We can't vectorize because we can't find the array bounds. > > cheers, > --renato > <pointer-reduction.patch>
Renato Golin
2013-Oct-21 16:08 UTC
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 17:00, Nadav Rotem <nrotem at apple.com> wrote:> Renato, > > This looks like the right direction. Did you run it on the LLVM test > suite to check if it finds new loops to vectorize ? >Thanks Nadav. No, I haven't. I'm not expecting it to vectorize anything, but you're right, maybe this is enough material on its own. I did run the check-all and found no regressions, but that means very little in this case. I'll run with debug messages on and compare the LV lines. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/6cb27cc4/attachment.html>
Arnold Schwaighofer
2013-Oct-21 16:29 UTC
[LLVMdev] First attempt at recognizing pointer reduction
Renato, can you post a hand-created vectorized IR of how a reduction would work on your example? I don’t think that recognizing this as a reduction is going to get you far. A reduction is beneficial if the value reduced is only truly needed outside of a loop. This is not the case here (we are storing/loading from the pointer). Your example is something like WRITEPTR = phi i8* [ outsideval, preheader] [WPTR, loop] store WRITEPTR ... WPTR = getelementptr WRIPTR, 3 If you treat this as a reduction, you will end up with code like (say we used a VF=2): WRITEPTR = phi <2 x i8*> [ <outsideval, 0>, preheader] [WPTR, loop] ptr = WRITEPTR<0> + WRITEPTR<1> store ptr ... WPTR = getelementptr WRIPTR, <3,3> I think the right approach to get examples like this is that we need to recognize strided pointer inductions (we only support strides by one currently). Basically we need to be able to see phi_ptr = phi double* phi_ptr = gep phi_ptr, 2 and make a canonical induction variable out of this phi_ind = phi 0, phi_ind ptr = gep ptr_val_from_outside_loop, 2*phi_ind phi_ind = add, phi_ind, 1 (of course virtually). Or in other words: for (i++) *ptr++ *ptr++ has to look like for (i++) ptr[2*i] ptr[2*i+1] to us when we are vectorizing this. Thanks, Arnold On Oct 21, 2013, at 10:23 AM, Renato Golin <renato.golin at linaro.org> wrote:> Hi Nadav, Arnold, > > I managed to find some time to work on the pointer reduction, and I got a patch that can make "canVectorize()" pass. > > Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). > > This makes it recognize the writes and reads, but then "canVectorizeMemory()" bails because it can't find the array bounds. This will be my next step, but I'd like to know if I'm in the right direction, with the right assumptions about the reduction detection, before proceeding into more complicated bits. > > An example of the debug output I get for my earlier example: > > LV: Checking a loop in "fn1" > LV: Found a loop: for.body > > == Induction => > LV: Found an induction variable. > > == Our write reduction => > LV: Found a pointer add reduction PHI. %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] > > == Not sure what this is... Will check. => > LV: Found an non-int non-pointer PHI. > > == Our read reduction => > LV: Found a pointer add reduction PHI. %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] > > == Below are the memory operations's validation, which I'll deal with later => > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified write ptr: i8* %WRITE > LV: Found an unidentified read ptr: i8* %READ > LV: Found an unidentified read ptr: i8* %READ > LV: Found an unidentified read ptr: i8* %READ > LV: Found a runtime check ptr: %WRITE913 = phi i8* [ %incdec.ptr18, %for.body ], [ %WRITE, %for.body.preheader ] > LV: Found a runtime check ptr: %incdec.ptr16 = getelementptr inbounds i8* %WRITE913, i64 1 > LV: Found a runtime check ptr: %incdec.ptr17 = getelementptr inbounds i8* %WRITE913, i64 2 > LV: Found a runtime check ptr: %READ1012 = phi i8* [ %incdec.ptr2, %for.body ], [ %READ, %for.body.preheader ] > LV: Found a runtime check ptr: %incdec.ptr = getelementptr inbounds i8* %READ1012, i64 1 > LV: Found a runtime check ptr: %incdec.ptr1 = getelementptr inbounds i8* %READ1012, i64 2 > LV: We need to do 18 pointer comparisons. > LV: We can't vectorize because we can't find the array bounds. > > cheers, > --renato > <pointer-reduction.patch>
Renato Golin
2013-Oct-21 18:00 UTC
[LLVMdev] First attempt at recognizing pointer reduction
Hi Arnold, To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a Load<3> rather than three loads. I guess this is why Nadav was hinting for the SLP vectorizer, not the loop vec. Since the operation on all three loaded values are exactly the same AND the writes are also sequential, I can reduce that to: load<3> -> op<3> -> store<3>. That should work even on machines that don't have de-interleaved access (assuming they're pointers to simple types, etc). On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:> can you post a hand-created vectorized IR of how a reduction would work on > your example? >I'm not there yet, just trying to understand the induction/reduction code first and what comes out of it. I think the right approach to get examples like this is that we need to> recognize strided pointer inductions (we only support strides by one > currently). >I see. I'll have a look at IK_PtrInduction and see what patterns I can spot. Do you envisage a new IK type for strided induction, or just work with the PtrInduction to extend the concept of a non-unit stride (ie. separate Step from ElementSize)? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/4320c3b6/attachment.html>
Renato Golin
2013-Oct-23 14:41 UTC
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:> I don’t think that recognizing this as a reduction is going to get you > far. A reduction is beneficial if the value reduced is only truly needed > outside of a loop. > This is not the case here (we are storing/loading from the pointer). >Hi Arnold, Nadav, Let me resurrect this discussion a bit. There are few things that I need to check while validating a stride access loop: 1. If there is an induction variable, and it has stride > 1. This is easy, and can be easily extended on isInductionVariable() by adding Step to InductionInfo; 2. If the reduction variables' access are sequential and compatible with the stride. This is the core of the change, and will require SCEV computation, as extensively described by Arnold. 3. If the reduction variable has any use outside of the loop. This will need to account for global variables, which is in itself, a use of the computed value. The problem I'm having, and why I'm resurrecting this thread, is in this simple code: #define OFFSET 255 extern char b[OFFSET]; char *fn1 (char a[]) { unsigned i; for (i=0; i<OFFSET; i += 3) { a[i] = OFFSET - b[i] - DELTA; a[i+1] = OFFSET - b[i+1] - DELTA; a[i+2] = OFFSET - b[i+2] - DELTA; } return &a[0]; } This is the most trivial example. The exit count is constant, the operations are the same and compatible with the stride of 3, and the variable "a" is used outside the loop. However, the optimized code that the vectorizer sees is after SCEV has passed through it, leaving the IR very similar to my original example: for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [255 x i8]* @b, i64 0, i64 %indvars.iv %0 = load i8* %arrayidx, align 1 %1 = xor i8 %0, -1 %2 = load i8* @DELTA, align 1 %sub2 = sub i8 %1, %2 %arrayidx5 = getelementptr inbounds i8* %a, i64 %indvars.iv store i8 %sub2, i8* %arrayidx5, align 1 ... %indvars.iv.next = add nuw nsw i64 %indvars.iv, 3 %11 = trunc i64 %indvars.iv.next to i32 %cmp = icmp ult i32 %11, 255 br i1 %cmp, label %for.body, label %for.end So, as you can see, my original patch to recognize pointer reductions is still relevant, since I'll need to act on the list of induction and reduction variables to discover if the reduction's stride is compatible with the induction's step. Also, I'll need that to make sure that the reduction in question has access outside (via global / extern qualifiers), so ignoring load/store instructions that won't have any users, because the variables are not used directly, but through &a[0]. Arnold, I agree with you that the analysis should not be done exclusively trying to manipulate the pointer reduction variables (but using SCEV as you describe), but I can't see how I can get past the validation phase if, at least, I don't recognize the GEP PHIs as a reduction variable. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/ea1c2899/attachment.html>
Apparently Analagous Threads
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] First attempt at recognizing pointer reduction
- [LLVMdev] LiveIntervals analysis problem
- LoopVectorizer -- generating bad and unhandled shufflevector sequence
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass