Sanjoy Das
2015-May-06 01:59 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
For void test0(unsigned short a, unsigned short * in, unsigned short * out) { for (unsigned short w = 1; w < a - 1; w++) //this will never overflow out[w] = in[w+7] * 2; } I think it will be sufficient to add a couple of new cases to ScalarEvolution::HowManyLessThans -- zext(A) ult zext(B) == A ult B sext(A) slt sext(B) == A slt B Currently it bails out if it sees a non-add recurrence on the LHS of the ult/slt. You could also teach ScalarEvolution::isImpliedCondOperands the following: zext(A) ult zext(B) <=> A ult B sext(A) slt sext(B) <=> A slt B to get more aggressive promotion from ext{A,+,B} to {ext A,+,ext B}. I'll be happy to review these patches. -- Sanjoy
Silviu Baranga
2015-May-06 11:03 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi Renato, Sanjoy,> void test0(unsigned short a, unsigned short * in, unsigned short * out) { > for (unsigned short w = 1; w < a - 1; w++) //this will never overflow > out[w] = in[w+7] * 2; > }Turns out this can actually overflow for a = 0, and in this case would be an infinite loop (end condition would be w < MAX_USHORT). Run-time checks could still be added here: if (a > 0) do vectorized version else default to scalar version On 06 May 2015 at 03:00, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> I think it will be sufficient to add a couple of new cases to > ScalarEvolution::HowManyLessThans -- > > zext(A) ult zext(B) == A ult B > sext(A) slt sext(B) == A slt BI had a go at this but realized that the problem above would affect at least the ult case (we need to have B < max(type(B)) for it to hold), and so I gave up (at least temporarily). I think this should still be possible, but some range checking might be required. On 05 May 2015 at 18:54, Renato Golin < renato.golin at linaro.org > wrote:> > - Promote values that go into the trip count calculation and memory > > access indices to the smallest type > > which would remove sext/zext/trunc operations from the loop body. This > > should remove the sext/zext issue, as SCEV wouldn’t have to deal with > > these operations. > > You mean demote the range specifier (a in your example above), to a more > constrained type than the induction variable, right? > > If possible, this would help static analysis of loop bounds and not require run > time checks.I was thinking about doing something like promoting the induction variable (in my case w) to a wider type (int), but there might be some problems with that. I think we need SCEV working to be able to reason about when that would be safe, and even if we could do that, there is no guarantee that this would give us any benefit - we might end up increasing the code size if vectorization wouldn't be possible even after that transformation. On 05 May 2015 at 18:54, Renato Golin < renato.golin at linaro.org > wrote:> > > - Add runtime checks (outside the loop) to detect overflows in the > > original loop > > I think we do some of that. Maybe the two steps above will make the little > we have today work on most cases already.I think we're only checking for overflows in the trip count variable that's being generated by the loop vectorizer. So any potential overflows in the actual loop would make us bail out in the legality check part. I'm currently trying to work around this issue by teaching SCEV to be able to optimistically fold sext/zext expressions while accumulating the assumptions that it has made doing so (which need to be checked at run-time). This allows us to gather all the assumptions in the vectorization legality check of the loop vectorizer and even allows us to go through the cost model without modifying the IR. This would of course only happen if we would otherwise abort because we didn't get a SCEV expression that we want to be an AddRecExpr. This seems to work at least on a toy example (it can work around issues with the number of back-edges taken and memory checks), so I hope to have a patch sent out for RFC soon. Thanks, Silviu
Sanjoy Das
2015-May-06 19:00 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi Silviu,>> void test0(unsigned short a, unsigned short * in, unsigned short * out) { >> for (unsigned short w = 1; w < a - 1; w++) //this will never overflow >> out[w] = in[w+7] * 2; >> } > > Turns out this can actually overflow for a = 0, and in this case would > be an infinite loop (end condition would be w < MAX_USHORT). Run-time > checks could still be added here:Maybe I'm missing something here, but for a = 0 this will run exactly (MAX_USHORT - 1) iterations, it will stop once w is MAX_USHORT. In general, if your loop condition is "I u< N" for any N, I++ cannot unsigned overflow since the only value of I for which I++ overflows is MAX_USHORT (= -1) and MAX_USHORT will not be u< N for any N. A similar fact follows for "I s< N" and sign overflow in I++.>> I think it will be sufficient to add a couple of new cases to >> ScalarEvolution::HowManyLessThans -- >> >> zext(A) ult zext(B) == A ult B >> sext(A) slt sext(B) == A slt B > > I had a go at this but realized that the problem above would affect > at least the ult case (we need to have B < max(type(B)) for it to hold),Why? AFAICT, the above two facts hold for all values of A and B, including INT_MAX, INT_MIN, INT_UMAX etc. -- Sanjoy