Silviu Baranga
2015-Apr-29 15:21 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi, This is somewhat similar to the previous thread regarding missed vectorization opportunities (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html), but maybe different enough to require a new thread. I'm seeing some missed vectorization opportunities in the loop vectorizer because SCEV is not able to fold sext/zext expressions into recurrence expressions (AddRecExpr). This can manifest in multiple ways: - We cannot get the back-edges taken count since SCEV because we may have something like (sext (1,+1)) which we can't evaluate as it can overflow - We cannot get SCEV AddRec expressions for pointers which need runtime checks, and the loop vectorizer fails with a "Can't vectorize due to memory conflicts" error. I think there are two cases: 1) It would be possible for SCEV to prove that it is safe to fold the sext/zext nodes into an AddRec expression, but this doesn't happen because either nsw/nuw flags have been lost or the code to make the inference of nsw/nuw flags in some particular case is missing 2) It is actually possible for some operations to overflow, so folding sext/zext nodes into AddRec expressions would be incorrect. Here is an example where we fail to get the number of back-edge branches taken because of sext/zext operations: void test0(unsigned short a, unsigned short * in, unsigned short * out) { for (unsigned short w = 1; w < a - 1; w++) //this will never overflow out[w] = in[w+7] * 2; } In there anyone working on improving the 1) aspect of SCEV? If so, maybe some coordination of effort might be a good idea. Since the issue seems to be that certain operations can overflow and SCEV cannot properly reason about overflows and extend operations, would it make more sense to try and: - Promote values that go into the trip count calculation and memory access indices to the smallest type which would remove sext/zext/trunc operations from the loop body. This should remove the sext/zext issue, as SCEV wouldn't have to deal with these operations. - Add nsw/nuw flags where necessary - Add runtime checks (outside the loop) to detect overflows in the original loop Would there be any fundamental issue with this approach? I think it would it be preferable to point fixes for case 1), so if anyone is working on something similar it would be good to know. Thanks, Silviu -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150429/f91d9ccb/attachment.html>
Renato Golin
2015-May-05 17:53 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
On 29 April 2015 at 16:21, Silviu Baranga <Silviu.Baranga at arm.com> wrote:> 1) It would be possible for SCEV to prove that it is safe to fold the > sext/zext nodes into an AddRec expression, but this doesn’t happen because either nsw/nuw flags have been > lost or the codeHi Silviu, It's possible that this is something I spotted two years ago while working on the stride vectorizer draft. I believe the flag loss is still in there.> 2) It is actually possible for some operations to overflow, so folding > sext/zext nodes into AddRec expressions would be incorrect.Exactly. We need to keep the flags as much as possible.> In there anyone working on improving the 1) aspect of SCEV? If so, maybe > some coordination of effort might be a good idea.I haven't heard of anyone so far.> - Promote values that go into the trip count calculation and memory > access indices to the smallest type > which would remove sext/zext/trunc operations from the loop body. This > should remove the sext/zext > issue, as SCEV wouldn’t have to deal with these operations.You mean demote the range specifier (a in your example above), to a more constrained type than the induction variable, right? If possible, this would help static analysis of loop bounds and not require run time checks.> - Add nsw/nuw flags where necessaryOr at least make sure they're not removed. The cases I've seen had then in O0 but not in O3, but I haven't gone through to see (or I can't remember) which pass removed them.> - Add runtime checks (outside the loop) to detect overflows in the > original loopI think we do some of that. Maybe the two steps above will make the little we have today work on most cases already. cheers, --renato
Sanjoy Das
2015-May-06 01:59 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
For void test0(unsigned short a, unsigned short * in, unsigned short * out) { for (unsigned short w = 1; w < a - 1; w++) //this will never overflow out[w] = in[w+7] * 2; } I think it will be sufficient to add a couple of new cases to ScalarEvolution::HowManyLessThans -- zext(A) ult zext(B) == A ult B sext(A) slt sext(B) == A slt B Currently it bails out if it sees a non-add recurrence on the LHS of the ult/slt. You could also teach ScalarEvolution::isImpliedCondOperands the following: zext(A) ult zext(B) <=> A ult B sext(A) slt sext(B) <=> A slt B to get more aggressive promotion from ext{A,+,B} to {ext A,+,ext B}. I'll be happy to review these patches. -- Sanjoy
Seemingly Similar Threads
- [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
- [cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
- [cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
- [LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
- [LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)