Silviu Baranga
2015-Apr-29 15:21 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi,
This is somewhat similar to the previous thread regarding missed vectorization
opportunities
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html),
but maybe different enough to require a new thread.
I'm seeing some missed vectorization opportunities in the loop vectorizer
because SCEV
is not able to fold sext/zext expressions into recurrence expressions
(AddRecExpr).
This can manifest in multiple ways:
- We cannot get the back-edges taken count since SCEV because we may
have something like (sext (1,+1))
which we can't evaluate as it can overflow
- We cannot get SCEV AddRec expressions for pointers which need runtime
checks, and the
loop vectorizer fails with a "Can't vectorize due to memory
conflicts" error.
I think there are two cases:
1) It would be possible for SCEV to prove that it is safe to fold the
sext/zext nodes into an AddRec
expression, but this doesn't happen because either nsw/nuw flags have been
lost or the code
to make the inference of nsw/nuw flags in some particular case is missing
2) It is actually possible for some operations to overflow, so folding
sext/zext nodes into AddRec
expressions would be incorrect.
Here is an example where we fail to get the number of back-edge branches taken
because of sext/zext
operations:
void test0(unsigned short a, unsigned short * in, unsigned short * out) {
for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
out[w] = in[w+7] * 2;
}
In there anyone working on improving the 1) aspect of SCEV? If so, maybe some
coordination of effort
might be a good idea.
Since the issue seems to be that certain operations can overflow and SCEV cannot
properly reason about
overflows and extend operations, would it make more sense to try and:
- Promote values that go into the trip count calculation and memory
access indices to the smallest type
which would remove sext/zext/trunc operations from the loop body. This should
remove the sext/zext
issue, as SCEV wouldn't have to deal with these operations.
- Add nsw/nuw flags where necessary
- Add runtime checks (outside the loop) to detect overflows in the
original loop
Would there be any fundamental issue with this approach? I think it would it be
preferable to point fixes
for case 1), so if anyone is working on something similar it would be good to
know.
Thanks,
Silviu
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2548782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150429/f91d9ccb/attachment.html>
Renato Golin
2015-May-05 17:53 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
On 29 April 2015 at 16:21, Silviu Baranga <Silviu.Baranga at arm.com> wrote:> 1) It would be possible for SCEV to prove that it is safe to fold the > sext/zext nodes into an AddRec expression, but this doesn’t happen because either nsw/nuw flags have been > lost or the codeHi Silviu, It's possible that this is something I spotted two years ago while working on the stride vectorizer draft. I believe the flag loss is still in there.> 2) It is actually possible for some operations to overflow, so folding > sext/zext nodes into AddRec expressions would be incorrect.Exactly. We need to keep the flags as much as possible.> In there anyone working on improving the 1) aspect of SCEV? If so, maybe > some coordination of effort might be a good idea.I haven't heard of anyone so far.> - Promote values that go into the trip count calculation and memory > access indices to the smallest type > which would remove sext/zext/trunc operations from the loop body. This > should remove the sext/zext > issue, as SCEV wouldn’t have to deal with these operations.You mean demote the range specifier (a in your example above), to a more constrained type than the induction variable, right? If possible, this would help static analysis of loop bounds and not require run time checks.> - Add nsw/nuw flags where necessaryOr at least make sure they're not removed. The cases I've seen had then in O0 but not in O3, but I haven't gone through to see (or I can't remember) which pass removed them.> - Add runtime checks (outside the loop) to detect overflows in the > original loopI think we do some of that. Maybe the two steps above will make the little we have today work on most cases already. cheers, --renato
Sanjoy Das
2015-May-06 01:59 UTC
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
For
void test0(unsigned short a, unsigned short * in, unsigned short * out) {
for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
out[w] = in[w+7] * 2;
}
I think it will be sufficient to add a couple of new cases to
ScalarEvolution::HowManyLessThans --
zext(A) ult zext(B) == A ult B
sext(A) slt sext(B) == A slt B
Currently it bails out if it sees a non-add recurrence on the LHS of
the ult/slt.
You could also teach ScalarEvolution::isImpliedCondOperands the
following:
zext(A) ult zext(B) <=> A ult B
sext(A) slt sext(B) <=> A slt B
to get more aggressive promotion from ext{A,+,B} to {ext A,+,ext B}.
I'll be happy to review these patches.
-- Sanjoy
Possibly Parallel Threads
- [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
- [cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
- [cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
- [LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
- [LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)