thr3ads.net - llvm dev - [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Silviu Baranga

2015-Apr-29 15:21 UTC

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

Hi,

This is somewhat similar to the previous thread regarding missed vectorization
opportunities
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html),
but maybe different enough to require a new thread.

I'm seeing some missed vectorization opportunities in the loop vectorizer
because SCEV
is not able to fold sext/zext expressions into recurrence expressions
(AddRecExpr).

This can manifest in multiple ways:

-          We cannot get the back-edges taken count since SCEV  because we may
have something like (sext (1,+1))

which we can't evaluate as it can overflow

-          We cannot get SCEV AddRec expressions for pointers which need runtime
checks, and the

loop vectorizer fails with a "Can't vectorize due to memory
conflicts" error.



I think there are two cases:

1)      It would be possible for SCEV to prove that it is safe to fold the
sext/zext nodes into an AddRec

expression, but this doesn't happen because either nsw/nuw flags have been
lost or the code

to make the inference of nsw/nuw flags in some particular case is missing

2)      It is actually possible for some operations to overflow, so folding
sext/zext nodes into AddRec

expressions would be incorrect.



Here is an example where we fail to get the number of back-edge branches taken
because of sext/zext

operations:


void test0(unsigned short a, unsigned short *  in, unsigned short * out) {
  for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
      out[w] = in[w+7] * 2;

}



In there anyone working on improving the 1) aspect of SCEV? If so, maybe some
coordination of effort

might be a good idea.



Since the issue seems to be that certain operations can overflow and SCEV cannot
properly reason about

overflows and extend operations, would it make more sense to try and:

-          Promote values that go into the trip count calculation and memory
access indices to the smallest type

which would remove sext/zext/trunc operations from the loop body. This should
remove the sext/zext

issue, as SCEV wouldn't have to deal with these operations.

-          Add nsw/nuw flags where necessary

-          Add runtime checks (outside the loop) to detect overflows in the
original loop

Would there be any fundamental issue with this approach? I think it would it be
preferable to point fixes
for case 1), so if anyone is working on something similar it would be good to
know.

Thanks,
Silviu

-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2548782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150429/f91d9ccb/attachment.html>

Renato Golin

2015-May-05 17:53 UTC

head link

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

On 29 April 2015 at 16:21, Silviu Baranga <Silviu.Baranga at arm.com>
wrote:> 1)      It would be possible for SCEV to prove that it is safe to fold the
> sext/zext nodes into an AddRec expression, but this doesn’t happen because
either nsw/nuw flags have been
> lost or the code
Hi Silviu,

It's possible that this is something I spotted two years ago while
working on the stride vectorizer draft. I believe the flag loss is
still in there.

> 2)      It is actually possible for some operations to overflow, so folding
> sext/zext nodes into AddRec expressions would be incorrect.
Exactly. We need to keep the flags as much as possible.

> In there anyone working on improving the 1) aspect of SCEV? If so, maybe
> some coordination of effort might be a good idea.
I haven't heard of anyone so far.

> -          Promote values that go into the trip count calculation and
memory
> access indices to the smallest type
> which would remove sext/zext/trunc operations from the loop body. This
> should remove the sext/zext
> issue, as SCEV wouldn’t have to deal with these operations.
You mean demote the range specifier (a in your example above), to a
more constrained type than the induction variable, right?

If possible, this would help static analysis of loop bounds and not
require run time checks.

> -          Add nsw/nuw flags where necessary
Or at least make sure they're not removed. The cases I've seen had
then in O0 but not in O3, but I haven't gone through to see (or I
can't remember) which pass removed them.

> -          Add runtime checks (outside the loop) to detect overflows in the
> original loop
I think we do some of that. Maybe the two steps above will make the
little we have today work on most cases already.

cheers,
--renato

Sanjoy Das

2015-May-06 01:59 UTC

head link

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

For

void test0(unsigned short a, unsigned short *  in, unsigned short * out) {
  for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
      out[w] = in[w+7] * 2;
}

I think it will be sufficient to add a couple of new cases to
ScalarEvolution::HowManyLessThans --

  zext(A) ult zext(B) == A ult B
  sext(A) slt sext(B) == A slt B

Currently it bails out if it sees a non-add recurrence on the LHS of
the ult/slt.

You could also teach ScalarEvolution::isImpliedCondOperands the
following:

  zext(A) ult zext(B) <=> A ult B
  sext(A) slt sext(B) <=> A slt B

to get more aggressive promotion from ext{A,+,B} to {ext A,+,ext B}.

I'll be happy to review these patches.

-- Sanjoy

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Apr 2015 - [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

Possibly Parallel Threads