thr3ads.net - llvm dev - [LLVMdev] Missed vectorization opportunities? [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Vaivaswatha N

2015-Apr-22 11:01 UTC

[LLVMdev] Missed vectorization opportunities?

Hi,
I am trying to understand the limitations of the current vectorizer, and came
upon these test cases that fail to get vectorized.
1. loop1 below (note the increment by 2) fails to get vectorized because the
access a[j+1] is considered to wrap around (the corresponding SCEV doesn't
have nsw/nuw set) and hence isStridedPtr() in LoopAccessAnalysis.cpp return
false.

#define SIZE 100000
void loop1 (float a[SIZE])
{
  long t, j;
  float cc = 23, dd = 34, ee = 233;
#pragma clang loop vectorize(enable)
  for (j = 1; j < SIZE-1; j += 2) {
    float x;
    x = a[j+1] + ee;
    x = x / dd;
    x = (cc + x) / dd;
    a[j] = x + 2 ;
  }
}

2. Of the two loops below, the loop "works" gets vectorized, while the
loop "fails" does not get vectorized.#define SIZE 10000
int a[SIZE+4] = {};
void works(unsigned long m, unsigned long n)
{
  long j;
  // SCEV can compute the exit value for this
loop                                  
  for (j = m; j < n+1; j += 1) {
    a[j] = a[j+1] + 2;
  }
}
void fails(unsigned long m, unsigned long n)
{
  long j;
  // SCEV cannot compute the exit value for this
loop                            
  for (j = m; j <= n; j += 1) {
    a[j] = a[j+1] + 2;
  }
}
The only difference between the two loops is the loop exit condition, both
semantically same. Scalar Evolution is unable to determine the exit value of the
second loop, leading to the vectorizer failing to vectorize. It seemed to me
that this is due to ScalarEvolution::SimplifyICmpOperands failing to
canonicalize the corresponding ICMP_ULE instruction.
Thanks,
- Vaivaswatha
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150422/41a73399/attachment.html>

mats petersson

2015-Apr-22 11:16 UTC

head link

[LLVMdev] Missed vectorization opportunities?

In your first case, I fail to see what the obvious vectorisation would
be - you are reading one and writing one of the every other element -
that is not very easy to made into vectorized operatioons - have you
tried writing code for that, and confirmed that it's actually better
than just straight loop?

--
Mats

On 22 April 2015 at 12:01, Vaivaswatha N <vaivaswatha at yahoo.co.in>
wrote:> Hi,
>
> I am trying to understand the limitations of the current vectorizer, and
> came upon these test cases that fail to get vectorized.
>
> 1. loop1 below (note the increment by 2) fails to get vectorized because
the
> access a[j+1] is considered to wrap around (the corresponding SCEV
doesn't
> have nsw/nuw set) and hence isStridedPtr() in LoopAccessAnalysis.cpp return
> false.
>
> #define SIZE 100000
> void loop1 (float a[SIZE])
> {
>   long t, j;
>   float cc = 23, dd = 34, ee = 233;
> #pragma clang loop vectorize(enable)
>   for (j = 1; j < SIZE-1; j += 2) {
>     float x;
>     x = a[j+1] + ee;
>     x = x / dd;
>     x = (cc + x) / dd;
>     a[j] = x + 2 ;
>   }
> }
>
> 2. Of the two loops below, the loop "works" gets vectorized,
while the loop
> "fails" does not get vectorized.
> #define SIZE 10000
> int a[SIZE+4] = {};
> void works(unsigned long m, unsigned long n)
> {
>   long j;
>   // SCEV can compute the exit value for this loop
>   for (j = m; j < n+1; j += 1) {
>     a[j] = a[j+1] + 2;
>   }
> }
> void fails(unsigned long m, unsigned long n)
> {
>   long j;
>   // SCEV cannot compute the exit value for this loop
>   for (j = m; j <= n; j += 1) {
>     a[j] = a[j+1] + 2;
>   }
> }
>
> The only difference between the two loops is the loop exit condition, both
> semantically same. Scalar Evolution is unable to determine the exit value
of
> the second loop, leading to the vectorizer failing to vectorize. It seemed
> to me that this is due to ScalarEvolution::SimplifyICmpOperands failing to
> canonicalize the corresponding ICMP_ULE instruction.
>
> Thanks,
>
> - Vaivaswatha
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Vaivaswatha N

2015-Apr-22 11:35 UTC

head link

[LLVMdev] Missed vectorization opportunities?

>have you tried writing code for that, and confirmed that it's actually
better than just straight loop?The same loop gets vectorized on icc and runs
faster hence.Also, as an experiment, I tried forcing isStridedPtr() to return true. The
vectorizer proceeds to vectorize the loop, but does a poor job. Instead of using
vector Load/Store instructions and packing/unpacking, it uses scalar Load/Stores
instructions and packs them for the arithmetic operations. Note that without the
pragma, the cost model does advice against the vectorization. My reason for
forcing vectorization was that it did well on icc.
On icc, I tried the above loop without any pragmas and saw that the loop was
vectorized. Also using the pragma "novector" resulted in the loop not
being vectorized. With vectorization, performance was better.

Thanks,
 - Vaivaswatha 


     On Wednesday, 22 April 2015 4:46 PM, mats petersson <mats at
planetcatfish.com> wrote:
   

 In your first case, I fail to see what the obvious vectorisation would
be - you are reading one and writing one of the every other element -
that is not very easy to made into vectorized operatioons - have you
tried writing code for that, and confirmed that it's actually better
than just straight loop?

--
Mats

On 22 April 2015 at 12:01, Vaivaswatha N <vaivaswatha at yahoo.co.in>
wrote:> Hi,
>
> I am trying to understand the limitations of the current vectorizer, and
> came upon these test cases that fail to get vectorized.
>
> 1. loop1 below (note the increment by 2) fails to get vectorized because
the
> access a[j+1] is considered to wrap around (the corresponding SCEV
doesn't
> have nsw/nuw set) and hence isStridedPtr() in LoopAccessAnalysis.cpp return
> false.
>
> #define SIZE 100000
> void loop1 (float a[SIZE])
> {
>  long t, j;
>  float cc = 23, dd = 34, ee = 233;
> #pragma clang loop vectorize(enable)
>  for (j = 1; j < SIZE-1; j += 2) {
>    float x;
>    x = a[j+1] + ee;
>    x = x / dd;
>    x = (cc + x) / dd;
>    a[j] = x + 2 ;
>  }
> }
>
> 2. Of the two loops below, the loop "works" gets vectorized,
while the loop
> "fails" does not get vectorized.
> #define SIZE 10000
> int a[SIZE+4] = {};
> void works(unsigned long m, unsigned long n)
> {
>  long j;
>  // SCEV can compute the exit value for this loop
>  for (j = m; j < n+1; j += 1) {
>    a[j] = a[j+1] + 2;
>  }
> }
> void fails(unsigned long m, unsigned long n)
> {
>  long j;
>  // SCEV cannot compute the exit value for this loop
>  for (j = m; j <= n; j += 1) {
>    a[j] = a[j+1] + 2;
>  }
> }
>
> The only difference between the two loops is the loop exit condition, both
> semantically same. Scalar Evolution is unable to determine the exit value
of
> the second loop, leading to the vectorizer failing to vectorize. It seemed
> to me that this is due to ScalarEvolution::SimplifyICmpOperands failing to
> canonicalize the corresponding ICMP_ULE instruction.
>
> Thanks,
>
> - Vaivaswatha
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150422/93152e5e/attachment.html>

Daniel Berlin

2015-Apr-22 15:29 UTC

head link

[LLVMdev] Missed vectorization opportunities?

> 2. Of the two loops below, the loop "works" gets vectorized,
while the loop
> "fails" does not get vectorized.
> #define SIZE 10000
> int a[SIZE+4] = {};
> void works(unsigned long m, unsigned long n)
> {
>   long j;
>   // SCEV can compute the exit value for this loop
>   for (j = m; j < n+1; j += 1) {
>     a[j] = a[j+1] + 2;
>   }
> }
> void fails(unsigned long m, unsigned long n)
> {
>   long j;
>   // SCEV cannot compute the exit value for this loop
>   for (j = m; j <= n; j += 1) {
>     a[j] = a[j+1] + 2;
>   }
> }
>
> The only difference between the two loops is the loop exit condition, both
> semantically same.

I expect SCEV treats them differently because of MAX_INT handling.
Look as the definedness of both if n == MAX_INT. The first has
undefined behavior, the second does not.
If you change the second into the first, you introduce undefined behavior.
(or maybe it's implementation defined, but whatever)


This is the:
  if (!getUnsignedRange(RHS).getUnsignedMax().isMaxValue()) {

check in that function simplify.

But you should file a bug anyway.

Sanjoy Das

2015-Apr-22 18:44 UTC

head link

[LLVMdev] Missed vectorization opportunities?

> I expect SCEV treats them differently because of MAX_INT handling.
> Look as the definedness of both if n == MAX_INT. The first has
> undefined behavior, the second does not.
> If you change the second into the first, you introduce undefined behavior.
> (or maybe it's implementation defined, but whatever)
To elaborate a little further on this:

In the first loop, you can never enter the loop with "j == INT_SMAX"
since INT_SMAX will never be < anything.  This means j + 1 cannot
overflow.  In the second loop you /can/ enter the loop with "j
=INT_SMAX" if "n == INT_SMAX" so j + 1 can potentially overflow.

Ideally SCEV should be able to infer the nsw'ness of the additions
from the nsw bits in the source IR; but that's more complex that it
sounds since SCEV does not have  a notion of control flow within the
loop and it hashes SCEVs by the operands and not by the nsw/nuw bits.
Crude example:

define void @x(i32 %a, i32 %b, i1 %c) {
 entry:
  %m = add i32 %a, %b
  br i1 %c, label %do, label %dont

 do:
  %m1 = add nsw i32 %a, %b
  br label %dont

 dont:
  ret void
}

both %m and %m1 get mapped to the *same* SCEV, and you cannot mark
that SCEV as nsw even though %m1 is nsw.

-- Sanjoy


>
>
> This is the:
>   if (!getUnsignedRange(RHS).getUnsignedMax().isMaxValue()) {
>
> check in that function simplify.
>
> But you should file a bug anyway.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2015 - [LLVMdev] Missed vectorization opportunities?

[LLVMdev] Missed vectorization opportunities?

[LLVMdev] Missed vectorization opportunities?

[LLVMdev] Missed vectorization opportunities?

[LLVMdev] Missed vectorization opportunities?

[LLVMdev] Missed vectorization opportunities?

Possibly Parallel Threads