thr3ads.net - llvm dev - [llvm-dev] autovectorization of outer loop [May 2017]

If this information is useful, please help other people find it:
Share via:

Jyotirmoy Bhattacharya via llvm-dev

2017-May-10 07:16 UTC

[llvm-dev] autovectorization of outer loop

I have the following C++ code that evaluates a Chebyshev polynomial using
Clenshaw's algorithm

void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)
{
  #pragma omp simd
  for (int i=0;i<m;i++){
    double x = xs[i];
    double u0=0,u1=0,u2=0;
    for (int k=n;k>=0;k--){
      u2 = u1;
      u1 = u0;
      u0 = 2*x*u1-u2+coeffs[k];
    }
    ys[i] = 0.5*(coeffs[0]+u0-u2);
  }
}

I'm hoping for an autovectorization of the outer loop so that the inner
loop operates on vectors.

When compiled with

clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc

using clang++ 3.8.1-23, no vectorization happens and I get the message

chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array
bounds
      [-Rpass-analysis=loop-vectorize]
    ys[i] = 0.5*(coeffs[0]+u0-u2);
                 ^
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be
      identified as reduction is used outside the loop
      [-Rpass-analysis=loop-vectorize]


On the same code icc vectorizes the outer loop as expected.

I was wondering if there are small ways in which I can change my code to
help LLVM's autovectorizer to succeed. I would also appreciate any pointers
to documentation or LLVM source that can help me better understand how
autovectorization of outer loops works.

Regards,
Jyotirmoy Bhattacharya

PS. The interesting part of icc's assembler output is

..B1.4:                         # Preds ..B1.8 ..B1.3
        xorl      %r15d, %r15d                                  #14.5
        xorl      %ebx, %ebx                                    #14.21
        testq     %rsi, %rsi                                    #14.21
        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16
        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14
        vmovdqa   %ymm1, %ymm4                                  #13.19
        vmovdqa   %ymm1, %ymm2                                  #13.24
        jl        ..B1.8        # Prob 2%                       #14.21

..B1.5:                         # Preds ..B1.4
        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14

..B1.6:                         # Preds ..B1.6 ..B1.5
        vmovapd   %ymm4, %ymm2                                  #20.3
        incq      %r15                                          #14.5
        vmovapd   %ymm5, %ymm4                                  #20.3
        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19
        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22
        decq      %rbx
        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22
        cmpq      %r10, %r15                                    #14.5
        jb        ..B1.6        # Prob 82%                      #14.5

..B1.8:                         # Preds ..B1.6 ..B1.4
        vbroadcastsd (%rdi), %ymm3                              #19.18
        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28
        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31
        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31
        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5
        addq      $4, %r9                                       #11.3
        cmpq      %r8, %r9                                      #11.3
        jb        ..B1.4        # Prob 82%                      #11
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170510/9a48b564/attachment.html>

Zaks, Ayal via llvm-dev

2017-May-10 21:17 UTC

head link

[llvm-dev] autovectorization of outer loop

> help me better understand how autovectorization of outer loops works.
LLVM’s loop vectorizer currently handles innermost loops only.

> I'm hoping for an autovectorization of the outer loop so that the inner
loop operates on vectors.
We share that hope and are working to achieve it:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html, but it will
take some time. See https://reviews.llvm.org/D28975 and
https://reviews.llvm.org/D32871. Thanks for the use-case.

> I was wondering if there are small ways in which I can change my code to
help LLVM's autovectorizer to succeed.
If a doubly-nested loop can be interchanged such that the inner loop becomes
vectorizable, it may help.

Ayal.


From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Jyotirmoy Bhattacharya via llvm-dev
Sent: Wednesday, May 10, 2017 10:16
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] autovectorization of outer loop

I have the following C++ code that evaluates a Chebyshev polynomial using
Clenshaw's algorithm

void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)
{
  #pragma omp simd
  for (int i=0;i<m;i++){
    double x = xs[i];
    double u0=0,u1=0,u2=0;
    for (int k=n;k>=0;k--){
      u2 = u1;
      u1 = u0;
      u0 = 2*x*u1-u2+coeffs[k];
    }
    ys[i] = 0.5*(coeffs[0]+u0-u2);
  }
}

I'm hoping for an autovectorization of the outer loop so that the inner loop
operates on vectors.

When compiled with

clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc

using clang++ 3.8.1-23, no vectorization happens and I get the message

chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array bounds
      [-Rpass-analysis=loop-vectorize]
    ys[i] = 0.5*(coeffs[0]+u0-u2);
                 ^
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be
      identified as reduction is used outside the loop
      [-Rpass-analysis=loop-vectorize]


On the same code icc vectorizes the outer loop as expected.

I was wondering if there are small ways in which I can change my code to help
LLVM's autovectorizer to succeed. I would also appreciate any pointers to
documentation or LLVM source that can help me better understand how
autovectorization of outer loops works.

Regards,
Jyotirmoy Bhattacharya

PS. The interesting part of icc's assembler output is

..B1.4:                         # Preds ..B1.8 ..B1.3
        xorl      %r15d, %r15d                                  #14.5
        xorl      %ebx, %ebx                                    #14.21
        testq     %rsi, %rsi                                    #14.21
        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16
        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14
        vmovdqa   %ymm1, %ymm4                                  #13.19
        vmovdqa   %ymm1, %ymm2                                  #13.24
        jl        ..B1.8        # Prob 2%                       #14.21

..B1.5:                         # Preds ..B1.4
        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14

..B1.6:                         # Preds ..B1.6 ..B1.5
        vmovapd   %ymm4, %ymm2                                  #20.3
        incq      %r15                                          #14.5
        vmovapd   %ymm5, %ymm4                                  #20.3
        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19
        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22
        decq      %rbx
        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22
        cmpq      %r10, %r15                                    #14.5
        jb        ..B1.6        # Prob 82%                      #14.5

..B1.8:                         # Preds ..B1.6 ..B1.4
        vbroadcastsd (%rdi), %ymm3                              #19.18
        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28
        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31
        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31
        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5
        addq      $4, %r9                                       #11.3
        cmpq      %r8, %r9                                      #11.3
        jb        ..B1.4        # Prob 82%                      #11
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170510/35b94455/attachment.html>

llvm dev - May 2017 - autovectorization of outer loop

[llvm-dev] autovectorization of outer loop

[llvm-dev] autovectorization of outer loop