thr3ads.net - similar to: "unable to emit vectorized code in LLVM IR"

Displaying 20 results from an estimated 5000 matches similar to: "unable to emit vectorized code in LLVM IR"

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

I want to vectorize the user given inputs. when opt does vectorization user supplied inputs (from a text file) will be added using AVX vector instructions. as you pointed; When i changed my code to following: int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa, b[i]=bb; c[i]=a[i] + b[i];

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

i removed printf from loop. Now getting no error. but the IR doesnot contain vectorized code. IR Output is as follows: ; ModuleID = 'sum-vec.ll' source_filename = "sum-vec.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @main(i32, i8**

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

even if i make my code as follows: vectorized instructions not get emitted. What to do? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa, b[i]=bb; c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0; } On Thu, Aug 17, 2017 at 10:03 PM, Craig Topper <craig.topper at

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

Ok. I have managed to vectorize the second loop in the following code. But the first loop is still not vectorized? Why? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa+i, b[i]=bb+i;} for (int i=0; i<1000; i++) { c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0;

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

lli sum-vec03.ll 5 2 #0 0x0000000000c1f818 (lli+0xc1f818) #1 0x0000000000c1d90e (lli+0xc1d90e) #2 0x0000000000c1da5c (lli+0xc1da5c) #3 0x00007f987c2c3d10 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10) #4 0x00007f987c6f0038 #5 0x0000000000989f8c (lli+0x989f8c) #6 0x00000000009383dc (lli+0x9383dc) #7 0x000000000057eedd (lli+0x57eedd) #8 0x00007f987b464a40 __libc_start_main

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

Hello, I am trying to vectorize following stencil code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[restrict][N]) { int i, j, k; for (k = 0; k < 100; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j++) { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 23

Jacobi 5 Point Stencil Code not Vectorizing

<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue

LLVM Matrix Multiplication Loop Vectorizer

2017 Jun 27

LLVM Matrix Multiplication Loop Vectorizer

Hello, i am trying to vectorize a simple matrix multiplication in llvm; here is my code; #include <stdio.h> #define N 1000 // This function multiplies A[][] and B[][], and stores // the result in C[][] void multiply(int A[][N], int B[][N], int C[][N]) { int i, j, k; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = 0;

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 24

Jacobi 5 Point Stencil Code not Vectorizing

Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark:

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

Thank You. Right now to see the effect i did following changes; unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasAVX512()) return 65536; here i changed 512 to 65536. Then in loopvectorize.cpp i did following; assert(MaxVectorSize <= 2048 && "Did not expect to pack so many elements" " into

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and

LLVM opt unable to vectorize PolyBench code

2018 Mar 14

LLVM opt unable to vectorize PolyBench code

Hello, I m unable to vectorize following kernel by opt tool; for (i = 0; i < _PB_NI; i++) for (j = 0; j < _PB_NJ; j++) { tmp[i][j] = 0; for (k = 0; k < _PB_NK; ++k) tmp[i][j] += alpha * A[i][k] * B[k][j]; } for (i = 0; i < _PB_NI; i++) for (j = 0; j < _PB_NL; j++) { D[i][j] *= beta; for (k = 0; k < _PB_NJ; ++k) D[i][j] +=

KNL Vectorization with larger vector width

2018 Jul 23

KNL Vectorization with larger vector width

Thank You. I got it. Version issue. TTI.getRegisterBitWidth(true) How to put my target machine info in TTI? Please help. On Mon, Jul 23, 2018 at 11:33 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 7/23/2018 10:49 AM, hameeza ahmed via llvm-dev wrote: > > Thank You. > > But I cannot find your mentioned function

Vectorization width not correct using #pragma clang loop vectorize_width

2018 Sep 20

Vectorization width not correct using #pragma clang loop vectorize_width

Hello, I m trying to set vector width using #pragma clang loop vectorize_width(32) but i m getting width 8 for the following kernel; #define M 128 #define N 128 #define SQRT_FUN(x) sqrtf(x) int main(int argc, char** argv) { /* Variable declaration/allocation. */ double float_n = (double)N; double data[N*M]; double corr[M*M]; double mean[M]; double stddev[M]; uint32_t

LLVM opt unable to vectorize PolyBench code

2018 Mar 14

LLVM opt unable to vectorize PolyBench code

It would help if you sent the IR you're giving to opt or at least a complete C function and your clang command line. ~Craig On Wed, Mar 14, 2018 at 3:05 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Hello, > > I m unable to vectorize following kernel by opt tool; > > for (i = 0; i < _PB_NI; i++) > for (j = 0; j < _PB_NJ; j++) > { >

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi Ahmed, >From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only

similar to: unable to emit vectorized code in LLVM IR