Displaying 20 results from an estimated 5000 matches similar to: "unable to emit vectorized code in LLVM IR"
2017 Aug 17
3
unable to emit vectorized code in LLVM IR
I want to vectorize the user given inputs. when opt does vectorization user
supplied inputs (from a text file) will be added using AVX vector
instructions.
as you pointed; When i changed my code to following:
int main(int argc, char** argv) {
int a[1000], b[1000], c[1000];
int aa=atoi(argv[1]), bb=atoi(argv[2]);
for (int i=0; i<1000; i++) {
a[i]=aa, b[i]=bb;
c[i]=a[i] + b[i];
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
i removed printf from loop. Now getting no error. but the IR doesnot
contain vectorized code. IR Output is as follows:
; ModuleID = 'sum-vec.ll'
source_filename = "sum-vec.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main(i32, i8**
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
even if i make my code as follows: vectorized instructions not get emitted.
What to do?
int main(int argc, char** argv) {
int a[1000], b[1000], c[1000]; int g=0;
int aa=atoi(argv[1]), bb=atoi(argv[2]);
for (int i=0; i<1000; i++) {
a[i]=aa, b[i]=bb;
c[i]=a[i] + b[i];
g+=c[i];
}
printf("sum: %d\n", g);
return 0;
}
On Thu, Aug 17, 2017 at 10:03 PM, Craig Topper <craig.topper at
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just
added together 1000 times.
Despite the fact that you stored to a[i] and b[i] here, nothing reads them
other than the addition in the same loop iteration. So the compiler easily
removed the a and b arrays. Same with 'c', it's not read outside the loop
so it doesn't need to exist. So the compiler turned your
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
Ok. I have managed to vectorize the second loop in the following code. But
the first loop is still not vectorized? Why?
int main(int argc, char** argv) {
int a[1000], b[1000], c[1000]; int g=0;
int aa=atoi(argv[1]), bb=atoi(argv[2]);
for (int i=0; i<1000; i++) {
a[i]=aa+i, b[i]=bb+i;}
for (int i=0; i<1000; i++) {
c[i]=a[i] + b[i];
g+=c[i];
}
printf("sum: %d\n", g);
return 0;
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
lli sum-vec03.ll 5 2 #0 0x0000000000c1f818 (lli+0xc1f818)
#1 0x0000000000c1d90e (lli+0xc1d90e)
#2 0x0000000000c1da5c (lli+0xc1da5c)
#3 0x00007f987c2c3d10 __restore_rt
(/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10)
#4 0x00007f987c6f0038
#5 0x0000000000989f8c (lli+0x989f8c)
#6 0x00000000009383dc (lli+0x9383dc)
#7 0x000000000057eedd (lli+0x57eedd)
#8 0x00007f987b464a40 __libc_start_main
2017 Jul 01
3
Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution
to vectorize such codes?
please reply. i m waiting.
On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:
> I even tried polly but still my llvm IR does not contain vector
> instructions. i used the following command;
>
> clang -S -emit-llvm stencil.c -march=knl -O3
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
Hello,
I am trying to vectorize following stencil code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[restrict][N])
{
int i, j, k;
for (k = 0; k < 100; k++)
{ for (i = 1; i <= N-2; i++)
{ for (j = 1; j <= N-2; j++)
{ a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for
2017 Oct 23
3
Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue
2017 Jun 27
3
LLVM Matrix Multiplication Loop Vectorizer
Hello,
i am trying to vectorize a simple matrix multiplication in llvm;
here is my code;
#include <stdio.h>
#define N 1000
// This function multiplies A[][] and B[][], and stores
// the result in C[][]
void multiply(int A[][N], int B[][N], int C[][N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i][j] = 0;
2017 Oct 24
3
Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which
introduces a PHI node the current loop vectorizer cannot handle:
opt -O3 stencil.ll -pass-remarks=loop-vectorize
-pass-remarks-missed=loop-vectorize
-pass-remarks-analysis=loop-vectorize
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
remark:
2018 Jul 24
2
KNL Vectorization with larger vector width
Thank You.
Right now to see the effect i did following changes;
unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
if (Vector) {
if (ST->hasAVX512())
return 65536;
here i changed 512 to 65536. Then in loopvectorize.cpp i did following;
assert(MaxVectorSize <= 2048 && "Did not expect to pack so many elements"
" into
2018 Jul 24
2
KNL Vectorization with larger vector width
Hello,
I need help here. I am able to adjust the vector width through
WidestRegister value. When number of iterations=31 and I set vector
width=32 it gives <16xi32> and <8xi32> instructions.
However if i replicate same behavior with number of iterations=63 and I
set vector width=64, no vector instructions are emitted. it should do as
previous and gives <32xi32> and
2018 Mar 14
2
LLVM opt unable to vectorize PolyBench code
Hello,
I m unable to vectorize following kernel by opt tool;
for (i = 0; i < _PB_NI; i++)
for (j = 0; j < _PB_NJ; j++)
{
tmp[i][j] = 0;
for (k = 0; k < _PB_NK; ++k)
tmp[i][j] += alpha * A[i][k] * B[k][j];
}
for (i = 0; i < _PB_NI; i++)
for (j = 0; j < _PB_NL; j++)
{
D[i][j] *= beta;
for (k = 0; k < _PB_NJ; ++k)
D[i][j] +=
2018 Jul 23
2
KNL Vectorization with larger vector width
Thank You. I got it. Version issue.
TTI.getRegisterBitWidth(true)
How to put my target machine info in TTI?
Please help.
On Mon, Jul 23, 2018 at 11:33 PM, Friedman, Eli <efriedma at codeaurora.org>
wrote:
> On 7/23/2018 10:49 AM, hameeza ahmed via llvm-dev wrote:
>
> Thank You.
>
> But I cannot find your mentioned function
2018 Sep 20
2
Vectorization width not correct using #pragma clang loop vectorize_width
Hello,
I m trying to set vector width using #pragma clang loop vectorize_width(32)
but i m getting width 8 for the following kernel;
#define M 128
#define N 128
#define SQRT_FUN(x) sqrtf(x)
int main(int argc, char** argv)
{
/* Variable declaration/allocation. */
double float_n = (double)N;
double data[N*M];
double corr[M*M];
double mean[M];
double stddev[M];
uint32_t
2018 Mar 14
0
LLVM opt unable to vectorize PolyBench code
It would help if you sent the IR you're giving to opt or at least a
complete C function and your clang command line.
~Craig
On Wed, Mar 14, 2018 at 3:05 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:
> Hello,
>
> I m unable to vectorize following kernel by opt tool;
>
> for (i = 0; i < _PB_NI; i++)
> for (j = 0; j < _PB_NJ; j++)
> {
>
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed,
>From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism.
Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi,
I agree. In the context of targeting the KNL, however, I'm a bit
concerned about the addressing, and specifically, the size of the
resulting encoding:
> vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in
> zmm0
>
> vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344]
> ; zmm1<-zmm1+b[401344]
The KNL can only