thr3ads.net - similar to: "Regarding Compiler support"

Displaying 20 results from an estimated 20000 matches similar to: "Regarding Compiler support"

2016 Dec 27

Regarding Compiler support

> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of hameeza ahmed via llvm-dev > Sent: Saturday, December 24, 2016 2:17 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Regarding Compiler support > > Hello, > I want to ask some basic questions; > > What is meant by compiler support? Like if we say compiler support is > provided for

Issues with omp simd

2017 Dec 30

Issues with omp simd

I even tried following; int main(int argc, char **argv) { const int size = 1000000; float a[size], b[size],c[size]; #pragma omp simd for (int i=0; i<size; ++i) { a[i]=2; b[i]=3; c[i]=4; c[i]= a[i] + b[i]; } return 0; } but the output with and without openmp simd is same. why is that so? On Sun, Dec 31, 2017 at 12:01

Issues with omp simd

2017 Dec 30

Issues with omp simd

i changed my code to following; #pragma omp simd for (int i=0; i<size; ++i) { a[i]=2; b[i]=3; c[i]=4; c[i]= a[i] + b[i]; printf("c value %f",c[i]); } still no effect of omp simd? On Sun, Dec 31, 2017 at 12:26 AM, Craig Topper <craig.topper at gmail.com> wrote: > The for loop has no effect on the observable behavior of

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 24

Jacobi 5 Point Stencil Code not Vectorizing

Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark:

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 23

Jacobi 5 Point Stencil Code not Vectorizing

<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode

VBROADCAST Implementation Issues

2017 Aug 06

VBROADCAST Implementation Issues

i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for

Issues in Vector Add Instruction Machine Code Emission

2017 Sep 05

Issues in Vector Add Instruction Machine Code Emission

I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i restored my original instructions and for that i have to include bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp then used this condition; if(HasTA) ++SrcRegNum; in order to emit binary correctly. Is it right? On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote: >

Issues in Vector Add Instruction Machine Code Emission

2017 Sep 05

Issues in Vector Add Instruction Machine Code Emission

Thank You, I changed TA to EVEX or EVEX_4V. But now i am getting following error: Invalid prefix! UNREACHABLE executed at /lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647! On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote: > Not all instructions can use EVEX_4V. Move instructions in particular > cannot because they don't have 2 sources. >

Using new types v32f32, v32f64 in llvm backend not possible

2017 Jul 12

Using new types v32f32, v32f64 in llvm backend not possible

I would be very grateful if you specify whether there is some way to allocate registers (different order) / from different register sets to the same instruction based on the vector width/ no of iterations. I have tried several alternatives but could not succeed. Also I have asked this question many times but no one responds. Is there something wrong with this?? Kindly guide me. Thank You On

Issues with omp simd

2017 Dec 30

Issues with omp simd

hello, i am trying to optimize omp simd loop as follows int main(int argc, char **argv) { const int size = 1000000; float a[size], b[size],c[size]; #pragma omp simd for (int i=0; i<size; ++i) { c[i]= a[i] + b[i]; } return 0; } i run it using the following command; g++ -O0 --std=c++14 -fopenmp-simd lab.cpp -Iinclude -S -o lab.s

Error in v64i32 type in x86 backend

2017 Jul 07

Error in v64i32 type in x86 backend

Thank You. On Fri, Jul 7, 2017 at 10:03 AM, Craig Topper <craig.topper at gmail.com> wrote: > Yes, that error is from instruction selection. I think your legalization > changes worked fine. > > ~Craig > > On Thu, Jul 6, 2017 at 8:21 PM, hameeza ahmed via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> also i further run the following command;

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and

Automatic GPU Code Generation

2018 Dec 11

Automatic GPU Code Generation

Thank You.. I am asking to generate directly PTX code automatically or by directives without involvement of CUDA. This way, I am talking about avoiding source to source compiler approach where c code is converted automatically into CUDA, instead I am saying directly to convert C code to PTX assembly. On Tue, Dec 11, 2018 at 12:19 PM Madhur Amilkanthwar <madhur13490 at gmail.com> wrote:

Issues in Vector Add Instruction Machine Code Emission

2017 Sep 04

Issues in Vector Add Instruction Machine Code Emission

Thank You. I used EVEX_4V with all the instructions. I replaced TA and EVEX both with EVEX_4V. Now, I am getting following error: llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): Assertion `numPhysicalOperands >= 2 + additionalOperands && numPhysicalOperands <= 4 + additionalOperands &&

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

lli sum-vec03.ll 5 2 #0 0x0000000000c1f818 (lli+0xc1f818) #1 0x0000000000c1d90e (lli+0xc1d90e) #2 0x0000000000c1da5c (lli+0xc1da5c) #3 0x00007f987c2c3d10 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10) #4 0x00007f987c6f0038 #5 0x0000000000989f8c (lli+0x989f8c) #6 0x00000000009383dc (lli+0x9383dc) #7 0x000000000057eedd (lli+0x57eedd) #8 0x00007f987b464a40 __libc_start_main

error:Ran out of lanemask bits to represent subregister

2017 Jul 14

error:Ran out of lanemask bits to represent subregister

Do your 32768 registers also have sub registers? I can't tell you exactly what to change. I'm not familiar with the code. I would just be running grep or something. ~Craig On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Thank you so much. I think there is no issue with my definitions since i > have to use larger registers i.e 65536 bit

similar to: Regarding Compiler support