similar to: Regarding Compiler support

Displaying 20 results from an estimated 20000 matches similar to: "Regarding Compiler support"

2016 Dec 27
0
Regarding Compiler support
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of hameeza ahmed via llvm-dev > Sent: Saturday, December 24, 2016 2:17 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Regarding Compiler support > > Hello, > I want to ask some basic questions; > > What is meant by compiler support? Like if we say compiler support is > provided for
2017 Dec 30
3
Issues with omp simd
I even tried following; int main(int argc, char **argv) { const int size = 1000000; float a[size], b[size],c[size]; #pragma omp simd for (int i=0; i<size; ++i) { a[i]=2; b[i]=3; c[i]=4; c[i]= a[i] + b[i]; } return 0; } but the output with and without openmp simd is same. why is that so? On Sun, Dec 31, 2017 at 12:01
2017 Dec 30
1
Issues with omp simd
i changed my code to following; #pragma omp simd for (int i=0; i<size; ++i) { a[i]=2; b[i]=3; c[i]=4; c[i]= a[i] + b[i]; printf("c value %f",c[i]); } still no effect of omp simd? On Sun, Dec 31, 2017 at 12:26 AM, Craig Topper <craig.topper at gmail.com> wrote: > The for loop has no effect on the observable behavior of
2017 Oct 24
3
Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark:
2017 Oct 23
3
Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue
2017 Jul 01
3
Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i restored my original instructions and for that i have to include bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp then used this condition; if(HasTA) ++SrcRegNum; in order to emit binary correctly. Is it right? On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote: >
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
Thank You, I changed TA to EVEX or EVEX_4V. But now i am getting following error: Invalid prefix! UNREACHABLE executed at /lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647! On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote: > Not all instructions can use EVEX_4V. Move instructions in particular > cannot because they don't have 2 sources. >
2017 Jul 12
2
Using new types v32f32, v32f64 in llvm backend not possible
I would be very grateful if you specify whether there is some way to allocate registers (different order) / from different register sets to the same instruction based on the vector width/ no of iterations. I have tried several alternatives but could not succeed. Also I have asked this question many times but no one responds. Is there something wrong with this?? Kindly guide me. Thank You On
2017 Dec 30
2
Issues with omp simd
hello, i am trying to optimize omp simd loop as follows int main(int argc, char **argv) { const int size = 1000000; float a[size], b[size],c[size]; #pragma omp simd for (int i=0; i<size; ++i) { c[i]= a[i] + b[i]; } return 0; } i run it using the following command; g++ -O0 --std=c++14 -fopenmp-simd lab.cpp -Iinclude -S -o lab.s
2017 Jul 07
2
Error in v64i32 type in x86 backend
Thank You. On Fri, Jul 7, 2017 at 10:03 AM, Craig Topper <craig.topper at gmail.com> wrote: > Yes, that error is from instruction selection. I think your legalization > changes worked fine. > > ~Craig > > On Thu, Jul 6, 2017 at 8:21 PM, hameeza ahmed via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> also i further run the following command;
2018 Jul 24
2
KNL Vectorization with larger vector width
Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and
2018 Dec 11
2
Automatic GPU Code Generation
Thank You.. I am asking to generate directly PTX code automatically or by directives without involvement of CUDA. This way, I am talking about avoiding source to source compiler approach where c code is converted automatically into CUDA, instead I am saying directly to convert C code to PTX assembly. On Tue, Dec 11, 2018 at 12:19 PM Madhur Amilkanthwar <madhur13490 at gmail.com> wrote:
2017 Sep 04
2
Issues in Vector Add Instruction Machine Code Emission
Thank You. I used EVEX_4V with all the instructions. I replaced TA and EVEX both with EVEX_4V. Now, I am getting following error: llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier(): Assertion `numPhysicalOperands >= 2 + additionalOperands && numPhysicalOperands <= 4 + additionalOperands &&
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
lli sum-vec03.ll 5 2 #0 0x0000000000c1f818 (lli+0xc1f818) #1 0x0000000000c1d90e (lli+0xc1d90e) #2 0x0000000000c1da5c (lli+0xc1da5c) #3 0x00007f987c2c3d10 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10) #4 0x00007f987c6f0038 #5 0x0000000000989f8c (lli+0x989f8c) #6 0x00000000009383dc (lli+0x9383dc) #7 0x000000000057eedd (lli+0x57eedd) #8 0x00007f987b464a40 __libc_start_main
2017 Jul 14
3
error:Ran out of lanemask bits to represent subregister
Do your 32768 registers also have sub registers? I can't tell you exactly what to change. I'm not familiar with the code. I would just be running grep or something. ~Craig On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Thank you so much. I think there is no issue with my definitions since i > have to use larger registers i.e 65536 bit