Displaying 20 results from an estimated 20000 matches similar to: "Regarding Compiler support"
2016 Dec 27
0
Regarding Compiler support
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of hameeza ahmed via llvm-dev
> Sent: Saturday, December 24, 2016 2:17 PM
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Regarding Compiler support
>
> Hello,
> I want to ask some basic questions;
>
> What is meant by compiler support? Like if we say compiler support is
> provided for
2017 Dec 30
3
Issues with omp simd
I even tried following;
int main(int argc, char **argv)
{
const int size = 1000000;
float a[size], b[size],c[size];
#pragma omp simd
for (int i=0; i<size; ++i)
{
a[i]=2; b[i]=3; c[i]=4;
c[i]= a[i] + b[i];
}
return 0;
}
but the output with and without openmp simd is same. why is that so?
On Sun, Dec 31, 2017 at 12:01
2017 Dec 30
1
Issues with omp simd
i changed my code to following;
#pragma omp simd
for (int i=0; i<size; ++i)
{
a[i]=2; b[i]=3; c[i]=4;
c[i]= a[i] + b[i];
printf("c value %f",c[i]);
}
still no effect of omp simd?
On Sun, Dec 31, 2017 at 12:26 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> The for loop has no effect on the observable behavior of
2017 Oct 24
3
Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which
introduces a PHI node the current loop vectorizer cannot handle:
opt -O3 stencil.ll -pass-remarks=loop-vectorize
-pass-remarks-missed=loop-vectorize
-pass-remarks-analysis=loop-vectorize
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
remark:
2017 Oct 23
3
Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue
2017 Jul 01
3
Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution
to vectorize such codes?
please reply. i m waiting.
On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:
> I even tried polly but still my llvm IR does not contain vector
> instructions. i used the following command;
>
> clang -S -emit-llvm stencil.c -march=knl -O3
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said
as follows:
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
[(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello,
I did as you said,
Please tell me whether the following correct now??
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
[(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code.
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
i2048mem:$src),
"GATHER_256B\t{$src, $dst|$dst, $src}",
[(set VR_2048:$dst, (v64i32 (masked_gather
addr:$src)))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i
restored my original instructions and for that i have to include
bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp
then used this condition;
if(HasTA)
++SrcRegNum;
in order to emit binary correctly.
Is it right?
On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote:
>
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
Thank You,
I changed TA to EVEX or EVEX_4V. But now i am getting following error:
Invalid prefix!
UNREACHABLE executed at
/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647!
On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote:
> Not all instructions can use EVEX_4V. Move instructions in particular
> cannot because they don't have 2 sources.
>
2017 Jul 12
2
Using new types v32f32, v32f64 in llvm backend not possible
I would be very grateful if you specify whether there is some way to
allocate registers (different order) / from different register sets to the
same instruction based on the vector width/ no of iterations.
I have tried several alternatives but could not succeed.
Also I have asked this question many times but no one responds.
Is there something wrong with this??
Kindly guide me.
Thank You
On
2017 Dec 30
2
Issues with omp simd
hello,
i am trying to optimize omp simd loop as follows
int main(int argc, char **argv)
{
const int size = 1000000;
float a[size], b[size],c[size];
#pragma omp simd
for (int i=0; i<size; ++i)
{
c[i]= a[i] + b[i];
}
return 0;
}
i run it using the following command;
g++ -O0 --std=c++14 -fopenmp-simd lab.cpp -Iinclude -S -o lab.s
2017 Jul 07
2
Error in v64i32 type in x86 backend
Thank You.
On Fri, Jul 7, 2017 at 10:03 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> Yes, that error is from instruction selection. I think your legalization
> changes worked fine.
>
> ~Craig
>
> On Thu, Jul 6, 2017 at 8:21 PM, hameeza ahmed via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> also i further run the following command;
2018 Jul 24
2
KNL Vectorization with larger vector width
Hello,
I need help here. I am able to adjust the vector width through
WidestRegister value. When number of iterations=31 and I set vector
width=32 it gives <16xi32> and <8xi32> instructions.
However if i replicate same behavior with number of iterations=63 and I
set vector width=64, no vector instructions are emitted. it should do as
previous and gives <32xi32> and
2018 Dec 11
2
Automatic GPU Code Generation
Thank You..
I am asking to generate directly PTX code automatically or by directives
without involvement of CUDA. This way, I am talking about avoiding source
to source compiler approach where c code is converted automatically into
CUDA, instead I am saying directly to convert C code to PTX assembly.
On Tue, Dec 11, 2018 at 12:19 PM Madhur Amilkanthwar <madhur13490 at gmail.com>
wrote:
2017 Sep 04
2
Issues in Vector Add Instruction Machine Code Emission
Thank You.
I used EVEX_4V with all the instructions. I replaced TA and EVEX both with
EVEX_4V. Now, I am getting following error:
llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
Assertion `numPhysicalOperands >= 2 + additionalOperands &&
numPhysicalOperands <= 4 + additionalOperands &&
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
lli sum-vec03.ll 5 2 #0 0x0000000000c1f818 (lli+0xc1f818)
#1 0x0000000000c1d90e (lli+0xc1d90e)
#2 0x0000000000c1da5c (lli+0xc1da5c)
#3 0x00007f987c2c3d10 __restore_rt
(/lib/x86_64-linux-gnu/libpthread.so.0+0x10d10)
#4 0x00007f987c6f0038
#5 0x0000000000989f8c (lli+0x989f8c)
#6 0x00000000009383dc (lli+0x9383dc)
#7 0x000000000057eedd (lli+0x57eedd)
#8 0x00007f987b464a40 __libc_start_main
2017 Jul 14
3
error:Ran out of lanemask bits to represent subregister
Do your 32768 registers also have sub registers?
I can't tell you exactly what to change. I'm not familiar with the code. I
would just be running grep or something.
~Craig
On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Thank you so much. I think there is no issue with my definitions since i
> have to use larger registers i.e 65536 bit