Displaying 20 results from an estimated 9000 matches similar to: "Wrong Register use in directives"
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2013 Sep 06
0
[LLVMdev] CFI Directives
On 5 September 2013 19:27, Bill Wendling <wendling at apple.com> wrote:
> Hi Rafael,
>
> I've been staring at the CFI directives and have a question. Some background: I want to generate the compact unwind information using just the CFI directives. I *think* that this should be doable. The issue I'm facing right now is that I need to know how much the stack pointer was
2013 Sep 05
2
[LLVMdev] CFI Directives
Hi Rafael,
I've been staring at the CFI directives and have a question. Some background: I want to generate the compact unwind information using just the CFI directives. I *think* that this should be doable. The issue I'm facing right now is that I need to know how much the stack pointer was adjusted. So when I have something like this:
.cfi_startproc
Lfunc_begin175:
2017 Jun 21
2
AVX 512 Assembly Code Generation issues
when i generate code with 72 loop iterations.
the compiler generates code with using avx512 zmm operations 4 times
(16x4=64) and remaining 8 iterations are handled by routine mov operations
with EAX register. wouldn't it be better if it uses ymm for remaining 8
iterations as it does when iteration count is between 8 and 15. same for
xmm and so on.
please correct me if i am wrong.
Thank
2017 Oct 06
2
CFI directives for callee saved registers
Hello,
I've made changes to the prologue to not spill callee saved gprs to the
stack but rather spill them to unused vector registers. I'm not sure how
to handle this in the cfi directives. Originally, we would use cfi_offset
to give an offset of where it is saved on the stack. I tried to instead
use the cfi_restore directive. As the docs say ".cfi_restore says that the
rule
2017 Jul 10
2
Conditional Register Assignment based on the no of loop iterations
Here basically my problem is vector width since i have used v64i32 in my
backend. now if vector width=64. i want the Reg_B class registers to be
assigned and if vector width=2048 i want Reg_A registers to be assigned to
instruction.
Should i incorporate the solution in lowering stage? some thing like;
addRegisterClass(MVT::v2048i32, &X86::Reg_B);
2013 Sep 02
0
[LLVMdev] .globl
Hi Reed,
Still catching up on email, so hope this isn't already covered...
reed kotler <rkotler at mips.com> writes:
> I have a strange issue that I encountered with mips16 hard float.
>
> Part of mips16 hard float is to emit calls to runtime routines with the
> same signature as usual soft float routines, except that they are
> implemented using mips32 code which uses
2012 Mar 20
0
[LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization
I was told that my writeup lacked an example and details so I reproduced
the code that X uses and I was able to boil down the issue to a couple
of lines of code. Sorry again for the length of this email.
Code was compiled on OpenBSD with clang 3.0-release.
========================================================================
With -O0 which works as X expects:
2017 Sep 21
1
VSelect Instruction Error
Hello,
I am getting this error. What instruction is required to be implemented?
LLVM ERROR: Cannot select: t22: v32i32 = vselect t724, t11, t16
t724: v32i32,ch = load<LD128[FixedStack1]> t723, FrameIndex:i64<1>,
undef:i64
t659: i64 = FrameIndex<1>
t10: i64 = undef
t11: v32i32,ch = load<LD128[%sunkaddr45](align=4)(tbaa=<0x481f1e8>)> t0,
t8, undef:i64
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior
2019 Aug 08
2
Suboptimal code generated by clang+llc in quite a common scenario (?)
I found a something that I quite not understand when compiling a common piece of code using the -Os flags.
I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code:
char pp[3];
char *scscx = pp;
int tst( char i, char j, char k )
{
scscx[0] = i;
scscx[1] = j;
scscx[2] = k;
return 0;
}
The above gets
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said
as follows:
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
[(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather
2017 Jul 07
2
Unhandled reg/opcode register encoding VR2048 Error in backend
Hello,
I m working towards backend.
Here i need to define vector load and stores for 64 i32 elements. so in
x86instrinfo.td i wrote;
def VMOV_256B_RM : I<0x6F, MRMSrcMem, (outs VR2048:$dst), (ins
i32mem:$src),
"vmov_256B_rm\t{$src, $dst|$dst, $src}",
[(set VR2048:$dst, (v64i32 (scalar_to_vector (loadi32
addr:$src))))],
2019 Apr 23
5
StringRef Iterator Variable Display
Hello,
I want to display the variable names in stringref iterator. But it is not
displayed using following code.
for (set<StringRef>::iterator sit = L.begin(); sit != L.end(); sit++) {
errs() << *sit << " ";
}
How to do this?
Please help..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello,
I did as you said,
Please tell me whether the following correct now??
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
[(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode
2019 Feb 05
2
clang emits calls to consexpr function.
Hi Devs,
consider below testcase
$cat test.cpp
constexpr int product()
{
return 10*20;
}
int main()
{
const int x = product();
return 0;
}
$./clang test.cpp -std=c++11 -S
$./clang -v
clang version 9.0.0
Target: x86_64-unknown-linux-gnu
$cat test.s
main:
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code.
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
i2048mem:$src),
"GATHER_256B\t{$src, $dst|$dst, $src}",
[(set VR_2048:$dst, (v64i32 (masked_gather
addr:$src)))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2019 Aug 08
3
Suboptimal code generated by clang+llc in quite a common scenario (?)
This might not be the workaround you want because it is only available in
C, but you can use restrict to allow such optimizations.
https://godbolt.org/z/2gQ26f
Alex
On Thu, Aug 8, 2019 at 11:50 AM Michael Kruse via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> char* scscx is an universal pointer and may point to anything,
> including itself. That is, scscx might
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for