thr3ads.net - similar to: "Wrong Register use in directives"

Displaying 20 results from an estimated 9000 matches similar to: "Wrong Register use in directives"

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] CFI Directives

2013 Sep 06

[LLVMdev] CFI Directives

On 5 September 2013 19:27, Bill Wendling <wendling at apple.com> wrote: > Hi Rafael, > > I've been staring at the CFI directives and have a question. Some background: I want to generate the compact unwind information using just the CFI directives. I *think* that this should be doable. The issue I'm facing right now is that I need to know how much the stack pointer was

[LLVMdev] CFI Directives

2013 Sep 05

[LLVMdev] CFI Directives

Hi Rafael, I've been staring at the CFI directives and have a question. Some background: I want to generate the compact unwind information using just the CFI directives. I *think* that this should be doable. The issue I'm facing right now is that I need to know how much the stack pointer was adjusted. So when I have something like this: .cfi_startproc Lfunc_begin175:

AVX 512 Assembly Code Generation issues

2017 Jun 21

AVX 512 Assembly Code Generation issues

when i generate code with 72 loop iterations. the compiler generates code with using avx512 zmm operations 4 times (16x4=64) and remaining 8 iterations are handled by routine mov operations with EAX register. wouldn't it be better if it uses ymm for remaining 8 iterations as it does when iteration count is between 8 and 15. same for xmm and so on. please correct me if i am wrong. Thank

CFI directives for callee saved registers

2017 Oct 06

CFI directives for callee saved registers

Hello, I've made changes to the prologue to not spill callee saved gprs to the stack but rather spill them to unused vector registers. I'm not sure how to handle this in the cfi directives. Originally, we would use cfi_offset to give an offset of where it is saved on the stack. I tried to instead use the cfi_restore directive. As the docs say ".cfi_restore says that the rule

Conditional Register Assignment based on the no of loop iterations

2017 Jul 10

Conditional Register Assignment based on the no of loop iterations

Here basically my problem is vector width since i have used v64i32 in my backend. now if vector width=64. i want the Reg_B class registers to be assigned and if vector width=2048 i want Reg_A registers to be assigned to instruction. Should i incorporate the solution in lowering stage? some thing like; addRegisterClass(MVT::v2048i32, &X86::Reg_B);

[LLVMdev] .globl

2013 Sep 02

[LLVMdev] .globl

Hi Reed, Still catching up on email, so hope this isn't already covered... reed kotler <rkotler at mips.com> writes: > I have a strange issue that I encountered with mips16 hard float. > > Part of mips16 hard float is to emit calls to runtime routines with the > same signature as usual soft float routines, except that they are > implemented using mips32 code which uses

[LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization

2012 Mar 20

[LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization

I was told that my writeup lacked an example and details so I reproduced the code that X uses and I was able to boil down the issue to a couple of lines of code. Sorry again for the length of this email. Code was compiled on OpenBSD with clang 3.0-release. ======================================================================== With -O0 which works as X expects:

VSelect Instruction Error

2017 Sep 21

VSelect Instruction Error

Hello, I am getting this error. What instruction is required to be implemented? LLVM ERROR: Cannot select: t22: v32i32 = vselect t724, t11, t16 t724: v32i32,ch = load<LD128[FixedStack1]> t723, FrameIndex:i64<1>, undef:i64 t659: i64 = FrameIndex<1> t10: i64 = undef t11: v32i32,ch = load<LD128[%sunkaddr45](align=4)(tbaa=<0x481f1e8>)> t0, t8, undef:i64

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

Suboptimal code generated by clang+llc in quite a common scenario (?)

2019 Aug 08

Suboptimal code generated by clang+llc in quite a common scenario (?)

I found a something that I quite not understand when compiling a common piece of code using the -Os flags. I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code: char pp[3]; char *scscx = pp; int tst( char i, char j, char k ) { scscx[0] = i; scscx[1] = j; scscx[2] = k; return 0; } The above gets

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather

Unhandled reg/opcode register encoding VR2048 Error in backend

2017 Jul 07

Unhandled reg/opcode register encoding VR2048 Error in backend

Hello, I m working towards backend. Here i need to define vector load and stores for 64 i32 elements. so in x86instrinfo.td i wrote; def VMOV_256B_RM : I<0x6F, MRMSrcMem, (outs VR2048:$dst), (ins i32mem:$src), "vmov_256B_rm\t{$src, $dst|$dst, $src}", [(set VR2048:$dst, (v64i32 (scalar_to_vector (loadi32 addr:$src))))],

StringRef Iterator Variable Display

2019 Apr 23

StringRef Iterator Variable Display

Hello, I want to display the variable names in stringref iterator. But it is not displayed using following code. for (set<StringRef>::iterator sit = L.begin(); sit != L.end(); sit++) { errs() << *sit << " "; } How to do this? Please help.. -------------- next part -------------- An HTML attachment was scrubbed... URL:

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode

clang emits calls to consexpr function.

2019 Feb 05

clang emits calls to consexpr function.

Hi Devs, consider below testcase $cat test.cpp constexpr int product() { return 10*20; } int main() { const int x = product(); return 0; } $./clang test.cpp -std=c++11 -S $./clang -v clang version 9.0.0 Target: x86_64-unknown-linux-gnu $cat test.s main: .cfi_startproc # %bb.0: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq

VBROADCAST Implementation Issues

2017 Aug 06

VBROADCAST Implementation Issues

i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

Suboptimal code generated by clang+llc in quite a common scenario (?)

2019 Aug 08

Suboptimal code generated by clang+llc in quite a common scenario (?)

This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations. https://godbolt.org/z/2gQ26f Alex On Thu, Aug 8, 2019 at 11:50 AM Michael Kruse via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > char* scscx is an universal pointer and may point to anything, > including itself. That is, scscx might

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for

similar to: Wrong Register use in directives