search for: ymm4

Displaying 7 results from an estimated 7 matches for "ymm4".

Did you mean: xmm4
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
...+ rc = x86_emulate(&ctxt, &emulops); + if ( (rc != X86EMUL_OKAY) || memcmp(res, res + 16, 64) ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + + printf("%-40s", "Testing vmovdqu (%edx),%ymm4..."); + if ( stack_exec && cpu_has_avx ) + { + extern const unsigned char vmovdqu_from_mem[]; + +#if 0 /* Don''t use AVX2 instructions for now */ + asm volatile ( "vpcmpgtb %%ymm4, %%ymm4, %%ymm4\n" +#else + asm volatile ( "vpcmpgtb %%x...
2016 May 06
3
Unnecessary spill/fill issue
...the constant vectors immediately to stack, then each use references the stack pointer directly: Lots of these at top of function: movabsq $.LCPI0_212, %rbx vmovaps (%rbx), %ymm0 vmovaps %ymm0, 2816(%rsp) # 32-byte Spill Later on, each use references the stack pointer: vpaddd 2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload It seems the spill to stack is unnecessary. In one particularly bad kernel, I have 128 8-wide constant vectors, and so there is 4KB of stack use just for these constants. I think a better approach could be to load the constant vector pointers as needed: movabsq $.LC...
2020 Sep 01
2
Vector evolution?
...1eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 1f0: c5 fc 59 0c 87 vmulps (%rdi,%rax,4),%ymm0,%ymm1 1f5: c5 fc 59 54 87 20 vmulps 0x20(%rdi,%rax,4),%ymm0,%ymm2 1fb: c5 fc 59 5c 87 40 vmulps 0x40(%rdi,%rax,4),%ymm0,%ymm3 201: c5 fc 59 64 87 60 vmulps 0x60(%rdi,%rax,4),%ymm0,%ymm4 207: c5 fc 11 0c 87 vmovups %ymm1,(%rdi,%rax,4) 20c: c5 fc 11 54 87 20 vmovups %ymm2,0x20(%rdi,%rax,4) 212: c5 fc 11 5c 87 40 vmovups %ymm3,0x40(%rdi,%rax,4) 218: c5 fc 11 64 87 60 vmovups %ymm4,0x60(%rdi,%rax,4) 21e: c5 fc 59 8c 87 80 00 vmulps 0x80(%rdi,%rax,4),%ymm0,%ymm1 2...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...ithSubRegs<"ymm1", [XMM1, XMM1b]>, DwarfRegNum<[18, 22, 22]>; def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>, DwarfRegNum<[19, 23, 23]>; def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>, DwarfRegNum<[20, 24, 24]>; def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>, DwarfRegNum<[21, 25, 25]>; def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>, DwarfRegNum<[22, 26, 26]>; def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>, DwarfRegNum<[23, 27, 27]&...
2014 Feb 21
2
[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?
...size:256;offset:339;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:19;dwarf:19;#00 > $qRegisterInfo5e#dc > $name:ymm3;bitsize:256;offset:371;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:20;dwarf:20;#00 > $qRegisterInfo5f#dd > $name:ymm4;bitsize:256;offset:403;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:21;dwarf:21;#00 > $qRegisterInfo60#a8 > $name:ymm5;bitsize:256;offset:435;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:22;dwarf:22;#00 > $qRegisterInfo61#a9 > $n...
2014 Feb 20
2
[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?
Thank you, Clayton. This is very helpful. We use the LLDB specific GDB remote extensions, and our debugger server supports "qRegisterInfo" package. "reg 0x3c" is the frame pointer. In the example mentioned above, we have SP = FP - 40 for current call frame. And variable "a" is stored at address (FP + -24) from asm instruction [FP + -24] = R3;; Thus we can conclude