thr3ads.net - search: "ymm3"

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

3

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

0

[LLVMdev] Calling conventions for YMM registers on AVX

...test: # @test # BB#0: # %entry pushq %rbp movq %rsp, %rbp subq $64, %rsp vmovaps %xmm7, -32(%rbp) # 16-byte Spill vmovaps %xmm6, -16(%rbp) # 16-byte Spill vmovaps %ymm3, %ymm6 vmovaps %ymm2, %ymm7 vaddps %ymm7, %ymm0, %ymm0 vaddps %ymm6, %ymm1, %ymm1 callq foo vsubps %ymm7, %ymm0, %ymm0 vsubps %ymm6, %ymm1, %ymm1 vmovaps -16(%rbp), %xmm6 # 16-byte Reload vmovaps -32(%rbp), %xmm7 #...

Vector evolution?

2020 Sep 01

2

Vector evolution?

...0x0(%rip),%ymm0 # 1eb <_Z4fct7Pf+0xb> 1e9: 00 00 1eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 1f0: c5 fc 59 0c 87 vmulps (%rdi,%rax,4),%ymm0,%ymm1 1f5: c5 fc 59 54 87 20 vmulps 0x20(%rdi,%rax,4),%ymm0,%ymm2 1fb: c5 fc 59 5c 87 40 vmulps 0x40(%rdi,%rax,4),%ymm0,%ymm3 201: c5 fc 59 64 87 60 vmulps 0x60(%rdi,%rax,4),%ymm0,%ymm4 207: c5 fc 11 0c 87 vmovups %ymm1,(%rdi,%rax,4) 20c: c5 fc 11 54 87 20 vmovups %ymm2,0x20(%rdi,%rax,4) 212: c5 fc 11 5c 87 40 vmovups %ymm3,0x40(%rdi,%rax,4) 218: c5 fc 11 64 87 60 vmovups %ymm4,0x60(%rdi,%rax,4) 2...

[LLVMdev] AVX code gen

2013 Dec 11

2

[LLVMdev] AVX code gen

...the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought th...

unable to emit vectorized code in LLVM IR

2017 Aug 17

4

unable to emit vectorized code in LLVM IR

I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your

[LLVMdev] AVX code gen

2013 Dec 12

0

[LLVMdev] AVX code gen

...the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought th...

RFC: Adding Support For Vectorcall Calling Convention

2016 Nov 30

2

RFC: Adding Support For Vectorcall Calling Convention

...pes are allocated, in ascending order, to unused vector registers XMM0/YMM0/ZMM0 to XMM5/YMM5/ZMM5. - Just like in the default x65 CC, Shadow space is allocated for vector/HVA types. The size is fixed to 8 bytes per argument. - HVA types are returned in XMM0/YMM0/ZMM0 to XMM3/YMM3/ZMM3 while vector types are returned in XMM0/YMM0/ZMM0 and integers in RAX For more information or examples please see also: https://msdn.microsoft.com/en-us/library/dn375768.aspx Observations ------------------ - LLVM IR must preserve the original position of the arguments. - Since HVA...

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

2014 Feb 21

2

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

...size:256;offset:307;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:18;dwarf:18;#00 > $qRegisterInfo5d#db > $name:ymm2;bitsize:256;offset:339;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:19;dwarf:19;#00 > $qRegisterInfo5e#dc > $name:ymm3;bitsize:256;offset:371;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:20;dwarf:20;#00 > $qRegisterInfo5f#dd > $name:ymm4;bitsize:256;offset:403;encoding:vector;format:vector-uint8;set:Floating > Point Registers;gcc:21;dwarf:21;#00 > $qRegisterInfo60#a8 > $n...

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

2014 Feb 20

2

[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?

Thank you, Clayton. This is very helpful. We use the LLDB specific GDB remote extensions, and our debugger server supports "qRegisterInfo" package. "reg 0x3c" is the frame pointer. In the example mentioned above, we have SP = FP - 40 for current call frame. And variable "a" is stored at address (FP + -24) from asm instruction [FP + -24] = R3;; Thus we can conclude

search for: ymm3