Displaying 9 results from an estimated 9 matches for "ymm3".
Did you mean:
xmm3
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...test: # @test
# BB#0: # %entry
pushq %rbp
movq %rsp, %rbp
subq $64, %rsp
vmovaps %xmm7, -32(%rbp) # 16-byte Spill
vmovaps %xmm6, -16(%rbp) # 16-byte Spill
vmovaps %ymm3, %ymm6
vmovaps %ymm2, %ymm7
vaddps %ymm7, %ymm0, %ymm0
vaddps %ymm6, %ymm1, %ymm1
callq foo
vsubps %ymm7, %ymm0, %ymm0
vsubps %ymm6, %ymm1, %ymm1
vmovaps -16(%rbp), %xmm6 # 16-byte Reload
vmovaps -32(%rbp), %xmm7 #...
2020 Sep 01
2
Vector evolution?
...0x0(%rip),%ymm0 # 1eb
<_Z4fct7Pf+0xb>
1e9: 00 00
1eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1f0: c5 fc 59 0c 87 vmulps (%rdi,%rax,4),%ymm0,%ymm1
1f5: c5 fc 59 54 87 20 vmulps 0x20(%rdi,%rax,4),%ymm0,%ymm2
1fb: c5 fc 59 5c 87 40 vmulps 0x40(%rdi,%rax,4),%ymm0,%ymm3
201: c5 fc 59 64 87 60 vmulps 0x60(%rdi,%rax,4),%ymm0,%ymm4
207: c5 fc 11 0c 87 vmovups %ymm1,(%rdi,%rax,4)
20c: c5 fc 11 54 87 20 vmovups %ymm2,0x20(%rdi,%rax,4)
212: c5 fc 11 5c 87 40 vmovups %ymm3,0x40(%rdi,%rax,4)
218: c5 fc 11 64 87 60 vmovups %ymm4,0x60(%rdi,%rax,4)
2...
2013 Dec 11
2
[LLVMdev] AVX code gen
...the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought th...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just
added together 1000 times.
Despite the fact that you stored to a[i] and b[i] here, nothing reads them
other than the addition in the same loop iteration. So the compiler easily
removed the a and b arrays. Same with 'c', it's not read outside the loop
so it doesn't need to exist. So the compiler turned your
2013 Dec 12
0
[LLVMdev] AVX code gen
...the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought th...
2016 Nov 30
2
RFC: Adding Support For Vectorcall Calling Convention
...pes are
allocated, in ascending order, to unused vector registers
XMM0/YMM0/ZMM0 to XMM5/YMM5/ZMM5.
- Just like in the default x65 CC, Shadow space is allocated for
vector/HVA types. The size is fixed to 8 bytes per argument.
- HVA types are returned in XMM0/YMM0/ZMM0 to XMM3/YMM3/ZMM3 while
vector types are returned in XMM0/YMM0/ZMM0 and integers in RAX
For more information or examples please see also:
https://msdn.microsoft.com/en-us/library/dn375768.aspx
Observations
------------------
- LLVM IR must preserve the original position of the arguments.
- Since HVA...
2014 Feb 21
2
[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?
...size:256;offset:307;encoding:vector;format:vector-uint8;set:Floating
> Point Registers;gcc:18;dwarf:18;#00
> $qRegisterInfo5d#db
> $name:ymm2;bitsize:256;offset:339;encoding:vector;format:vector-uint8;set:Floating
> Point Registers;gcc:19;dwarf:19;#00
> $qRegisterInfo5e#dc
> $name:ymm3;bitsize:256;offset:371;encoding:vector;format:vector-uint8;set:Floating
> Point Registers;gcc:20;dwarf:20;#00
> $qRegisterInfo5f#dd
> $name:ymm4;bitsize:256;offset:403;encoding:vector;format:vector-uint8;set:Floating
> Point Registers;gcc:21;dwarf:21;#00
> $qRegisterInfo60#a8
> $n...
2014 Feb 20
2
[LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?
Thank you, Clayton. This is very helpful.
We use the LLDB specific GDB remote extensions, and our debugger server
supports "qRegisterInfo" package. "reg 0x3c" is the frame pointer.
In the example mentioned above, we have SP = FP - 40 for current call frame.
And variable "a" is stored at address (FP + -24) from asm instruction [FP +
-24] = R3;;
Thus we can conclude