Displaying 17 results from an estimated 17 matches for "vaddp".
Did you mean:
vaddr
2014 Jun 25
2
[LLVMdev] problem with X86's AVX assembler?
Hi,
I am trying to assemble below instruction with latest LLVM code, but fail.
Am I doing something wrong, or is this a bug?
$ echo "vaddps zmm7 {k6}, zmm2, zmm4,
{rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl
-show-encoding -x86-asm-syntax=intel
.text
<stdin>:1:31: error: unknown token in expression
vaddps zmm7 {k6}, zmm2, zmm4, {rd-sae}
^
<stdin>:1:31: er...
2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
...Jun,
>
> On Jun 25, 2014, at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to assemble below instruction with latest LLVM code, but
> fail. Am I doing something wrong, or is this a bug?
> >
> >
> > $ echo "vaddps zmm7 {k6}, zmm2, zmm4,
> {rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl
> -show-encoding -x86-asm-syntax=intel
> > .text
> > <stdin>:1:31: error: unknown token in expression
> > vaddps zmm7 {k6}, zmm2, zmm4, {rd-sae}
> >...
2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
...at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I am trying to assemble below instruction with latest LLVM code, but
>> fail. Am I doing something wrong, or is this a bug?
>> >
>> >
>> > $ echo "vaddps zmm7 {k6}, zmm2, zmm4,
>> {rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl
>> -show-encoding -x86-asm-syntax=intel
>> > .text
>> > <stdin>:1:31: error: unknown token in expression
>> > vaddps zmm7 {k6}, zmm2, zmm4, {rd...
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
...later types) turning it into an
AVX2 FMA instructions. Here's the snippet in the output it generates:
$ llc -O3 -mcpu=skylake
---------------------
.LBB0_2: # =>This Inner Loop Header: Depth=1
vbroadcastss (%rsi,%rdx,4), %ymm0
vmulps (%rdi,%rcx), %ymm0, %ymm0
vaddps (%rax,%rcx), %ymm0, %ymm0
vmovups %ymm0, (%rax,%rcx)
incq %rdx
addq $32, %rcx
cmpq $15, %rdx
jle .LBB0_2
-----------------------
$ llc --version
LLVM (http://llvm.org/):
LLVM version 8.0.0
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
(llvm commit 198009ae8db...
2012 May 24
4
[LLVMdev] use AVX automatically if present
...# @_fun1
.cfi_startproc
# BB#0: # %_L1
pushq %rbp
.Ltmp2:
.cfi_def_cfa_offset 16
.Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp4:
.cfi_def_cfa_register %rbp
vmovaps (%rdi), %ymm0
vaddps (%rsi), %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
popq %rbp
vzeroupper
ret
.Ltmp5:
.size _fun1, .Ltmp5-_fun1
.cfi_endproc
.section ".note.GNU-stack","", at progbits
I guess your answer is that I did not...
2012 May 24
0
[LLVMdev] use AVX automatically if present
...gt; # BB#0: # %_L1
> pushq %rbp
> .Ltmp2:
> .cfi_def_cfa_offset 16
> .Ltmp3:
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> .Ltmp4:
> .cfi_def_cfa_register %rbp
> vmovaps (%rdi), %ymm0
> vaddps (%rsi), %ymm0, %ymm0
> vmovaps %ymm0, (%rdi)
> popq %rbp
> vzeroupper
> ret
> .Ltmp5:
> .size _fun1, .Ltmp5-_fun1
> .cfi_endproc
>
>
> .section ".note.GNU-stack","", at progbits
>
>...
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...BB#0: # %entry
pushq %rbp
movq %rsp, %rbp
subq $64, %rsp
vmovaps %xmm7, -32(%rbp) # 16-byte Spill
vmovaps %xmm6, -16(%rbp) # 16-byte Spill
vmovaps %ymm3, %ymm6
vmovaps %ymm2, %ymm7
vaddps %ymm7, %ymm0, %ymm0
vaddps %ymm6, %ymm1, %ymm1
callq foo
vsubps %ymm7, %ymm0, %ymm0
vsubps %ymm6, %ymm1, %ymm1
vmovaps -16(%rbp), %xmm6 # 16-byte Reload
vmovaps -32(%rbp), %xmm7 # 16-byte Reload
addq $64, %rsp
p...
2009 Dec 17
1
[LLVMdev] Merging AVX
...entirely with a set of patterns that covers all SIMD instructions. But
that's going to be gradual so we need to maintain both as we go along.
So these foundational templates need to be somewhere accessible to
both sets of patterns.
Then I'll start with a simple instruction like ADDPS/D / VADDPS/D. I will add
all of the base templates needed to implement that and then add the
pattern itself, replacing the various ADDPS/D patterns in X86InstrSSE..td
We'll do instructions one by one until we're done.
When we get to things like shuffles where we've identified major rewrites
tha...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
...in the output it generates:
> >
> > $ llc -O3 -mcpu=skylake
> >
> > ---------------------
> > .LBB0_2: # =>This Inner Loop Header: Depth=1
> > vbroadcastss (%rsi,%rdx,4), %ymm0
> > vmulps (%rdi,%rcx), %ymm0, %ymm0
> > vaddps (%rax,%rcx), %ymm0, %ymm0
> > vmovups %ymm0, (%rax,%rcx)
> > incq %rdx
> > addq $32, %rcx
> > cmpq $15, %rdx
> > jle .LBB0_2
> > -----------------------
> >
> > $ llc --version
> > LLVM (http://llvm.org/):
> > LLVM version 8.0.0
> &g...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...ted assembly doesn't seem to
>> support this assumption :-(
>>
>>
>> main:
>> .cfi_startproc
>> xorl %eax, %eax
>> xorl %esi, %esi
>> .align 16, 0x90
>> .LBB0_1:
>> vmovups (%r8,%rax), %xmm0
>> vaddps (%rcx,%rax), %xmm0, %xmm0
>> vmovups %xmm0, (%rdx,%rax)
>> addq $4, %rsi
>> addq $16, %rax
>> cmpq $61, %rsi
>> jb .LBB0_1
>> retq
>>
>> I played with -mcpu and -march switches without success. In any case, t...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...r, the generated assembly doesn't seem to support this assumption :-(
>>
>>
>> main:
>> .cfi_startproc
>> xorl %eax, %eax
>> xorl %esi, %esi
>> .align 16, 0x90
>> .LBB0_1:
>> vmovups (%r8,%rax), %xmm0
>> vaddps (%rcx,%rax), %xmm0, %xmm0
>> vmovups %xmm0, (%rdx,%rax)
>> addq $4, %rsi
>> addq $16, %rax
>> cmpq $61, %rsi
>> jb .LBB0_1
>> retq
>>
>> I played with -mcpu and -march switches without success. In any case, the ta...
2013 Dec 12
0
[LLVMdev] AVX code gen
...i_def_cfa_register %rbp
xorl %eax, %eax
.align 4, 0x90
LBB0_1: ## %vector.body
## =>This Inner Loop Header: Depth=1
vmovups (%rdx,%rax,4), %ymm0
vmulps (%rsi,%rax,4), %ymm0, %ymm0
vaddps (%rdi,%rax,4), %ymm0, %ymm0
vmovups %ymm0, (%rdi,%rax,4)
addq $8, %rax
cmpq $256, %rax ## imm = 0x100
jne LBB0_1
## BB#2: ## %for.end
popq %rbp
vzeroupper
ret
.cfi_endproc
$ ca...
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...aybe the YMM registers get used when
lowering the IR to machine code. However, the generated assembly doesn't
seem to support this assumption :-(
main:
.cfi_startproc
xorl %eax, %eax
xorl %esi, %esi
.align 16, 0x90
.LBB0_1:
vmovups (%r8,%rax), %xmm0
vaddps (%rcx,%rax), %xmm0, %xmm0
vmovups %xmm0, (%rdx,%rax)
addq $4, %rsi
addq $16, %rax
cmpq $61, %rsi
jb .LBB0_1
retq
I played with -mcpu and -march switches without success. In any case,
the target architecture should be detected with the -datalayout p...
2012 May 24
2
[LLVMdev] use AVX automatically if present
...> > pushq %rbp
> > .Ltmp2:
> > .cfi_def_cfa_offset 16
> > .Ltmp3:
> > .cfi_offset %rbp, -16
> > movq %rsp, %rbp
> > .Ltmp4:
> > .cfi_def_cfa_register %rbp
> > vmovaps (%rdi), %ymm0
> > vaddps (%rsi), %ymm0, %ymm0
> > vmovaps %ymm0, (%rdi)
> > popq %rbp
> > vzeroupper
> > ret
> > .Ltmp5:
> > .size _fun1, .Ltmp5-_fun1
> > .cfi_endproc
> >
> >
> > .section ".note....
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...lt;16 x float>* %y, align 16
>> %3 = fadd <16 x float> %2, %1
>> ret <16 x float> %3
>> @@ -43,21 +43,21 @@ define <16 x float> @testf16_inp(<16 x f
>> ; preserved ymm6-ymm15
>> ; WIN64: testf16_regs
>> ; WIN64: call
>> -; WIN64: vaddps {{%ymm[6-7]}}, %ymm0, %ymm0
>> -; WIN64: vaddps {{%ymm[6-7]}}, %ymm1, %ymm1
>> +; WIN64: vaddps {{%ymm[6-7]}}, {{%ymm[0-1]}}, {{%ymm[0-1]}}
>> +; WIN64: vaddps {{%ymm[6-7]}}, {{%ymm[0-1]}}, {{%ymm[0-1]}}
>> ; WIN64: ret
>>
>> ; preserved ymm8-ymm15
>>...