search for: vaddp

Displaying 17 results from an estimated 17 matches for "vaddp".

Did you mean: vaddr
2014 Jun 25
2
[LLVMdev] problem with X86's AVX assembler?
Hi, I am trying to assemble below instruction with latest LLVM code, but fail. Am I doing something wrong, or is this a bug? $ echo "vaddps zmm7 {k6}, zmm2, zmm4, {rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl -show-encoding -x86-asm-syntax=intel .text <stdin>:1:31: error: unknown token in expression vaddps zmm7 {k6}, zmm2, zmm4, {rd-sae} ^ <stdin>:1:31: er...
2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
...Jun, > > On Jun 25, 2014, at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote: > > > Hi, > > > > I am trying to assemble below instruction with latest LLVM code, but > fail. Am I doing something wrong, or is this a bug? > > > > > > $ echo "vaddps zmm7 {k6}, zmm2, zmm4, > {rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl > -show-encoding -x86-asm-syntax=intel > > .text > > <stdin>:1:31: error: unknown token in expression > > vaddps zmm7 {k6}, zmm2, zmm4, {rd-sae} > >...
2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
...at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote: >> >> > Hi, >> > >> > I am trying to assemble below instruction with latest LLVM code, but >> fail. Am I doing something wrong, or is this a bug? >> > >> > >> > $ echo "vaddps zmm7 {k6}, zmm2, zmm4, >> {rd-sae}"|./Release+Asserts/bin/llvm-mc -assemble -triple=x86_64 -mcpu=knl >> -show-encoding -x86-asm-syntax=intel >> > .text >> > <stdin>:1:31: error: unknown token in expression >> > vaddps zmm7 {k6}, zmm2, zmm4, {rd...
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
...later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner Loop Header: Depth=1 vbroadcastss (%rsi,%rdx,4), %ymm0 vmulps (%rdi,%rcx), %ymm0, %ymm0 vaddps (%rax,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rax,%rcx) incq %rdx addq $32, %rcx cmpq $15, %rdx jle .LBB0_2 ----------------------- $ llc --version LLVM (http://llvm.org/): LLVM version 8.0.0 Optimized build. Default target: x86_64-unknown-linux-gnu Host CPU: skylake (llvm commit 198009ae8db...
2012 May 24
4
[LLVMdev] use AVX automatically if present
...# @_fun1 .cfi_startproc # BB#0: # %_L1 pushq %rbp .Ltmp2: .cfi_def_cfa_offset 16 .Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp4: .cfi_def_cfa_register %rbp vmovaps (%rdi), %ymm0 vaddps (%rsi), %ymm0, %ymm0 vmovaps %ymm0, (%rdi) popq %rbp vzeroupper ret .Ltmp5: .size _fun1, .Ltmp5-_fun1 .cfi_endproc .section ".note.GNU-stack","", at progbits I guess your answer is that I did not...
2012 May 24
0
[LLVMdev] use AVX automatically if present
...gt; # BB#0: # %_L1 > pushq %rbp > .Ltmp2: > .cfi_def_cfa_offset 16 > .Ltmp3: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > .Ltmp4: > .cfi_def_cfa_register %rbp > vmovaps (%rdi), %ymm0 > vaddps (%rsi), %ymm0, %ymm0 > vmovaps %ymm0, (%rdi) > popq %rbp > vzeroupper > ret > .Ltmp5: > .size _fun1, .Ltmp5-_fun1 > .cfi_endproc > > > .section ".note.GNU-stack","", at progbits > &gt...
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...BB#0: # %entry pushq %rbp movq %rsp, %rbp subq $64, %rsp vmovaps %xmm7, -32(%rbp) # 16-byte Spill vmovaps %xmm6, -16(%rbp) # 16-byte Spill vmovaps %ymm3, %ymm6 vmovaps %ymm2, %ymm7 vaddps %ymm7, %ymm0, %ymm0 vaddps %ymm6, %ymm1, %ymm1 callq foo vsubps %ymm7, %ymm0, %ymm0 vsubps %ymm6, %ymm1, %ymm1 vmovaps -16(%rbp), %xmm6 # 16-byte Reload vmovaps -32(%rbp), %xmm7 # 16-byte Reload addq $64, %rsp p...
2009 Dec 17
1
[LLVMdev] Merging AVX
...entirely with a set of patterns that covers all SIMD instructions. But that's going to be gradual so we need to maintain both as we go along. So these foundational templates need to be somewhere accessible to both sets of patterns. Then I'll start with a simple instruction like ADDPS/D / VADDPS/D. I will add all of the base templates needed to implement that and then add the pattern itself, replacing the various ADDPS/D patterns in X86InstrSSE..td We'll do instructions one by one until we're done. When we get to things like shuffles where we've identified major rewrites tha...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
...in the output it generates: > > > > $ llc -O3 -mcpu=skylake > > > > --------------------- > > .LBB0_2: # =>This Inner Loop Header: Depth=1 > > vbroadcastss (%rsi,%rdx,4), %ymm0 > > vmulps (%rdi,%rcx), %ymm0, %ymm0 > > vaddps (%rax,%rcx), %ymm0, %ymm0 > > vmovups %ymm0, (%rax,%rcx) > > incq %rdx > > addq $32, %rcx > > cmpq $15, %rdx > > jle .LBB0_2 > > ----------------------- > > > > $ llc --version > > LLVM (http://llvm.org/): > > LLVM version 8.0.0 > &g...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...ted assembly doesn't seem to >> support this assumption :-( >> >> >> main: >> .cfi_startproc >> xorl %eax, %eax >> xorl %esi, %esi >> .align 16, 0x90 >> .LBB0_1: >> vmovups (%r8,%rax), %xmm0 >> vaddps (%rcx,%rax), %xmm0, %xmm0 >> vmovups %xmm0, (%rdx,%rax) >> addq $4, %rsi >> addq $16, %rax >> cmpq $61, %rsi >> jb .LBB0_1 >> retq >> >> I played with -mcpu and -march switches without success. In any case, t...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...r, the generated assembly doesn't seem to support this assumption :-( >> >> >> main: >> .cfi_startproc >> xorl %eax, %eax >> xorl %esi, %esi >> .align 16, 0x90 >> .LBB0_1: >> vmovups (%r8,%rax), %xmm0 >> vaddps (%rcx,%rax), %xmm0, %xmm0 >> vmovups %xmm0, (%rdx,%rax) >> addq $4, %rsi >> addq $16, %rax >> cmpq $61, %rsi >> jb .LBB0_1 >> retq >> >> I played with -mcpu and -march switches without success. In any case, the ta...
2013 Dec 12
0
[LLVMdev] AVX code gen
...i_def_cfa_register %rbp xorl %eax, %eax .align 4, 0x90 LBB0_1: ## %vector.body ## =>This Inner Loop Header: Depth=1 vmovups (%rdx,%rax,4), %ymm0 vmulps (%rsi,%rax,4), %ymm0, %ymm0 vaddps (%rdi,%rax,4), %ymm0, %ymm0 vmovups %ymm0, (%rdi,%rax,4) addq $8, %rax cmpq $256, %rax ## imm = 0x100 jne LBB0_1 ## BB#2: ## %for.end popq %rbp vzeroupper ret .cfi_endproc $ ca...
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...aybe the YMM registers get used when lowering the IR to machine code. However, the generated assembly doesn't seem to support this assumption :-( main: .cfi_startproc xorl %eax, %eax xorl %esi, %esi .align 16, 0x90 .LBB0_1: vmovups (%r8,%rax), %xmm0 vaddps (%rcx,%rax), %xmm0, %xmm0 vmovups %xmm0, (%rdx,%rax) addq $4, %rsi addq $16, %rax cmpq $61, %rsi jb .LBB0_1 retq I played with -mcpu and -march switches without success. In any case, the target architecture should be detected with the -datalayout p...
2012 May 24
2
[LLVMdev] use AVX automatically if present
...> > pushq %rbp > > .Ltmp2: > > .cfi_def_cfa_offset 16 > > .Ltmp3: > > .cfi_offset %rbp, -16 > > movq %rsp, %rbp > > .Ltmp4: > > .cfi_def_cfa_register %rbp > > vmovaps (%rdi), %ymm0 > > vaddps (%rsi), %ymm0, %ymm0 > > vmovaps %ymm0, (%rdi) > > popq %rbp > > vzeroupper > > ret > > .Ltmp5: > > .size _fun1, .Ltmp5-_fun1 > > .cfi_endproc > > > > > > .section ".note....
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...lt;16 x float>* %y, align 16 >> %3 = fadd <16 x float> %2, %1 >> ret <16 x float> %3 >> @@ -43,21 +43,21 @@ define <16 x float> @testf16_inp(<16 x f >> ; preserved ymm6-ymm15 >> ; WIN64: testf16_regs >> ; WIN64: call >> -; WIN64: vaddps {{%ymm[6-7]}}, %ymm0, %ymm0 >> -; WIN64: vaddps {{%ymm[6-7]}}, %ymm1, %ymm1 >> +; WIN64: vaddps {{%ymm[6-7]}}, {{%ymm[0-1]}}, {{%ymm[0-1]}} >> +; WIN64: vaddps {{%ymm[6-7]}}, {{%ymm[0-1]}}, {{%ymm[0-1]}} >> ; WIN64: ret >> >> ; preserved ymm8-ymm15 >>...