search for: vmovsd

Displaying 16 results from an estimated 16 matches for "vmovsd".

Did you mean: movsd
2011 Mar 18
2
[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?
...g a i7 MacBook Pro 2011. If I write: @g = global double 0.000000e+00 define i32 @main() { entry: %0 = load double* @g %1 = fmul double 1.000000e+06, %0 store double %1, double* @g ret i32 0 } in test.ll and I run > llc test.ll > gcc test.s I get: test.s:12:no such instruction: `vmovsd _g(%rip), %xmm0' test.s:13:no such instruction: `vmulsd LCPI0_0(%rip), %xmm0,%xmm0' test.s:14:no such instruction: `vmovsd %xmm0, _g(%rip)' I'm completely puzzled. Help? Thanks! N
2012 Jul 26
1
[LLVMdev] X86 FMA4
..., It's not obvious, but there is a significant scalar performance issue following the GCC intrinsics. Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3. Changing the intrinsics and patte...
2015 Oct 02
2
Register Spill Caused by the Reassociation pass
This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like
2012 Jul 25
6
[LLVMdev] X86 FMA4
We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. Why is VFMADDSD4 defined with vector types? Is this simply because the gcc intrinsic uses vector types? It's quite unnatural if you have a compiler that generates FMAs as opposed to requiring user intrinsics. -Dave
2011 Mar 18
0
[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?
...00e+06, %0 >>>  store double %1, double* @g >>>  ret i32 0 >>> } >>> >>> in test.ll and I run >>> >>>> llc test.ll >>>> gcc test.s >>> >>> I get: >>> >>> test.s:12:no such instruction: `vmovsd _g(%rip), %xmm0' >>> test.s:13:no such instruction: `vmulsd LCPI0_0(%rip), %xmm0,%xmm0' >>> test.s:14:no such instruction: `vmovsd %xmm0, _g(%rip)' >>> >>> I'm completely puzzled. Help? >>> >>> Thanks! >>> N >> &gt...
2012 Jul 26
0
[LLVMdev] X86 FMA4
...rformance issue > following the GCC intrinsics. > > > > > >Let's look at the VFMADDSD pattern. We're operating on scalars with > undefineds as the remaining vector elements of the operands. This sounds > okay, but when one looks closer... > > > > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 > > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill > > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 > > > > > >The spill here is 16-bytes. But, we're only using the low 8-by...
2014 Mar 12
2
[LLVMdev] Autovectorization questions
...e sample with $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive and loop body looks like: .LBB1_2: # %for.body # =>This Inner Loop Header: Depth=1 cltq vmovsd (%rsi,%rax,8), %xmm0 movq %r9, %r10 sarq $32, %r10 vaddsd (%rdi,%r10,8), %xmm0, %xmm0 vmovsd %xmm0, (%rdi,%r10,8) addq %r8, %r9 addl %ecx, %eax decl %edx jne .LBB1_2 so vector instructions for scalars (vaddsd, vm...
2012 Jul 27
2
[LLVMdev] X86 FMA4
...Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a safe assumption to make that vmovsd has the same stats as well. Michael On Jul 26, 2012, at 11:46 AM, Cameron McInally wrote: > Ah, bad example. This is a general problem for all (maybe most) SSE and AVX SS/SD patterns though, which...
2012 Jul 27
0
[LLVMdev] X86 FMA4
...s/etc > for loading/storing from memory. > > vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput > of 0.5. > vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, > 3 latency, with a reciprocal throughput of 1. > > He does not list vmovsd, but movsd has the same stats as vmovaps, so I > feel it is a safe assumption to make that vmovsd has the same stats as well. > > Michael > > On Jul 26, 2012, at 11:46 AM, Cameron McInally wrote: > > Ah, bad example. This is a general problem for all (maybe most) SSE and > A...
2014 Mar 12
4
[LLVMdev] Autovectorization questions
...-march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive >> >> and loop body looks like: >> >> .LBB1_2: # %for.body >> # =>This Inner Loop Header: Depth=1 >> cltq >> vmovsd (%rsi,%rax,8), %xmm0 >> movq %r9, %r10 >> sarq $32, %r10 >> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >> vmovsd %xmm0, (%rdi,%r10,8) >> addq %r8, %r9 >> addl %ecx, %eax >> decl %edx >>...
2012 Jul 26
0
[LLVMdev] X86 FMA4
Because the intrinsics uses vector types (same as gcc). - Jan ----- Original Message ----- > From: "dag at cray.com" <dag at cray.com> > To: llvmdev at cs.uiuc.edu > Cc: > Sent: Wednesday, July 25, 2012 3:26 PM > Subject: [LLVMdev] X86 FMA4 > > We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. > > Why is VFMADDSD4
2012 Jul 27
3
[LLVMdev] X86 FMA4
...ge for vmovaps/etc for loading/storing from memory. > > vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. > vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. > > He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a safe assumption to make that vmovsd has the same stats as well. > > Michael > > On Jul 26, 2012, at 11:46 AM, Cameron McInally wrote: > >> Ah, bad example. This is a general problem for all (maybe most) SSE and AVX...
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
...ed\n"); + printf("%-40s", "Testing movsd %xmm5,(%ecx)..."); memset(res, 0x77, 64); memset(res + 10, 0x66, 8); @@ -683,6 +762,59 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing vmovsd %xmm5,(%ecx)..."); + memset(res, 0x88, 64); + memset(res + 10, 0x77, 8); + if ( stack_exec && cpu_has_avx ) + { + extern const unsigned char vmovsd_to_mem[]; + + asm volatile ( "vbroadcastsd %0, %%ymm5\n" + ".pushsection .t...
2014 Oct 07
4
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
...I0_0: .quad -4620693217682128896 # double -0.5 .LCPI0_1: .quad -9223372036854775808 # double -0 .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: vmovsd g, %xmm0 vmulsd .LCPI0_0, %xmm0, %xmm0 vucomisd .LCPI0_1, %xmm0 sete %al movzbl %al, %eax ret .Ltmp0: .size main, .Ltmp0-main .cfi_endproc .type g, at object # @g .section .rodata,"a", a...
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 9, 2014, at 1:47 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Tue, Sep 9, 2014 at 12:53 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: > Hi Chandler, > > I had observed some improvements and regressions with the new lowering. > > Here are the numbers for an Ivy Bridge machine fixed at
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...============================================================================= >> --- llvm/trunk/test/CodeGen/X86/chain_order.ll (original) >> +++ llvm/trunk/test/CodeGen/X86/chain_order.ll Tue Oct 15 18:33:07 2013 >> @@ -3,8 +3,8 @@ >> ;CHECK-LABEL: cftx020: >> ;CHECK: vmovsd (%rdi), %xmm{{.*}} >> ;CHECK: vmovsd 16(%rdi), %xmm{{.*}} >> -;CHECK: vmovhpd 8(%rdi), %xmm{{.*}} >> ;CHECK: vmovsd 24(%rdi), %xmm{{.*}} >> +;CHECK: vmovhpd 8(%rdi), %xmm{{.*}} >> ;CHECK: vmovupd %xmm{{.*}}, (%rdi) >> ;CHECK: vmovupd %xmm{{.*}}, 16(%rdi) &gt...