thr3ads.net - search: "fp4

Displaying 7 results from an estimated 7 matches for "fp4_".

2012 Jul 26

[LLVMdev] X86 FMA4

...9;s not obvious, but there is a significant scalar performance issue following the GCC intrinsics. Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3. Changing the intrinsics and patterns to...

[LLVMdev] X86 FMA4

2012 Jul 25

[LLVMdev] X86 FMA4

We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. Why is VFMADDSD4 defined with vector types? Is this simply because the gcc intrinsic uses vector types? It's quite unnatural if you have a compiler that generates FMAs as opposed to requiring user intrinsics. -Dave

[LLVMdev] X86 FMA4

2012 Jul 26

[LLVMdev] X86 FMA4

...e issue > following the GCC intrinsics. > > > > > >Let's look at the VFMADDSD pattern. We're operating on scalars with > undefineds as the remaining vector elements of the operands. This sounds > okay, but when one looks closer... > > > > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 > > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill > > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 > > > > > >The spill here is 16-bytes. But, we're only using the low 8-bytes of...

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

...alar performance issue following the GCC intrinsics. > > > > > >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... > > > > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 > > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill > > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 > > > > > >The spill here is 16-bytes. But, we're only using the low 8-bytes of...

[LLVMdev] X86 FMA4

2012 Jul 26

[LLVMdev] X86 FMA4

Because the intrinsics uses vector types (same as gcc). - Jan ----- Original Message ----- > From: "dag at cray.com" <dag at cray.com> > To: llvmdev at cs.uiuc.edu > Cc: > Sent: Wednesday, July 25, 2012 3:26 PM > Subject: [LLVMdev] X86 FMA4 > > We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. > > Why is VFMADDSD4

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

...CC intrinsics. >> > >> > >> >Let's look at the VFMADDSD pattern. We're operating on scalars with >> undefineds as the remaining vector elements of the operands. This sounds >> okay, but when one looks closer... >> > >> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 >> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill >> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 >> > >> > >> >The spill here is 16-bytes. But, we're only usin...

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

...ue following the GCC intrinsics. >> > >> > >> >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... >> > >> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 >> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill >> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 >> > >> > >> >The spill here is 16-bytes. But, we're only usin...

search for: fp4_