search for: fpppp

Displaying 7 results from an estimated 7 matches for "fpppp".

2012 Jul 26
1
[LLVMdev] X86 FMA4
...is a significant scalar performance issue following the GCC intrinsics. Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3. Changing the intrinsics and patterns to accept scalar operands, we...
2012 Jul 25
6
[LLVMdev] X86 FMA4
We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. Why is VFMADDSD4 defined with vector types? Is this simply because the gcc intrinsic uses vector types? It's quite unnatural if you have a compiler that generates FMAs as opposed to requiring user intrinsics. -Dave
2012 Jul 26
0
[LLVMdev] X86 FMA4
...GCC intrinsics. > > > > > >Let's look at the VFMADDSD pattern. We're operating on scalars with > undefineds as the remaining vector elements of the operands. This sounds > okay, but when one looks closer... > > > > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 > > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill > > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 > > > > > >The spill here is 16-bytes. But, we're only using the low 8-bytes of > xmm3. Changing the in...
2012 Jul 27
2
[LLVMdev] X86 FMA4
...lowing the GCC intrinsics. > > > > > >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... > > > > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 > > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill > > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 > > > > > >The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3. Changing the intrins...
2012 Jul 26
0
[LLVMdev] X86 FMA4
Because the intrinsics uses vector types (same as gcc). - Jan ----- Original Message ----- > From: "dag at cray.com" <dag at cray.com> > To: llvmdev at cs.uiuc.edu > Cc: > Sent: Wednesday, July 25, 2012 3:26 PM > Subject: [LLVMdev] X86 FMA4 > > We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns. > > Why is VFMADDSD4
2012 Jul 27
0
[LLVMdev] X86 FMA4
...t; >> > >> >Let's look at the VFMADDSD pattern. We're operating on scalars with >> undefineds as the remaining vector elements of the operands. This sounds >> okay, but when one looks closer... >> > >> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 >> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill >> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 >> > >> > >> >The spill here is 16-bytes. But, we're only using the low 8-bytes of >&g...
2012 Jul 27
3
[LLVMdev] X86 FMA4
...nsics. >> > >> > >> >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer... >> > >> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647 >> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill >> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647 >> > >> > >> >The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3....