Displaying 7 results from an estimated 7 matches for "fp4_".
2012 Jul 26
1
[LLVMdev] X86 FMA4
...9;s not obvious, but there is a significant scalar performance issue
following the GCC intrinsics.
Let's look at the VFMADDSD pattern. We're operating on scalars with
undefineds as the remaining vector elements of the operands. This sounds
okay, but when one looks closer...
vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
The spill here is 16-bytes. But, we're only using the low 8-bytes of
xmm3. Changing the intrinsics and patterns to...
2012 Jul 25
6
[LLVMdev] X86 FMA4
We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns.
Why is VFMADDSD4 defined with vector types? Is this simply because the
gcc intrinsic uses vector types? It's quite unnatural if you have a
compiler that generates FMAs as opposed to requiring user intrinsics.
-Dave
2012 Jul 26
0
[LLVMdev] X86 FMA4
...e issue
> following the GCC intrinsics.
> >
> >
> >Let's look at the VFMADDSD pattern. We're operating on scalars with
> undefineds as the remaining vector elements of the operands. This sounds
> okay, but when one looks closer...
> >
> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
> >
> >
> >The spill here is 16-bytes. But, we're only using the low 8-bytes of...
2012 Jul 27
2
[LLVMdev] X86 FMA4
...alar performance issue following the GCC intrinsics.
> >
> >
> >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer...
> >
> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
> >
> >
> >The spill here is 16-bytes. But, we're only using the low 8-bytes of...
2012 Jul 26
0
[LLVMdev] X86 FMA4
Because the intrinsics uses vector types (same as gcc).
- Jan
----- Original Message -----
> From: "dag at cray.com" <dag at cray.com>
> To: llvmdev at cs.uiuc.edu
> Cc:
> Sent: Wednesday, July 25, 2012 3:26 PM
> Subject: [LLVMdev] X86 FMA4
>
> We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns.
>
> Why is VFMADDSD4
2012 Jul 27
0
[LLVMdev] X86 FMA4
...CC intrinsics.
>> >
>> >
>> >Let's look at the VFMADDSD pattern. We're operating on scalars with
>> undefineds as the remaining vector elements of the operands. This sounds
>> okay, but when one looks closer...
>> >
>> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
>> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
>> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
>> >
>> >
>> >The spill here is 16-bytes. But, we're only usin...
2012 Jul 27
3
[LLVMdev] X86 FMA4
...ue following the GCC intrinsics.
>> >
>> >
>> >Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer...
>> >
>> > vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
>> > vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
>> > vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
>> >
>> >
>> >The spill here is 16-bytes. But, we're only usin...