search for: shufpd

Displaying 14 results from an estimated 14 matches for "shufpd".

Did you mean: shufps
2008 Nov 17
2
[LLVMdev] Patterns with Multiple Stores
I want to write a pattern that looks something like this: def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, (MOVSDmr addr:$dst, FR64:$src))), imm:3) So I want to convert an unaligned vector store to a scalar store, a shuffle and a scalar store. There are several question I have: - Is the imm:3 syntax correct? Basically I want to hard-code the shuffle mask - The first MOVS...
2008 Nov 17
0
[LLVMdev] Patterns with Multiple Stores
On Monday 17 November 2008 14:28, David Greene wrote: > I want to write a pattern that looks something like this: > > def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), > (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, > (MOVSDmr addr:$dst, FR64:$src))), imm:3) > > So I want to convert an unaligned vector store to a scalar store, a shuffle > and a scalar store. I got a little further with this: def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), (MOVSDmr...
2005 Jul 27
3
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
...[4xfloat]. The instruction: add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz Explaination: '.a' is a writemask. only the specified component will be update '.xxyy' and '.zzzz' are swizzle masks, specify the component permutation, simliar to the Intel SSE permutation instruction SHUFPD '_bias' and '_x2' are modifiers. they modify the value of source operands and send the modified values to the adder. '_bias' = source - 0.5, '_x2' = source * 2 '_sat' is an instruction modifier. when specified, it saturates (or clamps) the instruction resul...
2008 Nov 18
1
[LLVMdev] Patterns with Multiple Stores
...008, at 3:50 PM, David Greene wrote: > On Monday 17 November 2008 14:28, David Greene wrote: >> I want to write a pattern that looks something like this: >> >> def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), >> (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri >> (VR128:$src, >> (MOVSDmr addr:$dst, FR64:$src))), imm:3) >> >> So I want to convert an unaligned vector store to a scalar store, a >> shuffle >> and a scalar store. > > I got a little further with this: > > def : Pat<(unalignedst...
2005 Jul 29
0
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
..._sat r0.a, r1_bias.xxyy, r3_x2.zzzz > > Explaination: > > '.a' is a writemask. only the specified component will be update > > '.xxyy' and '.zzzz' are swizzle masks, specify the component > permutation, simliar to the Intel SSE permutation instruction SHUFPD > > '_bias' and '_x2' are modifiers. they modify the value of source > operands and send the modified values to the adder. '_bias' = source - > 0.5, '_x2' = source * 2 > > '_sat' is an instruction modifier. when specified, it saturates (or...
2008 Oct 20
2
[LLVMdev] TableGen Hacking Help
...I've hacked tblgen to handle patterns like this: let AddedComplexity = 40 in { def : Pat<(vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:$src1))), (v2f64 (scalar_to_vector (loadf64 addr:$src2))), SHUFP_shuffle_mask:$sm), (SHUFPDrri (MOVSD2PDrm addr:$src1), (MOVSD2PDrm addr:$src2), SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; } // AddedComplexity I believe the problem with the tblgen in trunk is that it doesn't know how to support patterns with two memory operan...
2008 Oct 07
2
[LLVMdev] Making Sense of ISel DAG Output
...e following pattern: let AddedComplexity = 40 in { def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr: $src1))), (v2f64 (scalar_to_vector (loadf64 addr: $src2))), SHUFP_shuffle_mask:$sm)), (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)), (v2f64 (MOVSD2PDrm addr:$src2)), SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; } // AddedComplexity After much hacking of tblgen, I finally convinced it to generate some somewhat-seemingly-reasonably-corre...
2008 Oct 07
0
[LLVMdev] Making Sense of ISel DAG Output
...ty = 40 in { > def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 > addr: > $src1))), > (v2f64 (scalar_to_vector (loadf64 > addr: > $src2))), > SHUFP_shuffle_mask:$sm)), > (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)), > (v2f64 (MOVSD2PDrm addr:$src2)), > SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; > } // AddedComplexity > > After much hacking of tblgen, I finally convinced it to generate some > somewhat-...
2008 Oct 03
0
[LLVMdev] Making Sense of ISel DAG Output
On Fri, October 3, 2008 9:10 am, David Greene wrote: > On Thursday 02 October 2008 19:32, Dan Gohman wrote: > >> Looking at your dump() output above, it looks like the pre-selection >> loads have multiple uses, so even though you've managed to match a >> larger pattern that incorporates them, they still need to exist to >> satisfy some other users. > > Yes,
2008 Oct 03
3
[LLVMdev] Making Sense of ISel DAG Output
On Thursday 02 October 2008 19:32, Dan Gohman wrote: > Looking at your dump() output above, it looks like the pre-selection > loads have multiple uses, so even though you've managed to match a > larger pattern that incorporates them, they still need to exist to > satisfy some other users. Yes, I looked at that too. It looks like these other uses end up being chains to
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...tr [esp+4B0h],xmm3 002B0115 movaps xmmword ptr [esp+4A0h],xmm4 002B011D movaps xmm0,xmmword ptr [esp+4C0h] 002B0125 movaps xmm1,xmmword ptr [esp+4B0h] 002B012D movaps xmm2,xmmword ptr [esp+4A0h] 002B0135 movaps xmm3,xmm1 002B0138 movaps xmm4,xmm1 002B013B shufpd xmm4,xmm4,0 002B0140 movaps xmm5,xmmword ptr [esp+4D0h] 002B0148 subpd xmm5,xmm4 002B014C xorps xmm6,xmm6 002B014F mulpd xmm4,xmm6 002B0153 xorps xmm7,xmm7 002B0156 movaps xmmword ptr [esp+490h],xmm0 002B015E movaps xmm0,xmmword ptr [esp+4F0...
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp) # 16-byte Spill movapd 16(%rsp), %xmm0 # 16-byte Reload shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0] callq sinf movaps 32(%rsp), %xmm1 # 16-byte Reload unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp) # 16-byte Spill movapd 16(%rsp), %xmm0 # 16-byte Reload shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0] callq sinf movaps 32(%rsp), %xmm1 # 16-byte Reload unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think