thr3ads.net - search: "shufpd"

2008 Nov 17

2

[LLVMdev] Patterns with Multiple Stores

I want to write a pattern that looks something like this: def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, (MOVSDmr addr:$dst, FR64:$src))), imm:3) So I want to convert an unaligned vector store to a scalar store, a shuffle and a scalar store. There are several question I have: - Is the imm:3 syntax correct? Basically I want to hard-code the shuffle mask - The first MOVS...

[LLVMdev] Patterns with Multiple Stores

2008 Nov 17

0

[LLVMdev] Patterns with Multiple Stores

On Monday 17 November 2008 14:28, David Greene wrote: > I want to write a pattern that looks something like this: > > def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), > (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, > (MOVSDmr addr:$dst, FR64:$src))), imm:3) > > So I want to convert an unaligned vector store to a scalar store, a shuffle > and a scalar store. I got a little further with this: def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), (MOVSDmr...

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 27

3

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

...[4xfloat]. The instruction: add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz Explaination: '.a' is a writemask. only the specified component will be update '.xxyy' and '.zzzz' are swizzle masks, specify the component permutation, simliar to the Intel SSE permutation instruction SHUFPD '_bias' and '_x2' are modifiers. they modify the value of source operands and send the modified values to the adder. '_bias' = source - 0.5, '_x2' = source * 2 '_sat' is an instruction modifier. when specified, it saturates (or clamps) the instruction resul...

[LLVMdev] Patterns with Multiple Stores

2008 Nov 18

1

[LLVMdev] Patterns with Multiple Stores

...008, at 3:50 PM, David Greene wrote: > On Monday 17 November 2008 14:28, David Greene wrote: >> I want to write a pattern that looks something like this: >> >> def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), >> (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri >> (VR128:$src, >> (MOVSDmr addr:$dst, FR64:$src))), imm:3) >> >> So I want to convert an unaligned vector store to a scalar store, a >> shuffle >> and a scalar store. > > I got a little further with this: > > def : Pat<(unalignedst...

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 29

0

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

..._sat r0.a, r1_bias.xxyy, r3_x2.zzzz > > Explaination: > > '.a' is a writemask. only the specified component will be update > > '.xxyy' and '.zzzz' are swizzle masks, specify the component > permutation, simliar to the Intel SSE permutation instruction SHUFPD > > '_bias' and '_x2' are modifiers. they modify the value of source > operands and send the modified values to the adder. '_bias' = source - > 0.5, '_x2' = source * 2 > > '_sat' is an instruction modifier. when specified, it saturates (or...

[LLVMdev] TableGen Hacking Help

2008 Oct 20

2

[LLVMdev] TableGen Hacking Help

...I've hacked tblgen to handle patterns like this: let AddedComplexity = 40 in { def : Pat<(vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:$src1))), (v2f64 (scalar_to_vector (loadf64 addr:$src2))), SHUFP_shuffle_mask:$sm), (SHUFPDrri (MOVSD2PDrm addr:$src1), (MOVSD2PDrm addr:$src2), SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; } // AddedComplexity I believe the problem with the tblgen in trunk is that it doesn't know how to support patterns with two memory operan...

[LLVMdev] Making Sense of ISel DAG Output

2008 Oct 07

2

[LLVMdev] Making Sense of ISel DAG Output

...e following pattern: let AddedComplexity = 40 in { def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr: $src1))), (v2f64 (scalar_to_vector (loadf64 addr: $src2))), SHUFP_shuffle_mask:$sm)), (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)), (v2f64 (MOVSD2PDrm addr:$src2)), SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; } // AddedComplexity After much hacking of tblgen, I finally convinced it to generate some somewhat-seemingly-reasonably-corre...

[LLVMdev] Making Sense of ISel DAG Output

2008 Oct 07

0

[LLVMdev] Making Sense of ISel DAG Output

...ty = 40 in { > def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 > addr: > $src1))), > (v2f64 (scalar_to_vector (loadf64 > addr: > $src2))), > SHUFP_shuffle_mask:$sm)), > (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)), > (v2f64 (MOVSD2PDrm addr:$src2)), > SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>; > } // AddedComplexity > > After much hacking of tblgen, I finally convinced it to generate some > somewhat-...

[LLVMdev] Making Sense of ISel DAG Output

2008 Oct 03

0

[LLVMdev] Making Sense of ISel DAG Output

On Fri, October 3, 2008 9:10 am, David Greene wrote: > On Thursday 02 October 2008 19:32, Dan Gohman wrote: > >> Looking at your dump() output above, it looks like the pre-selection >> loads have multiple uses, so even though you've managed to match a >> larger pattern that incorporates them, they still need to exist to >> satisfy some other users. > > Yes,

[LLVMdev] Making Sense of ISel DAG Output

2008 Oct 03

3

[LLVMdev] Making Sense of ISel DAG Output

On Thursday 02 October 2008 19:32, Dan Gohman wrote: > Looking at your dump() output above, it looks like the pre-selection > loads have multiple uses, so even though you've managed to match a > larger pattern that incorporates them, they still need to exist to > satisfy some other users. Yes, I looked at that too. It looks like these other uses end up being chains to

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

0

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...tr [esp+4B0h],xmm3 002B0115 movaps xmmword ptr [esp+4A0h],xmm4 002B011D movaps xmm0,xmmword ptr [esp+4C0h] 002B0125 movaps xmm1,xmmword ptr [esp+4B0h] 002B012D movaps xmm2,xmmword ptr [esp+4A0h] 002B0135 movaps xmm3,xmm1 002B0138 movaps xmm4,xmm1 002B013B shufpd xmm4,xmm4,0 002B0140 movaps xmm5,xmmword ptr [esp+4D0h] 002B0148 subpd xmm5,xmm4 002B014C xorps xmm6,xmm6 002B014F mulpd xmm4,xmm6 002B0153 xorps xmm7,xmm7 002B0156 movaps xmmword ptr [esp+490h],xmm0 002B015E movaps xmm0,xmmword ptr [esp+4F0...

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 01

2

RFC: A proposal for vectorizing loops with calls to math functions using SVML

...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp) # 16-byte Spill movapd 16(%rsp), %xmm0 # 16-byte Reload shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0] callq sinf movaps 32(%rsp), %xmm1 # 16-byte Reload unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 04

2

RFC: A proposal for vectorizing loops with calls to math functions using SVML

...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp) # 16-byte Spill movapd 16(%rsp), %xmm0 # 16-byte Reload shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0] callq sinf movaps 32(%rsp), %xmm1 # 16-byte Reload unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

4

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

search for: shufpd