Displaying 14 results from an estimated 14 matches for "shufpd".
Did you mean:
shufps
2008 Nov 17
2
[LLVMdev] Patterns with Multiple Stores
I want to write a pattern that looks something like this:
def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst),
(MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src,
(MOVSDmr addr:$dst, FR64:$src))), imm:3)
So I want to convert an unaligned vector store to a scalar store, a shuffle
and a scalar store.
There are several question I have:
- Is the imm:3 syntax correct? Basically I want to hard-code the shuffle mask
- The first MOVS...
2008 Nov 17
0
[LLVMdev] Patterns with Multiple Stores
On Monday 17 November 2008 14:28, David Greene wrote:
> I want to write a pattern that looks something like this:
>
> def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst),
> (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src,
> (MOVSDmr addr:$dst, FR64:$src))), imm:3)
>
> So I want to convert an unaligned vector store to a scalar store, a shuffle
> and a scalar store.
I got a little further with this:
def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst),
(MOVSDmr...
2005 Jul 27
3
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
...[4xfloat].
The instruction:
add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz
Explaination:
'.a' is a writemask. only the specified component will be update
'.xxyy' and '.zzzz' are swizzle masks, specify the component
permutation, simliar to the Intel SSE permutation instruction SHUFPD
'_bias' and '_x2' are modifiers. they modify the value of source
operands and send the modified values to the adder. '_bias' = source -
0.5, '_x2' = source * 2
'_sat' is an instruction modifier. when specified, it saturates (or
clamps) the instruction resul...
2008 Nov 18
1
[LLVMdev] Patterns with Multiple Stores
...008, at 3:50 PM, David Greene wrote:
> On Monday 17 November 2008 14:28, David Greene wrote:
>> I want to write a pattern that looks something like this:
>>
>> def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst),
>> (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri
>> (VR128:$src,
>> (MOVSDmr addr:$dst, FR64:$src))), imm:3)
>>
>> So I want to convert an unaligned vector store to a scalar store, a
>> shuffle
>> and a scalar store.
>
> I got a little further with this:
>
> def : Pat<(unalignedst...
2005 Jul 29
0
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
..._sat r0.a, r1_bias.xxyy, r3_x2.zzzz
>
> Explaination:
>
> '.a' is a writemask. only the specified component will be update
>
> '.xxyy' and '.zzzz' are swizzle masks, specify the component
> permutation, simliar to the Intel SSE permutation instruction SHUFPD
>
> '_bias' and '_x2' are modifiers. they modify the value of source
> operands and send the modified values to the adder. '_bias' = source -
> 0.5, '_x2' = source * 2
>
> '_sat' is an instruction modifier. when specified, it saturates (or...
2008 Oct 20
2
[LLVMdev] TableGen Hacking Help
...I've hacked tblgen to handle patterns like this:
let AddedComplexity = 40 in {
def : Pat<(vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:$src1))),
(v2f64 (scalar_to_vector (loadf64 addr:$src2))),
SHUFP_shuffle_mask:$sm),
(SHUFPDrri (MOVSD2PDrm addr:$src1),
(MOVSD2PDrm addr:$src2),
SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
} // AddedComplexity
I believe the problem with the tblgen in trunk is that it doesn't know how to
support patterns with two memory operan...
2008 Oct 07
2
[LLVMdev] Making Sense of ISel DAG Output
...e following pattern:
let AddedComplexity = 40 in {
def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:
$src1))),
(v2f64 (scalar_to_vector (loadf64 addr:
$src2))),
SHUFP_shuffle_mask:$sm)),
(SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)),
(v2f64 (MOVSD2PDrm addr:$src2)),
SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
} // AddedComplexity
After much hacking of tblgen, I finally convinced it to generate some
somewhat-seemingly-reasonably-corre...
2008 Oct 07
0
[LLVMdev] Making Sense of ISel DAG Output
...ty = 40 in {
> def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64
> addr:
> $src1))),
> (v2f64 (scalar_to_vector (loadf64
> addr:
> $src2))),
> SHUFP_shuffle_mask:$sm)),
> (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)),
> (v2f64 (MOVSD2PDrm addr:$src2)),
> SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
> } // AddedComplexity
>
> After much hacking of tblgen, I finally convinced it to generate some
> somewhat-...
2008 Oct 03
0
[LLVMdev] Making Sense of ISel DAG Output
On Fri, October 3, 2008 9:10 am, David Greene wrote:
> On Thursday 02 October 2008 19:32, Dan Gohman wrote:
>
>> Looking at your dump() output above, it looks like the pre-selection
>> loads have multiple uses, so even though you've managed to match a
>> larger pattern that incorporates them, they still need to exist to
>> satisfy some other users.
>
> Yes,
2008 Oct 03
3
[LLVMdev] Making Sense of ISel DAG Output
On Thursday 02 October 2008 19:32, Dan Gohman wrote:
> Looking at your dump() output above, it looks like the pre-selection
> loads have multiple uses, so even though you've managed to match a
> larger pattern that incorporates them, they still need to exist to
> satisfy some other users.
Yes, I looked at that too. It looks like these other uses end up being
chains to
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...tr [esp+4B0h],xmm3
002B0115 movaps xmmword ptr [esp+4A0h],xmm4
002B011D movaps xmm0,xmmword ptr [esp+4C0h]
002B0125 movaps xmm1,xmmword ptr [esp+4B0h]
002B012D movaps xmm2,xmmword ptr [esp+4A0h]
002B0135 movaps xmm3,xmm1
002B0138 movaps xmm4,xmm1
002B013B shufpd xmm4,xmm4,0
002B0140 movaps xmm5,xmmword ptr [esp+4D0h]
002B0148 subpd xmm5,xmm4
002B014C xorps xmm6,xmm6
002B014F mulpd xmm4,xmm6
002B0153 xorps xmm7,xmm7
002B0156 movaps xmmword ptr [esp+490h],xmm0
002B015E movaps xmm0,xmmword ptr [esp+4F0...
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
callq sinf
movaps %xmm0, 32(%rsp) # 16-byte Spill
movapd 16(%rsp), %xmm0 # 16-byte Reload
shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0]
callq sinf
movaps 32(%rsp), %xmm1 # 16-byte Reload
unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
callq sinf
movaps %xmm0, 32(%rsp) # 16-byte Spill
movapd 16(%rsp), %xmm0 # 16-byte Reload
shufpd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0]
callq sinf
movaps 32(%rsp), %xmm1 # 16-byte Reload
unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
unpcklps (%rsp), %xmm1 # 16-byte Folded Reload...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think