thr3ads.net - search: "unpcklps"

2009 Nov 20

1

[LLVMdev] Spilling & UNPCKLPS Question

I'm working on adding some more annotations to asm and I cam across this odd construct generated for X86/split-vector-rem.ll: movss %xmm0, 32(%rsp) # Scalar Spill [...] unpcklps 48(%rsp), %xmm0 # Vector Folded Reload [...] movaps %xmm0, 16(%rsp) # Vector Spill [...] unpcklps 32(%rsp), %xmm0 # Vector Folded Reload How is this possibly legal? First we spill %xmm0 (a 32-bit value) to a st...

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

2

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

...# %loop2 # =>This Inner Loop Header: Depth=1 movq offset_array3(,%rsi,8), %rdi movq offset_array2(,%rsi,8), %r10 movss -28(%rax), %xmm0 movss -8(%rax), %xmm1 movss -4(%rax), %xmm2 unpcklps %xmm0, %xmm2 # xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1] movss (%rax), %xmm0 unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps %xmm2, %xmm1 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1] addps (%r9), %xmm1...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...m1,%xmm0 4005ef: movq %xmm0,%rax 4005f4: movslq %eax,%rcx 4005f7: sar $0x20,%rax 4005fb: punpckhqdq %xmm0,%xmm0 4005ff: movq %xmm0,%rdx 400604: movslq %edx,%rsi 400607: sar $0x20,%rdx 40060b: movss 0x400740(,%rax,4),%xmm0 400614: movss 0x400740(,%rdx,4),%xmm1 40061d: unpcklps %xmm1,%xmm0 400620: movss 0x400740(,%rcx,4),%xmm1 400629: movss 0x400740(,%rsi,4),%xmm2 400632: unpcklps %xmm2,%xmm1 400635: unpcklps %xmm0,%xmm1 400638: mulps 0xf1(%rip),%xmm1 # 400730 <.LCPI0_3> 40063f: movaps %xmm1,0x1a1a(%rip) # 402060 <r> 400646: xor...

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

0

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

...# =>This Inner Loop Header: Depth=1 > movq offset_array3(,%rsi,8), %rdi > movq offset_array2(,%rsi,8), %r10 > movss -28(%rax), %xmm0 > movss -8(%rax), %xmm1 > movss -4(%rax), %xmm2 > unpcklps %xmm0, %xmm2 # xmm2 = > xmm2[0],xmm0[0],xmm2[1],xmm0[1] > movss (%rax), %xmm0 > unpcklps %xmm0, %xmm1 # xmm1 = > xmm1[0],xmm0[0],xmm1[1],xmm0[1] > unpcklps %xmm2, %xmm1 # xmm1 = > xmm1[0],xmm2[0],xmm1[1],xmm2[1] >...

[LLVMdev] stack alignment (again)

2008 Mar 29

5

[LLVMdev] stack alignment (again)

...curious about the state of stack alignment on x86. I noticed there are a few bugs outstanding on the issue. I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief. My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn't have changed alignment on the objects. The unpcklps instruction is coming from an ExtractElementInst or InsertElementInst. I can always hard code...

[LLVMdev] stack alignment (again)

2008 Mar 30

0

[LLVMdev] stack alignment (again)

...f stack alignment on x86. I noticed > there are a few bugs outstanding on the issue. I recently added > some code which had the effect of throwing an extra function > parameter on our stack at runtime, a 4 byte pointer. > > Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, > dword ptr [eps] cause grief. My AllocaInstr instructions are told > to be 16 byte aligned, so the addition of a 4-byte parameter > shouldn’t have changed alignment on the objects. Hi Chuck, I think the basic problem is that the stack pointer on windows/linux is not guara...

[LLVMdev] Vector troubles

2007 Sep 28

2

[LLVMdev] Vector troubles

...ret void } Assembler (intel format): 15c00010 83ec2c sub esp,2Ch 15c00013 8b442434 mov eax,dword ptr [esp+34h] 15c00017 f30f10400c movss xmm0,dword ptr [eax+0Ch] 15c0001c f30f104804 movss xmm1,dword ptr [eax+4] 15c00021 0f14c8 unpcklps xmm1,xmm0 15c00024 f30f104008 movss xmm0,dword ptr [eax+8] 15c00029 f30f1010 movss xmm2,dword ptr [eax] 15c0002d 0f14d0 unpcklps xmm2,xmm0 15c00030 0f14d1 unpcklps xmm2,xmm1 15c00033 0f291424 movaps xmmword ptr [esp],xmm2 ss:0023:0012f238=0012f2580122e...

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

2011 Feb 26

0

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

...e(VT) dl, VT, V1, V2, DAG); > > why would this not be: > > if (X86::isUNPCKLMask(SVOp)) > return SVOp; Ok, I discovered that Bruno did this in revisions 112934, 112942 and 113020 but the logs don't really make clear why. I did figure out that I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding patterns. Once I added them everything started working. I found this all very confusing because it appears there are now two ways to match certain shuffle instructions in .td files: one through the traditional shuffle operators like unpckl and shufp and another t...

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

2011 Feb 25

2

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

...this code: if (X86::isUNPCKLMask(SVOp)) getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG); why would this not be: if (X86::isUNPCKLMask(SVOp)) return SVOp; I'm trying to add support for VUNPCKL and am getting into trouble because the existing code ends up creating: VUNPCKLPS load load which is badness come selection time. Legalize doesn't get a chance to look below the target shuffle node to see that there are two memory operands. Back in the 2.7 days, we used to just return the shuffle as is if it was already legal. Why the change to create a target node?...

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 01

2

RFC: A proposal for vectorizing loops with calls to math functions using SVML

...byte Spill shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3] callq sinf movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload shufps $229, %xmm0, %xmm0 # xmm0 = xmm0[1,1,2,3] callq sinf unpcklps (%rsp), %xmm0 # 16-byte Folded Reload # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp)...

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 04

2

RFC: A proposal for vectorizing loops with calls to math functions using SVML

...byte Spill shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3] callq sinf movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload shufps $229, %xmm0, %xmm0 # xmm0 = xmm0[1,1,2,3] callq sinf unpcklps (%rsp), %xmm0 # 16-byte Folded Reload # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] movaps %xmm0, (%rsp) # 16-byte Spill movaps 16(%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp)...

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

2

[LLVMdev] SelectionDAG scalarizes vector operations.

.../VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types. Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ... Currently we promote vector elements to the wide...

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

0

[LLVMdev] SelectionDAG scalarizes vector operations.

...ctorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types. > Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ... > > Currently we promote vector elements...

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

2011 Feb 26

2

[LLVMdev] X86 LowerVECTOR_SHUFFLE Question

...t;> why would this not be: >> >> if (X86::isUNPCKLMask(SVOp)) >> return SVOp; > > Ok, I discovered that Bruno did this in revisions 112934, 112942 and > 113020 but the logs don't really make clear why. I did figure out that > I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding > patterns. Once I added them everything started working. > > I found this all very confusing because it appears there are now two > ways to match certain shuffle instructions in .td files: one through the > traditional shuffle operators like unpck...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

13

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly

[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT

2006 Apr 13

3

[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT

Here's what's left on Linux (GCC 4.1.0), after all updates that went into the branch: Running /proj/llvm/build/../llvm/test/Regression/CFrontend/dg.exp ... FAIL: /proj/llvm/build/../llvm/test/Regression/CFrontend/2004-02-12- LargeAggregateCopy.c.tr: gccas: /proj/llvm/build/../llvm/lib/VMCore/Function.cpp:266: unsigned int llvm::Function::getIntrinsicID() const: Assertion `0 &&

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

2

[LLVMdev] SelectionDAG scalarizes vector operations.

...VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types. > Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ... > > Currently we promote vector elements t...

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

0

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...; CHECK-NEXT: movapd %xmm0, (%eax) >> ; CHECK-NEXT: ret >> } >> @@ -48,7 +48,7 @@ define void @test3(<4 x float>* %res, <4 >> store <4 x float> %tmp13, <4 x float>* %res >> ret void >> ; CHECK: @test3 >> -; CHECK: unpcklps >> +; CHECK: unpcklps >> } >> >> define void @test4(<4 x float> %X, <4 x float>* %res) nounwind { >> @@ -85,9 +85,9 @@ define void @test6(<4 x float>* %res, <4 >> %tmp2 = shufflevector <4 x float> %tmp1, <4 x float>...

search for: unpcklps