Displaying 18 results from an estimated 18 matches for "unpcklps".
2009 Nov 20
1
[LLVMdev] Spilling & UNPCKLPS Question
I'm working on adding some more annotations to asm and I
cam across this odd construct generated for X86/split-vector-rem.ll:
movss %xmm0, 32(%rsp) # Scalar Spill
[...]
unpcklps 48(%rsp), %xmm0 # Vector Folded Reload
[...]
movaps %xmm0, 16(%rsp) # Vector Spill
[...]
unpcklps 32(%rsp), %xmm0 # Vector Folded Reload
How is this possibly legal? First we spill %xmm0 (a 32-bit value)
to a st...
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
...# %loop2
# =>This Inner Loop Header: Depth=1
movq offset_array3(,%rsi,8), %rdi
movq offset_array2(,%rsi,8), %r10
movss -28(%rax), %xmm0
movss -8(%rax), %xmm1
movss -4(%rax), %xmm2
unpcklps %xmm0, %xmm2 # xmm2 =
xmm2[0],xmm0[0],xmm2[1],xmm0[1]
movss (%rax), %xmm0
unpcklps %xmm0, %xmm1 # xmm1 =
xmm1[0],xmm0[0],xmm1[1],xmm0[1]
unpcklps %xmm2, %xmm1 # xmm1 =
xmm1[0],xmm2[0],xmm1[1],xmm2[1]
addps (%r9), %xmm1...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...m1,%xmm0
4005ef: movq %xmm0,%rax
4005f4: movslq %eax,%rcx
4005f7: sar $0x20,%rax
4005fb: punpckhqdq %xmm0,%xmm0
4005ff: movq %xmm0,%rdx
400604: movslq %edx,%rsi
400607: sar $0x20,%rdx
40060b: movss 0x400740(,%rax,4),%xmm0
400614: movss 0x400740(,%rdx,4),%xmm1
40061d: unpcklps %xmm1,%xmm0
400620: movss 0x400740(,%rcx,4),%xmm1
400629: movss 0x400740(,%rsi,4),%xmm2
400632: unpcklps %xmm2,%xmm1
400635: unpcklps %xmm0,%xmm1
400638: mulps 0xf1(%rip),%xmm1 # 400730 <.LCPI0_3>
40063f: movaps %xmm1,0x1a1a(%rip) # 402060 <r>
400646: xor...
2015 Jul 29
0
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
...# =>This Inner Loop Header: Depth=1
> movq offset_array3(,%rsi,8), %rdi
> movq offset_array2(,%rsi,8), %r10
> movss -28(%rax), %xmm0
> movss -8(%rax), %xmm1
> movss -4(%rax), %xmm2
> unpcklps %xmm0, %xmm2 # xmm2 =
> xmm2[0],xmm0[0],xmm2[1],xmm0[1]
> movss (%rax), %xmm0
> unpcklps %xmm0, %xmm1 # xmm1 =
> xmm1[0],xmm0[0],xmm1[1],xmm0[1]
> unpcklps %xmm2, %xmm1 # xmm1 =
> xmm1[0],xmm2[0],xmm1[1],xmm2[1]
>...
2008 Mar 29
5
[LLVMdev] stack alignment (again)
...curious about the state of stack alignment on x86. I noticed
there are a few bugs outstanding on the issue. I recently added some
code which had the effect of throwing an extra function parameter on our
stack at runtime, a 4 byte pointer.
Esp is now not 16-byte aligned, so instructions like unpcklps xmm1,
dword ptr [eps] cause grief. My AllocaInstr instructions are told to be
16 byte aligned, so the addition of a 4-byte parameter shouldn't have
changed alignment on the objects.
The unpcklps instruction is coming from an ExtractElementInst or
InsertElementInst. I can always hard code...
2008 Mar 30
0
[LLVMdev] stack alignment (again)
...f stack alignment on x86. I noticed
> there are a few bugs outstanding on the issue. I recently added
> some code which had the effect of throwing an extra function
> parameter on our stack at runtime, a 4 byte pointer.
>
> Esp is now not 16-byte aligned, so instructions like unpcklps xmm1,
> dword ptr [eps] cause grief. My AllocaInstr instructions are told
> to be 16 byte aligned, so the addition of a 4-byte parameter
> shouldn’t have changed alignment on the objects.
Hi Chuck,
I think the basic problem is that the stack pointer on windows/linux
is not guara...
2007 Sep 28
2
[LLVMdev] Vector troubles
...ret void
}
Assembler (intel format):
15c00010 83ec2c sub esp,2Ch
15c00013 8b442434 mov eax,dword ptr [esp+34h]
15c00017 f30f10400c movss xmm0,dword ptr [eax+0Ch]
15c0001c f30f104804 movss xmm1,dword ptr [eax+4]
15c00021 0f14c8 unpcklps xmm1,xmm0
15c00024 f30f104008 movss xmm0,dword ptr [eax+8]
15c00029 f30f1010 movss xmm2,dword ptr [eax]
15c0002d 0f14d0 unpcklps xmm2,xmm0
15c00030 0f14d1 unpcklps xmm2,xmm1
15c00033 0f291424 movaps xmmword ptr [esp],xmm2
ss:0023:0012f238=0012f2580122e...
2011 Feb 26
0
[LLVMdev] X86 LowerVECTOR_SHUFFLE Question
...e(VT) dl, VT, V1, V2, DAG);
>
> why would this not be:
>
> if (X86::isUNPCKLMask(SVOp))
> return SVOp;
Ok, I discovered that Bruno did this in revisions 112934, 112942 and
113020 but the logs don't really make clear why. I did figure out that
I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding
patterns. Once I added them everything started working.
I found this all very confusing because it appears there are now two
ways to match certain shuffle instructions in .td files: one through the
traditional shuffle operators like unpckl and shufp and another t...
2011 Feb 25
2
[LLVMdev] X86 LowerVECTOR_SHUFFLE Question
...this code:
if (X86::isUNPCKLMask(SVOp))
getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG);
why would this not be:
if (X86::isUNPCKLMask(SVOp))
return SVOp;
I'm trying to add support for VUNPCKL and am getting into trouble
because the existing code ends up creating:
VUNPCKLPS
load
load
which is badness come selection time. Legalize doesn't get a chance to
look below the target shuffle node to see that there are two memory
operands.
Back in the 2.7 days, we used to just return the shuffle as is if it was
already legal. Why the change to create a target node?...
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...byte Spill
shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
callq sinf
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
shufps $229, %xmm0, %xmm0 # xmm0 = xmm0[1,1,2,3]
callq sinf
unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
callq sinf
movaps %xmm0, 32(%rsp)...
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...byte Spill
shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
callq sinf
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
shufps $229, %xmm0, %xmm0 # xmm0 = xmm0[1,1,2,3]
callq sinf
unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
# xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
movaps %xmm0, (%rsp) # 16-byte Spill
movaps 16(%rsp), %xmm0 # 16-byte Reload
callq sinf
movaps %xmm0, 32(%rsp)...
2012 Feb 08
2
[LLVMdev] SelectionDAG scalarizes vector operations.
.../VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types.
Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ...
Currently we promote vector elements to the wide...
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
...ctorOpLegalizer scalarize the code.
Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle.
This is possible because all of the inputs of the build vector come from two
values(src and (undef or zero)). Finally, the shuffle lowering code lowers the
new shuffle node into UNPCKLPS. This sequence should be optimal for all of the
sane types.
> Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ...
>
> Currently we promote vector elements...
2011 Feb 26
2
[LLVMdev] X86 LowerVECTOR_SHUFFLE Question
...t;> why would this not be:
>>
>> if (X86::isUNPCKLMask(SVOp))
>> return SVOp;
>
> Ok, I discovered that Bruno did this in revisions 112934, 112942 and
> 113020 but the logs don't really make clear why. I did figure out that
> I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding
> patterns. Once I added them everything started working.
>
> I found this all very confusing because it appears there are now two
> ways to match certain shuffle instructions in .td files: one through the
> traditional shuffle operators like unpck...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly
2006 Apr 13
3
[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT
Here's what's left on Linux (GCC 4.1.0), after all updates that went
into the branch:
Running /proj/llvm/build/../llvm/test/Regression/CFrontend/dg.exp ...
FAIL: /proj/llvm/build/../llvm/test/Regression/CFrontend/2004-02-12-
LargeAggregateCopy.c.tr:
gccas: /proj/llvm/build/../llvm/lib/VMCore/Function.cpp:266: unsigned
int llvm::Function::getIntrinsicID() const: Assertion `0 &&
2012 Feb 08
2
[LLVMdev] SelectionDAG scalarizes vector operations.
...VectorOpLegalizer scalarize the code.
Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle.
This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types.
> Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ...
>
> Currently we promote vector elements t...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...; CHECK-NEXT: movapd %xmm0, (%eax)
>> ; CHECK-NEXT: ret
>> }
>> @@ -48,7 +48,7 @@ define void @test3(<4 x float>* %res, <4
>> store <4 x float> %tmp13, <4 x float>* %res
>> ret void
>> ; CHECK: @test3
>> -; CHECK: unpcklps
>> +; CHECK: unpcklps
>> }
>>
>> define void @test4(<4 x float> %X, <4 x float>* %res) nounwind {
>> @@ -85,9 +85,9 @@ define void @test6(<4 x float>* %res, <4
>> %tmp2 = shufflevector <4 x float> %tmp1, <4 x float>...