Displaying 17 results from an estimated 17 matches for "vinsertps".
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
> xmm4[0,1],xmm1[2],x...
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...t; On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>>
>>> Unfortunately, another team, while doing internal testing has seen the
>>> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...t;>
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks. A sample here:
>>
>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...o:rob.lougher at gmail.com>>
>>>> wrote:
>>>>>
>>>>> Unfortunately, another team, while doing internal testing has seen the
>>>>> new path generating illegal insertps masks. A sample here:
>>>>>
>>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>>>> vinsertps $416, %xmm1, %xm...
2014 Sep 04
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Greetings all,
As you may have noticed, there is a new vector shuffle lowering path in the
X86 backend. You can try it out with the
'-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm
-x86-experimental-vector-shuffle-lowering' to clang. Please test it out!
There may be some correctness bugs, I'm still fuzz testing it to shake them
out. But I expect fairly few
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...tor <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,
i32 5, i32 2, i32 7>
ret <4 x float> %1
}
;;;
llc (-mcpu=corei7-avx):
vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
3) When a shuffle performs an insert at index 0 we always generate an
insertps, while a movss would do a better job.
;;;
define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {
%1 = shufflevector <4 x float>...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...t;__dso_handle+0x28>
40051d: vpsubd %xmm1,%xmm0,%xmm0
400521: vmovq %xmm0,%rax
400526: movslq %eax,%rcx
400529: sar $0x20,%rax
40052d: vpextrq $0x1,%xmm0,%rdx
400533: movslq %edx,%rsi
400536: sar $0x20,%rdx
40053a: vmovss 0x4006c0(,%rcx,4),%xmm0
400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0
40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0
400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0
<__dso_handle+0x38>
40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r...
2020 Aug 31
2
Vectorization of math function failed?
...00 00 callq 30 <_Z4fct1Dv4_f+0x30>
30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp)
36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0
3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42>
42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp),%xmm1
48: c4 e3 71 21 4c 24 20 vinsertps $0x10,0x20(%rsp),%xmm1,%xmm1
4f: 10
50: c4 e3 71 21 4c 24 10 vinsertps $0x20,0x10(%rsp),%xmm1,%xmm1
57: 20
58: c4 e3 71 21 c0 30 vinsertps $0x30,%xmm0,%xmm1,%xmm0
5e: 48 83 c4 48 add $0x48,%rsp
62: c3 retq
63: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,...
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...5, i32 2, i32 7>
>> ret <4 x float> %1
>> }
>> ;;;
>>
>> llc (-mcpu=corei7-avx):
>> vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
>>
>> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
>>
>>
>> 3) When a shuffle performs an insert at index 0 we always generate an
>> insertps, while a movss would do a better job.
>> ;;;
>> define <4 x float> @baz(<4 x float> %A, <4 x f...
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...%xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2]
>
>
> Also, I see differences when some loads are shuffled, that I'm a bit
> conflicted about:
> vmovaps -0xXX(%rbp), %xmm3
> ...
> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = xmm4[3],xmm3[1,2,3]
> becomes:
> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
> ...
> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3]
>
> Note that the second version does the shuffle in...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>>> xmm3[0,2],xmm0[1,2]
>>>
>>>
>>> Also, I see differences when some loads are shuffled, that I'm a bit
>>> conflicted about:
>>> vmovaps -0xXX(%rbp), %xmm3
>>> ...
>>> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 =
>>> xmm4[3],xmm3[1,2,3]
>>> becomes:
>>> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
>>> ...
>>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 =
>>> xmm4[3],xmm2[1,2,3]...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...0]
>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>> xmm3[0,2],xmm0[1,2]
>>
>>
>> Also, I see differences when some loads are shuffled, that I'm a bit
>> conflicted about:
>> vmovaps -0xXX(%rbp), %xmm3
>> ...
>> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = xmm4[3],xmm3[1,2,3]
>> becomes:
>> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
>> ...
>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3]
>>
>> Note that the second ver...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...0 =
>>>> xmm3[0,2],xmm0[1,2]
>>>>
>>>>
>>>> Also, I see differences when some loads are shuffled, that I'm a bit
>>>> conflicted about:
>>>> vmovaps -0xXX(%rbp), %xmm3
>>>> ...
>>>> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 =
>>>> xmm4[3],xmm3[1,2,3]
>>>> becomes:
>>>> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>> ...
>>>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 =
>>>&...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...t;4 x i32> <i32 4,
> i32 5, i32 2, i32 7>
> ret <4 x float> %1
> }
> ;;;
>
> llc (-mcpu=corei7-avx):
> vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
>
> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
> vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]
>
>
> 3) When a shuffle performs an insert at index 0 we always generate an
> insertps, while a movss would do a better job.
> ;;;
> define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {
> %1...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...%B, <4 x i32> <i32 4,
>> i32 1, i32 2, i32 3>
>> ret <4 x float> %1
>> }
>> ;;;
>>
>> llc (-mcpu=corei7-avx):
>> vmovss %xmm1, %xmm0, %xmm0
>>
>> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]
>
>
> So, this is hard. I think we should do this in MC after register allocation
> because movss is the worst instruction ever: it switches from blending with
> the destination to zeroing the destination when the source switches f...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...mm, this is base + 8 * 8 + 4.
>> +; STRESS-NEXT: vmovss 68([[BASE]]), [[OUT_Imm:%xmm[0-9]+]]
>> +; Add high slice: out[out_start].imm, this is base + 4.
>> +; STRESS-NEXT: vaddss 4([[BASE]]), [[OUT_Imm]], [[RES_Imm:%xmm[0-9]+]]
>> ; Swap Imm and Real.
>> ; STRESS-NEXT: vinsertps $16, [[RES_Imm]], [[RES_Real]], [[RES_Vec:%xmm[0-9]+]]
>> ; Put the results back into out[out_start].
>> @@ -32,14 +32,14 @@
>> ;
>> ; Same for REGULAR, we eliminate register bank copy with each slices.
>> ; REGULAR-LABEL: t1:
>> -; Load out[out_start + 8].imm, t...