Displaying 15 results from an estimated 15 matches for "vblendp".
Did you mean:
vblendps
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...re
> some raw observations, in case any of it rings a bell:
>
Very cool, and thanks for the analysis!
>
>
> Another problem I'm seeing is that in some cases we can't fold memory
> anymore:
> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
> vblendps $0x1, %xmm2, %xmm0, %xmm0
> becomes:
> vmovaps -0xXX(%rdx), %xmm2
> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2]
>
>
> Also, I see differences when some lo...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...> Very cool, and thanks for the analysis!
>>
>>
>>>
>>>
>>> Another problem I'm seeing is that in some cases we can't fold memory
>>> anymore:
>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>> becomes:
>>> vmovaps -0xXX(%rdx), %xmm2
>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>>> xmm3[0,2],xmm0[1,2]
>&g...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...it rings a bell:
>>
>
> Very cool, and thanks for the analysis!
>
>
>>
>>
>> Another problem I'm seeing is that in some cases we can't fold memory
>> anymore:
>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>> becomes:
>> vmovaps -0xXX(%rdx), %xmm2
>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>> xmm3[0,2],xmm0[1,2]
>>
>>
>>...
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...red into two shufps
instructions.
Example:
;;;
define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {
%1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,
i32 5, i32 2, i32 7>
ret <4 x float> %1
}
;;;
llc (-mcpu=corei7-avx):
vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]
llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]
vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]
2) On SSE4.1, we should try not to emit an...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...he analysis!
>>>
>>>
>>>>
>>>>
>>>> Another problem I'm seeing is that in some cases we can't fold memory
>>>> anymore:
>>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>>> becomes:
>>>> vmovaps -0xXX(%rdx), %xmm2
>>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 =
>>>> xmm2[3,0],xmm0[0,0]
>>>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>&g...
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...1, <4 x i32> <i32 4, i32 1,
> i32 6, i32 7>
> ret <4 x float> %2
> }
>
>
> llc -march=x86-64 -mattr=+avx test.ll -o -
>
> test: # @test
> vxorps %xmm2, %xmm2, %xmm2
> vmovss %xmm0, %xmm2, %xmm2
> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> retl
>
> test2: # @test2
> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> vxorp...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;4 x float> %2
>>> }
>>>
>>>
>>> llc -march=x86-64 -mattr=+avx test.ll -o -
>>>
>>> test: # @test
>>> vxorps %xmm2, %xmm2, %xmm2
>>> vmovss %xmm0, %xmm2, %xmm2
>>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>> retl
>>>
>>> test2: # @test2
>>> vinsertps $48, %xmm1, %xmm0, %xmm0 #...
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ternal codebase.
In one particular case I observed a slowdown (around 1%); here is what
I found when investigating on this slowdown.
1. With the new shuffle lowering, there is one case where we end up
producing the following sequence:
vmovss .LCPxx(%rip), %xmm1
vxorps %xmm0, %xmm0, %xmm0
vblendps $1, %xmm1, %xmm0, %xmm0
Before, we used to generate a simpler:
vmovss .LCPxx(%rip), %xmm1
In this particular case, the 'vblendps' is redundant since the vmovss
would zero the upper bits in %xmm1. I am not sure why we get this
poor-codegen with your new shuffle lowering. I will investi...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...> i32 6, i32 7>
>> ret <4 x float> %2
>> }
>>
>>
>> llc -march=x86-64 -mattr=+avx test.ll -o -
>>
>> test: # @test
>> vxorps %xmm2, %xmm2, %xmm2
>> vmovss %xmm0, %xmm2, %xmm2
>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>> retl
>>
>> test2: # @test2
>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xm...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;;
> define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {
> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,
> i32 5, i32 2, i32 7>
> ret <4 x float> %1
> }
> ;;;
>
> llc (-mcpu=corei7-avx):
> vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]
>
> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
> vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]
> vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]
>
>
> 2) On SS...
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...gt; @foo(<4 x float> %A, <4 x float> %B) {
>> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,
>> i32 5, i32 2, i32 7>
>> ret <4 x float> %1
>> }
>> ;;;
>>
>> llc (-mcpu=corei7-avx):
>> vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]
>>
>> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]
>> vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]
>>
&...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly
2018 Aug 06
2
[PATCH] D50328: [X86][SSE] Combine (some) target shuffles with multiple uses
[NOTE: Removed Phab and reviewers]
> ================
> Comment at: test/CodeGen/X86/2012-01-12-extract-sv.ll:12
> +; CHECK-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3]
> +; CHECK-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]
> ; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
> ----------------
> greened wrote:
>> Can we make this test less brittle by using FileCheck variables?
>> This goes for p...