Displaying 14 results from an estimated 14 matches for "vpshufd".
Did you mean:
vpshufb
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...2. There are cases where we no longer fold a vector load in one of
the operands of a shuffle.
This is an example:
vmovaps 320(%rsp), %xmm0
vshufps $-27, %xmm0, %xmm0, %xmm0 # %xmm0 = %xmm0[1,1,2,3]
Before, we used to emit the following sequence:
# 16-byte Folded reload.
vpshufd $1, 320(%rsp), %xmm0 # %xmm0 = mem[1,0,0,0]
Note: the reason why the shuffle masks are different but still valid
is because the upper bits in %xmm0 are unused. Later on, the code uses
register %xmm0 in a 'vcvtss2sd' instruction; only the lower 32-bits of
%xmm0 have a meaning in this c...
2017 Aug 04
2
Status of llvm.experimental.vector.reduce.* intrinsics
...> Hi Renato,
>
> just to make it clear, I didn't implement reductions on x86_64 they just
> worked when I tried to lower an
> llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A shuffle pattern
> is generated for the intrinsic.
>
> vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
> vpor %xmm1, %xmm0, %xmm0
> vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3]
> vpor %xmm1, %xmm0, %xmm0
> vpsrld $16, %xmm0, %xmm1
> vpor %xmm1, %xmm0, %xm...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...uentin,
>
> If AVX is available I would expect the vpermilps/vpermilpd instruction to
> be used for all float/double single vector shuffles, especially as it can
> deal with the folded load case as well - this would avoid the integer/float
> execution domain transfer issue with using vpshufd.
>
Yes, this is the obvious solution to folding memory loads. It just isn't
implemented yet.
Well, actually it is, but I haven't finished writing tests for it. =]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-de...
2017 Aug 04
3
Status of llvm.experimental.vector.reduce.* intrinsics
...39;t implement reductions on
> x86_64 they just
> > worked when I tried to lower an
> > llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
> shuffle pattern
> > is generated for the intrinsic.
> >
> > vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
> > vpor %xmm1, %xmm0, %xmm0
> > vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3]
> > vpor %xmm1, %xmm0, %xmm0
> > vpsrld $16, %xmm0, %xmm1
>...
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...m_mem_1[0,0]
> vshufps $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
> = xmm0[0,2],xmm_mem_1[1,2]
>
>
> I also see a lot of somewhat neutral (focusing on Haswell for now)
> domain changes such as (xmm5 and 0 are initially integers, and are
> dead after the store):
> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
> vmovdqu %xmm0, 0x20(%rax)
> turning into:
> vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xm...
2017 Aug 03
2
Status of llvm.experimental.vector.reduce.* intrinsics
Hi Amara,
thank you for the clarification. I tested the intrinsics x86_64 and it
seemed to work pretty well. Looking forward to try this intrinsics with
the AArch64 backend. Maybe I find the time to look into codegen to get
this intrinsics out of experimental stage. They seem pretty useful.
Cheers,
Michael
-----Original Message-----
From: Amara Emerson [amara.emerson at gmail.com]
Received:
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...%xmm0, %xmm0 ## xmm0
>>> = xmm0[0,2],xmm_mem_1[1,2]
>>>
>>>
>>> I also see a lot of somewhat neutral (focusing on Haswell for now)
>>> domain changes such as (xmm5 and 0 are initially integers, and are
>>> dead after the store):
>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>> vmovdqu %xmm0, 0x20(%rax)
>>> turning into:
>>> vshufps $0x2, %x...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...s $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>> = xmm0[0,2],xmm_mem_1[1,2]
>>
>>
>> I also see a lot of somewhat neutral (focusing on Haswell for now)
>> domain changes such as (xmm5 and 0 are initially integers, and are
>> dead after the store):
>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>> vmovdqu %xmm0, 0x20(%rax)
>> turning into:
>> vshufps $0x2, %xmm5, %xmm0, %xmm0 ##...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...;>> = xmm0[0,2],xmm_mem_1[1,2]
>>>>
>>>>
>>>> I also see a lot of somewhat neutral (focusing on Haswell for now)
>>>> domain changes such as (xmm5 and 0 are initially integers, and are
>>>> dead after the store):
>>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2]
>>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>> vmovdqu %xmm0, 0x20(%rax)
>>>> turning into:
>>>> vs...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...;>>> zmm1 gets the value of element 5 of zmm5, etc.
>>>>
>>>>
>>>>> * vpaddd zmm0, zmm0, zmm1*
>>>>> * vshufi64x2 zmm1, zmm0, zmm0, 1 # zmm1 = zmm0[2,3,0,1,0,1,0,1]*
>>>>> * vpaddd zmm0, zmm0, zmm1*
>>>>> * vpshufd zmm1, zmm0, 238 # zmm1 =
>>>>> zmm0[2,3,2,3,6,7,6,7,10,11,10,11,14,15,14,15]*
>>>>> * vpaddd zmm0, zmm0, zmm1*
>>>>> * vpshufd zmm1, zmm0, 229 # zmm1 =
>>>>> zmm0[1,1,2,3,5,5,6,7,9,9,10,11,13,13,14,15]*
>>>>>...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...e:
>
> > If AVX is available I would expect the vpermilps/vpermilpd instruction
> to be used for all float/double single vector shuffles, especially as it
> can deal with the folded load case as well - this would avoid the
> integer/float execution domain transfer issue with using vpshufd.
> >
> > Yes, this is the obvious solution to folding memory loads. It just isn't
> implemented yet.
> >
> > Well, actually it is, but I haven't finished writing tests for it. =]
>
> Thanks Chandler - vpermilps/vpermilpd generation looks great now.
>
>...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...#39;s hard to test for the ISEL condition because CodeGen optimizes
>> ; away the bugpointed code. Just ensure the basics are still there.
>> ;CHECK-LABEL: func:
>> -;CHECK: vxorps
>> -;CHECK: vinsertf128
>> +;CHECK: vpxor
>> +;CHECK: vinserti128
>> ;CHECK: vpshufd
>> ;CHECK: vpshufd
>> ;CHECK: vmulps
>>
>> Modified: llvm/trunk/test/CodeGen/X86/3addr-16bit.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/3addr-16bit.ll?rev=192750&r1=192749&r2=192750&view=diff
>> =======================...