search for: vpshufd

Displaying 14 results from an estimated 14 matches for "vpshufd".

Did you mean: vpshufb
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...2. There are cases where we no longer fold a vector load in one of the operands of a shuffle. This is an example: vmovaps 320(%rsp), %xmm0 vshufps $-27, %xmm0, %xmm0, %xmm0 # %xmm0 = %xmm0[1,1,2,3] Before, we used to emit the following sequence: # 16-byte Folded reload. vpshufd $1, 320(%rsp), %xmm0 # %xmm0 = mem[1,0,0,0] Note: the reason why the shuffle masks are different but still valid is because the upper bits in %xmm0 are unused. Later on, the code uses register %xmm0 in a 'vcvtss2sd' instruction; only the lower 32-bits of %xmm0 have a meaning in this c...
2017 Aug 04
2
Status of llvm.experimental.vector.reduce.* intrinsics
...> Hi Renato, > > just to make it clear, I didn't implement reductions on x86_64 they just > worked when I tried to lower an > llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A shuffle pattern > is generated for the intrinsic. > > vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1] > vpor %xmm1, %xmm0, %xmm0 > vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3] > vpor %xmm1, %xmm0, %xmm0 > vpsrld $16, %xmm0, %xmm1 > vpor %xmm1, %xmm0, %xm...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...uentin, > > If AVX is available I would expect the vpermilps/vpermilpd instruction to > be used for all float/double single vector shuffles, especially as it can > deal with the folded load case as well - this would avoid the integer/float > execution domain transfer issue with using vpshufd. > Yes, this is the obvious solution to folding memory loads. It just isn't implemented yet. Well, actually it is, but I haven't finished writing tests for it. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-de...
2017 Aug 04
3
Status of llvm.experimental.vector.reduce.* intrinsics
...39;t implement reductions on > x86_64 they just > > worked when I tried to lower an > > llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A > shuffle pattern > > is generated for the intrinsic. > > > > vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1] > > vpor %xmm1, %xmm0, %xmm0 > > vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3] > > vpor %xmm1, %xmm0, %xmm0 > > vpsrld $16, %xmm0, %xmm1 &gt...
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...m_mem_1[0,0] > vshufps $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0 > = xmm0[0,2],xmm_mem_1[1,2] > > > I also see a lot of somewhat neutral (focusing on Haswell for now) > domain changes such as (xmm5 and 0 are initially integers, and are > dead after the store): > vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] > vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 > = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] > vmovdqu %xmm0, 0x20(%rax) > turning into: > vshufps $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xm...
2017 Aug 03
2
Status of llvm.experimental.vector.reduce.* intrinsics
Hi Amara, thank you for the clarification. I tested the intrinsics x86_64 and it seemed to work pretty well. Looking forward to try this intrinsics with the AArch64 backend. Maybe I find the time to look into codegen to get this intrinsics out of experimental stage. They seem pretty useful. Cheers, Michael -----Original Message----- From: Amara Emerson [amara.emerson at gmail.com] Received:
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...%xmm0, %xmm0 ## xmm0 >>> = xmm0[0,2],xmm_mem_1[1,2] >>> >>> >>> I also see a lot of somewhat neutral (focusing on Haswell for now) >>> domain changes such as (xmm5 and 0 are initially integers, and are >>> dead after the store): >>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >>> vmovdqu %xmm0, 0x20(%rax) >>> turning into: >>> vshufps $0x2, %x...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...s $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0 >> = xmm0[0,2],xmm_mem_1[1,2] >> >> >> I also see a lot of somewhat neutral (focusing on Haswell for now) >> domain changes such as (xmm5 and 0 are initially integers, and are >> dead after the store): >> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >> vmovdqu %xmm0, 0x20(%rax) >> turning into: >> vshufps $0x2, %xmm5, %xmm0, %xmm0 ##...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...;>> = xmm0[0,2],xmm_mem_1[1,2] >>>> >>>> >>>> I also see a lot of somewhat neutral (focusing on Haswell for now) >>>> domain changes such as (xmm5 and 0 are initially integers, and are >>>> dead after the store): >>>> vpshufd $-0x5c, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,2] >>>> vpalignr $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 >>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11] >>>> vmovdqu %xmm0, 0x20(%rax) >>>> turning into: >>>> vs...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...;>>> zmm1 gets the value of element 5 of zmm5, etc. >>>> >>>> >>>>> * vpaddd zmm0, zmm0, zmm1* >>>>> * vshufi64x2 zmm1, zmm0, zmm0, 1 # zmm1 = zmm0[2,3,0,1,0,1,0,1]* >>>>> * vpaddd zmm0, zmm0, zmm1* >>>>> * vpshufd zmm1, zmm0, 238 # zmm1 = >>>>> zmm0[2,3,2,3,6,7,6,7,10,11,10,11,14,15,14,15]* >>>>> * vpaddd zmm0, zmm0, zmm1* >>>>> * vpshufd zmm1, zmm0, 229 # zmm1 = >>>>> zmm0[1,1,2,3,5,5,6,7,9,9,10,11,13,13,14,15]* >>>>>...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...e: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would avoid the > integer/float execution domain transfer issue with using vpshufd. > > > > Yes, this is the obvious solution to folding memory loads. It just isn't > implemented yet. > > > > Well, actually it is, but I haven't finished writing tests for it. =] > > Thanks Chandler - vpermilps/vpermilpd generation looks great now. > >...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...#39;s hard to test for the ISEL condition because CodeGen optimizes >> ; away the bugpointed code. Just ensure the basics are still there. >> ;CHECK-LABEL: func: >> -;CHECK: vxorps >> -;CHECK: vinsertf128 >> +;CHECK: vpxor >> +;CHECK: vinserti128 >> ;CHECK: vpshufd >> ;CHECK: vpshufd >> ;CHECK: vmulps >> >> Modified: llvm/trunk/test/CodeGen/X86/3addr-16bit.ll >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/3addr-16bit.ll?rev=192750&r1=192749&r2=192750&view=diff >> =======================...