search for: vpermilpd

Displaying 5 results from an estimated 5 matches for "vpermilpd".

Did you mean: vpermilps
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sat, Sep 20, 2014 at 7:12 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > Hi Andrea / Chandler / Quentin, > > If AVX is available I would expect the vpermilps/vpermilpd instruction to > be used for all float/double single vector shuffles, especially as it can > deal with the folded load case as well - this would avoid the integer/float > execution domain transfer issue with using vpshufd. > Yes, this is the obvious solution to folding memory loads. It...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would avoid the > integer/float execution domain transfer issue with using vpshufd. > > > > Yes, this is the obvious solution to folding me...
2020 Aug 31
2
Vectorization of math function failed?
...00 00 00 00 callq e <_Z4fct1Dv4_f+0xe> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp) 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0 19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e> 1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp) 24: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0 2b: e8 00 00 00 00 callq 30 <_Z4fct1Dv4_f+0x30> 30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp) 36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0 3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42> 42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp)...
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store
2015 May 04
4
[LLVMdev] Load value and broadcast in LLVM
Is it possible to load a value into a vector register and broadcast it in LLVM? For example, for the following address %x %x = getelementptr inbounds %struct._Ray* %ray, i32 0, i32 0, i32 0 instead of loading the value at %x into a scalar register %0: %0 = load double* %x, align 4, !tbaa !0 I want to load it into a <2 x double> vector register %1 and make both of the two elements in %1