thr3ads.net - search: "vpermilpd"

Displaying 5 results from an estimated 5 matches for "vpermilpd".

Did you mean: vpermilps

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 20

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Sat, Sep 20, 2014 at 7:12 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > Hi Andrea / Chandler / Quentin, > > If AVX is available I would expect the vpermilps/vpermilpd instruction to > be used for all float/double single vector shuffles, especially as it can > deal with the folded load case as well - this would avoid the integer/float > execution domain transfer issue with using vpshufd. > Yes, this is the obvious solution to folding memory loads. It...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 23

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would avoid the > integer/float execution domain transfer issue with using vpshufd. > > > > Yes, this is the obvious solution to folding me...

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

...00 00 00 00 callq e <_Z4fct1Dv4_f+0xe> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp) 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0 19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e> 1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp) 24: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0 2b: e8 00 00 00 00 callq 30 <_Z4fct1Dv4_f+0x30> 30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp) 36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0 3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42> 42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp)...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store

[LLVMdev] Load value and broadcast in LLVM

2015 May 04

[LLVMdev] Load value and broadcast in LLVM

Is it possible to load a value into a vector register and broadcast it in LLVM? For example, for the following address %x %x = getelementptr inbounds %struct._Ray* %ray, i32 0, i32 0, i32 0 instead of loading the value at %x into a scalar register %0: %0 = load double* %x, align 4, !tbaa !0 I want to load it into a <2 x double> vector register %1 and make both of the two elements in %1

search for: vpermilpd