search for: vgatherdps

Displaying 4 results from an estimated 4 matches for "vgatherdps".

2011 Nov 30
2
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
...lare float* @lut; define <8 x float> @foo(<8 x float> %indices) { %pointer = getelementptr float* @lut, <8 x i32> %indices %values = load <8 x float*> %pointer ret <8 x float> %values; } And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode: foo: VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1 RET Jose ----- Original Message ----- > Hi Jose, > > The proposed IR change does not contribute nor hinder the...
2011 Nov 29
0
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
Hi Jose, The proposed IR change does not contribute nor hinder the usecase you mentioned. The case of a base + vector-index should be easily addressed by an intrinsic. The pointer-vector proposal comes to support full scatter/gather instructions (such as the AVX2 gather instructions). Nadav -----Original Message----- From: Jose Fonseca [mailto:jfonseca at vmware.com] Sent: Tuesday, November
2011 Nov 30
0
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
...lare float* @lut; define <8 x float> @foo(<8 x float> %indices) { %pointer = getelementptr float* @lut, <8 x i32> %indices %values = load <8 x float*> %pointer ret <8 x float> %values; } And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode: foo: VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1 RET Jose ----- Original Message ----- > Hi Jose, > > The proposed IR change does not contribute nor hinder the...
2011 Nov 29
4
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
----- Original Message ----- > "Rotem, Nadav" <nadav.rotem at intel.com> writes: > > > David, > > > > Thanks for the support! I sent a detailed email with the overall > > plan. But just to reiterate, the GEP would look like this: > > > > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32 > > 3, i32