Displaying 4 results from an estimated 4 matches for "vgatherdps".
2011 Nov 30
2
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
...lare float* @lut;
define <8 x float> @foo(<8 x float> %indices) {
%pointer = getelementptr float* @lut, <8 x i32> %indices
%values = load <8 x float*> %pointer
ret <8 x float> %values;
}
And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode:
foo:
VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones
VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1
RET
Jose
----- Original Message -----
> Hi Jose,
>
> The proposed IR change does not contribute nor hinder the...
2011 Nov 29
0
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
Hi Jose,
The proposed IR change does not contribute nor hinder the usecase you mentioned. The case of a base + vector-index should be easily addressed by an intrinsic. The pointer-vector proposal comes to support full scatter/gather instructions (such as the AVX2 gather instructions).
Nadav
-----Original Message-----
From: Jose Fonseca [mailto:jfonseca at vmware.com]
Sent: Tuesday, November
2011 Nov 30
0
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
...lare float* @lut;
define <8 x float> @foo(<8 x float> %indices) {
%pointer = getelementptr float* @lut, <8 x i32> %indices
%values = load <8 x float*> %pointer
ret <8 x float> %values;
}
And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode:
foo:
VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones
VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1
RET
Jose
----- Original Message -----
> Hi Jose,
>
> The proposed IR change does not contribute nor hinder the...
2011 Nov 29
4
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
----- Original Message -----
> "Rotem, Nadav" <nadav.rotem at intel.com> writes:
>
> > David,
> >
> > Thanks for the support! I sent a detailed email with the overall
> > plan. But just to reiterate, the GEP would look like this:
> >
> > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32
> > 3, i32