Jose Fonseca
2011-Nov-30 15:59 UTC
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
Yes, indeed I can always fallback to intrinsics. But still, I believe that the case I described is in its essence quite common-place, so it should be a first-class citizen in the LLVM IR. AVX2 is the target ISA I'm thinking of too BTW. Let's forget 3D, and imagine something as trivial as a vectorized i32 => float table look up. I'd expect that the IR would look something like: ; Look Up Table with precomputed values declare float* @lut; define <8 x float> @foo(<8 x float> %indices) { %pointer = getelementptr float* @lut, <8 x i32> %indices %values = load <8 x float*> %pointer ret <8 x float> %values; } And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode: foo: VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1 RET Jose ----- Original Message -----> Hi Jose, > > The proposed IR change does not contribute nor hinder the usecase you > mentioned. The case of a base + vector-index should be easily > addressed by an intrinsic. The pointer-vector proposal comes to > support full scatter/gather instructions (such as the AVX2 gather > instructions). > > Nadav > > > -----Original Message----- > From: Jose Fonseca [mailto:jfonseca at vmware.com] > Sent: Tuesday, November 29, 2011 22:25 > To: Rotem, Nadav; David A. Greene > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and > Vector-GEP > > ----- Original Message ----- > > "Rotem, Nadav" <nadav.rotem at intel.com> writes: > > > > > David, > > > > > > Thanks for the support! I sent a detailed email with the overall > > > plan. But just to reiterate, the GEP would look like this: > > > > > > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, > > > i32 > > > 3, i32 4> > > > > > > Where the index of the GEP is a vector of indices. I am not > > > against > > > having multiple indices. I just want to start with a basic set of > > > features. > > > > Ah, I see. I actually think multiple indices as in multiple > > vectors > > of > > indices to the GEP above would be pretty rare. > > Nadav, David, > > I'd like to understand a bit better the final role of these pointer > vector types in 64bit architectures, where the pointers are often > bigger than the elements stored/fetch (e.g, 32bits floats/ints). > > Will 64bits backends be forced to actually operate with 64bit pointer > vectors all the time? Or will they be able to retain operations on > base + 32bit offsets as such? > > In particular, an important use case for 3D software rendering is to > be able to gather <4 x i32> values, from a i32* scalar base pointer > in a 64bit address space, indexed by <N x i32> offsets. [1] And it > is important that the intermediate <N x i32*> pointer vectors is > actually never instanced, as it wouldn't fit in the hardware SIMD > registers, and therefore would require two gather operations. > > It would be nice to see how this use case would look in the proposed > IR, and get assurance that backends will be able to emit efficient > code (i.e., a single gather instruction) from that IR. > > Jose > > [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >
Rotem, Nadav
2011-Nov-30 21:16 UTC
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
Jose, The scenario you described is probably the most important/common case. Supporting GEPs with a scalar base pointer and multiple indices can indeed assist IR-level optimizations in detecting these patterns and replace them with intrinsics. But even without a single scalar base pointers, optimizations can detect that the base pointer is broadcasted from a scalar. Having said that, I am still not sure how to add codegen support for AVX2 scatter/gather of base + 32bit-indices. The problem is that the GEP would return a vector of pointers, which need to be reversed back to the 'base+index' form. I think that replacing the GEP/LOAD sequence with an intrinsic if probably the best choice. Nadav -----Original Message----- From: Jose Fonseca [mailto:jfonseca at vmware.com] Sent: Wednesday, November 30, 2011 18:00 To: Rotem, Nadav Cc: LLVM Developers Mailing List; David A. Greene Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP Yes, indeed I can always fallback to intrinsics. But still, I believe that the case I described is in its essence quite common-place, so it should be a first-class citizen in the LLVM IR. AVX2 is the target ISA I'm thinking of too BTW. Let's forget 3D, and imagine something as trivial as a vectorized i32 => float table look up. I'd expect that the IR would look something like: ; Look Up Table with precomputed values declare float* @lut; define <8 x float> @foo(<8 x float> %indices) { %pointer = getelementptr float* @lut, <8 x i32> %indices %values = load <8 x float*> %pointer ret <8 x float> %values; } And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode: foo: VPCMPEQB ymm1, ymm1, ymm1 ; generate all ones VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1 RET Jose ----- Original Message -----> Hi Jose, > > The proposed IR change does not contribute nor hinder the usecase you > mentioned. The case of a base + vector-index should be easily > addressed by an intrinsic. The pointer-vector proposal comes to > support full scatter/gather instructions (such as the AVX2 gather > instructions). > > Nadav > > > -----Original Message----- > From: Jose Fonseca [mailto:jfonseca at vmware.com] > Sent: Tuesday, November 29, 2011 22:25 > To: Rotem, Nadav; David A. Greene > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and > Vector-GEP > > ----- Original Message ----- > > "Rotem, Nadav" <nadav.rotem at intel.com> writes: > > > > > David, > > > > > > Thanks for the support! I sent a detailed email with the overall > > > plan. But just to reiterate, the GEP would look like this: > > > > > > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, > > > i32 > > > 3, i32 4> > > > > > > Where the index of the GEP is a vector of indices. I am not > > > against > > > having multiple indices. I just want to start with a basic set of > > > features. > > > > Ah, I see. I actually think multiple indices as in multiple > > vectors > > of > > indices to the GEP above would be pretty rare. > > Nadav, David, > > I'd like to understand a bit better the final role of these pointer > vector types in 64bit architectures, where the pointers are often > bigger than the elements stored/fetch (e.g, 32bits floats/ints). > > Will 64bits backends be forced to actually operate with 64bit pointer > vectors all the time? Or will they be able to retain operations on > base + 32bit offsets as such? > > In particular, an important use case for 3D software rendering is to > be able to gather <4 x i32> values, from a i32* scalar base pointer > in a 64bit address space, indexed by <N x i32> offsets. [1] And it > is important that the intermediate <N x i32*> pointer vectors is > actually never instanced, as it wouldn't fit in the hardware SIMD > registers, and therefore would require two gather operations. > > It would be nice to see how this use case would look in the proposed > IR, and get assurance that backends will be able to emit efficient > code (i.e., a single gather instruction) from that IR. > > Jose > > [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >--------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
David A. Greene
2011-Dec-01 17:15 UTC
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
"Rotem, Nadav" <nadav.rotem at intel.com> writes:> The scenario you described is probably the most important/common > case. Supporting GEPs with a scalar base pointer and multiple indices > can indeed assist IR-level optimizations in detecting these patterns > and replace them with intrinsics. But even without a single scalar > base pointers, optimizations can detect that the base pointer is > broadcasted from a scalar.I just wrote an extensive reply that covers this. I think we do want a scalar base GEP to make isel easier and the IR more target-independent. We should also consider a strided GEP (also covered in my reply) for the same reason.> Having said that, I am still not sure how to add codegen support for > AVX2 scatter/gather of base + 32bit-indices. The problem is that the > GEP would return a vector of pointers, which need to be reversed back > to the 'base+index' form. I think that replacing the GEP/LOAD sequence > with an intrinsic if probably the best choice.In the same reply I mentioned various index generation instructions like cidx. This allows you to retain the index set separate from the final addresses. You'd then match load (gep(base, <0>, index set)) as the gather operation in isel, similar to how memops are done for X86 today (i.e. manual lowering would put the GEP information in special address match data structures). The same can be done without cidx, but you need a more complex instruction sequence to generate the index set and match that in the address manual lowering code. This does make the manual address matching code more complicated. There are lots of options here but I do think we need something beyond the simple "all vector" GEP. -Dave
Possibly Parallel Threads
- [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
- [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
- [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
- [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
- [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP