----- Original Message -----> From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Philip Reames" <listmail at philipreames.com>, dag at cray.com, "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Monday, December 22, 2014 8:05:43 AM > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > Why shouldn't the IR representation simply be a load from a vector > > of arbitrary pointers? > > Such a load could indeed serve as a general form of a gather or > scatter. As Elena responded, we can propose two distinct intrinsics: > one with a vector of pointers, and another with (non-zero) base, a > vector of indices, and a scale implicitly inferred from the element > type. > > The motivation for the latter stems from vectorizing a load or store > to "b[i]", where b is invariant. Broadcasting b and using a vector > gep to feed a vector of pointers, to be pattern matched and folded > later, may work.I would like you to explore this direction, where we use a vector GEP and the intrinsic simply takes a vector of pointers. The backend should pattern-match this as appropriate. I see no reason why we can't make this work, especially because we don't have any real uses of vector GEPs now, so we can *define* the canonical optimized form of them to be conducive to the kind of pattern matching we'd like to perform in the backends. This, I imagine, will require some additional infrastructure work. Currently, GEPs, including vector GEPs, are expanded very early during SDAG building, and the form produced may not be appropriate for reliable pattern matching during later lowering phases. The way this is done is not set in stone, however, and we can certainly change it (including via the introduction of new SDAG nodes) to keep the necessary information together in compact form. Thanks again, Hal> The alternative intrinsic proposed keeps b scalar > and uses a vector of indices for i. In any case, it's important to > recognize such common patterns, at-least for x86, so could deserve > an x86 intrinsic. But it's a general pattern that could potentially > serve other implementations; any other gathers to consider atm? > > Documentation indeed needs to be provided. > > Ayal. > > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Philip Reames > Sent: Sunday, December 21, 2014 20:25 > To: dag at cray.com; Demikhovsky, Elena > Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > On 12/18/2014 11:56 AM, dag at cray.com wrote: > > "Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes: > > > >> Semantics: > >> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + > >> VectorOfIndices[i]*Scale) > >> = VectorValue[i];} > >> VectorValue: any float or integer vector type. > >> BaseAddr: a pointer; may be zero if full address is placed in the > >> index. > >> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer > >> values. > > What about the case of a gather/scatter where the BaseAddr is zero > > and > > the indices are pointers? Must we do a ptrtoint? llvm.org is down > > at > > the moment but I don't think we currently have a vector ptrtoint. > I would be opposed to any representation which required the > introduction of ptrtoint casts by the vectorizer. If it were the > only option available, I could be argued around, but I think we > should try to avoid this. > > More generally, I'm somewhat hesitant of representing a scatter with > explicit base and offsets at all. Why shouldn't the IR > representation simply be a load from a vector of arbitrary pointers? > The backend can pattern match the actual gather instructions it > supports and scalarize the rest. The proposal being made seems very > specific to the current generation of x86 hardware. > > p.s. Where is the documentation for the existing mask load > intrinsics? > I can't find it with a quick search through the LangRef. > > Philip > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
For non-zero case, most of time (e.g. a[i]), the "base" is uniform, the property helps to identify and perform neighboring gather/scatter optimizations in some applications. If the information is preserved, later, it may not easy to recover this "uniform" information. Non-zero case is not designed only for x86, basically, it helps to preserve certain program information. Thanks, Xinmin -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Tuesday, December 23, 2014 5:17 PM To: Zaks, Ayal Cc: dag at cray.com; Khasanov, Robert; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal ----- Original Message -----> From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Philip Reames" <listmail at philipreames.com>, dag at cray.com, "Elena > Demikhovsky" <elena.demikhovsky at intel.com> > Cc: "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Monday, December 22, 2014 8:05:43 AM > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > Why shouldn't the IR representation simply be a load from a vector > > of arbitrary pointers? > > Such a load could indeed serve as a general form of a gather or > scatter. As Elena responded, we can propose two distinct intrinsics: > one with a vector of pointers, and another with (non-zero) base, a > vector of indices, and a scale implicitly inferred from the element > type. > > The motivation for the latter stems from vectorizing a load or store > to "b[i]", where b is invariant. Broadcasting b and using a vector gep > to feed a vector of pointers, to be pattern matched and folded later, > may work.I would like you to explore this direction, where we use a vector GEP and the intrinsic simply takes a vector of pointers. The backend should pattern-match this as appropriate. I see no reason why we can't make this work, especially because we don't have any real uses of vector GEPs now, so we can *define* the canonical optimized form of them to be conducive to the kind of pattern matching we'd like to perform in the backends. This, I imagine, will require some additional infrastructure work. Currently, GEPs, including vector GEPs, are expanded very early during SDAG building, and the form produced may not be appropriate for reliable pattern matching during later lowering phases. The way this is done is not set in stone, however, and we can certainly change it (including via the introduction of new SDAG nodes) to keep the necessary information together in compact form. Thanks again, Hal> The alternative intrinsic proposed keeps b scalar and uses a vector of > indices for i. In any case, it's important to recognize such common > patterns, at-least for x86, so could deserve an x86 intrinsic. But > it's a general pattern that could potentially serve other > implementations; any other gathers to consider atm? > > Documentation indeed needs to be provided. > > Ayal. > > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Philip Reames > Sent: Sunday, December 21, 2014 20:25 > To: dag at cray.com; Demikhovsky, Elena > Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > On 12/18/2014 11:56 AM, dag at cray.com wrote: > > "Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes: > > > >> Semantics: > >> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + > >> VectorOfIndices[i]*Scale) > >> = VectorValue[i];} > >> VectorValue: any float or integer vector type. > >> BaseAddr: a pointer; may be zero if full address is placed in the > >> index. > >> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer > >> values. > > What about the case of a gather/scatter where the BaseAddr is zero > > and the indices are pointers? Must we do a ptrtoint? llvm.org is > > down at the moment but I don't think we currently have a vector > > ptrtoint. > I would be opposed to any representation which required the > introduction of ptrtoint casts by the vectorizer. If it were the only > option available, I could be argued around, but I think we should try > to avoid this. > > More generally, I'm somewhat hesitant of representing a scatter with > explicit base and offsets at all. Why shouldn't the IR representation > simply be a load from a vector of arbitrary pointers? > The backend can pattern match the actual gather instructions it > supports and scalarize the rest. The proposal being made seems very > specific to the current generation of x86 hardware. > > p.s. Where is the documentation for the existing mask load intrinsics? > I can't find it with a quick search through the LangRef. > > Philip > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
----- Original Message -----> From: "Xinmin Tian" <xinmin.tian at intel.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Ayal Zaks" <ayal.zaks at intel.com> > Cc: dag at cray.com, "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, December 23, 2014 7:36:44 PM > Subject: RE: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > For non-zero case, most of time (e.g. a[i]), the "base" is uniform,Agreed.> the property helps to identify and perform neighboring > gather/scatter optimizations in some applications. If the > information is preserved, later, it may not easy to recover this > "uniform" information.I don't understand what you mean. Why do you feel the information will be difficult to recover later from vector GEPs feeding the scatter/gather intrinsics? -Hal> Non-zero case is not designed only for x86, > basically, it helps to preserve certain program information. > > Thanks, > Xinmin > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel > Sent: Tuesday, December 23, 2014 5:17 PM > To: Zaks, Ayal > Cc: dag at cray.com; Khasanov, Robert; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > ----- Original Message ----- > > From: "Ayal Zaks" <ayal.zaks at intel.com> > > To: "Philip Reames" <listmail at philipreames.com>, dag at cray.com, > > "Elena > > Demikhovsky" <elena.demikhovsky at intel.com> > > Cc: "Robert Khasanov" <robert.khasanov at intel.com>, > > llvmdev at cs.uiuc.edu > > Sent: Monday, December 22, 2014 8:05:43 AM > > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > > > Why shouldn't the IR representation simply be a load from a > > > vector > > > of arbitrary pointers? > > > > Such a load could indeed serve as a general form of a gather or > > scatter. As Elena responded, we can propose two distinct > > intrinsics: > > one with a vector of pointers, and another with (non-zero) base, a > > vector of indices, and a scale implicitly inferred from the element > > type. > > > > The motivation for the latter stems from vectorizing a load or > > store > > to "b[i]", where b is invariant. Broadcasting b and using a vector > > gep > > to feed a vector of pointers, to be pattern matched and folded > > later, > > may work. > > I would like you to explore this direction, where we use a vector GEP > and the intrinsic simply takes a vector of pointers. The backend > should pattern-match this as appropriate. I see no reason why we > can't make this work, especially because we don't have any real uses > of vector GEPs now, so we can *define* the canonical optimized form > of them to be conducive to the kind of pattern matching we'd like to > perform in the backends. > > This, I imagine, will require some additional infrastructure work. > Currently, GEPs, including vector GEPs, are expanded very early > during SDAG building, and the form produced may not be appropriate > for reliable pattern matching during later lowering phases. The way > this is done is not set in stone, however, and we can certainly > change it (including via the introduction of new SDAG nodes) to keep > the necessary information together in compact form. > > Thanks again, > Hal > > > The alternative intrinsic proposed keeps b scalar and uses a vector > > of > > indices for i. In any case, it's important to recognize such common > > patterns, at-least for x86, so could deserve an x86 intrinsic. But > > it's a general pattern that could potentially serve other > > implementations; any other gathers to consider atm? > > > > Documentation indeed needs to be provided. > > > > Ayal. > > > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu > > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Philip Reames > > Sent: Sunday, December 21, 2014 20:25 > > To: dag at cray.com; Demikhovsky, Elena > > Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > > > > > On 12/18/2014 11:56 AM, dag at cray.com wrote: > > > "Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes: > > > > > >> Semantics: > > >> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + > > >> VectorOfIndices[i]*Scale) > > >> = VectorValue[i];} > > >> VectorValue: any float or integer vector type. > > >> BaseAddr: a pointer; may be zero if full address is placed in > > >> the > > >> index. > > >> VectorOfIndices: a vector of i32 or i64 signed or unsigned > > >> integer > > >> values. > > > What about the case of a gather/scatter where the BaseAddr is > > > zero > > > and the indices are pointers? Must we do a ptrtoint? llvm.org > > > is > > > down at the moment but I don't think we currently have a vector > > > ptrtoint. > > I would be opposed to any representation which required the > > introduction of ptrtoint casts by the vectorizer. If it were the > > only > > option available, I could be argued around, but I think we should > > try > > to avoid this. > > > > More generally, I'm somewhat hesitant of representing a scatter > > with > > explicit base and offsets at all. Why shouldn't the IR > > representation > > simply be a load from a vector of arbitrary pointers? > > The backend can pattern match the actual gather instructions it > > supports and scalarize the rest. The proposal being made seems > > very > > specific to the current generation of x86 hardware. > > > > p.s. Where is the documentation for the existing mask load > > intrinsics? > > I can't find it with a quick search through the LangRef. > > > > Philip > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material > > for > > the sole use of the intended recipient(s). Any review or > > distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [LLVMdev] Indexed Load and Store Intrinsics - proposal
- [LLVMdev] Indexed Load and Store Intrinsics - proposal
- [LLVMdev] Indexed Load and Store Intrinsics - proposal
- [LLVMdev] Indexed Load and Store Intrinsics - proposal
- RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)