> This problem can be solved by sinking the broadcast instruction at codegen-prepare time.I considered this option. We currently don’t have target specific optimizations in codegen-prepare time. (Or I’m wrong?) And it will be very X86-directed optimization. Even gather-scatter intrinsics are considered as common for all targets. And the second reason, why I’d prefer to generate a splat-GEP, is compile-time saving. I should generate 2 (or more, for each splat element) redundant instructions (broadcast is insert+shuffle), hoist them outside the loop on some stage. Then look for them on CodeGenPreare pass, sink them back and rebuild the CFG. - Elena From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Monday, March 02, 2015 19:01 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; Duncan P. N. Exon Smith; dag at cray.com; Philip Reames (listmail at philipreames.com); Hal Finkel (hfinkel at anl.gov); Chandler Carruth (chandlerc at gmail.com) Subject: Re: Extending Vector GEP - proposal I don’t have a strong opinion on this. The current GEP syntax is more restrictive and the single base pointer case can be emulated using a broadcast + vector-gep, that can easily be patten matched at codegen time. The problem with the current syntax is that the ‘broadcast’ instruction can be hoisted outside of loops and this can be a problem with our "one block at a time" codegen implementation. This problem can be solved by sinking the broadcast instruction at codegen-prepare time. Is there a strong motivation to prefer one representation over the other? On Mar 1, 2015, at 2:10 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: Hi, According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements. %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets I propose to lessen this requirement. Let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value. %A = getelementptr i8* %ptr, <4 x i64> %offsets or %A = getelementptr <4 x i8*> %ptrs, i64 %offset In this case we don’t have to add a “broadcast” before GEP. It is actually will be developer’s decision what form to choose. I plan to use vector GEP in gather/scatter and the “broadcasting” of the scalar value impedes to narrow this operation to the “common base, multiple indices” form in the future. What do you think? Thanks. · Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/f5997e0e/attachment.html>
> On Mar 2, 2015, at 11:30 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote: > > > This problem can be solved by sinking the broadcast instruction at codegen-prepare time. > I considered this option. We currently don’t have target specific optimizations in codegen-prepare time. (Or I’m wrong?) > And it will be very X86-directed optimization. Even gather-scatter intrinsics are considered as common for all targets. > > And the second reason, why I’d prefer to generate a splat-GEP, is compile-time saving. > I should generate 2 (or more, for each splat element) redundant instructions (broadcast is insert+shuffle), hoist them outside the loop on some stage. Then look for them on CodeGenPreare pass, sink them back and rebuild the CFG.Okay. I think that it’s reasonable to add support for GEP with a single base pointer and a vector of indices.> > - Elena > > From: Nadav Rotem [mailto:nrotem at apple.com <mailto:nrotem at apple.com>] > Sent: Monday, March 02, 2015 19:01 > To: Demikhovsky, Elena > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>; Duncan P. N. Exon Smith; dag at cray.com <mailto:dag at cray.com>; Philip Reames (listmail at philipreames.com <mailto:listmail at philipreames.com>); Hal Finkel (hfinkel at anl.gov <mailto:hfinkel at anl.gov>); Chandler Carruth (chandlerc at gmail.com <mailto:chandlerc at gmail.com>) > Subject: Re: Extending Vector GEP - proposal > > I don’t have a strong opinion on this. The current GEP syntax is more restrictive and the single base pointer case can be emulated using a broadcast + vector-gep, that can easily be patten matched at codegen time. The problem with the current syntax is that the ‘broadcast’ instruction can be hoisted outside of loops and this can be a problem with our "one block at a time" codegen implementation. This problem can be solved by sinking the broadcast instruction at codegen-prepare time. > > Is there a strong motivation to prefer one representation over the other? > > > On Mar 1, 2015, at 2:10 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote: > > Hi, > > According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements. > > %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets > > I propose to lessen this requirement. Let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value. > > %A = getelementptr i8* %ptr, <4 x i64> %offsets > or > %A = getelementptr <4 x i8*> %ptrs, i64 %offset > > In this case we don’t have to add a “broadcast” before GEP. It is actually will be developer’s decision what form to choose. > I plan to use vector GEP in gather/scatter and the “broadcasting” of the scalar value impedes to narrow this operation to the “common base, multiple indices” form in the future. > > What do you think? > Thanks. > > · Elena > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/c5143fe2/attachment.html>
----- Original Message -----> From: "Nadav Rotem" <nrotem at apple.com> > To: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: llvmdev at cs.uiuc.edu, "Duncan P. N. Exon Smith" > <dexonsmith at apple.com>, dag at cray.com, "Philip Reames > (listmail at philipreames.com)" <listmail at philipreames.com>, "Hal > Finkel (hfinkel at anl.gov)" <hfinkel at anl.gov>, "Chandler Carruth > (chandlerc at gmail.com)" <chandlerc at gmail.com> > Sent: Tuesday, March 3, 2015 11:38:47 AM > Subject: Re: Extending Vector GEP - proposal> > On Mar 2, 2015, at 11:30 PM, Demikhovsky, Elena < > > elena.demikhovsky at intel.com > wrote: >> > > This problem can be solved by sinking the broadcast instruction > > > at > > > codegen-prepare time. > > > I considered this option. We currently don’t have target specific > > optimizations in codegen-prepare time. (Or I’m wrong?) > > > And it will be very X86-directed optimization. Even gather-scatter > > intrinsics are considered as common for all targets. >> > And the second reason, why I’d prefer to generate a splat-GEP, is > > compile-time saving. > > > I should generate 2 (or more, for each splat element) redundant > > instructions (broadcast is insert+shuffle), hoist them outside the > > loop on some stage. Then look for them on CodeGenPreare pass, sink > > them back and rebuild the CFG. > > Okay. I think that it’s reasonable to add support for GEP with a > single base pointer and a vector of indices.I agree; the splat case, especially when you're indexing into a structure, seems as though it will be very common. -Hal> > - Elena >> > From: Nadav Rotem [ mailto:nrotem at apple.com ] > > > Sent: Monday, March 02, 2015 19:01 > > > To: Demikhovsky, Elena > > > Cc: llvmdev at cs.uiuc.edu ; Duncan P. N. Exon Smith; dag at cray.com ; > > Philip Reames ( listmail at philipreames.com ); Hal Finkel ( > > hfinkel at anl.gov ); Chandler Carruth ( chandlerc at gmail.com ) > > > Subject: Re: Extending Vector GEP - proposal >> > I don’t have a strong opinion on this. The current GEP syntax is > > more > > restrictive and the single base pointer case can be emulated using > > a > > broadcast + vector-gep, that can easily be patten matched at > > codegen > > time. The problem with the current syntax is that the ‘broadcast’ > > instruction can be hoisted outside of loops and this can be a > > problem with our "one block at a time" codegen implementation. This > > problem can be solved by sinking the broadcast instruction at > > codegen-prepare time. >> > Is there a strong motivation to prefer one representation over the > > other? >> > > On Mar 1, 2015, at 2:10 AM, Demikhovsky, Elena < > > > elena.demikhovsky at intel.com > wrote: > > >> > > Hi, > > >> > > According to the current GEP syntax, vector GEP requires that > > > each > > > index must be a vector with the same number of elements. > > >> > > %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets > > >> > > I propose to lessen this requirement. Let each index be or vector > > > or > > > scalar. All vector indices must have the same number of elements. > > > The scalar value will mean the splat vector value. > > >> > > %A = getelementptr i8* %ptr, <4 x i64> %offsets > > >> > > or > > >> > > %A = getelementptr <4 x i8*> %ptrs, i64 %offset > > >> > > In this case we don’t have to add a “broadcast” before GEP. It is > > > actually will be developer’s decision what form to choose. > > >> > > I plan to use vector GEP in gather/scatter and the “broadcasting” > > > of > > > the scalar value impedes to narrow this operation to the “common > > > base, multiple indices” form in the future. > > >> > > What do you think? > > >> > > Thanks. > > >> > > · Elena > > >> > > --------------------------------------------------------------------- > > > > > > Intel Israel (74) Limited > > > > > > This e-mail and any attachments may contain confidential material > > > for > > > > > > the sole use of the intended recipient(s). Any review or > > > distribution > > > > > > by others is strictly prohibited. If you are not the intended > > > > > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > > > Intel Israel (74) Limited > > > This e-mail and any attachments may contain confidential material > > for > > > the sole use of the intended recipient(s). Any review or > > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/6c5dc0cf/attachment.html>
"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:> I should generate 2 (or more, for each splat element) redundant > instructions (broadcast is insert+shuffle), hoist them outside the > loop on some stage. Then look for them on CodeGenPreare pass, sink > them back and rebuild the CFG.I agree with Elena. These are common operations and ought to be directly representable in the IR. Hoisting and sinking have been constant pain points for us for exactly the reason described. Getting the sinking right isn't trivial. It's not especially hard but it's extra work that supporting the operations actually desired in the IR would eliminate. -David
Nadav Rotem <nrotem at apple.com> writes:> Okay. I think that it’s reasonable to add support for GEP with a > single base pointer and a vector of indices.We should also support a vector of pointers and a scalar index, I think. -David