Martin J. O'Riordan via llvm-dev
2016-May-16 10:11 UTC
[llvm-dev] sum elements in the vector
This would be really cool. We have several instructions that perform horizontal vector operations, and have to use built-ins to select them as there is no easy way of expressing them in a TD file. Some like SUM for a ‘v4i32’ are easy enough to express with a pattern fragment, SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in TableGen disappearing into itself for hours trying to reduce the patterns before I gave up and cancelled it. If there were ISD nodes for these, then it would be far simpler to express in TableGen, and also, the pattern fragments only match a very specific form of IR to the desired instruction. The horizontal operations are particularly useful for finalising a vectorised operation - for example I may want to compute the scalar MAX, MIN or SUM of a large number of items. If the number of items is divisible by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a time can be computed using normal vector operation, and then the final scalar value can be computed using a single horizontal operation. MartinO From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chandler Carruth via llvm-dev Sent: 16 May 2016 2:16 To: Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel Subject: Re: [llvm-dev] sum elements in the vector I'm starting to think we should directly implement horizontal operations on vector types. My suspicion is that coming up with a nice model for this would help us a lot with things like: - Idiom recognition of reduction patterns that use horizontal arithmetic - Ability to use horizontal operations in SLPVectorizer - Significantly easier cost modeling of vectorizing loops with reductions in LoopVectorize - Other things I've not thought of? Curious what others think? -Chandler On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:> why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic?Adding intrinsic is not the only way, it is one of the way and user WILL-NOT be required to invoke It specifically. Currently LLVM does not have any instruction to directly represent “sum of elements in a vector” and generate your particular instruction.However, you can do it without intrinsic by pattern matching the LLVM-IRs representing “sum of elements in vector” to your particular instruction in DAGCombiner. Regards, Shahid From: Rail Shafigulin [mailto:rail at esenciatech.com <mailto:rail at esenciatech.com> ] Sent: Monday, May 09, 2016 11:59 PM To: Shahid, Asghar-ahmad; llvm-dev Cc: Das, Dibyendu Subject: Re: [llvm-dev] sum elements in the vector I'm a little confused. Here is why. I was able to add a vector add instruction to my target without using any intrinsics and without adding any new instructions to LLVM. So here is my question: how come I managed to add a new vector instruction without adding an intrinsic and why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic? Another question that I have is whether compiler will be able to target this new instruction (sum elements in a vector) if it is implemented as an intrinsic or the user will have to specifically invoke an instrinsic. Pardon if questions seem dumb, I'm still learning things. Any help is appreciated. On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com <mailto:rail at esenciatech.com> > wrote: Thanks for the reply. These steps will add an instruction as an intrinsic. Is it possible to add an actual new instruction so that a compiler could target it during an optimization? How hard is it to do it? Is that a realistic objective. Rail On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com <mailto:Asghar-ahmad.Shahid at amd.com> > wrote: Hi Rail, We had done this for generation of X86 PSAD (sum of absolute difference) instruction through Llvm intrinsic. Doing this requires following 1. Define an intrinsic, xyz(), for the required instruction and corresponding SDNode 2. Generate the “call xyz() “ IR based the matched pattern 3. Map “call xyz()” IR to corresponding SDNode in SelectionDagBuilder.cpp 4. Provide default expansion of the xyz() intrinsic 5. Legalize type and/or operation 6. Provide Lowering of intrinsic/SDNode to generate your target instruction You can visit http://llvm.org/docs/ExtendingLLVM.html for details. Regards, Shahid From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 11:00 PM To: Das, Dibyendu Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] sum elements in the vector Thanks for the pointers. I looked at hadd instructions. They seem to do very similar to what I need. Unfortunately as I said before my LLVM experience is limited. My understanding is that when I create a new type of SDNode I need to specify a pattern for it, so that when LLVM is analyzing the code and is seeing a given pattern it would create this particular node. I'm really struggling to understand how it is done. So here are the problems that I'm having. 1. How do I identify that pattern that should be used? 2. How do I specify a given pattern? Do you (or someone else) mind helping me out? Any help is appreciated. On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com <mailto:Dibyendu.Das at amd.com> > wrote: This is roughly along the lines of x86 hadd* instructions though the semantics of hadd* may not exactly match what you are looking for. This is probably more in line with x86/ARM SAD-like instructions but I don’t think llvm generates SAD without intrinsics. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 9:34 AM To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > Subject: [llvm-dev] sum elements in the vector My target has an instruction that adds up all elements in the vector and stores the result in a register. I'm trying to implement it in my compiler but I'm not sure even where to start. I did look at other targets, but they don't seem to have anything like it ( I could be wrong. My experience with LLVM is limited, so if I missed it, I'd appreciate if someone could point it out ). My understanding is that if SDNode for such an instruction doesn't exist I have to define one. Unfortunately, I don't know how to do it. I don't even know where to start looking. Would someone care to point me in the right direction? Any help is appreciated. -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160516/808a557b/attachment.html>
Demikhovsky, Elena via llvm-dev
2016-May-16 18:31 UTC
[llvm-dev] sum elements in the vector
We are also thinking about necessity of horizontal intrinsics : sum, mul, min, max – for floating point and integers, and logical operations for integers. It will allow to apply target specific optimizations for reduction tail. - Elena From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Martin J. O'Riordan via llvm-dev Sent: Monday, May 16, 2016 13:11 To: 'LLVM Developers' <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] sum elements in the vector This would be really cool. We have several instructions that perform horizontal vector operations, and have to use built-ins to select them as there is no easy way of expressing them in a TD file. Some like SUM for a ‘v4i32’ are easy enough to express with a pattern fragment, SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in TableGen disappearing into itself for hours trying to reduce the patterns before I gave up and cancelled it. If there were ISD nodes for these, then it would be far simpler to express in TableGen, and also, the pattern fragments only match a very specific form of IR to the desired instruction. The horizontal operations are particularly useful for finalising a vectorised operation - for example I may want to compute the scalar MAX, MIN or SUM of a large number of items. If the number of items is divisible by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a time can be computed using normal vector operation, and then the final scalar value can be computed using a single horizontal operation. MartinO From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chandler Carruth via llvm-dev Sent: 16 May 2016 2:16 To: Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel Subject: Re: [llvm-dev] sum elements in the vector I'm starting to think we should directly implement horizontal operations on vector types. My suspicion is that coming up with a nice model for this would help us a lot with things like: - Idiom recognition of reduction patterns that use horizontal arithmetic - Ability to use horizontal operations in SLPVectorizer - Significantly easier cost modeling of vectorizing loops with reductions in LoopVectorize - Other things I've not thought of? Curious what others think? -Chandler On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:> why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic?Adding intrinsic is not the only way, it is one of the way and user WILL-NOT be required to invoke It specifically. Currently LLVM does not have any instruction to directly represent “sum of elements in a vector” and generate your particular instruction.However, you can do it without intrinsic by pattern matching the LLVM-IRs representing “sum of elements in vector” to your particular instruction in DAGCombiner. Regards, Shahid From: Rail Shafigulin [mailto:rail at esenciatech.com<mailto:rail at esenciatech.com>] Sent: Monday, May 09, 2016 11:59 PM To: Shahid, Asghar-ahmad; llvm-dev Cc: Das, Dibyendu Subject: Re: [llvm-dev] sum elements in the vector I'm a little confused. Here is why. I was able to add a vector add instruction to my target without using any intrinsics and without adding any new instructions to LLVM. So here is my question: how come I managed to add a new vector instruction without adding an intrinsic and why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic? Another question that I have is whether compiler will be able to target this new instruction (sum elements in a vector) if it is implemented as an intrinsic or the user will have to specifically invoke an instrinsic. Pardon if questions seem dumb, I'm still learning things. Any help is appreciated. On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com<mailto:rail at esenciatech.com>> wrote: Thanks for the reply. These steps will add an instruction as an intrinsic. Is it possible to add an actual new instruction so that a compiler could target it during an optimization? How hard is it to do it? Is that a realistic objective. Rail On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com<mailto:Asghar-ahmad.Shahid at amd.com>> wrote: Hi Rail, We had done this for generation of X86 PSAD (sum of absolute difference) instruction through Llvm intrinsic. Doing this requires following 1. Define an intrinsic, xyz(), for the required instruction and corresponding SDNode 2. Generate the “call xyz() “ IR based the matched pattern 3. Map “call xyz()” IR to corresponding SDNode in SelectionDagBuilder.cpp 4. Provide default expansion of the xyz() intrinsic 5. Legalize type and/or operation 6. Provide Lowering of intrinsic/SDNode to generate your target instruction You can visit http://llvm.org/docs/ExtendingLLVM.html for details. Regards, Shahid From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 11:00 PM To: Das, Dibyendu Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] sum elements in the vector Thanks for the pointers. I looked at hadd instructions. They seem to do very similar to what I need. Unfortunately as I said before my LLVM experience is limited. My understanding is that when I create a new type of SDNode I need to specify a pattern for it, so that when LLVM is analyzing the code and is seeing a given pattern it would create this particular node. I'm really struggling to understand how it is done. So here are the problems that I'm having. 1. How do I identify that pattern that should be used? 2. How do I specify a given pattern? Do you (or someone else) mind helping me out? Any help is appreciated. On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com<mailto:Dibyendu.Das at amd.com>> wrote: This is roughly along the lines of x86 hadd* instructions though the semantics of hadd* may not exactly match what you are looking for. This is probably more in line with x86/ARM SAD-like instructions but I don’t think llvm generates SAD without intrinsics. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 9:34 AM To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: [llvm-dev] sum elements in the vector My target has an instruction that adds up all elements in the vector and stores the result in a register. I'm trying to implement it in my compiler but I'm not sure even where to start. I did look at other targets, but they don't seem to have anything like it ( I could be wrong. My experience with LLVM is limited, so if I missed it, I'd appreciate if someone could point it out ). My understanding is that if SDNode for such an instruction doesn't exist I have to define one. Unfortunately, I don't know how to do it. I don't even know where to start looking. Would someone care to point me in the right direction? Any help is appreciated. -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160516/9f03fd7c/attachment.html>
Hi, I've long been an advocate of intrinsics at least for horizontal operations; first class instructions I could also get behind. Anything that isn't pattern matching power-two-shuffles in the backend! James On Mon, 16 May 2016 at 19:32, Demikhovsky, Elena via llvm-dev < llvm-dev at lists.llvm.org> wrote:> We are also thinking about necessity of horizontal intrinsics : sum, mul, > min, max – for floating point and integers, and logical operations for > integers. > > It will allow to apply target specific optimizations for reduction tail. > > > > - * Elena* > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Martin > J. O'Riordan via llvm-dev > *Sent:* Monday, May 16, 2016 13:11 > *To:* 'LLVM Developers' <llvm-dev at lists.llvm.org> > > > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > This would be really cool. We have several instructions that perform > horizontal vector operations, and have to use built-ins to select them as > there is no easy way of expressing them in a TD file. Some like SUM for a ‘ > v4i32’ are easy enough to express with a pattern fragment, SUM ‘v8i16’ > takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in > TableGen disappearing into itself for hours trying to reduce the patterns > before I gave up and cancelled it. > > > > If there were ISD nodes for these, then it would be far simpler to express > in TableGen, and also, the pattern fragments only match a very specific > form of IR to the desired instruction. > > > > The horizontal operations are particularly useful for finalising a > vectorised operation - for example I may want to compute the scalar MAX, > MIN or SUM of a large number of items. If the number of items is divisible > by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a > time can be computed using normal vector operation, and then the final > scalar value can be computed using a single horizontal operation. > > > > MartinO > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org > <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Chandler Carruth via > llvm-dev > *Sent:* 16 May 2016 2:16 > *To:* Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > I'm starting to think we should directly implement horizontal operations > on vector types. > > > > My suspicion is that coming up with a nice model for this would help us a > lot with things like: > > - Idiom recognition of reduction patterns that use horizontal arithmetic > > - Ability to use horizontal operations in SLPVectorizer > > - Significantly easier cost modeling of vectorizing loops with reductions > in LoopVectorize > > - Other things I've not thought of? > > > Curious what others think? > > > > -Chandler > > > > On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > why in order to add this particular instruction (sum elements in a > vector) I need to add an insrinsic? > > Adding intrinsic is not the only way, it is one of the way and user > WILL-NOT be required to invoke > > It specifically. > > > > Currently LLVM does not have any instruction to directly represent “sum of > elements in a vector” and > > generate your particular instruction.However, you can do it without > intrinsic by pattern matching the > > LLVM-IRs representing “sum of elements in vector” to your particular > instruction in DAGCombiner. > > > > Regards, > > Shahid > > > > > > *From:* Rail Shafigulin [mailto:rail at esenciatech.com] > *Sent:* Monday, May 09, 2016 11:59 PM > *To:* Shahid, Asghar-ahmad; llvm-dev > *Cc:* Das, Dibyendu > > > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > I'm a little confused. Here is why. > > > > I was able to add a vector add instruction to my target without using any > intrinsics and without adding any new instructions to LLVM. So here is my > question: how come I managed to add a new vector instruction without adding > an intrinsic and why in order to add this particular instruction (sum > elements in a vector) I need to add an insrinsic? > > > > Another question that I have is whether compiler will be able to target > this new instruction (sum elements in a vector) if it is implemented as an > intrinsic or the user will have to specifically invoke an instrinsic. > > > > Pardon if questions seem dumb, I'm still learning things. > > > > Any help is appreciated. > > > > On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com> > wrote: > > Thanks for the reply. These steps will add an instruction as an intrinsic. > Is it possible to add an actual new instruction so that a compiler could > target it during an optimization? How hard is it to do it? Is that a > realistic objective. > > > > Rail > > > > On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad < > Asghar-ahmad.Shahid at amd.com> wrote: > > Hi Rail, > > > > We had done this for generation of X86 PSAD (sum of absolute difference) > instruction through > > Llvm intrinsic. Doing this requires following > > 1. Define an intrinsic, xyz(), for the required instruction and > corresponding SDNode > > 2. Generate the “call xyz() “ IR based the matched pattern > > 3. Map “call xyz()” IR to corresponding SDNode in > SelectionDagBuilder.cpp > > 4. Provide default expansion of the xyz() intrinsic > > 5. Legalize type and/or operation > > 6. Provide Lowering of intrinsic/SDNode to generate your target > instruction > > > > You can visit http://llvm.org/docs/ExtendingLLVM.html for details. > > > > Regards, > > Shahid > > > > > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail > Shafigulin via llvm-dev > *Sent:* Monday, April 04, 2016 11:00 PM > *To:* Das, Dibyendu > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > Thanks for the pointers. I looked at hadd instructions. They seem to do > very similar to what I need. Unfortunately as I said before my LLVM > experience is limited. My understanding is that when I create a new type of > SDNode I need to specify a pattern for it, so that when LLVM is analyzing > the code and is seeing a given pattern it would create this particular > node. I'm really struggling to understand how it is done. So here are the > problems that I'm having. > > > > 1. How do I identify that pattern that should be used? > > 2. How do I specify a given pattern? > > > > Do you (or someone else) mind helping me out? > > > > Any help is appreciated. > > > > On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com> > wrote: > > This is roughly along the lines of x86 hadd* instructions though the > semantics of hadd* may not exactly match what you are looking for. This is > probably more in line with x86/ARM SAD-like instructions but I don’t think > llvm generates SAD without intrinsics. > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail > Shafigulin via llvm-dev > *Sent:* Monday, April 04, 2016 9:34 AM > *To:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* [llvm-dev] sum elements in the vector > > > > My target has an instruction that adds up all elements in the vector and > stores the result in a register. I'm trying to implement it in my compiler > but I'm not sure even where to start. > > > > I did look at other targets, but they don't seem to have anything like it > ( I could be wrong. My experience with LLVM is limited, so if I missed it, > I'd appreciate if someone could point it out ). > > > > My understanding is that if SDNode for such an instruction doesn't exist I > have to define one. Unfortunately, I don't know how to do it. I don't even > know where to start looking. Would someone care to point me in the right > direction? > > > > Any help is appreciated. > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160516/ed467efd/attachment.html>
On Mon, May 16, 2016 at 3:11 AM, Martin J. O'Riordan via llvm-dev < llvm-dev at lists.llvm.org> wrote:> This would be really cool. We have several instructions that perform > horizontal vector operations, and have to use built-ins to select them as > there is no easy way of expressing them in a TD file. Some like SUM for a ‘ > v4i32’ are easy enough to express with a pattern fragment, >Do you mind sharing how to do it with a pattern fragment? I'm not new to TD files but all the work I've done was very simple.> SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ > resulted in TableGen disappearing into itself for hours trying to reduce > the patterns before I gave up and cancelled it. > > > > If there were ISD nodes for these, then it would be far simpler to express > in TableGen, and also, the pattern fragments only match a very specific > form of IR to the desired instruction. > > > > The horizontal operations are particularly useful for finalising a > vectorised operation - for example I may want to compute the scalar MAX, > MIN or SUM of a large number of items. If the number of items is divisible > by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a > time can be computed using normal vector operation, and then the final > scalar value can be computed using a single horizontal operation. > > > > MartinO > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Chandler > Carruth via llvm-dev > *Sent:* 16 May 2016 2:16 > *To:* Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel > > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > I'm starting to think we should directly implement horizontal operations > on vector types. > > > > My suspicion is that coming up with a nice model for this would help us a > lot with things like: > > - Idiom recognition of reduction patterns that use horizontal arithmetic > > - Ability to use horizontal operations in SLPVectorizer > > - Significantly easier cost modeling of vectorizing loops with reductions > in LoopVectorize > > - Other things I've not thought of? > > > Curious what others think? > > > > -Chandler > > > > On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > why in order to add this particular instruction (sum elements in a > vector) I need to add an insrinsic? > > Adding intrinsic is not the only way, it is one of the way and user > WILL-NOT be required to invoke > > It specifically. > > > > Currently LLVM does not have any instruction to directly represent “sum of > elements in a vector” and > > generate your particular instruction.However, you can do it without > intrinsic by pattern matching the > > LLVM-IRs representing “sum of elements in vector” to your particular > instruction in DAGCombiner. > > > > Regards, > > Shahid > > > > > > *From:* Rail Shafigulin [mailto:rail at esenciatech.com] > *Sent:* Monday, May 09, 2016 11:59 PM > *To:* Shahid, Asghar-ahmad; llvm-dev > *Cc:* Das, Dibyendu > > > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > I'm a little confused. Here is why. > > > > I was able to add a vector add instruction to my target without using any > intrinsics and without adding any new instructions to LLVM. So here is my > question: how come I managed to add a new vector instruction without adding > an intrinsic and why in order to add this particular instruction (sum > elements in a vector) I need to add an insrinsic? > > > > Another question that I have is whether compiler will be able to target > this new instruction (sum elements in a vector) if it is implemented as an > intrinsic or the user will have to specifically invoke an instrinsic. > > > > Pardon if questions seem dumb, I'm still learning things. > > > > Any help is appreciated. > > > > On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com> > wrote: > > Thanks for the reply. These steps will add an instruction as an intrinsic. > Is it possible to add an actual new instruction so that a compiler could > target it during an optimization? How hard is it to do it? Is that a > realistic objective. > > > > Rail > > > > On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad < > Asghar-ahmad.Shahid at amd.com> wrote: > > Hi Rail, > > > > We had done this for generation of X86 PSAD (sum of absolute difference) > instruction through > > Llvm intrinsic. Doing this requires following > > 1. Define an intrinsic, xyz(), for the required instruction and > corresponding SDNode > > 2. Generate the “call xyz() “ IR based the matched pattern > > 3. Map “call xyz()” IR to corresponding SDNode in > SelectionDagBuilder.cpp > > 4. Provide default expansion of the xyz() intrinsic > > 5. Legalize type and/or operation > > 6. Provide Lowering of intrinsic/SDNode to generate your target > instruction > > > > You can visit http://llvm.org/docs/ExtendingLLVM.html for details. > > > > Regards, > > Shahid > > > > > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail > Shafigulin via llvm-dev > *Sent:* Monday, April 04, 2016 11:00 PM > *To:* Das, Dibyendu > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] sum elements in the vector > > > > Thanks for the pointers. I looked at hadd instructions. They seem to do > very similar to what I need. Unfortunately as I said before my LLVM > experience is limited. My understanding is that when I create a new type of > SDNode I need to specify a pattern for it, so that when LLVM is analyzing > the code and is seeing a given pattern it would create this particular > node. I'm really struggling to understand how it is done. So here are the > problems that I'm having. > > > > 1. How do I identify that pattern that should be used? > > 2. How do I specify a given pattern? > > > > Do you (or someone else) mind helping me out? > > > > Any help is appreciated. > > > > On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com> > wrote: > > This is roughly along the lines of x86 hadd* instructions though the > semantics of hadd* may not exactly match what you are looking for. This is > probably more in line with x86/ARM SAD-like instructions but I don’t think > llvm generates SAD without intrinsics. > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail > Shafigulin via llvm-dev > *Sent:* Monday, April 04, 2016 9:34 AM > *To:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* [llvm-dev] sum elements in the vector > > > > My target has an instruction that adds up all elements in the vector and > stores the result in a register. I'm trying to implement it in my compiler > but I'm not sure even where to start. > > > > I did look at other targets, but they don't seem to have anything like it > ( I could be wrong. My experience with LLVM is limited, so if I missed it, > I'd appreciate if someone could point it out ). > > > > My understanding is that if SDNode for such an instruction doesn't exist I > have to define one. Unfortunately, I don't know how to do it. I don't even > know where to start looking. Would someone care to point me in the right > direction? > > > > Any help is appreciated. > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > > > > > -- > > Rail Shafigulin > > Software Engineer > Esencia Technologies > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- Rail Shafigulin Software Engineer Esencia Technologies -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160516/a23d115c/attachment.html>
Martin J. O'Riordan via llvm-dev
2016-May-18 12:56 UTC
[llvm-dev] sum elements in the vector
Hi Rail, We used a very simple pattern expansion (actually, not a pattern fragment). For example, for AND, ADD (horizontal sum), OR and XOR of 4 elements we use something like the following TableGen structure: class HORIZ_Op4<SDNode opc, RegisterClass regVT, ValueType rt, ValueType vt, string asmstr> : SHAVE_Instr<(outs regVT:$dst), (ins VRF128:$src), !strconcat(asmstr, " $dst $src"), [(set regVT:$dst, (opc (rt (vector_extract(vt VRF128:$src), 0 ) ), (opc (rt (vector_extract(vt VRF128:$src), 1 ) ), (opc (rt (vector_extract(vt VRF128:$src), 2 ) ), (rt (vector_extract(vt VRF128:$src), 3 ) ) ) ) ) )]>; This is okay for 4 element vectors, and it will get selected if the programmer writes something like: vec[0] & vec[1] & vec[2] & vec[3] but not with a simple variant like: vec[0] & vec[2] & vec[1] & vec[3] If this was properly represented by an ISD node, the other permutations could be more easily handled through normalisation. We “could” write patterns for each of the permutations, but it is verbose, and in practice most people only write it one way anyway. The 8-lane equivalent has TableGen left thinking for quite a long time, and the 16-lane equivalent seems to hang TableGen. MartinO From: Rail Shafigulin [mailto:rail at esenciatech.com] Sent: 16 May 2016 23:50 To: Martin J. O'Riordan Cc: LLVM Developers Subject: Re: [llvm-dev] sum elements in the vector On Mon, May 16, 2016 at 3:11 AM, Martin J. O'Riordan via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote: This would be really cool. We have several instructions that perform horizontal vector operations, and have to use built-ins to select them as there is no easy way of expressing them in a TD file. Some like SUM for a ‘v4i32’ are easy enough to express with a pattern fragment, Do you mind sharing how to do it with a pattern fragment? I'm not new to TD files but all the work I've done was very simple. SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in TableGen disappearing into itself for hours trying to reduce the patterns before I gave up and cancelled it. If there were ISD nodes for these, then it would be far simpler to express in TableGen, and also, the pattern fragments only match a very specific form of IR to the desired instruction. The horizontal operations are particularly useful for finalising a vectorised operation - for example I may want to compute the scalar MAX, MIN or SUM of a large number of items. If the number of items is divisible by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a time can be computed using normal vector operation, and then the final scalar value can be computed using a single horizontal operation. MartinO From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf Of Chandler Carruth via llvm-dev Sent: 16 May 2016 2:16 To: Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel Subject: Re: [llvm-dev] sum elements in the vector I'm starting to think we should directly implement horizontal operations on vector types. My suspicion is that coming up with a nice model for this would help us a lot with things like: - Idiom recognition of reduction patterns that use horizontal arithmetic - Ability to use horizontal operations in SLPVectorizer - Significantly easier cost modeling of vectorizing loops with reductions in LoopVectorize - Other things I've not thought of? Curious what others think? -Chandler On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:> why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic?Adding intrinsic is not the only way, it is one of the way and user WILL-NOT be required to invoke It specifically. Currently LLVM does not have any instruction to directly represent “sum of elements in a vector” and generate your particular instruction.However, you can do it without intrinsic by pattern matching the LLVM-IRs representing “sum of elements in vector” to your particular instruction in DAGCombiner. Regards, Shahid From: Rail Shafigulin [mailto:rail at esenciatech.com <mailto:rail at esenciatech.com> ] Sent: Monday, May 09, 2016 11:59 PM To: Shahid, Asghar-ahmad; llvm-dev Cc: Das, Dibyendu Subject: Re: [llvm-dev] sum elements in the vector I'm a little confused. Here is why. I was able to add a vector add instruction to my target without using any intrinsics and without adding any new instructions to LLVM. So here is my question: how come I managed to add a new vector instruction without adding an intrinsic and why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic? Another question that I have is whether compiler will be able to target this new instruction (sum elements in a vector) if it is implemented as an intrinsic or the user will have to specifically invoke an instrinsic. Pardon if questions seem dumb, I'm still learning things. Any help is appreciated. On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com <mailto:rail at esenciatech.com> > wrote: Thanks for the reply. These steps will add an instruction as an intrinsic. Is it possible to add an actual new instruction so that a compiler could target it during an optimization? How hard is it to do it? Is that a realistic objective. Rail On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com <mailto:Asghar-ahmad.Shahid at amd.com> > wrote: Hi Rail, We had done this for generation of X86 PSAD (sum of absolute difference) instruction through Llvm intrinsic. Doing this requires following 1. Define an intrinsic, xyz(), for the required instruction and corresponding SDNode 2. Generate the “call xyz() “ IR based the matched pattern 3. Map “call xyz()” IR to corresponding SDNode in SelectionDagBuilder.cpp 4. Provide default expansion of the xyz() intrinsic 5. Legalize type and/or operation 6. Provide Lowering of intrinsic/SDNode to generate your target instruction You can visit http://llvm.org/docs/ExtendingLLVM.html for details. Regards, Shahid From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 11:00 PM To: Das, Dibyendu Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] sum elements in the vector Thanks for the pointers. I looked at hadd instructions. They seem to do very similar to what I need. Unfortunately as I said before my LLVM experience is limited. My understanding is that when I create a new type of SDNode I need to specify a pattern for it, so that when LLVM is analyzing the code and is seeing a given pattern it would create this particular node. I'm really struggling to understand how it is done. So here are the problems that I'm having. 1. How do I identify that pattern that should be used? 2. How do I specify a given pattern? Do you (or someone else) mind helping me out? Any help is appreciated. On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com <mailto:Dibyendu.Das at amd.com> > wrote: This is roughly along the lines of x86 hadd* instructions though the semantics of hadd* may not exactly match what you are looking for. This is probably more in line with x86/ARM SAD-like instructions but I don’t think llvm generates SAD without intrinsics. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf Of Rail Shafigulin via llvm-dev Sent: Monday, April 04, 2016 9:34 AM To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > Subject: [llvm-dev] sum elements in the vector My target has an instruction that adds up all elements in the vector and stores the result in a register. I'm trying to implement it in my compiler but I'm not sure even where to start. I did look at other targets, but they don't seem to have anything like it ( I could be wrong. My experience with LLVM is limited, so if I missed it, I'd appreciate if someone could point it out ). My understanding is that if SDNode for such an instruction doesn't exist I have to define one. Unfortunately, I don't know how to do it. I don't even know where to start looking. Would someone care to point me in the right direction? Any help is appreciated. -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies -- Rail Shafigulin Software Engineer Esencia Technologies _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Rail Shafigulin Software Engineer Esencia Technologies -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/842d67ac/attachment.html>