Cameron McInally via llvm-dev
2021-Jan-21 20:16 UTC
[llvm-dev] [RFC] Introduce a new stepvector operation
On Thu, Jan 21, 2021 at 3:02 PM Chris Tetreault <ctetreau at quicinc.com> wrote:> > Just to be clear: I think adding another magic named literal would be a mistake. I think being able to write <n, m, ...> would be great, and would obviate the need for many of the named shuffles in Eli's RFC as well as serve in David's use case. > > Thanks, > Christopher Tetreault > > -----Original Message----- > From: Cameron McInally <cameron.mcinally at nyu.edu> > Sent: Thursday, January 21, 2021 11:39 AM > To: Chris Tetreault <ctetreau at quicinc.com> > Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org > Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector operation > > On Wed, Jan 20, 2021 at 4:33 PM Chris Tetreault via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > David, > > > > > > > > Thanks for writing this up. I’d just like to speak to some concerns I have regarding shufflevector. As many of us know, shufflevector takes two vectors and a constant vector of i32, and does stuff. The constant shuffle mask can be scalable or fixed width. The shuffle mask is supposed to be an arbitrary constant vector, however for scalable vectors only zeroinitializer or undef are accepted. There are reasonable technical reasons for this state of affairs, but it reveals an issue: we don't really handle constant scalable vectors very well. Surely there are other similar issues throughout the codebase, but this is one I struggle with regularly so it sticks out in my mind. > > > > > > > > However, we probably want to be able to use the stepvector in shufflevector. For instance, if we had a stepvector literal, then we could implement vector concatenation in terms of shufflevector: > > > > > > > > %a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32> > > stepvector > > > > > > > > In fact, a lot of useful shuffles can be implemented in terms of stepvector multiplied or added to some constants. Pulling from Eli's list in https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_pipermail_llvm-2Ddev_2020-2DJanuary_138762.html&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=oUBhDwLSPYkEyfN2XuWJkl4sT_RMHdQH_vdinEiQb9E&s=jHVhfm-2eFcDjkeFYlDs9RnjI6Vfx5ZgKADkzhRJQ3I&e= , I can see: > > > > > > > > “ > > > > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> > > %v2, SHUFFLE_NAME > > > > > > > > SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles): > > > > concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) -> > > see above > > > > split_low - Return the low half of the first operand (<0, 1>) -> > > stepvector of type <vscale x n/2 x i32> > > > > split_high - Return the high half of the first operand (<2, 3>) -> > > (stepvector + splat(n/2)) of type <vscale x n/2 x i32> > > > > zip_low - Zip together the low halves of the two operands (<0, 4, > > 1, 5>) > > > > zip_high - Zip together the high halves of the two operands (<2, 6, > > 3, 7>) > > > > unzip_even - Unzip the even elements of the two operands (<0, 2, 4, > > 6>) (stepvector + stepvector) of type <vscale x n x i32> > > > > unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, > > 7>) (stepvector + stepvector + splat(1)) of type <vscale x n x i32> > > > > “ > > > > > > > > Unfortunately, all of these cannot be done because shufflevector only supports scalable undef or zeroinitializer. In order to support these cases, we would need to extend shufflevector to support stepvector (for concat), and arbitrary constant expressions for the rest. Supporting stepvector might not be so hard with the current scheme: if the shuffle is scalable, and the mask is <0, 1, ..., n - 1>, then the input mask was a scalable stepvector. However, I think this illustrates my proposal: vector pattern literals. They could look like this in IR: > > > > > > > > <0, 1, ...> ; stepvector > > > > > > > > This is also more flexible, because it enables lots of other scalable vector literals: > > > > > > > > <7, 7, ...> ; splat(7) without writing that horrid > > insertelement/shufflevector thing > > > > <0, 2, ...> ; unzip_even mask > > > > <1, 3, ...> ; unzip_odd mask > > > > > > > > The implementation for shufflevector would be straightforward because the mask for the two currently supported cases of zeroinitializer and undef (<0, 0, ...> <=> zeroinitializer and <undef, undef, ...> <=> undef) already follow the proposed scheme. This could also have the side benefits of making some IR easier to read (for very wide vectors, the fixed width stepvector could be more than 80 columns wide), and might result in efficiency gains in the compiler (don't need to walk a very wide vector to see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7, ..., 2048> once to ConstantPatternVector(0, 1)). > > > > > > > > Sorry for rambling. I think I’ve personally come around to the idea that a constant would be good. However, a more flexible constant would be best if we’re going to use it to add a bunch of special cases to the codebase. Most special cases for scalable undef and zeroinitializer can be replaced with equivalent code that also handles vector pattern literals. > > > > > > > > Thanks, > > > > Christopher Tetreault > > Glad to hear you've come around, Chris. Named shuffle constants are a good compromise, and I would very much like to see that happen as well.Oh, I misunderstood. If you want to go that route, why not use "vscale" to imply that the constants are also scaled? So something like: <vscale x 2 x i32> <i32 0, i32 2> would imply: <i32 0, i32 2, i32 4, i32 6, ...> I'm not sure how you'd formalize that yet though. And reverse would be a little weird, but I suppose it would be a little weird in your proposal too. Actually, after writing this out, I think named constants would be much more clear. There's no room for misinterpretation if they're named.
Chris Tetreault via llvm-dev
2021-Jan-21 20:40 UTC
[llvm-dev] [RFC] Introduce a new stepvector operation
I think the idea I've proposed is cleaner and more general than just inferring it by the type having vscale. It would be a shame if it were limited to just scalable vectors. I listed a few use cases for fixed width vectors as well. I'm not proposing this as a cure-all for shufflevector, it's just a more general version of the proposed stepvector that is useful in more cases. My personal opinion is that all of these "unary shuffles" should be a different instruction, but that's outside the scope of this RFC. As for it being unclear: complicated code is hard to read in general, so complicated pattern vectors will be as well. But to me, "stepvector" only has meaning if I am familiar with the use case and/or read the docs. <0, 1, ...> is obvious to anybody who's taken any sort of higher math class. Thanks, Christopher Tetreault -----Original Message----- From: Cameron McInally <cameron.mcinally at nyu.edu> Sent: Thursday, January 21, 2021 12:17 PM To: Chris Tetreault <ctetreau at quicinc.com> Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector operation On Thu, Jan 21, 2021 at 3:02 PM Chris Tetreault <ctetreau at quicinc.com> wrote:> > Just to be clear: I think adding another magic named literal would be a mistake. I think being able to write <n, m, ...> would be great, and would obviate the need for many of the named shuffles in Eli's RFC as well as serve in David's use case. > > Thanks, > Christopher Tetreault > > -----Original Message----- > From: Cameron McInally <cameron.mcinally at nyu.edu> > Sent: Thursday, January 21, 2021 11:39 AM > To: Chris Tetreault <ctetreau at quicinc.com> > Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org > Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector > operation > > On Wed, Jan 20, 2021 at 4:33 PM Chris Tetreault via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > David, > > > > > > > > Thanks for writing this up. I’d just like to speak to some concerns I have regarding shufflevector. As many of us know, shufflevector takes two vectors and a constant vector of i32, and does stuff. The constant shuffle mask can be scalable or fixed width. The shuffle mask is supposed to be an arbitrary constant vector, however for scalable vectors only zeroinitializer or undef are accepted. There are reasonable technical reasons for this state of affairs, but it reveals an issue: we don't really handle constant scalable vectors very well. Surely there are other similar issues throughout the codebase, but this is one I struggle with regularly so it sticks out in my mind. > > > > > > > > However, we probably want to be able to use the stepvector in shufflevector. For instance, if we had a stepvector literal, then we could implement vector concatenation in terms of shufflevector: > > > > > > > > %a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32> > > stepvector > > > > > > > > In fact, a lot of useful shuffles can be implemented in terms of stepvector multiplied or added to some constants. Pulling from Eli's list in https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_pipermail_llvm-2Ddev_2020-2DJanuary_138762.html&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=oUBhDwLSPYkEyfN2XuWJkl4sT_RMHdQH_vdinEiQb9E&s=jHVhfm-2eFcDjkeFYlDs9RnjI6Vfx5ZgKADkzhRJQ3I&e= , I can see: > > > > > > > > “ > > > > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x > > i32> %v2, SHUFFLE_NAME > > > > > > > > SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles): > > > > concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) > > -> see above > > > > split_low - Return the low half of the first operand (<0, 1>) -> > > stepvector of type <vscale x n/2 x i32> > > > > split_high - Return the high half of the first operand (<2, 3>) > > -> (stepvector + splat(n/2)) of type <vscale x n/2 x i32> > > > > zip_low - Zip together the low halves of the two operands (<0, 4, > > 1, 5>) > > > > zip_high - Zip together the high halves of the two operands (<2, > > 6, 3, 7>) > > > > unzip_even - Unzip the even elements of the two operands (<0, 2, > > 4, > > 6>) (stepvector + stepvector) of type <vscale x n x i32> > > > > unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, > > 7>) (stepvector + stepvector + splat(1)) of type <vscale x n x i32> > > > > “ > > > > > > > > Unfortunately, all of these cannot be done because shufflevector only supports scalable undef or zeroinitializer. In order to support these cases, we would need to extend shufflevector to support stepvector (for concat), and arbitrary constant expressions for the rest. Supporting stepvector might not be so hard with the current scheme: if the shuffle is scalable, and the mask is <0, 1, ..., n - 1>, then the input mask was a scalable stepvector. However, I think this illustrates my proposal: vector pattern literals. They could look like this in IR: > > > > > > > > <0, 1, ...> ; stepvector > > > > > > > > This is also more flexible, because it enables lots of other scalable vector literals: > > > > > > > > <7, 7, ...> ; splat(7) without writing that horrid > > insertelement/shufflevector thing > > > > <0, 2, ...> ; unzip_even mask > > > > <1, 3, ...> ; unzip_odd mask > > > > > > > > The implementation for shufflevector would be straightforward because the mask for the two currently supported cases of zeroinitializer and undef (<0, 0, ...> <=> zeroinitializer and <undef, undef, ...> <=> undef) already follow the proposed scheme. This could also have the side benefits of making some IR easier to read (for very wide vectors, the fixed width stepvector could be more than 80 columns wide), and might result in efficiency gains in the compiler (don't need to walk a very wide vector to see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7, ..., 2048> once to ConstantPatternVector(0, 1)). > > > > > > > > Sorry for rambling. I think I’ve personally come around to the idea that a constant would be good. However, a more flexible constant would be best if we’re going to use it to add a bunch of special cases to the codebase. Most special cases for scalable undef and zeroinitializer can be replaced with equivalent code that also handles vector pattern literals. > > > > > > > > Thanks, > > > > Christopher Tetreault > > Glad to hear you've come around, Chris. Named shuffle constants are a good compromise, and I would very much like to see that happen as well.Oh, I misunderstood. If you want to go that route, why not use "vscale" to imply that the constants are also scaled? So something like: <vscale x 2 x i32> <i32 0, i32 2> would imply: <i32 0, i32 2, i32 4, i32 6, ...> I'm not sure how you'd formalize that yet though. And reverse would be a little weird, but I suppose it would be a little weird in your proposal too. Actually, after writing this out, I think named constants would be much more clear. There's no room for misinterpretation if they're named.