Eli Friedman via llvm-dev
2020-Jan-30 00:48 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles. The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation. For example, concatenating two fixed-length vectors currently looks like this: shufflevector <2 x i32> %v1, <2 x i32> %v2, <4 x i32> <i32 0, i32 1, i32 2, i32 3> (Note that despite the syntax, the mask is really just a list of constant integers, not a general constant expression.) There isn't any obvious way to extend this to variable shuffles. The mask would need to have "vscale * N" elements, and it's impossible to write out all the elements of a list of unknown size; it would need to be generated with some sort of formula. The motivation here is to express various common patterns that are useful for vectorization and lowering SVE intrinsics, in a target-independent manner: 1.Unzipping/unpacking interleaved data 2.Zipping/packing interleaved data 3.Reversing a vector 4.Concatenating two vectors, or splitting a vector into halves. This isn't a comprehensive list of all shuffles we might want, but it's a reasonable starting point. The way the proposal is written, it should be easy to extend if we add more shuffles. Proposed IR syntax: %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, SHUFFLE_NAME SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles): splat - Splat element 0 of the first operand. (<0, 0, 0, 0>) reverse - Reverse the elements of the first operand (<3, 2, 1, 0>) concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) split_low - Return the low half of the first operand (<0, 1>) split_high - Return the high half of the first operand (<2, 3>) zip_low - Zip together the low halves of the two operands (<0, 4, 1, 5>) zip_high - Zip together the high halves of the two operands (<2, 6, 3, 7>) unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>) unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>) On SVE targets, all of these shuffles can be lowered to a single instruction for vector types which fit in a single register. I expect that other scalable vector instruction sets will also support these operations, since they're important for many use cases. (If we end up in some weird situation where it's necessary, we could expand a shuffle into an extractelement/insertelement loop, similar to what we do for unsupported fixed-length shuffles. But the vectorizer would probably avoid generating unsupported shuffles, so it's unlikely to come up in practice.) In C++, I expect to represent this list as an enum, and then wrap up the entire description of a fixed or scalable shuffle into a class "ShuffleMask". This would allow checking whether an operation matches one of the above patterns, and can be converted to the existing ArrayRef<int> for fixed shuffles. ShuffleVectorInst::getShuffleMask would then return a ShuffleMask, I think. Then we could add an alternate API getFixedShuffleMask() that only works for fixed shuffles, and just returns the fixed mask as an ArrayRef<int>. I'm working on refactoring the existing shufflevector handling in LLVM to allow making these changes. See https://reviews.llvm.org/D72467 . I haven't tried implementing this proposal yet, though. Alternatives: Instead of extending shufflevector, we could introduce a dedicated intrinsic for each common shuffle. This is less readable, and makes it harder to leverage existing code that reasons about shuffles. But it would mean fewer changes to existing code. We could make shufflevector take a variable shuffle mask operand (any Value*). This has been proposed before. This introduces a lot of complexity: the general case is hard to lower on most targets, and it becomes harder to pattern-match the special cases we actually care about. (It's possible we want to expose this as a target-independent intrinsic at some point, but that wouldn't really serve as a substitute for what's described here.) We could come up with some way to represent a formula for generating a shuffle mask, instead of only allowing specific, known, shuffles. I'm not sure how to do that in a way that's both straightforward to reason about, and covers all the cases we care about. Or also along these lines, we could specify a shuffle mask as a tree of operations: for example, "zip(first_vector, reverse(second_vector))". We can add more shuffles to the list. There are a few SVE shuffle instructions which are not equivalent to any of the basic operations I've listed: ext and trn. And it's possible to construct other shuffles out of sequences of multiple instructions. (Some of the additional shuffles we could specify would require an integer parameter, or an integer parameter list; it should be possible to support that without any major changes to the proposal.) -Eli
Nicolai Hähnle via llvm-dev
2020-Jan-30 08:21 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
Hi Eli, On Thu, Jan 30, 2020 at 1:48 AM Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Proposed IR syntax: > > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, SHUFFLE_NAME > > SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles): > splat - Splat element 0 of the first operand. (<0, 0, 0, 0>) > reverse - Reverse the elements of the first operand (<3, 2, 1, 0>) > concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) > split_low - Return the low half of the first operand (<0, 1>) > split_high - Return the high half of the first operand (<2, 3>) > zip_low - Zip together the low halves of the two operands (<0, 4, 1, 5>) > zip_high - Zip together the high halves of the two operands (<2, 6, 3, 7>) > unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>) > unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)This fixed list of shuffles makes me uncomfortable, and I wonder if there isn't a much simpler solution to the problem. Specifically, allow the IR form: %result = shufflevector <vscale x n x TY> %v1, <vscale x n x TY> %v2, <m x i32> <mask> yielding a result of type <vscale x m x TY>. (The <mask> could still just be a constant list of integers, i.e. this doesn't require a relaxation to arbitrary Value* masks.) I believe that all the shuffle names you listed above could fit into that scheme, simply by plugging in the mask that you've already written in that list. In other words, the mask isn't vscale, is still represent as just a straightforward list of constant, and it is applied equally to each "vscale part" of the operand vectors. This is conceptually analogous to how `select` on vectors can have either an <n x i1> selection operand to operate as a fully vector select, or an i1 selection operand that is applied equally to each part of the operand vectors. Maybe I'm misunderstanding the intended semantics of your list of shuffles, in which case, please enlighten me :) Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
Renato Golin via llvm-dev
2020-Jan-30 08:56 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
On Thu, 30 Jan 2020 at 08:22, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote:> This fixed list of shuffles makes me uncomfortable, and I wonder if > there isn't a much simpler solution to the problem. Specifically, > allow the IR form: > > %result = shufflevector <vscale x n x TY> %v1, <vscale x n x TY> %v2, > <m x i32> <mask> > > yielding a result of type <vscale x m x TY>. (The <mask> could still > just be a constant list of integers, i.e. this doesn't require a > relaxation to arbitrary Value* masks.)Back when Arm was proposing the SVE extensions, another proposal was to use SCEV like patterns. I think they're all valid proposals, but implementation wise, they can get really complicated. Masks and expressions will have to be pattern-matched against target-specific instructions when lowering, and if we're not going to be generating the most unusual patterns to begin with (front-end/middle-end creating them), then I think a fixed list to begin with is quite sensible. For example, I wouldn't want to let the vectoriser generate any random pattern in the shuffle if I know that there is no valid instruction in the back-end that can cope with that, and I'll end up with under-performing code. A fixed list will also allow us to build the infrastructure first, with one less problem to handle. After we're sure it works well, we can extend to different patterns. For example, we can add to SHUFFLE-NAME things like "mask" and "scev" and "expr", and then add whatever to the end of it. In those edge cases, it will be up to the back-end to legalise that in a meaningful, and hopefully performing, way. cheers, --renato
Richard Sandiford via llvm-dev
2020-Jan-30 11:35 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> writes:> [...] > We can add more shuffles to the list. There are a few SVE shuffle > instructions which are not equivalent to any of the basic operations > I've listed: ext and trn.I realise this wasn't supposed to be an exhaustive list, but FWIW: revb, revh and revw are also useful for reversing each sequence of N elements (as opposed to reversing the whole vector). Thanks, Richard> And it's possible to construct other shuffles out of sequences of > multiple instructions. (Some of the additional shuffles we could > specify would require an integer parameter, or an integer parameter > list; it should be possible to support that without any major changes > to the proposal.) > > -Eli > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Eli Friedman via llvm-dev
2020-Jan-30 21:39 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
> -----Original Message----- > From: Richard Sandiford <richard.sandiford at arm.com> > Sent: Thursday, January 30, 2020 3:36 AM > To: Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> > Cc: Eli Friedman <efriedma at quicinc.com> > Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors > (SVE etc.) > > Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> writes: > > [...] > > We can add more shuffles to the list. There are a few SVE shuffle > > instructions which are not equivalent to any of the basic operations > > I've listed: ext and trn. > > I realise this wasn't supposed to be an exhaustive list, but FWIW: > revb, revh and revw are also useful for reversing each sequence > of N elements (as opposed to reversing the whole vector).revb is technically already exposed as a target-independent intrinsic (llvm.bswap), but yes, I missed those. -Eli
Chris Lattner via llvm-dev
2020-Feb-06 00:01 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles. The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation. For example, concatenating two fixed-length vectors currently looks like this: > > Proposed IR syntax: > > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, SHUFFLE_NAME > > Alternatives: > > Instead of extending shufflevector, we could introduce a dedicated intrinsic for each common shuffle. This is less readable, and makes it harder to leverage existing code that reasons about shuffles. But it would mean fewer changes to existing code.Hi Eli, Did you consider a design point between these two extremes? You could introduce one new instruction, something like “fixed shuffle vector” that takes two vectors and an enum. That would keep the structurally (and representationally) different cases as separate instructions, without creating a new instruction per fixed shuffle kind. Relatedly, how do you foresee canonicalization (in instcombine and inst selection) working for these? If not for compatibility, it would make sense to canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would probably introduce a bunch of regressions for various targets. -Chris
Evandro Menezes via llvm-dev
2020-Feb-06 15:51 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
> On Feb 5, 2020, at 18:01, Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles. The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation. For example, concatenating two fixed-length vectors currently looks like this: >> >> Proposed IR syntax: >> >> %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, SHUFFLE_NAME >> >> Alternatives: >> >> Instead of extending shufflevector, we could introduce a dedicated intrinsic for each common shuffle. This is less readable, and makes it harder to leverage existing code that reasons about shuffles. But it would mean fewer changes to existing code. > > Hi Eli, > > Did you consider a design point between these two extremes? You could introduce one new instruction, something like “fixed shuffle vector” that takes two vectors and an enum. That would keep the structurally (and representationally) different cases as separate instructions, without creating a new instruction per fixed shuffle kind. > > Relatedly, how do you foresee canonicalization (in instcombine and inst selection) working for these? If not for compatibility, it would make sense to canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would probably introduce a bunch of regressions for various targets. > > -ChrisThis is a fair point. It seems to me that the motivation is to make the shuffling of vectors generic enough to support both fixed and scalable vectors, which is a sensible and elegant approach. However, it may wreak havoc throughout parts of the source base and/or regress several targets until the offending code is identified and properly massaged, which would take considerable work. Or not. Though the risk is likely, it would be interesting to get at least a rough assessment of the damage, which can be achieved with a rough prototyping of this approach. Using intrinsics has its merits too, especially in its simpler form suggested by Chris. Though it would be less generic and repeating the existing shuffles for fixed length, it would probably be far less risky, particularly for being less invasive. If anything, it would be easier to prototype with this approach. If my assessment is near the mark, then, strategically, the latter approach seems to allow making some progress sooner. Once everyone gets a better understanding of shuffling scalable vectors, then both approaches could be reconsidered again, but informed by this preliminary implementation. Back to y'all. __ Evandro Menezes ◊ SiFive ◊ Austin, TX
Eli Friedman via llvm-dev
2020-Feb-07 20:39 UTC
[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)
> -----Original Message----- > From: Chris Lattner <clattner at nondot.org> > Sent: Wednesday, February 5, 2020 4:02 PM > To: Eli Friedman <efriedma at quicinc.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors > (SVE etc.) > > On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm- > dev at lists.llvm.org> wrote: > > > > Currently, for scalable vectors, only splat shuffles are allowed; we're > considering allowing more different kinds of shuffles. The issue is, > essentially, that a shuffle mask is a simple list of integers, and that isn't > enough to express a scalable operation. For example, concatenating two > fixed-length vectors currently looks like this: > > > > Proposed IR syntax: > > > > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, > SHUFFLE_NAME > > > > Alternatives: > > > > Instead of extending shufflevector, we could introduce a dedicated > intrinsic for each common shuffle. This is less readable, and makes it harder > to leverage existing code that reasons about shuffles. But it would mean > fewer changes to existing code. > > Hi Eli, > > Did you consider a design point between these two extremes? You could > introduce one new instruction, something like “fixed shuffle vector” that > takes two vectors and an enum. That would keep the structurally (and > representationally) different cases as separate instructions, without creating > a new instruction per fixed shuffle kind.Well, there are sort of two forms of this. I could add a new instruction, or I could add a new intrinsic (maybe using a metadata string to specify the shuffle). An instruction is a ton of boilerplate. And an intrinsic means we don't get shufflevector constant expressions, which are useful for optimization. Either way, it's a bunch of extra work if we intend to eventually unify the two. I don't see any scenario under which we don't want to eventually unify them. The operations I'm adding are semantically the same as the equivalent fixed-width shuffles; we just can't represent the shuffle masks the same way. And I think if we do end up changing the representation of scalable shufflevectors later, we'll be able to autoupgrade the existing ones. I think I can keep the initial patch relatively small if we wrap the abstract notion of a "ShuffleMask", which is either a fixed shuffle or a named scalable shuffle, in a C++ class. And then we can let optimizations that expect fixed shuffles just convert that to the expected ArrayRef<int>.> Relatedly, how do you foresee canonicalization (in instcombine and inst > selection) working for these? If not for compatibility, it would make sense to > canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would > probably introduce a bunch of regressions for various targets.I'm thinking that we don't use the new named shuffles for fixed-width shuffles at the IR level. Instead, we add helpers to ShuffleVectorInst that match either a scalable shuffle, or an equivalent fixed shuffle. So code that wants to handle both can pretend they're canonicalized, and code that handles fixed shuffles won't be disrupted. In SelectionDAG, shuffles aren't really unified the same way they are in IR. I think we map onto existing operations where they make sense (CONCAT_VECTORS and EXTRACT_SUBVECTOR). For scalable zip/unzip, I haven't really thought deeply about it; it probably makes sense to change SHUFFLE_VECTOR's shuffle mask to use ShuffleMask like IR, but that probably doesn't have a big impact either way. -Eli
Possibly Parallel Threads
- [RFC] Extending shufflevector for vscale vectors (SVE etc.)
- [RFC] Extending shufflevector for vscale vectors (SVE etc.)
- [RFC] Extending shufflevector for vscale vectors (SVE etc.)
- IR canonicalization: shufflevector or vector trunc?
- IR canonicalization: shufflevector or vector trunc?