thr3ads.net - llvm dev - [llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.) [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Eli Friedman via llvm-dev

2020-Jan-30 00:48 UTC

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Currently, for scalable vectors, only splat shuffles are allowed; we're
considering allowing more different kinds of shuffles.  The issue is,
essentially, that a shuffle mask is a simple list of integers, and that
isn't enough to express a scalable operation.  For example, concatenating
two fixed-length vectors currently looks like this:

shufflevector <2 x i32> %v1, <2 x i32> %v2, <4 x i32> <i32
0, i32 1, i32 2, i32 3>

(Note that despite the syntax, the mask is really just a list of constant
integers, not a general constant expression.)

There isn't any obvious way to extend this to variable shuffles. The mask
would need to have "vscale * N" elements, and it's impossible to
write out all the elements of a list of unknown size; it would need to be
generated with some sort of formula.


The motivation here is to express various common patterns that are useful for
vectorization and lowering SVE intrinsics, in a target-independent manner:

1.Unzipping/unpacking interleaved data
2.Zipping/packing interleaved data
3.Reversing a vector
4.Concatenating two vectors, or splitting a vector into halves.

This isn't a comprehensive list of all shuffles we might want, but it's
a reasonable starting point.  The way the proposal is written, it should be easy
to extend if we add more shuffles.


Proposed IR syntax:

%result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32>
%v2, SHUFFLE_NAME

SHUFFLE_NAME can be one of the following (with examples of the equivalent <4
x i32> shuffles):
splat - Splat element 0 of the first operand. (<0, 0, 0, 0>)
reverse - Reverse the elements of the first operand (<3, 2, 1, 0>)
concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>)
split_low - Return the low half of the first operand (<0, 1>)
split_high - Return the high half of the first operand (<2, 3>)
zip_low - Zip together the low halves of the two operands (<0, 4, 1, 5>)
zip_high - Zip together the high halves of the two operands (<2, 6, 3, 7>)
unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>)
unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)

On SVE targets, all of these shuffles can be lowered to a single instruction for
vector types which fit in a single register.  I expect that other scalable
vector instruction sets will also support these operations, since they're
important for many use cases.  (If we end up in some weird situation where
it's necessary, we could expand a shuffle into an
extractelement/insertelement loop, similar to what we do for unsupported
fixed-length shuffles. But the vectorizer would probably avoid generating
unsupported shuffles, so it's unlikely to come up in practice.)

In C++, I expect to represent this list as an enum, and then wrap up the entire
description of a fixed or scalable shuffle into a class "ShuffleMask".
This would allow checking whether an operation matches one of the above
patterns, and can be converted to the existing ArrayRef<int> for fixed
shuffles.  ShuffleVectorInst::getShuffleMask would then return a ShuffleMask, I
think. Then we could add an alternate API getFixedShuffleMask() that only works
for fixed shuffles, and just returns the fixed mask as an ArrayRef<int>.

I'm working on refactoring the existing shufflevector handling in LLVM to
allow making these changes.  See https://reviews.llvm.org/D72467 .  I
haven't tried implementing this proposal yet, though.


Alternatives:

Instead of extending shufflevector, we could introduce a dedicated intrinsic for
each common shuffle.  This is less readable, and makes it harder to leverage
existing code that reasons about shuffles.  But it would mean fewer changes to
existing code.

We could make shufflevector take a variable shuffle mask operand (any Value*). 
This has been proposed before.  This introduces a lot of complexity: the general
case is hard to lower on most targets, and it becomes harder to pattern-match
the special cases we actually care about.  (It's possible we want to expose
this as a target-independent intrinsic at some point, but that wouldn't
really serve as a substitute for what's described here.)

We could come up with some way to represent a formula for generating a shuffle
mask, instead of only allowing specific, known, shuffles.  I'm not sure how
to do that in a way that's both straightforward to reason about, and covers
all the cases we care about.  Or also along these lines, we could specify a
shuffle mask as a tree  of operations: for example, "zip(first_vector,
reverse(second_vector))".

We can add more shuffles to the list.  There are a few SVE shuffle instructions
which are not equivalent to any of the basic operations I've listed: ext and
trn. And it's possible to construct other shuffles out of sequences of
multiple instructions.  (Some of the additional shuffles we could specify would
require an integer parameter, or an integer parameter list; it should be
possible to support that without any major changes to the proposal.)

-Eli

Nicolai Hähnle via llvm-dev

2020-Jan-30 08:21 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Hi Eli,

On Thu, Jan 30, 2020 at 1:48 AM Eli Friedman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Proposed IR syntax:
>
> %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME
>
> SHUFFLE_NAME can be one of the following (with examples of the equivalent
<4 x i32> shuffles):
> splat - Splat element 0 of the first operand. (<0, 0, 0, 0>)
> reverse - Reverse the elements of the first operand (<3, 2, 1, 0>)
> concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>)
> split_low - Return the low half of the first operand (<0, 1>)
> split_high - Return the high half of the first operand (<2, 3>)
> zip_low - Zip together the low halves of the two operands (<0, 4, 1,
5>)
> zip_high - Zip together the high halves of the two operands (<2, 6, 3,
7>)
> unzip_even - Unzip the even elements of the two operands (<0, 2, 4,
6>)
> unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)
This fixed list of shuffles makes me uncomfortable, and I wonder if
there isn't a much simpler solution to the problem. Specifically,
allow the IR form:

%result = shufflevector <vscale x n x TY> %v1, <vscale x n x TY>
%v2,
<m x i32> <mask>

yielding a result of type <vscale x m x TY>. (The <mask> could still
just be a constant list of integers, i.e. this doesn't require a
relaxation to arbitrary Value* masks.)

I believe that all the shuffle names you listed above could fit into
that scheme, simply by plugging in the mask that you've already
written in that list. In other words, the mask isn't vscale, is still
represent as just a straightforward list of constant, and it is
applied equally to each "vscale part" of the operand vectors. This is
conceptually analogous to how `select` on vectors can have either an
<n x i1> selection operand to operate as a fully vector select, or an
i1 selection operand that is applied equally to each part of the
operand vectors.

Maybe I'm misunderstanding the intended semantics of your list of
shuffles, in which case, please enlighten me :)

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Renato Golin via llvm-dev

2020-Jan-30 08:56 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

On Thu, 30 Jan 2020 at 08:22, Nicolai Hähnle via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> This fixed list of shuffles makes me uncomfortable, and I wonder if
> there isn't a much simpler solution to the problem. Specifically,
> allow the IR form:
>
> %result = shufflevector <vscale x n x TY> %v1, <vscale x n x
TY> %v2,
> <m x i32> <mask>
>
> yielding a result of type <vscale x m x TY>. (The <mask> could
still
> just be a constant list of integers, i.e. this doesn't require a
> relaxation to arbitrary Value* masks.)
Back when Arm was proposing the SVE extensions, another proposal was
to use SCEV like patterns.

I think they're all valid proposals, but implementation wise, they can
get really complicated.

Masks and expressions will have to be pattern-matched against
target-specific instructions when lowering, and if we're not going to
be generating the most unusual patterns to begin with
(front-end/middle-end creating them), then I think a fixed list to
begin with is quite sensible.

For example, I wouldn't want to let the vectoriser generate any random
pattern in the shuffle if I know that there is no valid instruction in
the back-end that can cope with that, and I'll end up with
under-performing code.

A fixed list will also allow us to build the infrastructure first,
with one less problem to handle. After we're sure it works well, we
can extend to different patterns.

For example, we can add to SHUFFLE-NAME things like "mask" and
"scev"
and "expr", and then add whatever to the end of it. In those edge
cases, it will be up to the back-end to legalise that in a meaningful,
and hopefully performing, way.

cheers,
--renato

Richard Sandiford via llvm-dev

2020-Jan-30 11:35 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org>
writes:> [...]
> We can add more shuffles to the list.  There are a few SVE shuffle
> instructions which are not equivalent to any of the basic operations
> I've listed: ext and trn.
I realise this wasn't supposed to be an exhaustive list, but FWIW:
revb, revh and revw are also useful for reversing each sequence
of N elements (as opposed to reversing the whole vector).

Thanks,
Richard
> And it's possible to construct other shuffles out of sequences of
> multiple instructions.  (Some of the additional shuffles we could
> specify would require an integer parameter, or an integer parameter
> list; it should be possible to support that without any major changes
> to the proposal.)
>
> -Eli
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Eli Friedman via llvm-dev

2020-Jan-30 21:39 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford at arm.com>
> Sent: Thursday, January 30, 2020 3:36 AM
> To: Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org>
> Cc: Eli Friedman <efriedma at quicinc.com>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale
vectors
> (SVE etc.)
>
> Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > [...]
> > We can add more shuffles to the list.  There are a few SVE shuffle
> > instructions which are not equivalent to any of the basic operations
> > I've listed: ext and trn.
>
> I realise this wasn't supposed to be an exhaustive list, but FWIW:
> revb, revh and revw are also useful for reversing each sequence
> of N elements (as opposed to reversing the whole vector).
revb is technically already exposed as a target-independent intrinsic
(llvm.bswap), but yes, I missed those.

-Eli

Chris Lattner via llvm-dev

2020-Feb-06 00:01 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Currently, for scalable vectors, only splat shuffles are allowed; we're
considering allowing more different kinds of shuffles.  The issue is,
essentially, that a shuffle mask is a simple list of integers, and that
isn't enough to express a scalable operation.  For example, concatenating
two fixed-length vectors currently looks like this:
> 
> Proposed IR syntax:
> 
> %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME
> 
> Alternatives:
> 
> Instead of extending shufflevector, we could introduce a dedicated
intrinsic for each common shuffle.  This is less readable, and makes it harder
to leverage existing code that reasons about shuffles.  But it would mean fewer
changes to existing code.
Hi Eli,

Did you consider a design point between these two extremes?  You could introduce
one new instruction, something like “fixed shuffle vector” that takes two
vectors and an enum.  That would keep the structurally (and representationally)
different cases as separate instructions, without creating a new instruction per
fixed shuffle kind.

Relatedly, how do you foresee canonicalization (in instcombine and inst
selection) working for these?  If not for compatibility, it would make sense to
canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would
probably introduce a bunch of regressions for various targets.

-Chris

Evandro Menezes via llvm-dev

2020-Feb-06 15:51 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> On Feb 5, 2020, at 18:01, Chris Lattner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> Currently, for scalable vectors, only splat shuffles are allowed;
we're considering allowing more different kinds of shuffles.  The issue is,
essentially, that a shuffle mask is a simple list of integers, and that
isn't enough to express a scalable operation.  For example, concatenating
two fixed-length vectors currently looks like this:
>> 
>> Proposed IR syntax:
>> 
>> %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME
>> 
>> Alternatives:
>> 
>> Instead of extending shufflevector, we could introduce a dedicated
intrinsic for each common shuffle.  This is less readable, and makes it harder
to leverage existing code that reasons about shuffles.  But it would mean fewer
changes to existing code.
> 
> Hi Eli,
> 
> Did you consider a design point between these two extremes?  You could
introduce one new instruction, something like “fixed shuffle vector” that takes
two vectors and an enum.  That would keep the structurally (and
representationally) different cases as separate instructions, without creating a
new instruction per fixed shuffle kind.
> 
> Relatedly, how do you foresee canonicalization (in instcombine and inst
selection) working for these?  If not for compatibility, it would make sense to
canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would
probably introduce a bunch of regressions for various targets.
> 
> -Chris
This is a fair point.  It seems to me that the motivation is to make the
shuffling of vectors generic enough to support both fixed and scalable vectors,
which is a sensible and elegant approach.  However, it may wreak havoc
throughout parts of the source base and/or regress several targets until the
offending code is identified and properly massaged, which would take
considerable work.  Or not.  Though the risk is likely, it would be interesting
to get at least a rough assessment of the damage, which can be achieved with a
rough prototyping of this approach.

Using intrinsics has its merits too, especially in its simpler form suggested by
Chris.  Though it would be less generic and repeating the existing shuffles for
fixed length, it would probably be far less risky, particularly for being less
invasive.  If anything, it would be easier to prototype with this approach.

If my assessment is near the mark, then, strategically, the latter approach
seems to allow making some progress sooner.  Once everyone gets a better
understanding of shuffling scalable vectors, then both approaches could be
reconsidered again, but informed by this preliminary implementation.

Back to y'all.

__ 
Evandro Menezes ◊ SiFive ◊ Austin, TX

Eli Friedman via llvm-dev

2020-Feb-07 20:39 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Wednesday, February 5, 2020 4:02 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale
vectors
> (SVE etc.)
>
> On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-
> dev at lists.llvm.org> wrote:
> >
> > Currently, for scalable vectors, only splat shuffles are allowed;
we're
> considering allowing more different kinds of shuffles.  The issue is,
> essentially, that a shuffle mask is a simple list of integers, and that
isn't
> enough to express a scalable operation.  For example, concatenating two
> fixed-length vectors currently looks like this:
> >
> > Proposed IR syntax:
> >
> > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2,
> SHUFFLE_NAME
> >
> > Alternatives:
> >
> > Instead of extending shufflevector, we could introduce a dedicated
> intrinsic for each common shuffle.  This is less readable, and makes it
harder
> to leverage existing code that reasons about shuffles.  But it would mean
> fewer changes to existing code.
>
> Hi Eli,
>
> Did you consider a design point between these two extremes?  You could
> introduce one new instruction, something like “fixed shuffle vector” that
> takes two vectors and an enum.  That would keep the structurally (and
> representationally) different cases as separate instructions, without
creating
> a new instruction per fixed shuffle kind.
Well, there are sort of two forms of this.  I could add a new instruction, or I
could add a new intrinsic (maybe using a metadata string to specify the
shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means we
don't get shufflevector constant expressions, which are useful for
optimization.

Either way, it's a bunch of extra work if we intend to eventually unify the
two.  I don't see any scenario under which we don't want to eventually
unify them.  The operations I'm adding are semantically the same as the
equivalent fixed-width shuffles; we just can't represent the shuffle masks
the same way.  And I think if we do end up changing the representation of
scalable shufflevectors later, we'll be able to autoupgrade the existing
ones.

I think I can keep the initial patch relatively small if we wrap the abstract
notion of a "ShuffleMask", which is either a fixed shuffle or a named
scalable shuffle, in a C++ class.  And then we can let optimizations that expect
fixed shuffles just convert that to the expected ArrayRef<int>.
> Relatedly, how do you foresee canonicalization (in instcombine and inst
> selection) working for these?  If not for compatibility, it would make
sense to
> canonicalize from shuffle vector to the ‘fixed’ formats, but doing that
would
> probably introduce a bunch of regressions for various targets.
I'm thinking that we don't use the new named shuffles for fixed-width
shuffles at the IR level.  Instead, we add helpers to ShuffleVectorInst that
match either a scalable shuffle, or an equivalent fixed shuffle.  So code that
wants to handle both can pretend they're canonicalized, and code that
handles fixed shuffles won't be disrupted.

In SelectionDAG, shuffles aren't really unified the same way they are in IR.
I think we map onto existing operations where they make sense (CONCAT_VECTORS
and EXTRACT_SUBVECTOR).  For scalable zip/unzip, I haven't really thought
deeply about it; it probably makes sense to change SHUFFLE_VECTOR's shuffle
mask to use ShuffleMask like IR, but that probably doesn't have a big impact
either way.

-Eli

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Feb 2020 - [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Reasonably Related Threads