thr3ads.net - llvm dev - [llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.) [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Eli Friedman via llvm-dev

2020-Feb-07 20:39 UTC

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Wednesday, February 5, 2020 4:02 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale
vectors
> (SVE etc.)
>
> On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-
> dev at lists.llvm.org> wrote:
> >
> > Currently, for scalable vectors, only splat shuffles are allowed;
we're
> considering allowing more different kinds of shuffles.  The issue is,
> essentially, that a shuffle mask is a simple list of integers, and that
isn't
> enough to express a scalable operation.  For example, concatenating two
> fixed-length vectors currently looks like this:
> >
> > Proposed IR syntax:
> >
> > %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2,
> SHUFFLE_NAME
> >
> > Alternatives:
> >
> > Instead of extending shufflevector, we could introduce a dedicated
> intrinsic for each common shuffle.  This is less readable, and makes it
harder
> to leverage existing code that reasons about shuffles.  But it would mean
> fewer changes to existing code.
>
> Hi Eli,
>
> Did you consider a design point between these two extremes?  You could
> introduce one new instruction, something like “fixed shuffle vector” that
> takes two vectors and an enum.  That would keep the structurally (and
> representationally) different cases as separate instructions, without
creating
> a new instruction per fixed shuffle kind.
Well, there are sort of two forms of this.  I could add a new instruction, or I
could add a new intrinsic (maybe using a metadata string to specify the
shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means we
don't get shufflevector constant expressions, which are useful for
optimization.

Either way, it's a bunch of extra work if we intend to eventually unify the
two.  I don't see any scenario under which we don't want to eventually
unify them.  The operations I'm adding are semantically the same as the
equivalent fixed-width shuffles; we just can't represent the shuffle masks
the same way.  And I think if we do end up changing the representation of
scalable shufflevectors later, we'll be able to autoupgrade the existing
ones.

I think I can keep the initial patch relatively small if we wrap the abstract
notion of a "ShuffleMask", which is either a fixed shuffle or a named
scalable shuffle, in a C++ class.  And then we can let optimizations that expect
fixed shuffles just convert that to the expected ArrayRef<int>.
> Relatedly, how do you foresee canonicalization (in instcombine and inst
> selection) working for these?  If not for compatibility, it would make
sense to
> canonicalize from shuffle vector to the ‘fixed’ formats, but doing that
would
> probably introduce a bunch of regressions for various targets.
I'm thinking that we don't use the new named shuffles for fixed-width
shuffles at the IR level.  Instead, we add helpers to ShuffleVectorInst that
match either a scalable shuffle, or an equivalent fixed shuffle.  So code that
wants to handle both can pretend they're canonicalized, and code that
handles fixed shuffles won't be disrupted.

In SelectionDAG, shuffles aren't really unified the same way they are in IR.
I think we map onto existing operations where they make sense (CONCAT_VECTORS
and EXTRACT_SUBVECTOR).  For scalable zip/unzip, I haven't really thought
deeply about it; it probably makes sense to change SHUFFLE_VECTOR's shuffle
mask to use ShuffleMask like IR, but that probably doesn't have a big impact
either way.

-Eli

Chris Lattner via llvm-dev

2020-Feb-07 22:59 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> On Feb 7, 2020, at 12:39 PM, Eli Friedman <efriedma at quicinc.com>
wrote:
>> 
>> Hi Eli,
>> 
>> Did you consider a design point between these two extremes?  You could
>> introduce one new instruction, something like “fixed shuffle vector”
that
>> takes two vectors and an enum.  That would keep the structurally (and
>> representationally) different cases as separate instructions, without
creating
>> a new instruction per fixed shuffle kind.
> 
> Well, there are sort of two forms of this.  I could add a new instruction,
or I could add a new intrinsic (maybe using a metadata string to specify the
shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means we
don't get shufflevector constant expressions, which are useful for
optimization.
Oh yes, I didn’t necessarily mean instruction, an intrinsic would make sense as
well.  What value do use see shuffle vector constant expressions provided?

In my opinion, LLVM constant expressions seem like a bad idea in general, and
I’d rather see them go away over time - as one example “constants that can trap”
often are mishandled, and pattern matching is slower and more complex due to
them.  In my mind, the a better representation would be something that directly
models what we need: global variable initializers can have relocations applied
to them, so they have a very specific grammar of constant expression that
represents this (symbol + constant, and symbol-symbol+constant on macho
targets).  The rest of the constant expr could be dropped, simplifying the
system.
> Either way, it's a bunch of extra work if we intend to eventually unify
the two.  I don't see any scenario under which we don't want to
eventually unify them.  The operations I'm adding are semantically the same
as the equivalent fixed-width shuffles; we just can't represent the shuffle
masks the same way.  And I think if we do end up changing the representation of
scalable shufflevectors later, we'll be able to autoupgrade the existing
ones.
It isn’t obvious to me that these need to be unified: we don’t have to a have a
single operation named “shuffle vector” that does all of the possible element
permutations.  For example, vperm on PPC Altivec supports data dependent shuffle
masks, and merging that into shuffle vector seems like a bad idea.

An alternate design could look like three things (again, ignoring intrinsic vs
instruction):

“Data dependent shuffle” would allow runtime shuffle masks.
  -> “fixed mask shuffle” would require a statically determined shuffle mask.
     -> “well known target-independent shuffle” would support specific
shuffles like zip, even/odd splits, splat, etc.
         -> “target specific shuffles” would be any number of special things
that are shuffle like.

By defining a tower of these, instcombine would canonicalize to the most
specific thing possible, and then targets and other transformations (e.g. at
isel time) would handle the one representation that maps onto the thing their
instruction can do.

The advantage of this approach is that the helper functions (e.g. the methods on
the instruction) can be specific to the use case, e.g. “fixed mask shuffle”
returns an ArrayRef<int> or whatever, and the “well known target
independent shuffle” operation could have a way to turn the enum into an
ArrayRef<int> for the case when you want to expand it out.

Again, to be clear, this is true regardless of whether they are intrinsics or
instructions. If they are instructions, these would be methods, intrinsics would
use global functions that assert on the opcode.

From a historical perspective, the ShuffleVector design was intended to allow
significant mid-level optimizations that merged and transformed shuffle masks. 
In practice though, instcombine (for example) has had to stay very "hands
off" with shuffles in general because it is too easy to produce something
that cannot be efficiently code generated.  If you canonicalize towards “well
known” shuffles, we could improve this, because we can expect targets to
efficiently handle the well known shuffles.

>> Relatedly, how do you foresee canonicalization (in instcombine and inst
>> selection) working for these?  If not for compatibility, it would make
sense to
>> canonicalize from shuffle vector to the ‘fixed’ formats, but doing that
would
>> probably introduce a bunch of regressions for various targets.
> 
> I'm thinking that we don't use the new named shuffles for
fixed-width shuffles at the IR level.
Is this out of conservatism because of the amount of existing code that works on
ShuffleVector?  While variable length vectors will always be special in some
ways, it seems useful to make their representation as consistent with static
length vectors as possible.

Represent even fixed length shuffles with an explicit representation seems like
it would make pattern matching the specific cases easier, makes the IR easier to
read, and provides a single canonical representation for each mask.  The generic
pattern matching stuff (e.g. PatternMatch.h, DAG ISel) could paper over the
representational differences as well.

-Chris

Eli Friedman via llvm-dev

2020-Feb-08 00:42 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Friday, February 7, 2020 3:00 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale
vectors
> (SVE etc.)
>
> > On Feb 7, 2020, at 12:39 PM, Eli Friedman <efriedma at
quicinc.com> wrote:
> >>
> >> Hi Eli,
> >>
> >> Did you consider a design point between these two extremes?  You
could
> >> introduce one new instruction, something like “fixed shuffle
vector” that
> >> takes two vectors and an enum.  That would keep the structurally
(and
> >> representationally) different cases as separate instructions,
without
> creating
> >> a new instruction per fixed shuffle kind.
> >
> > Well, there are sort of two forms of this.  I could add a new
instruction, or I
> could add a new intrinsic (maybe using a metadata string to specify the
> shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means
we
> don't get shufflevector constant expressions, which are useful for
> optimization.
>
> Oh yes, I didn’t necessarily mean instruction, an intrinsic would make
sense
> as well.  What value do use see shuffle vector constant expressions
> provided?
>
> In my opinion, LLVM constant expressions seem like a bad idea in general,
> and I’d rather see them go away over time - as one example “constants that
> can trap” often are mishandled, and pattern matching is slower and more
> complex due to them.  In my mind, the a better representation would be
> something that directly models what we need: global variable initializers
can
> have relocations applied to them, so they have a very specific grammar of
> constant expression that represents this (symbol + constant, and symbol-
> symbol+constant on macho targets).  The rest of the constant expr could be
> dropped, simplifying the system.
I think there are some practical advantages to having constant expressions; in
particular, we get automatic "CSE", which I think ends up being a
practical advantage in some cases.  And there are certain issues involving
SelectionDAG (in particular, cloning a constant into every basic block make
pattern-matching more effective, and reduces register pressure in some cases). 
And some of our cost modeling on IR sort of depends on the fact that constants
are not instructions.  I mean, we don't need constants; at an extreme we
could have a constantint instruction, like GlobalISel's G_CONSTANT.  But
it's not clear that would be helpful for midlevel optimizations.

For scalable vectors in particular, currently, the only way to express a vector
with where every element is a simple integer is using a splat shuffle
expression, and those often fold into some instruction.  Continuing to use
constant expressions for that will make that work more smoothly, I think.  That
could be solved in other ways, though (more fun code for CodeGenPrepare!), so
there isn't a hard requirement.

Some of this is what eventually led to the way we lower llvm.vscale: we have the
intrinsic, but we have a hack in codegenprepare to convert it to a GEP constant
expression.

Granted, there are disadvantages to constant expressions in function bodies: we
end up with more than one way to represent the same operation, it's hard to
remember to keep trapping constants in account, cost modeling can get confused
if it doesn't correctly account for an "expensive" constant.  And
the use of constant expressions in global variables is a complete mess, like you
mentioned.
> > Either way, it's a bunch of extra work if we intend to eventually
unify the
> two.  I don't see any scenario under which we don't want to
eventually unify
> them.  The operations I'm adding are semantically the same as the
equivalent
> fixed-width shuffles; we just can't represent the shuffle masks the
same
> way.  And I think if we do end up changing the representation of scalable
> shufflevectors later, we'll be able to autoupgrade the existing ones.
>
> It isn’t obvious to me that these need to be unified: we don’t have to a
have
> a single operation named “shuffle vector” that does all of the possible
> element permutations.  For example, vperm on PPC Altivec supports data
> dependent shuffle masks, and merging that into shuffle vector seems like a
> bad idea.
>
> An alternate design could look like three things (again, ignoring intrinsic
vs
> instruction):
>
> “Data dependent shuffle” would allow runtime shuffle masks.
>   -> “fixed mask shuffle” would require a statically determined shuffle
mask.
>      -> “well known target-independent shuffle” would support specific
> shuffles like zip, even/odd splits, splat, etc.
>          -> “target specific shuffles” would be any number of special
things that
> are shuffle like.
I'm not sure trying to form a strict hierarchy really works out.  The key
aspect that breaks this is the use of "undef" in shuffle masks.  So
there are actually multiple fixed shuffles of the same width which might count
as a "zip_lower", depending on the context.

I think the distinguishing factor here that means we want to integrate these
shuffles into the shufflevector instruction, vs. other sorts of shuffle-ish
operations, is that the shuffle pattern is known at compile-time.  Given that,
optimizations can reason about the value of various lanes, not just a black box.
Knowing a "vperm" is in fact a shuffle isn't very helpful.  All
you can say is that every output lane is equal to some input lane.  I guess you
could figure out *something* based on that, but it doesn't really make sense
to represent vperm, select, and shufflevector as the same instruction; the
operands are different.

We can force any sort of representation to "work" with enough
refactoring and helper methods, but I think my proposal ends up being the most
straightforward.
> From a historical perspective, the ShuffleVector design was intended to
allow
> significant mid-level optimizations that merged and transformed shuffle
> masks.  In practice though, instcombine (for example) has had to stay very
> "hands off" with shuffles in general because it is too easy to
produce
> something that cannot be efficiently code generated.  If you canonicalize
> towards “well known” shuffles, we could improve this, because we can
> expect targets to efficiently handle the well known shuffles.
We actually do a surprising amount here these days... and we're probably
going to expand it more using target-specific information (see discussion on
https://reviews.llvm.org/D73480 ).
> >> Relatedly, how do you foresee canonicalization (in instcombine and
inst
> >> selection) working for these?  If not for compatibility, it would
make sense
> to
> >> canonicalize from shuffle vector to the ‘fixed’ formats, but doing
that
> would
> >> probably introduce a bunch of regressions for various targets.
> >
> > I'm thinking that we don't use the new named shuffles for
fixed-width
> shuffles at the IR level.
>
> Is this out of conservatism because of the amount of existing code that
> works on ShuffleVector?  While variable length vectors will always be
special
> in some ways, it seems useful to make their representation as consistent
> with static length vectors as possible.
>
> Represent even fixed length shuffles with an explicit representation seems
> like it would make pattern matching the specific cases easier, makes the IR
> easier to read, and provides a single canonical representation for each
mask.
> The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel) could
> paper over the representational differences as well.
The other aspect here is the use of "undef" in fixed shuffles. 
Frontends usually don't do this, but some of our optimizations for shuffles
are built around replacing unused results of shuffles with "undef". 
We don't want to forbid transforming "<0, 4, 1, 5>" ->
"<0, undef, 1, 5>" because we decided the former was the named
shuffle "zip", and we want to allow transforms/analysis for zip-like
shuffles on "<0, undef, 1, 5>".  So I think it ends up being
more straightforward to have a helper for "is this a zip", as opposed
to a helper to transforming a "zip" to a fixed shuffle mask.

Independent of anything we do with scalable vectors, it might make sense in the
IR printer to add some sort of note for known shuffle patterns, so developers
don't have to figure out the meaning of the numerical sequences in IR dumps.

-Eli

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Feb 2020 - [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Reasonably Related Threads