thr3ads.net - llvm dev - [llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.) [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Eli Friedman via llvm-dev

2020-Feb-08 00:42 UTC

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Friday, February 7, 2020 3:00 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale
vectors
> (SVE etc.)
>
> > On Feb 7, 2020, at 12:39 PM, Eli Friedman <efriedma at
quicinc.com> wrote:
> >>
> >> Hi Eli,
> >>
> >> Did you consider a design point between these two extremes?  You
could
> >> introduce one new instruction, something like “fixed shuffle
vector” that
> >> takes two vectors and an enum.  That would keep the structurally
(and
> >> representationally) different cases as separate instructions,
without
> creating
> >> a new instruction per fixed shuffle kind.
> >
> > Well, there are sort of two forms of this.  I could add a new
instruction, or I
> could add a new intrinsic (maybe using a metadata string to specify the
> shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means
we
> don't get shufflevector constant expressions, which are useful for
> optimization.
>
> Oh yes, I didn’t necessarily mean instruction, an intrinsic would make
sense
> as well.  What value do use see shuffle vector constant expressions
> provided?
>
> In my opinion, LLVM constant expressions seem like a bad idea in general,
> and I’d rather see them go away over time - as one example “constants that
> can trap” often are mishandled, and pattern matching is slower and more
> complex due to them.  In my mind, the a better representation would be
> something that directly models what we need: global variable initializers
can
> have relocations applied to them, so they have a very specific grammar of
> constant expression that represents this (symbol + constant, and symbol-
> symbol+constant on macho targets).  The rest of the constant expr could be
> dropped, simplifying the system.
I think there are some practical advantages to having constant expressions; in
particular, we get automatic "CSE", which I think ends up being a
practical advantage in some cases.  And there are certain issues involving
SelectionDAG (in particular, cloning a constant into every basic block make
pattern-matching more effective, and reduces register pressure in some cases). 
And some of our cost modeling on IR sort of depends on the fact that constants
are not instructions.  I mean, we don't need constants; at an extreme we
could have a constantint instruction, like GlobalISel's G_CONSTANT.  But
it's not clear that would be helpful for midlevel optimizations.

For scalable vectors in particular, currently, the only way to express a vector
with where every element is a simple integer is using a splat shuffle
expression, and those often fold into some instruction.  Continuing to use
constant expressions for that will make that work more smoothly, I think.  That
could be solved in other ways, though (more fun code for CodeGenPrepare!), so
there isn't a hard requirement.

Some of this is what eventually led to the way we lower llvm.vscale: we have the
intrinsic, but we have a hack in codegenprepare to convert it to a GEP constant
expression.

Granted, there are disadvantages to constant expressions in function bodies: we
end up with more than one way to represent the same operation, it's hard to
remember to keep trapping constants in account, cost modeling can get confused
if it doesn't correctly account for an "expensive" constant.  And
the use of constant expressions in global variables is a complete mess, like you
mentioned.
> > Either way, it's a bunch of extra work if we intend to eventually
unify the
> two.  I don't see any scenario under which we don't want to
eventually unify
> them.  The operations I'm adding are semantically the same as the
equivalent
> fixed-width shuffles; we just can't represent the shuffle masks the
same
> way.  And I think if we do end up changing the representation of scalable
> shufflevectors later, we'll be able to autoupgrade the existing ones.
>
> It isn’t obvious to me that these need to be unified: we don’t have to a
have
> a single operation named “shuffle vector” that does all of the possible
> element permutations.  For example, vperm on PPC Altivec supports data
> dependent shuffle masks, and merging that into shuffle vector seems like a
> bad idea.
>
> An alternate design could look like three things (again, ignoring intrinsic
vs
> instruction):
>
> “Data dependent shuffle” would allow runtime shuffle masks.
>   -> “fixed mask shuffle” would require a statically determined shuffle
mask.
>      -> “well known target-independent shuffle” would support specific
> shuffles like zip, even/odd splits, splat, etc.
>          -> “target specific shuffles” would be any number of special
things that
> are shuffle like.
I'm not sure trying to form a strict hierarchy really works out.  The key
aspect that breaks this is the use of "undef" in shuffle masks.  So
there are actually multiple fixed shuffles of the same width which might count
as a "zip_lower", depending on the context.

I think the distinguishing factor here that means we want to integrate these
shuffles into the shufflevector instruction, vs. other sorts of shuffle-ish
operations, is that the shuffle pattern is known at compile-time.  Given that,
optimizations can reason about the value of various lanes, not just a black box.
Knowing a "vperm" is in fact a shuffle isn't very helpful.  All
you can say is that every output lane is equal to some input lane.  I guess you
could figure out *something* based on that, but it doesn't really make sense
to represent vperm, select, and shufflevector as the same instruction; the
operands are different.

We can force any sort of representation to "work" with enough
refactoring and helper methods, but I think my proposal ends up being the most
straightforward.
> From a historical perspective, the ShuffleVector design was intended to
allow
> significant mid-level optimizations that merged and transformed shuffle
> masks.  In practice though, instcombine (for example) has had to stay very
> "hands off" with shuffles in general because it is too easy to
produce
> something that cannot be efficiently code generated.  If you canonicalize
> towards “well known” shuffles, we could improve this, because we can
> expect targets to efficiently handle the well known shuffles.
We actually do a surprising amount here these days... and we're probably
going to expand it more using target-specific information (see discussion on
https://reviews.llvm.org/D73480 ).
> >> Relatedly, how do you foresee canonicalization (in instcombine and
inst
> >> selection) working for these?  If not for compatibility, it would
make sense
> to
> >> canonicalize from shuffle vector to the ‘fixed’ formats, but doing
that
> would
> >> probably introduce a bunch of regressions for various targets.
> >
> > I'm thinking that we don't use the new named shuffles for
fixed-width
> shuffles at the IR level.
>
> Is this out of conservatism because of the amount of existing code that
> works on ShuffleVector?  While variable length vectors will always be
special
> in some ways, it seems useful to make their representation as consistent
> with static length vectors as possible.
>
> Represent even fixed length shuffles with an explicit representation seems
> like it would make pattern matching the specific cases easier, makes the IR
> easier to read, and provides a single canonical representation for each
mask.
> The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel) could
> paper over the representational differences as well.
The other aspect here is the use of "undef" in fixed shuffles. 
Frontends usually don't do this, but some of our optimizations for shuffles
are built around replacing unused results of shuffles with "undef". 
We don't want to forbid transforming "<0, 4, 1, 5>" ->
"<0, undef, 1, 5>" because we decided the former was the named
shuffle "zip", and we want to allow transforms/analysis for zip-like
shuffles on "<0, undef, 1, 5>".  So I think it ends up being
more straightforward to have a helper for "is this a zip", as opposed
to a helper to transforming a "zip" to a fixed shuffle mask.

Independent of anything we do with scalable vectors, it might make sense in the
IR printer to add some sort of note for known shuffle patterns, so developers
don't have to figure out the meaning of the numerical sequences in IR dumps.

-Eli

Chris Lattner via llvm-dev

2020-Feb-08 01:52 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> On Feb 7, 2020, at 4:42 PM, Eli Friedman <efriedma at quicinc.com>
wrote:
>> In my opinion, LLVM constant expressions seem like a bad idea in
general,
>> and I’d rather see them go away over time - as one example “constants
that
>> can trap” often are mishandled, and pattern matching is slower and more
>> complex due to them.  In my mind, the a better representation would be
>> something that directly models what we need: global variable
initializers can
>> have relocations applied to them, so they have a very specific grammar
of
>> constant expression that represents this (symbol + constant, and
symbol-
>> symbol+constant on macho targets).  The rest of the constant expr could
be
>> dropped, simplifying the system.
> 
> I think there are some practical advantages to having constant expressions;
in particular, we get automatic "CSE", which I think ends up being a
practical advantage in some cases.  And there are certain issues involving
SelectionDAG (in particular, cloning a constant into every basic block make
pattern-matching more effective, and reduces register pressure in some cases). 
And some of our cost modeling on IR sort of depends on the fact that constants
are not instructions.  I mean, we don't need constants; at an extreme we
could have a constantint instruction, like GlobalISel's G_CONSTANT.  But
it's not clear that would be helpful for midlevel optimizations.
Sure, but that cuts both ways.  The fact that they end up as unfolded
constantexprs implies that they need to be emitted, and the compiler doesn’t
generally have a cost model to reflect that.

CSE doesn’t happen in general because (as you mention below) undef’s exist in
shuffle masks, and the are not auto-CSE’d.

In the comments above, I was referring to LLVM IR - I agree that SelectionDAG
has its own issues, and that the LLVM IR design doesn’t necessarily translate
over.
> For scalable vectors in particular, currently, the only way to express a
vector with where every element is a simple integer is using a splat shuffle
expression, and those often fold into some instruction.  Continuing to use
constant expressions for that will make that work more smoothly, I think.  That
could be solved in other ways, though (more fun code for CodeGenPrepare!), so
there isn't a hard requirement.
Oh, I see what you’re saying.  The issue here is that the vector length is
modeled as a Constant.  Have I ever mentioned that that was always weird and
non-obvious? :-)

Joking aside, there is a serious problem for getting the cost model correct. 
You are (reasonably) wanting to have “something cheap” be modeled as a
"Constant*”, however a shuffle as an ConstantExpr also admits the cases
where an instruction needs to be dynamically computed.

This conflation is exactly the problem I have with the existing constant
expressions design.  You cannot reasonably write a cost model for this in target
independent code, and in some cases it helps but others it hurts.  It is hugely
problematic to me that we represent things as a Constant* that actually emit
code (this is why ‘constants that can trap’ exist for example).

In my opinion, you’d be much better off by making these be a separate intrinsic
that has no constantexpr, and use the standard CGP (or GlobalIsel) tricks to
duplicate it where necessary on your target.  The reason for this is that these
will need to be executed on some targets, but can be folded on others.  Having
them explicit still allows CSE and other transformations to work on them, which
is important for canonicalization and also for targets where they are not
folded.
> Some of this is what eventually led to the way we lower llvm.vscale: we
have the intrinsic, but we have a hack in codegenprepare to convert it to a GEP
constant expression.
Since you have the GEP trick that works with variable length vectors, why do you
need llvm.vscale anymore at all?
>> An alternate design could look like four things (again, ignoring
intrinsic vs
>> instruction):
>> 
>> “Data dependent shuffle” would allow runtime shuffle masks.
>>  -> “fixed mask shuffle” would require a statically determined
shuffle mask.
>>     -> “well known target-independent shuffle” would support
specific
>> shuffles like zip, even/odd splits, splat, etc.
>>         -> “target specific shuffles” would be any number of special
things that
>> are shuffle like.
> 
> I'm not sure trying to form a strict hierarchy really works out.  The
key aspect that breaks this is the use of "undef" in shuffle masks. 
So there are actually multiple fixed shuffles of the same width which might
count as a "zip_lower", depending on the context.
How does it break it?  It seems reasonable to have an undef mask even for the
‘well known’ shuffles.
> I think the distinguishing factor here that means we want to integrate
these shuffles into the shufflevector instruction, vs. other sorts of
shuffle-ish operations, is that the shuffle pattern is known at compile-time.
I can understand where you’re coming from, but I’m making an argument
independent of variable length vectors.  I’m arguing that the design above is
better *even for static vectors* because the original ShuffleVector bet didn’t
work out for them either.
> Given that, optimizations can reason about the value of various lanes, not
just a black box.
I don’t see how the above argues for black boxes.  Target specific intrinsics
are often black boxes for sure, but I don’t see how "well known
target-independent shuffles” would be black boxes that prevent understanding
lane mappings.
> We can force any sort of representation to "work" with enough
refactoring and helper methods,
Yes, agreed on that!
>> Represent even fixed length shuffles with an explicit representation
seems
>> like it would make pattern matching the specific cases easier, makes
the IR
>> easier to read, and provides a single canonical representation for each
mask.
>> The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel)
could
>> paper over the representational differences as well.
> 
> The other aspect here is the use of "undef" in fixed shuffles. 
Frontends usually don't do this, but some of our optimizations for shuffles
are built around replacing unused results of shuffles with "undef". 
We don't want to forbid transforming "<0, 4, 1, 5>" ->
"<0, undef, 1, 5>" because we decided the former was the named
shuffle "zip", and we want to allow transforms/analysis for zip-like
shuffles on "<0, undef, 1, 5>".  So I think it ends up being
more straightforward to have a helper for "is this a zip", as opposed
to a helper to transforming a "zip" to a fixed shuffle mask.
If you take a step back and look at the structure of the problem we’re working
with, the existing ShuffleVector design is very problematic IMO.  Among other
things, knowing that shuffle output elements are undef but not knowing that
about calls and loads of vectors is pretty weird.  Why don’t we have something
like the !range <http://llvm.org/docs/LangRef.html#range-metadata>
metadata that applies to load/call/shuffle to indicate that the result vector
elements have some unknown elements?  This would be far more general and
correct.

In a world where we have two different “shuffle with a constant mask” and
“shuffle with well known algorithms” operations, we would not implement undef
output elements with “-1 in the shuffle mask”.  Instead, we would have a design
like:

1) Shuffle masks would always have elements that range from 0 to #elements-1,
and the verifier would enforce this. The accessors would return
ArrayRef<unsigned> instead of ArrayRef<int>

2) We would have a unified way (like the !range attribute) to model this, and it
would apply to both shuffles (plus loads and calls).

3) Because this applies to calls, instcombine could infer this property for
target specific intrinsics, the result could be propagated across function
results using IPA etc.
> Independent of anything we do with scalable vectors, it might make sense in
the IR printer to add some sort of note for known shuffle patterns, so
developers don't have to figure out the meaning of the numerical sequences
in IR dumps.
Sure.  +1

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200207/72763c2a/attachment.html>

Sander De Smalen via llvm-dev

2020-Feb-10 15:31 UTC

head link

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

> On 8 Feb 2020, at 01:52, Chris Lattner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> Some of this is what eventually led to the way we lower llvm.vscale: we
have the intrinsic, but we have a hack in codegenprepare to convert it to a GEP
constant expression.
> Since you have the GEP trick that works with variable length vectors, why
do you need llvm.vscale anymore at all?As far as I understood it, it would be dependent on the target's DataLayout
whether the GEP trick would be a valid transformation because of vector
alignment (https://reviews.llvm.org/D71636#1792473).
> If you take a step back and look at the structure of the problem we’re
working with, the existing ShuffleVector design is very problematic IMO.  Among
other things, knowing that shuffle output elements are undef but not knowing
that about calls and loads of vectors is pretty weird.The masked load intrinsics have a predicate and a passthru operand. Is that not
what you mean?
> Why don’t we have something like the !range metadata that applies to
load/call/shuffle to indicate that the result vector elements have some unknown
elements?  This would be far more general and correct.Instead of adding a new attribute, can we use a `select` (or perhaps a new
intrinsic) to express which lanes are undef? That way the same mechanism would
also work for scalable vectors without having to figure out some convoluted way
to describe that in metadata.

At the moment the `select` would be thrown away by InstCombine which folds
`select %p, %v, undef -> %v`. I'm not sure to what extend this is
desirable; it is useful to be folded away for certain purposes but not when you
want to explicitly specify that something can be undef. Perhaps a new intrinsic
would be more appropriate?
>> On Feb 7, 2020, at 4:42 PM, Eli Friedman <efriedma at
quicinc.com> wrote:
>>> The other aspect here is the use of "undef" in fixed
shuffles.  Frontends usually don't do this, but some of our optimizations
for shuffles are built around replacing unused results of shuffles with
"undef".  We don't want to forbid transforming "<0, 4, 1,
5>" -> "<0, undef, 1, 5>" because we decided the
former was the named shuffle "zip", and we want to allow
transforms/analysis for zip-like shuffles on "<0, undef, 1, 5>".
So I think it ends up being more straightforward to have a helper for "is
this a zip", as opposed to a helper to transforming a "zip" to a
fixed shuffle mask.For some shufflemask like `<0, undef, undef, undef>`, I guess this means
that 'isSplat' and 'isZipLo' would both return true?

Just recapping the discussion here for my understanding (please correct me where
I'm wrong):

- The original proposal is to extend `shufflevector` to take either a
'shuffle pattern' or a 'shuffle mask'.
  - For fixed-width shuffles it will retain the mask, in case any of the
elements is 'undef' (so not to lose that information)
  - ShuffleVectorInst is extended with methods to query if something e.g. is a
zip/splat/etc.

- An alternative would be to not unify these into a single `shufflevector`, but
rather have multiple operations.
  - InstCombine can segment shuffles into two exclusive sets of operations:
known shuffle patterns vs unknown pattern with explicit mask
  - To avoid multiple representations for the same shuffle, we want to map to
exclusively either to 'explicit mask' or 'known mask'
operations.
    But in order not to lose information about lanes being undef, we'd then
need to model the 'undef' lanes separate from the instruction.

For either route, does this mean we'll also want to consider supporting
data-dependent shuffle masks? (I don't see why we wouldn't support those
if we go through the lengths of adding support for more types of shuffles).

Eli's proposal seems quite neat and sounds like it would take less effort,
as this is mostly about extending shufflevector rather than changing it. Going
down the path of revisiting shufflevector entirely by removing 'undef'
from the shuffle mask seems like a much bigger change. That's not a reason
not to do it, but if we end up going down that path, I think we should consider
separating concerns and prioritising what's needed for scalable auto-vec
first (either using this proposal or starting off with shuffle intrinsics that
work exclusively for scalable vectors, so that we don't need to worry about
canonicalisation). As long as we eventually end up with a shuffle representation
that puts fixed-width vectors and scalable vectors on somewhat equal footing (or
at least as much as possible).

Thanks,

Sander> On 8 Feb 2020, at 01:52, Chris Lattner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> On Feb 7, 2020, at 4:42 PM, Eli Friedman <efriedma at
quicinc.com> wrote:
>>> In my opinion, LLVM constant expressions seem like a bad idea in
general,
>>> and I’d rather see them go away over time - as one example
“constants that
>>> can trap” often are mishandled, and pattern matching is slower and
more
>>> complex due to them.  In my mind, the a better representation would
be
>>> something that directly models what we need: global variable
initializers can
>>> have relocations applied to them, so they have a very specific
grammar of
>>> constant expression that represents this (symbol + constant, and
symbol-
>>> symbol+constant on macho targets).  The rest of the constant expr
could be
>>> dropped, simplifying the system.
>> 
>> I think there are some practical advantages to having constant
expressions; in particular, we get automatic "CSE", which I think ends
up being a practical advantage in some cases.  And there are certain issues
involving SelectionDAG (in particular, cloning a constant into every basic block
make pattern-matching more effective, and reduces register pressure in some
cases).  And some of our cost modeling on IR sort of depends on the fact that
constants are not instructions.  I mean, we don't need constants; at an
extreme we could have a constantint instruction, like GlobalISel's
G_CONSTANT.  But it's not clear that would be helpful for midlevel
optimizations.
> 
> Sure, but that cuts both ways.  The fact that they end up as unfolded
constantexprs implies that they need to be emitted, and the compiler doesn’t
generally have a cost model to reflect that.
> 
> CSE doesn’t happen in general because (as you mention below) undef’s exist
in shuffle masks, and the are not auto-CSE’d.
> 
> In the comments above, I was referring to LLVM IR - I agree that
SelectionDAG has its own issues, and that the LLVM IR design doesn’t necessarily
translate over.
> 
>> For scalable vectors in particular, currently, the only way to express
a vector with where every element is a simple integer is using a splat shuffle
expression, and those often fold into some instruction.  Continuing to use
constant expressions for that will make that work more smoothly, I think.  That
could be solved in other ways, though (more fun code for CodeGenPrepare!), so
there isn't a hard requirement.
> 
> Oh, I see what you’re saying.  The issue here is that the vector length is
modeled as a Constant.  Have I ever mentioned that that was always weird and
non-obvious? :-)
> 
> Joking aside, there is a serious problem for getting the cost model
correct.  You are (reasonably) wanting to have “something cheap” be modeled as a
"Constant*”, however a shuffle as an ConstantExpr also admits the cases
where an instruction needs to be dynamically computed.
> 
> This conflation is exactly the problem I have with the existing constant
expressions design.  You cannot reasonably write a cost model for this in target
independent code, and in some cases it helps but others it hurts.  It is hugely
problematic to me that we represent things as a Constant* that actually emit
code (this is why ‘constants that can trap’ exist for example).
> 
> In my opinion, you’d be much better off by making these be a separate
intrinsic that has no constantexpr, and use the standard CGP (or GlobalIsel)
tricks to duplicate it where necessary on your target.  The reason for this is
that these will need to be executed on some targets, but can be folded on
others.  Having them explicit still allows CSE and other transformations to work
on them, which is important for canonicalization and also for targets where they
are not folded.
> 
>> Some of this is what eventually led to the way we lower llvm.vscale: we
have the intrinsic, but we have a hack in codegenprepare to convert it to a GEP
constant expression.
> 
> Since you have the GEP trick that works with variable length vectors, why
do you need llvm.vscale anymore at all?
> 
>>> An alternate design could look like four things (again, ignoring
intrinsic vs
>>> instruction):
>>> 
>>> “Data dependent shuffle” would allow runtime shuffle masks.
>>>  -> “fixed mask shuffle” would require a statically determined
shuffle mask.
>>>     -> “well known target-independent shuffle” would support
specific
>>> shuffles like zip, even/odd splits, splat, etc.
>>>         -> “target specific shuffles” would be any number of
special things that
>>> are shuffle like.
>> 
>> I'm not sure trying to form a strict hierarchy really works out. 
The key aspect that breaks this is the use of "undef" in shuffle
masks.  So there are actually multiple fixed shuffles of the same width which
might count as a "zip_lower", depending on the context.
> 
> How does it break it?  It seems reasonable to have an undef mask even for
the ‘well known’ shuffles.
> 
>> I think the distinguishing factor here that means we want to integrate
these shuffles into the shufflevector instruction, vs. other sorts of
shuffle-ish operations, is that the shuffle pattern is known at compile-time.
> 
> I can understand where you’re coming from, but I’m making an argument
independent of variable length vectors.  I’m arguing that the design above is
better *even for static vectors* because the original ShuffleVector bet didn’t
work out for them either.
> 
>> Given that, optimizations can reason about the value of various lanes,
not just a black box.
> 
> I don’t see how the above argues for black boxes.  Target specific
intrinsics are often black boxes for sure, but I don’t see how "well known
target-independent shuffles” would be black boxes that prevent understanding
lane mappings.
> 
>> We can force any sort of representation to "work" with enough
refactoring and helper methods,
> 
> Yes, agreed on that!
> 
>>> Represent even fixed length shuffles with an explicit
representation seems
>>> like it would make pattern matching the specific cases easier,
makes the IR
>>> easier to read, and provides a single canonical representation for
each mask.
>>> The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel)
could
>>> paper over the representational differences as well.
>> 
>> The other aspect here is the use of "undef" in fixed
shuffles.  Frontends usually don't do this, but some of our optimizations
for shuffles are built around replacing unused results of shuffles with
"undef".  We don't want to forbid transforming "<0, 4, 1,
5>" -> "<0, undef, 1, 5>" because we decided the
former was the named shuffle "zip", and we want to allow
transforms/analysis for zip-like shuffles on "<0, undef, 1, 5>".
So I think it ends up being more straightforward to have a helper for "is
this a zip", as opposed to a helper to transforming a "zip" to a
fixed shuffle mask.
> 
> If you take a step back and look at the structure of the problem we’re
working with, the existing ShuffleVector design is very problematic IMO.  Among
other things, knowing that shuffle output elements are undef but not knowing
that about calls and loads of vectors is pretty weird.  Why don’t we have
something like the !range metadata that applies to load/call/shuffle to indicate
that the result vector elements have some unknown elements?  This would be far
more general and correct.
> 
> In a world where we have two different “shuffle with a constant mask” and
“shuffle with well known algorithms” operations, we would not implement undef
output elements with “-1 in the shuffle mask”.  Instead, we would have a design
like:
> 
> 1) Shuffle masks would always have elements that range from 0 to
#elements-1, and the verifier would enforce this. The accessors would return
ArrayRef<unsigned> instead of ArrayRef<int>
> 
> 2) We would have a unified way (like the !range attribute) to model this,
and it would apply to both shuffles (plus loads and calls).
> 
> 3) Because this applies to calls, instcombine could infer this property for
target specific intrinsics, the result could be propagated across function
results using IPA etc.
> 
>> Independent of anything we do with scalable vectors, it might make
sense in the IR printer to add some sort of note for known shuffle patterns, so
developers don't have to figure out the meaning of the numerical sequences
in IR dumps.
> 
> Sure.  +1
> 
> -Chris
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Feb 2020 - [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)