thr3ads.net - llvm dev - [llvm-dev] [RFC] Introduce a new stepvector operation [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Chris Tetreault via llvm-dev

2021-Jan-27 20:04 UTC

[llvm-dev] [RFC] Introduce a new stepvector operation

Paul,

                Really, the two big use cases for scalable vectors are
stepvector and enabling more uses of shufflevector. shufflevector is the one
constant expression that “doesn’t work” for scalable vectors yet. Realistically,
it can never be made to work with arbitrary constant scalable vectors without a
fundamental redesign. Another use case is that we could simplify more constant
expressions if they are written in terms of ConstantStepVector. Using the
original proposed stepvector literal, (here, `splat(1)` is the
insertelement/shufflevector thing; I’m not going to write it out) `splat(1) +
stepvector` is as simplified as it can get, but `splat(1) + <0, 1, …>` can
be simplified to `<1, 2, …>`.

                That said, there are other uses for fixed width vectors. It
would enable writing more compact LLVM IR code, which is useful in test cases
(stepvector of type <64 x i8> would span multiple lines). Additionally,
pattern matching on a stepvector is linear in the width of the vector, but for
ConstantStepVector it will have constant time. These are, admittedly, minor
advantages. However, I think the level of effort in implementing
ConstantStepVector with a start/stride pair vs ConstantStepVector that starts at
zero and increments by one is not that great.

Thanks,
                Christopher Tetreault

From: Paul Walker <Paul.Walker at arm.com>
Sent: Wednesday, January 27, 2021 4:36 AM
To: Chris Tetreault <ctetreau at quicinc.com>; David Sherwood
<David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
Cc: Sander De Smalen <Sander.DeSmalen at arm.com>; Eli Friedman
<efriedma at quicinc.com>; cameron.mcinally at nyu.edu
Subject: [EXT] Re: [RFC] Introduce a new stepvector operation

Hi Chris,

I just wanted to say that I like this idea (i.e. <i32 0, i32 1, …>) very
much but wonder how much of it is reliant on its potential uses by
shufflevector.  I ask because the shufflevector problem has extra pitfalls and
I'm not convinced opening up a handful of extra shuffles (beyond the current
splat support, which I consider a bit of a hack) is that great long term.

Essentially, I'm not a fan of having a nobbled shufflevector for scalable
vectors as compared to fixed length vectors.  Taking your splat example, I would
much prefer a dedicated splat instruction that works for all vector types and be
just as pretty for constant and variable splats, whereas ConstantStepVector only
helps with constant splats.

As I say, I like the idea and so am really just asking if you think there'll
be enough pull to make it happen without tying it to shufflevector?

Paul!!!

From: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Date: Tuesday, 26 January 2021 at 17:34
To: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>, "llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>, Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>, Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>, "cameron.mcinally at
nyu.edu<mailto:cameron.mcinally at nyu.edu>" <cameron.mcinally at
nyu.edu<mailto:cameron.mcinally at nyu.edu>>
Subject: RE: [RFC] Introduce a new stepvector operation

David,

                I am proposing a new C++ class (ConstantStepVector is fine; I
don’t really care what color the bike shed is). In my scheme, there would be no
`stepvector` literal, but you could write <0, 1, …> in IR and get the
stepvector returned by llvm::InnerLoopVectorizer::getStepVector().

                I would expect that you could write these ConstantStepVector
literals directly in statements, so instead of:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>,
%vector.ph ], …

                you would have:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, …>, %vector.ph ], …

                … and it would also allow you to have:

vector.body:
  %vec.ind = phi <vscale x 4 x i32> [ <i32 0, i32 1, …>, %vector.ph
], …

                At some point, I think it would probably make sense to start
canonicalizing to it. As you know, for scalable splats we do this horrible
insertelement/shufflevector thing, that should be replaced by <n, n, …>
(for all splats except zero and undef). Scalable stepvector would have to use
it, but it may make sense to have fixed width stepvector use it. I mentioned
downthread several potential uses for it. For shufflevector, we currently allow
zeroinitializer and undef for scalable shuffle masks. We could allow
ConstantStepVector, and internally in C++ code produce the fixed subvector mask,
and special case it for scalable shuffles (that code is already a bit gross).

                Overall, I think having a constant literal is useful. However,
if we’re going to add complexity to the language, let’s do it in the most
flexible way possible. It would be silly to have the stepvector literal, and
then decide to also have ConstantStepVector at some point in the future.

Thanks,
                Christopher Tetreault


From: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>
Sent: Tuesday, January 26, 2021 8:15 AM
To: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>; cameron.mcinally at nyu.edu<mailto:cameron.mcinally at
nyu.edu>
Subject: [EXT] RE: [RFC] Introduce a new stepvector operation

Hi Chris,

I’ve been following the discussions between you and Cameron and just so I
understand
things correctly it looks like you’re proposing a new way of writing vector
literals in IR
where the start and stride are determined from the first two values, i.e. <1,
3, …>
has a start of 1 and stride of 2. I quite like the idea of a vector literal, but
are you
proposing that internally in C++ we have some sort of ConstantStepVector that
contains
a start/stride pair? Or when parsing this vector literal do you expect us to
internally
represent this as something like:

splat(1) + 2 * stepvector

where stepvector would be fixed as <0, 1, 2, 3, …>?

When emitting LLVM IR with something like “clang -S -emit-llvm” would you also
expect to see these vector literals being emitted too? I’m just thinking about
how
this would look in vectorised code, for example normally when vectorising an
induction variable we’d see something like this:

vector.ph:
…
vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>,
%vector.ph ], …

Are you suggesting the vector literal would be created in it’s own statement
in the preheader, or directly as a constant like before:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, …>, %vector.ph ], …


Kind Regards,
David.

From: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Sent: 20 January 2021 21:33
To: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>
Subject: RE: [RFC] Introduce a new stepvector operation

David,

   Thanks for writing this up. I’d just like to speak to some concerns I have
regarding shufflevector. As many of us know, shufflevector takes two vectors and
a constant vector of i32, and does stuff. The constant shuffle mask can be
scalable or fixed width. The shuffle mask is supposed to be an arbitrary
constant vector, however for scalable vectors only zeroinitializer or undef are
accepted. There are reasonable technical reasons for this state of affairs, but
it reveals an issue: we don't really handle constant scalable vectors very
well. Surely there are other similar issues throughout the codebase, but this is
one I struggle with regularly so it sticks out in my mind.

   However, we probably want to be able to use the stepvector in shufflevector.
For instance, if we had a stepvector literal, then we could implement vector
concatenation in terms of shufflevector:

%a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x
i32> stepvector

   In fact, a lot of useful shuffles can be implemented in terms of stepvector
multiplied or added to some constants. Pulling from Eli's list in
https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html, I can see:

“
   %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME

   SHUFFLE_NAME can be one of the following (with examples of the equivalent
<4 x i32> shuffles):
   concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) ->
see above
   split_low - Return the low half of the first operand (<0, 1>) ->
stepvector of type <vscale x n/2 x i32>
   split_high - Return the high half of the first operand (<2, 3>) ->
(stepvector + splat(n/2)) of type <vscale x n/2 x i32>
   zip_low - Zip together the low halves of the two operands (<0, 4, 1,
5>)
   zip_high - Zip together the high halves of the two operands (<2, 6, 3,
7>)
   unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>)
(stepvector + stepvector) of type <vscale x n x i32>
   unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)
(stepvector + stepvector + splat(1)) of type <vscale x n x i32>
“

   Unfortunately, all of these cannot be done because shufflevector only
supports scalable undef or zeroinitializer. In order to support these cases, we
would need to extend shufflevector to support stepvector (for concat), and
arbitrary constant expressions for the rest. Supporting stepvector might not be
so hard with the current scheme: if the shuffle is scalable, and the mask is
<0, 1, ..., n - 1>, then the input mask was a scalable stepvector.
However, I think this illustrates my proposal: vector pattern literals. They
could look like this in IR:

   <0, 1, ...> ; stepvector

   This is also more flexible, because it enables lots of other scalable vector
literals:

   <7, 7, ...> ; splat(7) without writing that horrid
insertelement/shufflevector thing
   <0, 2, ...> ; unzip_even mask
   <1, 3, ...> ; unzip_odd mask

   The implementation for shufflevector would be straightforward because the
mask for the two currently supported cases of zeroinitializer and undef (<0,
0, ...> <=> zeroinitializer and <undef, undef, ...> <=>
undef) already follow the proposed scheme. This could also have the side
benefits of making some IR easier to read (for very wide vectors, the fixed
width stepvector could be more than 80 columns wide), and might result in
efficiency gains in the compiler (don't need to walk a very wide vector to
see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7,
..., 2048> once to ConstantPatternVector(0, 1)).

   Sorry for rambling. I think I’ve personally come around to the idea that a
constant would be good. However, a more flexible constant would be best if we’re
going to use it to add a bunch of special cases to the codebase. Most special
cases for scalable undef and zeroinitializer can be replaced with equivalent
code that also handles vector pattern literals.

Thanks,
   Christopher Tetreault

From: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>
Sent: Wednesday, January 20, 2021 8:04 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau
at quicinc.com>>
Subject: [EXT] [RFC] Introduce a new stepvector operation


Hi,



As part of adding support for scalable vectorization we need to update
llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this
just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however
this assumes we are using fixed length vectors. For scalable vectors we need an
operation that does the same thing, but without having to explicitly initalise
all the elements. Any new stepvector operation we provide could also be used for
fixed length vectors too if desired.



I believe the desirable properties of the operation should be:



1. The operation requires no arguments and simply returns a vector with the
numeric sequence <0, 1, 2, …>

2. For types with a large number of elements, e.g. <vscale x 32 x i8>
(vscale = 16), there is the possibility of the sequence value exceeding the
limit of the type midway through the vector. In such cases we define the
operation such that those elements are undefined or poison values.



A simple ‘stepvector’ operation (however we choose to implement it) with the
properties described above can then be used together with additional ‘mul’ and
‘add’ instructions to create any arbitrary linear sequence, i.e. <0, 2, 4, 6,
…> or <1, 3, 5, 7, …>



The first possible implementation with the properties as described above
involves using a new llvm.stepvector() intrinsic with no arguments that simply
returns a vector sequence <0, 1, 2, …> of the requested type, i.e.



  declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()



Introducing a new intrinsic is simple to implement and we can easily come up
with an appropriate cost model – cheap for fixed width vectors or scalable
vectors using SVE where we have the ‘index’ instruction.



However, since such an intrinsic is effectively returning a constant vector
sequence we could instead implement it using a new ‘stepvector’ constant in a
similar way to how ‘zeroinitializer’ works. This would be done using a new
ConstantStepVector class similar to ConstantAggregateZero and would return a
vector with the numeric sequence <0, 1, 2, …>. The main advantages of
using a constant over an intrinsic are:



1. It is easy to write tests in LLVM IR since ‘stepvector’ would work in the
same way as ‘zeroinitializer’, i.e. “%1 = add <4 x i32> %0, stepvector”

2. Creation of the node is easy with the simple interface:

  static Constant *ConstantStepVector::get(Type Ty)

3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.



The main disadvantages are:



1. A scalable constant cannot be represented as well in the .data section,
although we can still create a constant based on the architectural maximum for
vscale. It’s worth pointing out that this problem also exists for
zeroinitializer too – we’re just more likely to have cheap instructions to do
the job.

2. Harder to fit into the cost model due to it being a constant.

3. There are some concerns that we might then have to support stepvector as a
constant in the shufflevector operation too and that it should be restricted to
zeroinitializer only.





Any thoughts or feedback you have would be much appreciated!



Kind Regards,

David Sherwood.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210127/f2c23b5b/attachment.html>

Paul Walker via llvm-dev

2021-Jan-28 11:46 UTC

head link

[llvm-dev] [RFC] Introduce a new stepvector operation

Thanks Chris, this all sounds great. I just felt the question needed to be asked
as I didn't want to have this new feature automatically imply new
shufflevector features (where I still believe the best solution for scalable
vectors is to just never use the shufflevector instruction, but that's a
story for a different day).

It matches very closely to our earlier design in that we implemented an
indexvector instruction that took start and step values.  This had a
ConstantExpr variant, which other that the way it was printed, pretty much
mapped to this <n, n+1,..> concept.  We switched to stepvector to simplify
the design along with our technical debt, plus it meant we had nicer handling
for wrapping flags[1].  As you say having support for <imm, imm+1,..>
instead of just <0, 1,...> sounds like a perfect upgrade.

[1] What's the plan for wrapping?  Are these constants implicit NSW/NUW?

Paul!!!

From: Chris Tetreault <ctetreau at quicinc.com>
Date: Wednesday, 27 January 2021 at 20:05
To: Paul Walker <Paul.Walker at arm.com>, David Sherwood
<David.Sherwood at arm.com>, "llvm-dev at lists.llvm.org"
<llvm-dev at lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com>, Eli Friedman
<efriedma at quicinc.com>, "cameron.mcinally at nyu.edu"
<cameron.mcinally at nyu.edu>
Subject: RE: [RFC] Introduce a new stepvector operation

Paul,

                Really, the two big use cases for scalable vectors are
stepvector and enabling more uses of shufflevector. shufflevector is the one
constant expression that “doesn’t work” for scalable vectors yet. Realistically,
it can never be made to work with arbitrary constant scalable vectors without a
fundamental redesign. Another use case is that we could simplify more constant
expressions if they are written in terms of ConstantStepVector. Using the
original proposed stepvector literal, (here, `splat(1)` is the
insertelement/shufflevector thing; I’m not going to write it out) `splat(1) +
stepvector` is as simplified as it can get, but `splat(1) + <0, 1, …>` can
be simplified to `<1, 2, …>`.

                That said, there are other uses for fixed width vectors. It
would enable writing more compact LLVM IR code, which is useful in test cases
(stepvector of type <64 x i8> would span multiple lines). Additionally,
pattern matching on a stepvector is linear in the width of the vector, but for
ConstantStepVector it will have constant time. These are, admittedly, minor
advantages. However, I think the level of effort in implementing
ConstantStepVector with a start/stride pair vs ConstantStepVector that starts at
zero and increments by one is not that great.

Thanks,
                Christopher Tetreault

From: Paul Walker <Paul.Walker at arm.com>
Sent: Wednesday, January 27, 2021 4:36 AM
To: Chris Tetreault <ctetreau at quicinc.com>; David Sherwood
<David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
Cc: Sander De Smalen <Sander.DeSmalen at arm.com>; Eli Friedman
<efriedma at quicinc.com>; cameron.mcinally at nyu.edu
Subject: [EXT] Re: [RFC] Introduce a new stepvector operation

Hi Chris,

I just wanted to say that I like this idea (i.e. <i32 0, i32 1, …>) very
much but wonder how much of it is reliant on its potential uses by
shufflevector.  I ask because the shufflevector problem has extra pitfalls and
I'm not convinced opening up a handful of extra shuffles (beyond the current
splat support, which I consider a bit of a hack) is that great long term.

Essentially, I'm not a fan of having a nobbled shufflevector for scalable
vectors as compared to fixed length vectors.  Taking your splat example, I would
much prefer a dedicated splat instruction that works for all vector types and be
just as pretty for constant and variable splats, whereas ConstantStepVector only
helps with constant splats.

As I say, I like the idea and so am really just asking if you think there'll
be enough pull to make it happen without tying it to shufflevector?

Paul!!!

From: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Date: Tuesday, 26 January 2021 at 17:34
To: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>, "llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>, Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>, Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>, "cameron.mcinally at
nyu.edu<mailto:cameron.mcinally at nyu.edu>" <cameron.mcinally at
nyu.edu<mailto:cameron.mcinally at nyu.edu>>
Subject: RE: [RFC] Introduce a new stepvector operation

David,

                I am proposing a new C++ class (ConstantStepVector is fine; I
don’t really care what color the bike shed is). In my scheme, there would be no
`stepvector` literal, but you could write <0, 1, …> in IR and get the
stepvector returned by llvm::InnerLoopVectorizer::getStepVector().

                I would expect that you could write these ConstantStepVector
literals directly in statements, so instead of:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>,
%vector.ph ], …

                you would have:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, …>, %vector.ph ], …

                … and it would also allow you to have:

vector.body:
  %vec.ind = phi <vscale x 4 x i32> [ <i32 0, i32 1, …>, %vector.ph
], …

                At some point, I think it would probably make sense to start
canonicalizing to it. As you know, for scalable splats we do this horrible
insertelement/shufflevector thing, that should be replaced by <n, n, …>
(for all splats except zero and undef). Scalable stepvector would have to use
it, but it may make sense to have fixed width stepvector use it. I mentioned
downthread several potential uses for it. For shufflevector, we currently allow
zeroinitializer and undef for scalable shuffle masks. We could allow
ConstantStepVector, and internally in C++ code produce the fixed subvector mask,
and special case it for scalable shuffles (that code is already a bit gross).

                Overall, I think having a constant literal is useful. However,
if we’re going to add complexity to the language, let’s do it in the most
flexible way possible. It would be silly to have the stepvector literal, and
then decide to also have ConstantStepVector at some point in the future.

Thanks,
                Christopher Tetreault


From: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>
Sent: Tuesday, January 26, 2021 8:15 AM
To: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>; cameron.mcinally at nyu.edu<mailto:cameron.mcinally at
nyu.edu>
Subject: [EXT] RE: [RFC] Introduce a new stepvector operation

Hi Chris,

I’ve been following the discussions between you and Cameron and just so I
understand
things correctly it looks like you’re proposing a new way of writing vector
literals in IR
where the start and stride are determined from the first two values, i.e. <1,
3, …>
has a start of 1 and stride of 2. I quite like the idea of a vector literal, but
are you
proposing that internally in C++ we have some sort of ConstantStepVector that
contains
a start/stride pair? Or when parsing this vector literal do you expect us to
internally
represent this as something like:

splat(1) + 2 * stepvector

where stepvector would be fixed as <0, 1, 2, 3, …>?

When emitting LLVM IR with something like “clang -S -emit-llvm” would you also
expect to see these vector literals being emitted too? I’m just thinking about
how
this would look in vectorised code, for example normally when vectorising an
induction variable we’d see something like this:

vector.ph:
…
vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>,
%vector.ph ], …

Are you suggesting the vector literal would be created in it’s own statement
in the preheader, or directly as a constant like before:

vector.body:
  %vec.ind = phi <4 x i32> [ <i32 0, i32 1, …>, %vector.ph ], …


Kind Regards,
David.

From: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Sent: 20 January 2021 21:33
To: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>
Subject: RE: [RFC] Introduce a new stepvector operation

David,

   Thanks for writing this up. I’d just like to speak to some concerns I have
regarding shufflevector. As many of us know, shufflevector takes two vectors and
a constant vector of i32, and does stuff. The constant shuffle mask can be
scalable or fixed width. The shuffle mask is supposed to be an arbitrary
constant vector, however for scalable vectors only zeroinitializer or undef are
accepted. There are reasonable technical reasons for this state of affairs, but
it reveals an issue: we don't really handle constant scalable vectors very
well. Surely there are other similar issues throughout the codebase, but this is
one I struggle with regularly so it sticks out in my mind.

   However, we probably want to be able to use the stepvector in shufflevector.
For instance, if we had a stepvector literal, then we could implement vector
concatenation in terms of shufflevector:

%a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x
i32> stepvector

   In fact, a lot of useful shuffles can be implemented in terms of stepvector
multiplied or added to some constants. Pulling from Eli's list in
https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html, I can see:

“
   %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME

   SHUFFLE_NAME can be one of the following (with examples of the equivalent
<4 x i32> shuffles):
   concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) ->
see above
   split_low - Return the low half of the first operand (<0, 1>) ->
stepvector of type <vscale x n/2 x i32>
   split_high - Return the high half of the first operand (<2, 3>) ->
(stepvector + splat(n/2)) of type <vscale x n/2 x i32>
   zip_low - Zip together the low halves of the two operands (<0, 4, 1,
5>)
   zip_high - Zip together the high halves of the two operands (<2, 6, 3,
7>)
   unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>)
(stepvector + stepvector) of type <vscale x n x i32>
   unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)
(stepvector + stepvector + splat(1)) of type <vscale x n x i32>
“

   Unfortunately, all of these cannot be done because shufflevector only
supports scalable undef or zeroinitializer. In order to support these cases, we
would need to extend shufflevector to support stepvector (for concat), and
arbitrary constant expressions for the rest. Supporting stepvector might not be
so hard with the current scheme: if the shuffle is scalable, and the mask is
<0, 1, ..., n - 1>, then the input mask was a scalable stepvector.
However, I think this illustrates my proposal: vector pattern literals. They
could look like this in IR:

   <0, 1, ...> ; stepvector

   This is also more flexible, because it enables lots of other scalable vector
literals:

   <7, 7, ...> ; splat(7) without writing that horrid
insertelement/shufflevector thing
   <0, 2, ...> ; unzip_even mask
   <1, 3, ...> ; unzip_odd mask

   The implementation for shufflevector would be straightforward because the
mask for the two currently supported cases of zeroinitializer and undef (<0,
0, ...> <=> zeroinitializer and <undef, undef, ...> <=>
undef) already follow the proposed scheme. This could also have the side
benefits of making some IR easier to read (for very wide vectors, the fixed
width stepvector could be more than 80 columns wide), and might result in
efficiency gains in the compiler (don't need to walk a very wide vector to
see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7,
..., 2048> once to ConstantPatternVector(0, 1)).

   Sorry for rambling. I think I’ve personally come around to the idea that a
constant would be good. However, a more flexible constant would be best if we’re
going to use it to add a bunch of special cases to the codebase. Most special
cases for scalable undef and zeroinitializer can be replaced with equivalent
code that also handles vector pattern literals.

Thanks,
   Christopher Tetreault

From: David Sherwood <David.Sherwood at arm.com<mailto:David.Sherwood at
arm.com>>
Sent: Wednesday, January 20, 2021 8:04 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Sander De Smalen <Sander.DeSmalen at arm.com<mailto:Sander.DeSmalen at
arm.com>>; Paul Walker <Paul.Walker at arm.com<mailto:Paul.Walker at
arm.com>>; Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau
at quicinc.com>>
Subject: [EXT] [RFC] Introduce a new stepvector operation


Hi,



As part of adding support for scalable vectorization we need to update
llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this
just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however
this assumes we are using fixed length vectors. For scalable vectors we need an
operation that does the same thing, but without having to explicitly initalise
all the elements. Any new stepvector operation we provide could also be used for
fixed length vectors too if desired.



I believe the desirable properties of the operation should be:



1. The operation requires no arguments and simply returns a vector with the
numeric sequence <0, 1, 2, …>

2. For types with a large number of elements, e.g. <vscale x 32 x i8>
(vscale = 16), there is the possibility of the sequence value exceeding the
limit of the type midway through the vector. In such cases we define the
operation such that those elements are undefined or poison values.



A simple ‘stepvector’ operation (however we choose to implement it) with the
properties described above can then be used together with additional ‘mul’ and
‘add’ instructions to create any arbitrary linear sequence, i.e. <0, 2, 4, 6,
…> or <1, 3, 5, 7, …>



The first possible implementation with the properties as described above
involves using a new llvm.stepvector() intrinsic with no arguments that simply
returns a vector sequence <0, 1, 2, …> of the requested type, i.e.



  declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()



Introducing a new intrinsic is simple to implement and we can easily come up
with an appropriate cost model – cheap for fixed width vectors or scalable
vectors using SVE where we have the ‘index’ instruction.



However, since such an intrinsic is effectively returning a constant vector
sequence we could instead implement it using a new ‘stepvector’ constant in a
similar way to how ‘zeroinitializer’ works. This would be done using a new
ConstantStepVector class similar to ConstantAggregateZero and would return a
vector with the numeric sequence <0, 1, 2, …>. The main advantages of
using a constant over an intrinsic are:



1. It is easy to write tests in LLVM IR since ‘stepvector’ would work in the
same way as ‘zeroinitializer’, i.e. “%1 = add <4 x i32> %0, stepvector”

2. Creation of the node is easy with the simple interface:

  static Constant *ConstantStepVector::get(Type Ty)

3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.



The main disadvantages are:



1. A scalable constant cannot be represented as well in the .data section,
although we can still create a constant based on the architectural maximum for
vscale. It’s worth pointing out that this problem also exists for
zeroinitializer too – we’re just more likely to have cheap instructions to do
the job.

2. Harder to fit into the cost model due to it being a constant.

3. There are some concerns that we might then have to support stepvector as a
constant in the shufflevector operation too and that it should be restricted to
zeroinitializer only.





Any thoughts or feedback you have would be much appreciated!



Kind Regards,

David Sherwood.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210128/93416ddd/attachment-0001.html>

llvm dev - Jan 2021 - [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation