thr3ads.net - llvm dev - [llvm-dev] [RFC] Introduce a new stepvector operation [Jan 2021]

If this information is useful, please help other people find it:
Share via:

David Sherwood via llvm-dev

2021-Jan-20 16:03 UTC

[llvm-dev] [RFC] Introduce a new stepvector operation

Hi,



As part of adding support for scalable vectorization we need to update
llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this
just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however
this assumes we are using fixed length vectors. For scalable vectors we need an
operation that does the same thing, but without having to explicitly initalise
all the elements. Any new stepvector operation we provide could also be used for
fixed length vectors too if desired.



I believe the desirable properties of the operation should be:



1. The operation requires no arguments and simply returns a vector with the
numeric sequence <0, 1, 2, ...>

2. For types with a large number of elements, e.g. <vscale x 32 x i8>
(vscale = 16), there is the possibility of the sequence value exceeding the
limit of the type midway through the vector. In such cases we define the
operation such that those elements are undefined or poison values.



A simple 'stepvector' operation (however we choose to implement it) with
the properties described above can then be used together with additional
'mul' and 'add' instructions to create any arbitrary linear
sequence, i.e. <0, 2, 4, 6, ...> or <1, 3, 5, 7, ...>



The first possible implementation with the properties as described above
involves using a new llvm.stepvector() intrinsic with no arguments that simply
returns a vector sequence <0, 1, 2, ...> of the requested type, i.e.



  declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()



Introducing a new intrinsic is simple to implement and we can easily come up
with an appropriate cost model - cheap for fixed width vectors or scalable
vectors using SVE where we have the 'index' instruction.



However, since such an intrinsic is effectively returning a constant vector
sequence we could instead implement it using a new 'stepvector' constant
in a similar way to how 'zeroinitializer' works. This would be done
using a new ConstantStepVector class similar to ConstantAggregateZero and would
return a vector with the numeric sequence <0, 1, 2, ...>. The main
advantages of using a constant over an intrinsic are:



1. It is easy to write tests in LLVM IR since 'stepvector' would work in
the same way as 'zeroinitializer', i.e. "%1 = add <4 x i32>
%0, stepvector"

2. Creation of the node is easy with the simple interface:

  static Constant *ConstantStepVector::get(Type Ty)

3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.



The main disadvantages are:



1. A scalable constant cannot be represented as well in the .data section,
although we can still create a constant based on the architectural maximum for
vscale. It's worth pointing out that this problem also exists for
zeroinitializer too - we're just more likely to have cheap instructions to
do the job.

2. Harder to fit into the cost model due to it being a constant.

3. There are some concerns that we might then have to support stepvector as a
constant in the shufflevector operation too and that it should be restricted to
zeroinitializer only.





Any thoughts or feedback you have would be much appreciated!



Kind Regards,

David Sherwood.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210120/1d3b349b/attachment.html>

Chris Tetreault via llvm-dev

2021-Jan-20 21:32 UTC

head link

[llvm-dev] [RFC] Introduce a new stepvector operation

David,

   Thanks for writing this up. I'd just like to speak to some concerns I
have regarding shufflevector. As many of us know, shufflevector takes two
vectors and a constant vector of i32, and does stuff. The constant shuffle mask
can be scalable or fixed width. The shuffle mask is supposed to be an arbitrary
constant vector, however for scalable vectors only zeroinitializer or undef are
accepted. There are reasonable technical reasons for this state of affairs, but
it reveals an issue: we don't really handle constant scalable vectors very
well. Surely there are other similar issues throughout the codebase, but this is
one I struggle with regularly so it sticks out in my mind.

   However, we probably want to be able to use the stepvector in shufflevector.
For instance, if we had a stepvector literal, then we could implement vector
concatenation in terms of shufflevector:

%a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x
i32> stepvector

   In fact, a lot of useful shuffles can be implemented in terms of stepvector
multiplied or added to some constants. Pulling from Eli's list in
https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html, I can see:

"
   %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
i32> %v2, SHUFFLE_NAME

   SHUFFLE_NAME can be one of the following (with examples of the equivalent
<4 x i32> shuffles):
   concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) ->
see above
   split_low - Return the low half of the first operand (<0, 1>) ->
stepvector of type <vscale x n/2 x i32>
   split_high - Return the high half of the first operand (<2, 3>) ->
(stepvector + splat(n/2)) of type <vscale x n/2 x i32>
   zip_low - Zip together the low halves of the two operands (<0, 4, 1,
5>)
   zip_high - Zip together the high halves of the two operands (<2, 6, 3,
7>)
   unzip_even - Unzip the even elements of the two operands (<0, 2, 4, 6>)
(stepvector + stepvector) of type <vscale x n x i32>
   unzip_odd - unzip the odd elements of the two operands (<1, 3, 5, 7>)
(stepvector + stepvector + splat(1)) of type <vscale x n x i32>
"

   Unfortunately, all of these cannot be done because shufflevector only
supports scalable undef or zeroinitializer. In order to support these cases, we
would need to extend shufflevector to support stepvector (for concat), and
arbitrary constant expressions for the rest. Supporting stepvector might not be
so hard with the current scheme: if the shuffle is scalable, and the mask is
<0, 1, ..., n - 1>, then the input mask was a scalable stepvector.
However, I think this illustrates my proposal: vector pattern literals. They
could look like this in IR:

   <0, 1, ...> ; stepvector

   This is also more flexible, because it enables lots of other scalable vector
literals:

   <7, 7, ...> ; splat(7) without writing that horrid
insertelement/shufflevector thing
   <0, 2, ...> ; unzip_even mask
   <1, 3, ...> ; unzip_odd mask

   The implementation for shufflevector would be straightforward because the
mask for the two currently supported cases of zeroinitializer and undef (<0,
0, ...> <=> zeroinitializer and <undef, undef, ...> <=>
undef) already follow the proposed scheme. This could also have the side
benefits of making some IR easier to read (for very wide vectors, the fixed
width stepvector could be more than 80 columns wide), and might result in
efficiency gains in the compiler (don't need to walk a very wide vector to
see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7,
..., 2048> once to ConstantPatternVector(0, 1)).

   Sorry for rambling. I think I've personally come around to the idea that
a constant would be good. However, a more flexible constant would be best if
we're going to use it to add a bunch of special cases to the codebase. Most
special cases for scalable undef and zeroinitializer can be replaced with
equivalent code that also handles vector pattern literals.

Thanks,
   Christopher Tetreault

From: David Sherwood <David.Sherwood at arm.com>
Sent: Wednesday, January 20, 2021 8:04 AM
To: llvm-dev at lists.llvm.org
Cc: Sander De Smalen <Sander.DeSmalen at arm.com>; Paul Walker
<Paul.Walker at arm.com>; Chris Tetreault <ctetreau at quicinc.com>
Subject: [EXT] [RFC] Introduce a new stepvector operation


Hi,



As part of adding support for scalable vectorization we need to update
llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this
just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however
this assumes we are using fixed length vectors. For scalable vectors we need an
operation that does the same thing, but without having to explicitly initalise
all the elements. Any new stepvector operation we provide could also be used for
fixed length vectors too if desired.



I believe the desirable properties of the operation should be:



1. The operation requires no arguments and simply returns a vector with the
numeric sequence <0, 1, 2, ...>

2. For types with a large number of elements, e.g. <vscale x 32 x i8>
(vscale = 16), there is the possibility of the sequence value exceeding the
limit of the type midway through the vector. In such cases we define the
operation such that those elements are undefined or poison values.



A simple 'stepvector' operation (however we choose to implement it) with
the properties described above can then be used together with additional
'mul' and 'add' instructions to create any arbitrary linear
sequence, i.e. <0, 2, 4, 6, ...> or <1, 3, 5, 7, ...>



The first possible implementation with the properties as described above
involves using a new llvm.stepvector() intrinsic with no arguments that simply
returns a vector sequence <0, 1, 2, ...> of the requested type, i.e.



  declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()



Introducing a new intrinsic is simple to implement and we can easily come up
with an appropriate cost model - cheap for fixed width vectors or scalable
vectors using SVE where we have the 'index' instruction.



However, since such an intrinsic is effectively returning a constant vector
sequence we could instead implement it using a new 'stepvector' constant
in a similar way to how 'zeroinitializer' works. This would be done
using a new ConstantStepVector class similar to ConstantAggregateZero and would
return a vector with the numeric sequence <0, 1, 2, ...>. The main
advantages of using a constant over an intrinsic are:



1. It is easy to write tests in LLVM IR since 'stepvector' would work in
the same way as 'zeroinitializer', i.e. "%1 = add <4 x i32>
%0, stepvector"

2. Creation of the node is easy with the simple interface:

  static Constant *ConstantStepVector::get(Type Ty)

3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.



The main disadvantages are:



1. A scalable constant cannot be represented as well in the .data section,
although we can still create a constant based on the architectural maximum for
vscale. It's worth pointing out that this problem also exists for
zeroinitializer too - we're just more likely to have cheap instructions to
do the job.

2. Harder to fit into the cost model due to it being a constant.

3. There are some concerns that we might then have to support stepvector as a
constant in the shufflevector operation too and that it should be restricted to
zeroinitializer only.





Any thoughts or feedback you have would be much appreciated!



Kind Regards,

David Sherwood.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210120/17632c6e/attachment.html>

Philip Reames via llvm-dev

2021-Jan-21 20:19 UTC

head link

[llvm-dev] [RFC] Introduce a new stepvector operation

Just a thought on another approach.

Unless I'm missing something, a step vector is simply a prefix sum over 
a constant vector.  Do we have clean ways with scalable vectors of 
representing those concepts?  If we do, we don't need a separate 
intrinsic.  If we don't, we probably need them anyways?

Slightly more specifically:
allones = broadcast 0
sum = prefix_sum allones
stepvec = sum - allones

Out of those, the only tricky one is the prefix sum.  Have we explored that?

Philip

On 1/20/21 8:03 AM, David Sherwood via llvm-dev wrote:>
> Hi,
>
> As part of adding support for scalable vectorization we need to update 
> llvm::InnerLoopVectorizer::getStepVector for scalable vectors. 
> Currently this just returns a constant vector with the sequence <0, 1, 
> 2, 3, ..>, however this assumes we are using fixed length vectors. For 
> scalable vectors we need an operation that does the same thing, but 
> without having to explicitly initalise all the elements. Any new 
> stepvector operation we provide could also be used for fixed length 
> vectors too if desired.
>
> I believe the desirable properties of the operation should be:
>
> 1. The operation requires no arguments and simply returns a vector 
> with the numeric sequence <0, 1, 2, …>
>
> 2. For types with a large number of elements, e.g. <vscale x 32 x i8>
> (vscale = 16), there is the possibility of the sequence value 
> exceeding the limit of the type midway through the vector. In such 
> cases we define the operation such that those elements are undefined 
> or poison values.
>
> A simple ‘stepvector’ operation (however we choose to implement it) 
> with the properties described above can then be used together with 
> additional ‘mul’ and ‘add’ instructions to create any arbitrary linear 
> sequence, i.e. <0, 2, 4, 6, …> or <1, 3, 5, 7, …>
>
> The first possible implementation with the properties as described 
> above involves using a new llvm.stepvector() intrinsic with no 
> arguments that simply returns a vector sequence <0, 1, 2, …> of the 
> requested type, i.e.
>
>   declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
>
> Introducing a new intrinsic is simple to implement and we can easily 
> come up with an appropriate cost model – cheap for fixed width vectors 
> or scalable vectors using SVE where we have the ‘index’ instruction.
>
> However, since such an intrinsic is effectively returning a constant 
> vector sequence we could instead implement it using a new ‘stepvector’ 
> constant in a similar way to how ‘zeroinitializer’ works. This would 
> be done using a new ConstantStepVector class similar to 
> ConstantAggregateZero and would return a vector with the numeric 
> sequence <0, 1, 2, …>. The main advantages of using a constant over
an
> intrinsic are:
>
> 1. It is easy to write tests in LLVM IR since ‘stepvector’ would work 
> in the same way as ‘zeroinitializer’, i.e. “%1 = add <4 x i32> %0, 
> stepvector”
>
> 2. Creation of the node is easy with the simple interface:
>
>   static Constant *ConstantStepVector::get(Type Ty)
>
> 3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.
>
> The main disadvantages are:
>
> 1. A scalable constant cannot be represented as well in the .data 
> section, although we can still create a constant based on the 
> architectural maximum for vscale. It’s worth pointing out that this 
> problem also exists for zeroinitializer too – we’re just more likely 
> to have cheap instructions to do the job.
>
> 2. Harder to fit into the cost model due to it being a constant.
>
> 3. There are some concerns that we might then have to support 
> stepvector as a constant in the shufflevector operation too and that 
> it should be restricted to zeroinitializer only.
>
> Any thoughts or feedback you have would be much appreciated!
>
> Kind Regards,
>
> David Sherwood.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210121/f43fc76f/attachment.html>

Florian Hahn via llvm-dev

2021-Jan-22 12:56 UTC

head link

[llvm-dev] [RFC] Introduce a new stepvector operation

> On Jan 20, 2021, at 16:03, David Sherwood via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
>  
> As part of adding support for scalable vectorization we need to update
llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this
just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however
this assumes we are using fixed length vectors. For scalable vectors we need an
operation that does the same thing, but without having to explicitly initalise
all the elements. Any new stepvector operation we provide could also be used for
fixed length vectors too if desired.
>  
> I believe the desirable properties of the operation should be:
>  
> 1. The operation requires no arguments and simply returns a vector with the
numeric sequence <0, 1, 2, …>
> 2. For types with a large number of elements, e.g. <vscale x 32 x i8>
(vscale = 16), there is the possibility of the sequence value exceeding the
limit of the type midway through the vector. In such cases we define the
operation such that those elements are undefined or poison values.
>  
> A simple ‘stepvector’ operation (however we choose to implement it) with
the properties described above can then be used together with additional ‘mul’
and ‘add’ instructions to create any arbitrary linear sequence, i.e. <0, 2,
4, 6, …> or <1, 3, 5, 7, …>
>  
> The first possible implementation with the properties as described above
involves using a new llvm.stepvector() intrinsic with no arguments that simply
returns a vector sequence <0, 1, 2, …> of the requested type, i.e.
>  
>   declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
>  
> Introducing a new intrinsic is simple to implement and we can easily come
up with an appropriate cost model – cheap for fixed width vectors or scalable
vectors using SVE where we have the ‘index’ instruction.
Are you planning on updating the existing code that generates vector constants
to use @llvm.stepvector for fixed width vectors?
 >  
> However, since such an intrinsic is effectively returning a constant vector
sequence we could instead implement it using a new ‘stepvector’ constant in a
similar way to how ‘zeroinitializer’ works. This would be done using a new
ConstantStepVector class similar to ConstantAggregateZero and would return a
vector with the numeric sequence <0, 1, 2, …>. The main advantages of
using a constant over an intrinsic are:
>  
> 1. It is easy to write tests in LLVM IR since ‘stepvector’ would work in
the same way as ‘zeroinitializer’, i.e. “%1 = add <4 x i32> %0,
stepvector”
> 2. Creation of the node is easy with the simple interface:
>   static Constant *ConstantStepVector::get(Type Ty)
If there’s IRBuilder::createStepVector(Ty), the creation of the intrinsic should
be also quite simple as well. A potential inconvenience of the intrinsic
compared to a constant is that we may generate redundant intrinsic calls . Not
sure how big of an issue that is going to be, but it shouldn’t be too big in
practice, as long as only a few places create step vectors.
> 3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.
I think if the intrinsic has the right attributes (e.g. readnone), things like
DCE, GVN & CSE should work automatically for the intrinsic. As for
pattern-matching, you could just add a matcher for stepvector, so patterns would
look similar regardless whether it is a constant or an intrinsic?

Given that we already have an intrinsic for vscale and there are going to be
named shuffle intrinsics for scalable vectors, having another intrinsic for
stepvector seems consistent with the previous decisions and the path of least
resistance.  Personally I am not convinced that mentioned benefits of the named
constant are worth the trouble if we still have a set of intrinsics for vscale
& the named shuffle.

Cheer,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210122/143cd4ea/attachment.html>

llvm dev - Jan 2021 - [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation

[llvm-dev] [RFC] Introduce a new stepvector operation