thr3ads.net - llvm dev - [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Graham Hunter via llvm-dev

2018-Jun-06 09:20 UTC

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi David,
>>> The name "getSizeExpressionInBits" makes me think that a
Value
>>> expression will be returned (something like a ConstantExpr that
uses
>>> vscale).  I would be surprised to get a pair of integers back.  Do
>>> clients actually need constant integer values or would a
ConstantExpr
>>> sufffice?  We could add a ConstantVScale or something to make it
work.
>> 
>> I agree the name is not ideal and I'm open to suggestions -- I was
thinking of the two
>> integers representing the known-at-compile-time terms in an expression:
>> '(scaled_bits * vscale) + unscaled_bits'.
>> 
>> Assuming the pair is of the form (unscaled, scaled), then for a type
with a size known at
>> compile time like <4 x i32> the size would be (128, 0).
>> 
>> For a scalable type like <scalable 4 x i32> the size would be (0,
128).
>> 
>> For a struct with, say, a <scalable 32 x i8> and an i64, it would
be (64, 256).
>> 
>> When calculating the offset for memory addresses, you just need to
multiply the scaled
>> part by vscale and add the unscaled as is.
> 
> Ok, now I understand what you're getting at.  A ConstantExpr would
> encapsulate this computation.  We alreay have
"non-static-constant"
> values for ConstantExpr like sizeof and offsetof.  I would see
> VScaleConstant in that same tradition.  In your struct example,
> getSizeExpressionInBits would return:
> 
> add(mul(256, vscale), 64)
> 
> Does that satisfy your needs?
Ah, I think the use of 'expression' in the name definitely confuses the
issue then. This
isn't for expressing the size in IR, where you would indeed just multiply by
vscale and
add any fixed-length size.

This is for the analysis code around the IR -- lots of code asks for the size of
a Type in
bits to determine what it can do to a Value with that type. Some of them are
specific to
scalar Types, like determining whether a sign/zero extend is needed. Others
would
apply to vector types (including scalable vectors), such as checking whether two
Types have the exact same size so that a bitcast can be used instead of a more
expensive operation like copying to memory and back to convert.

See 'getTypeSizeInBits' and 'getTypeStoreSizeInBits' in
DataLayout -- they're used
a few hundred times throughout the codebase, and to properly support scalable
types we'd need to return something that isn't just a single integer.
Since most
backends won't support scalable vectors I suggested having a
'FixedSize' method
that just returns the single integer, but it may be better to just leave the
existing method
as is and create a new method with 'Scalable' or
'VariableLength' or similar in the
name to make it more obvious in common code.

There's a few places where changes in IR may be needed;
'lifetime.start' markers in
IR embed size data, and we would need to either add a scalable term to that or
find some other way of indicating the size. That can be dealt with when we try
to
add support for the SVE ACLE though.
> 
> Is there anything about vscale or a scalable vector that requires a
> minimum bit width?  For example, is this legal?
> 
> <scalable 1 x double>
> 
> I know it won't map to an SVE type.  I'm simply curious because
> traditionally Cray machines defined vectors in terms of
> machine-dependent "maxvl" with an element type, so with the above
vscale
> would == maxvl.  Not that we make any such things anymore.  But maybe
> someone else does?
That's legal in IR, yes, and we believe it should be usable to represent the
vectors for
RISC-V's 'V' extension. The main problem there is that they have a
dynamic vector
length within the loop so that they can perform the last iterations of a loop
within vector
registers when there's less than a full register worth of data remaining.
SVE uses
predication (masking) to achieve the same effect.

For the 'V' extension, vscale would indeed correspond to
'maxvl', and I'm hoping that a
'setvl' intrinsic that provides a predicate will avoid the need for
modelling a change in
dynamic vector length -- reducing the vector length is effectively equivalent to
an implied
predicate on all operations. This avoids needing to add a token operand to all
existing
instructions that work on vector types.

-Graham





> 
>>> If we went the ConstantExpr route and added ConstantExpr support to
>>> ScalarEvolution, then SCEVs could be compared to do this size
>>> comparison.  We have code here that adds ConstantExpr support to
>>> ScalarEvolution.  We just didn't know if anyone else would be
interested
>>> in it since we added it solely for our Fortran frontend.
>> 
>> We added a dedicated SCEV expression class for vscale instead; I
suspect it works
>> either way.
> 
> Yes, that's probably true.  A vscale SCEV is less invasive.
> 
>> We've tried it as both an instruction and as a 'Constant',
and both work fine with
>> ScalarEvolution. I have not yet tried it with the intrinsic.
> 
> vscale as a Constant is interesting.  It's a target-dependent Constant
> like sizeof and offsetof.  It doesn't have a statically known value and
> maybe isn't "constant" across functions.  So it's a
strange kind of
> constant.
> 
> Ultimately whatever is easier for LLVM to analyze in the long run is
> best.  Intrinsics often block optimization.  I don't know whether
vscale
> would be "eaiser" as a Constant or an Instruction.
> 
>>> As above, we could add ConstantVScale and also ConstantStepVector
(or
>>> ConstantIota).  They won't fold to compile-time values but the
>>> expressions could be simplified.  I haven't really thought
through the
>>> implications of this, just brainstorming ideas.  What does your
>>> downstream compiler require in terms of constant support.  What
kinds of
>>> queries does it need to do?
>> 
>> It makes things a little easier to pattern match (just looking for a
constant to start
>> instead of having to match multiple different forms of vscale or
stepvector multiplied
>> and/or added in each place you're looking for them).
> 
> Ok.  Normalization could help with this but I certainly understand the
> issue.
> 
>> The bigger reason we currently depend on them being constant is that
code generation
>> generally looks at a single block at a time, and there are several
expressions using
>> vscale that we don't want to be generated in one block and passed
around in a register,
>> since many of the load/store addressing forms for instructions will
already scale properly.
> 
> This is kind of like X86 memop folding.  If a load has multiple uses, it
> won't be folded, on the theory that one load is better than many folded
> loads.  If a load has exactly one use, it will fold.  There's explicit
> predicate code in the X86 backend to enforce this requirement.  I
> suspect if the X86 backend tried to fold a single load into multiple
> places, Bad Things would happen (needed SDNodes might disappear, etc.).
> 
> Codegen probably doesn't understand non-statically-constant
> ConstantExprs, since sizeof of offsetof can be resolved by the target
> before instruction selection.
> 
>> We've done this downstream by having them be Constants, but if
there's a good way
>> of doing them with intrinsics we'd be fine with that too.
> 
> If vscale/stepvector as Constants works, it seems fine to me.
> 
>                               -David

David A. Greene via llvm-dev

2018-Jun-06 16:36 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> Ok, now I understand what you're getting at.  A ConstantExpr would
>> encapsulate this computation.  We alreay have
"non-static-constant"
>> values for ConstantExpr like sizeof and offsetof.  I would see
>> VScaleConstant in that same tradition.  In your struct example,
>> getSizeExpressionInBits would return:
>> 
>> add(mul(256, vscale), 64)
>> 
>> Does that satisfy your needs?
>
> Ah, I think the use of 'expression' in the name definitely confuses
the issue then. This
> isn't for expressing the size in IR, where you would indeed just
multiply by vscale and
> add any fixed-length size.
Ok, thanks for clarifying.  The use of "expression" is confusing.
> This is for the analysis code around the IR -- lots of code asks for the
size of a Type in
> bits to determine what it can do to a Value with that type. Some of them
are specific to
> scalar Types, like determining whether a sign/zero extend is needed. Others
would
> apply to vector types (including scalable vectors), such as checking
whether two
> Types have the exact same size so that a bitcast can be used instead of a
more
> expensive operation like copying to memory and back to convert.
If this method returns two integers, how does LLVM interpret the
comparison?  If the return value is { <unscaled>, <scaled> } then
how
do, say { 1024, 0 } and { 0, 128 } compare?  Doesn't it depend on the
vscale?  They could be the same size or not, depending on the target
characteristics.

Are bitcasts between scaled types and non-scaled types disallowed?  I
could certainly see an argument for disallowing it.  I could argue that
for bitcasting purposes that the unscaled and scaled parts would have to
exactly match in order to do a legal bitcast.  Is that the intent?
>> Is there anything about vscale or a scalable vector that requires a
>> minimum bit width?  For example, is this legal?
>> 
>> <scalable 1 x double>
>> 
>> I know it won't map to an SVE type.  I'm simply curious because
>> traditionally Cray machines defined vectors in terms of
>> machine-dependent "maxvl" with an element type, so with the
above vscale
>> would == maxvl.  Not that we make any such things anymore.  But maybe
>> someone else does?
>
> That's legal in IR, yes, and we believe it should be usable to
represent the vectors for
> RISC-V's 'V' extension. The main problem there is that they
have a dynamic vector
> length within the loop so that they can perform the last iterations of a
loop within vector
> registers when there's less than a full register worth of data
remaining. SVE uses
> predication (masking) to achieve the same effect.
>
> For the 'V' extension, vscale would indeed correspond to
'maxvl', and I'm hoping that a
> 'setvl' intrinsic that provides a predicate will avoid the need for
modelling a change in
> dynamic vector length -- reducing the vector length is effectively
equivalent to an implied
> predicate on all operations. This avoids needing to add a token operand to
all existing
> instructions that work on vector types.
Right.  In that way the RISC V method is very much like what the old
Cray machines did with the Vector Length register.

So in LLVM IR you would have "setvl" return a predicate and then apply
that predicate to operations using the current select method?  How does
instruction selection map that back onto a simple setvl + unpredicated
vector instructions?

For conditional code both vector length and masking must be taken into
account.  If "setvl" returns a predicate then that predicate would
have
to be combined in some way with the conditional predicate (typically via
an AND operation in an IR that directly supports predicates).  Since
LLVM IR doesn't have predicates _per_se_, would it turn into nested
selects or something?  Untangling that in instruction selection seems
difficult but perhaps I'm missing something.

                                 -David

Graham Hunter via llvm-dev

2018-Jun-07 16:10 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi,
> On 6 Jun 2018, at 17:36, David A. Greene <dag at cray.com> wrote:
> 
> Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> writes:
> 
>>> Ok, now I understand what you're getting at.  A ConstantExpr
would
>>> encapsulate this computation.  We alreay have
"non-static-constant"
>>> values for ConstantExpr like sizeof and offsetof.  I would see
>>> VScaleConstant in that same tradition.  In your struct example,
>>> getSizeExpressionInBits would return:
>>> 
>>> add(mul(256, vscale), 64)
>>> 
>>> Does that satisfy your needs?
>> 
>> Ah, I think the use of 'expression' in the name definitely
confuses the issue then. This
>> isn't for expressing the size in IR, where you would indeed just
multiply by vscale and
>> add any fixed-length size.
> 
> Ok, thanks for clarifying.  The use of "expression" is confusing.
> 
>> This is for the analysis code around the IR -- lots of code asks for
the size of a Type in
>> bits to determine what it can do to a Value with that type. Some of
them are specific to
>> scalar Types, like determining whether a sign/zero extend is needed.
Others would
>> apply to vector types (including scalable vectors), such as checking
whether two
>> Types have the exact same size so that a bitcast can be used instead of
a more
>> expensive operation like copying to memory and back to convert.
> 
> If this method returns two integers, how does LLVM interpret the
> comparison?  If the return value is { <unscaled>, <scaled> }
then how
> do, say { 1024, 0 } and { 0, 128 } compare?  Doesn't it depend on the
> vscale?  They could be the same size or not, depending on the target
> characteristics.
I did have a paragraph on that in the RFC, but perhaps a list would be
a better format (assuming X,Y,etc are non-zero):

{ X, 0 } <cmp> { Y, 0 }: Normal unscaled comparison.

{ 0, X } <cmp> { 0, Y }: Normal comparison within a function, or across
                         functions that inherit vector length. Cannot be
                         compared across non-inheriting functions.

{ X, 0 } > { 0, Y }: Cannot return true.

{ X, 0 } = { 0, Y }: Cannot return true.

{ X, 0 } < { 0, Y }: Can return true.

{ Xu, Xs } <cmp> { Yu, Ys }: Gets complicated, need to subtract common
                             terms and try the above comparisons; it
                             may not be possible to get a good answer.

I don't know if we need a 'maybe' result for cases comparing scaled
vs. unscaled; I believe the gcc implementation of SVE allows for such
results, but that supports a generic polynomial length representation.

I think in code, we'd have an inline function to deal with the first case
and an likely-not-taken call to a separate function to handle all the
scalable cases.
> Are bitcasts between scaled types and non-scaled types disallowed?  I
> could certainly see an argument for disallowing it.  I could argue that
> for bitcasting purposes that the unscaled and scaled parts would have to
> exactly match in order to do a legal bitcast.  Is that the intent?
I would propose disallowing bitcasts, but allowing extracting a subvector
if the minimum number of scaled bits matches the number of unscaled bits.
> 
>>> Is there anything about vscale or a scalable vector that requires a
>>> minimum bit width?  For example, is this legal?
>>> 
>>> <scalable 1 x double>
>>> 
>>> I know it won't map to an SVE type.  I'm simply curious
because
>>> traditionally Cray machines defined vectors in terms of
>>> machine-dependent "maxvl" with an element type, so with
the above vscale
>>> would == maxvl.  Not that we make any such things anymore.  But
maybe
>>> someone else does?
>> 
>> That's legal in IR, yes, and we believe it should be usable to
represent the vectors for
>> RISC-V's 'V' extension. The main problem there is that they
have a dynamic vector
>> length within the loop so that they can perform the last iterations of
a loop within vector
>> registers when there's less than a full register worth of data
remaining. SVE uses
>> predication (masking) to achieve the same effect.
>> 
>> For the 'V' extension, vscale would indeed correspond to
'maxvl', and I'm hoping that a
>> 'setvl' intrinsic that provides a predicate will avoid the need
for modelling a change in
>> dynamic vector length -- reducing the vector length is effectively
equivalent to an implied
>> predicate on all operations. This avoids needing to add a token operand
to all existing
>> instructions that work on vector types.
> 
> Right.  In that way the RISC V method is very much like what the old
> Cray machines did with the Vector Length register.
> 
> So in LLVM IR you would have "setvl" return a predicate and then
apply
> that predicate to operations using the current select method?  How does
> instruction selection map that back onto a simple setvl + unpredicated
> vector instructions?
> 
> For conditional code both vector length and masking must be taken into
> account.  If "setvl" returns a predicate then that predicate
would have
> to be combined in some way with the conditional predicate (typically via
> an AND operation in an IR that directly supports predicates).  Since
> LLVM IR doesn't have predicates _per_se_, would it turn into nested
> selects or something?  Untangling that in instruction selection seems
> difficult but perhaps I'm missing something.
My idea is for the RISC-V backend to recognize when a setvl intrinsic has
been used, and replace the use of its value in AND operations with an
all-true value (with constant folding to remove unnecessary ANDs) then
replace any masked instructions (generally loads, stores, anything else
that might generate an exception or modify state that it shouldn't) with
target-specific nodes that understand the dynamic vlen.

This could be part of lowering, or maybe a separate IR pass, rather than ISel.
I *think* this will work, but if someone can come up with some IR where it
wouldn't work then please let me know (e.g. global-state-changing
instructions
that could move out of blocks where one setvl predicate is used and into one
where another is used).

Unfortunately, I can't find a description of the instructions included in
the 'V' extension in the online manual (other than setvl or configuring
registers), so I can't tell if there's something I'm missing.

-Graham

llvm dev - Jun 2018 - [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths