thr3ads.net - llvm dev - [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2018-Jul-31 11:13 UTC

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Tue, 31 Jul 2018 at 03:53, David A. Greene <dag at cray.com>
wrote:> I wasn't talking about within an instruction but rather across
> instructions in the same expression tree.  Something like this would be
> weird:
Yes, that's what I was referring as "not in the API" therefore
"user error".

> The points where VL would be changed are limited and I think would
> require limited, straightforward additions on top of this proposal.
Indeed. I have a limited view on the spec and even more so on hardware
implementations, but it is my understanding that there is no attempt
to change VL mid-loop.

If we can assume VL will be "the same" (not constant) throughout every
self-contained sub-graph (from scalar|memory->vector to
vector->scalar|memory), there we should encode it in the IR spec that
this is a hard requirement.

This seems consistent with your explanation of the Cray VL change as
well as Bruce's description of RISC-V (both seem very similar to me),
where VL can change between two loop iterations but not within the
same iteration.

We will still have to be careful with access safety (alias, loop
dependencies, etc), but that shouldn't be different than if VL was
required to be constant throughout the program.

> That's right.  This proposal doesn't expose a way to change vscale,
but
> I don't think it precludes a later addition to do so.
That was my point about this change being harder to do later than now.

I think no one wants to do that now, so we're all happy to pay the
price later, because that will likely never come.

> I don't see why predicate values would be affected at all.  If a
machine
> with variable vector length has predicates, then typically the resulting
> operation would operate on the bitwise AND of the predicate and a
> conceptual all 1's predicate of length VL.
I think the problem is that SVE is fully predicated and Cray (RISC-V?)
is not, so mixing the two could lead into weird predication
situations.

So, if a high level optimisation pass assumes full predication and
change the loop accordingly, and another pass assumes no predication
and adds VL changes (say, loop tails), then we may end up with
incompatible IR that will be hard to select down in ISel.

Given that SVE has both predication and vscale change, this could
happen in practice. It wouldn't be necessarily wrong, but it would
have to be a conscious decision.

> Changing vscale would be no different than changing any other value in
> the program.  The dataflow determines its possible values at various
> program points.  vscale is an extra (implicit) operand to all vector
> operations with scalable type.
It is, but IIGIR, changing vscale and predicating are similar
transformations to achieve the similar goals, but will not be
represented the same way in IR.

Also, they're not always interchangeable, so that complicates the IR
matching in ISel as well as potential matching in optimisation passes.

> Why?  If a user does asm or some other such trick to change what vscale
> means, that's on the user.  If a machine has a VL that changes
> iteration-to-iteration, typically the compiler would be responsible for
> controlling it.
Not asm, sorry. Inline as is "user error".

I meant: make sure adding an IR visible change in VL (say, an
intrinsic or instruction), within a self-contained block, becomes an
IR error.

> If the vendor provides some target intrinsics to let the user write
> low-level vector code that changes vscale in a high-level language, then
> the vendor would be responsible for adding the necessary bits to the
> frontend and LLVM.  I would not recommend a vendor try to do this.  :)
Not recommending by making it an explicit error. :)

It may sound harsh, but given we're taking some pretty liberal design
choices right now, which could have long lasting impact on the
stability and quality of LLVM's code generation, I'd say we need to be
as conservative as possible.

> I don't see why.  Anyone adding ability to change vscale would need to
> add intrinsics and specify their semantics.  That shouldn't change
> anything about this proposal and any such additions shouldn't be
> hampered by this proposal.
I don't think it would be hard to do, but it could have consequences
to the rest of the optimisation and code generation pipeline.

I do not claim to have a clear vision on any of this, but as I said
above, it will pay off long term is we start conservative.

> I don't think we should worry about taking IR with dynamic changes to
VL
> and trying to generate good code for any random target from it.  Such IR
> is very clearly tied to a specific kind of target and we shouldn't
> bother pretending otherwise.
We're preaching for the same goals. :)

But we're trying to represent slightly different techniques
(predication, vscale change) which need to be tied down to only
exactly what they do.

Being conservative and explicit on the semantics is, IMHO, the easiest
path to get it right. We can surely expand later.

-- 
cheers,
--renato

Bruce Hoult via llvm-dev

2018-Jul-31 12:48 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Tue, Jul 31, 2018 at 9:13 PM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Indeed. I have a limited view on the spec and even more so on hardware
> implementations, but it is my understanding that there is no attempt
> to change VL mid-loop.
>
> If we can assume VL will be "the same" (not constant) throughout
every
> self-contained sub-graph (from scalar|memory->vector to
> vector->scalar|memory), there we should encode it in the IR spec that
> this is a hard requirement.
>
I don't see any harm in (very occasionally) making the VL shorter somewhere
within an iteration of a loop. Some work that was already done will be
wasted, but that's not a correctness problem. Making the VL longer
mid-iteration would of course be very bad.

The important thing is that the various source and destination pointers are
updated by the correct amount at the end of the loop.

> This seems consistent with your explanation of the Cray VL change as
> well as Bruce's description of RISC-V (both seem very similar to me),
> where VL can change between two loop iterations but not within the
> same iteration.
>
I'm not sure whether it will end up being possible or not, but I did
describe two situations where at least some RISC-V implementations might
want to change VL within an iteration:

1) a memory protection problem on some trailing part of a vector load or
store, causing that iteration to operate only on the accessible part, and
the next iteration to start from the first address in the non-accessible
part (and actually take a fault)

2) an interrupt/task switch in the middle of a loop iteration. Some
implementations may want to save/restore only the vector configuration, not
the values of the vector registers.
> I don't see why predicate values would be affected at all.  If a
machine
> > with variable vector length has predicates, then typically the
resulting
> > operation would operate on the bitwise AND of the predicate and a
> > conceptual all 1's predicate of length VL.
>
> I think the problem is that SVE is fully predicated and Cray (RISC-V?)
> is not, so mixing the two could lead into weird predication
> situations.
>
The current RISC-V proposal has a 2-bit field in each vector instruction,
with the values indicating:

- it's actually scalar
- vector operation with no predication
- vector operation, masked by the predicate register
- vector operation, masked by the inverse of the predicate register
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180731/0be965be/attachment.html>

Renato Golin via llvm-dev

2018-Jul-31 13:54 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Tue, 31 Jul 2018 at 13:48, Bruce Hoult <brucehoult at sifive.com>
wrote:> I don't see any harm in (very occasionally) making the VL shorter
somewhere within an iteration of a loop. Some work that was already done will be
wasted, but that's not a correctness problem. Making the VL longer
mid-iteration would of course be very bad.
> The important thing is that the various source and destination pointers are
updated by the correct amount at the end of the loop.
If this is orthogonal to the IR representation, ie. doesn't need
current instructions to *know* about it, but the sequence of IR
instructions will represent it, than it should be fine.

> I'm not sure whether it will end up being possible or not, but I did
describe two situations where at least some RISC-V implementations might want to
change VL within an iteration:
Apologies, I may have misinterpreted them.

> 1) a memory protection problem on some trailing part of a vector load or
store, causing that iteration to operate only on the accessible part, and the
next iteration to start from the first address in the non-accessible part (and
actually take a fault)
SVE deals with those problems with predication and FFR
(first-fault-register), not by changing the VL, but I imagine they're
semantically similar.

> 2) an interrupt/task switch in the middle of a loop iteration. Some
implementations may want to save/restore only the vector configuration, not the
values of the vector registers.
I assume the architecture will have to continue the program in the
same state they were when the interrupt occurred. How it does
shouldn't concern the code generation.


-- 
cheers,
--renato

David A. Greene via llvm-dev

2018-Jul-31 15:36 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Renato Golin <renato.golin at linaro.org> writes:
>> The points where VL would be changed are limited and I think would
>> require limited, straightforward additions on top of this proposal.
>
> Indeed. I have a limited view on the spec and even more so on hardware
> implementations, but it is my understanding that there is no attempt
> to change VL mid-loop.
What does "mid-loop" mean?  On traditional vector architectures it was
very common to change VL for the last loop iteration.  Otherwise you had
to have a remainder loop.  It was much better to change VL.
> If we can assume VL will be "the same" (not constant) throughout
every
> self-contained sub-graph (from scalar|memory->vector to
> vector->scalar|memory), there we should encode it in the IR spec that
> this is a hard requirement.
>
> This seems consistent with your explanation of the Cray VL change as
> well as Bruce's description of RISC-V (both seem very similar to me),
> where VL can change between two loop iterations but not within the
> same iteration.
Ok, I think I am starting to grasp what you are saying.  If a value
flows from memory or some scalar computation to vector and then back to
memory or scalar, VL should only ever be set at the start of the vector
computation until it finishes and the value is deposited in memory or
otherwise extracted.  I think this is ok, but note that any vector
functions called may change VL for the duration of the call.  The change
would not be visible to the caller.

Just thinking this through, a case where one might want to change VL
mid-stream is something like a half-length set of operations that feeds
a vector concat and then a full length set of operations following.  But
again I think this would be a strange way to do things.  If someone
really wants to do this they can predicate away the upper bits of the
half-length operations and maintain the same VL throughout the
computation.  If predication isn't available they they've got more
serious problems vectorizing code.  :)
> We will still have to be careful with access safety (alias, loop
> dependencies, etc), but that shouldn't be different than if VL was
> required to be constant throughout the program.
Yep.
>> That's right.  This proposal doesn't expose a way to change
vscale, but
>> I don't think it precludes a later addition to do so.
>
> That was my point about this change being harder to do later than now.
I guess I don't see why it would be any harder later.
> I think no one wants to do that now, so we're all happy to pay the
> price later, because that will likely never come.
I am not so sure about that.  Power requirements may very well drive
more dynamic vector lengths.  Even today some AVX 512 implementations
falter if there are "too many" 512-bit operations.  Scaling back SIMD
width statically is very common today and doing so dynamically seems
like an obvious extension.  I don't know of any efforts to do this so
it's all speculative at this point.  But the industry has done it in the
past and we have a curious pattern of reinventing things we did before.
>> I don't see why predicate values would be affected at all.  If a
machine
>> with variable vector length has predicates, then typically the
resulting
>> operation would operate on the bitwise AND of the predicate and a
>> conceptual all 1's predicate of length VL.
>
> I think the problem is that SVE is fully predicated and Cray (RISC-V?)
> is not, so mixing the two could lead into weird predication
> situations.
Cray vector ISAs were fully predicated and also used a vector length.
It didn't cause us any serious issues.  In many ways having an
adjustable VL and predication makes things easier because you don't have
to regenerate predicates to switch to a shorter VL.
> So, if a high level optimisation pass assumes full predication and
> change the loop accordingly, and another pass assumes no predication
> and adds VL changes (say, loop tails), then we may end up with
> incompatible IR that will be hard to select down in ISel.
>
> Given that SVE has both predication and vscale change, this could
> happen in practice. It wouldn't be necessarily wrong, but it would
> have to be a conscious decision.
It seems strange to me for an optimizer to operate in such a way.  The
optimizer should be fully aware of the target's capabilities and use
them accordingly.  But let's say this happens.  Pass 1 vectorizes the
loop with predication (for a conditional loop body) and creates a
remainder loop, which would also need to be predicated.  Note that such
a remainder loop is not necessary with full predication support but for
the sake of argument lets say pass 1 is not too smart.

Pass 2 comes along and says, "hey, I have the ability to change VL so we
don't need a remainder loop."  It rewrites the main loop to use dynamic
VL and removes the remainder loop.  During that rewrite, pass 2 would
have to maintain predication.  It can use the very same predicate values
pass 1 generated.  There is no need to adjust them because the VL is
applied "on top of" the predicates.

Pass 2 effectively rewrites the code to what the vectorizer should have
emitted in the first place.  I'm not seeing how ISel is any more
difficult.  SVE has an implicit vscale operand on every instruction and
ARM seems to have no difficulty selecting instructions for it.  Changing
the value of vscale shouldn't impact ISel at all.  The same instructions
are selected.
>> Changing vscale would be no different than changing any other value in
>> the program.  The dataflow determines its possible values at various
>> program points.  vscale is an extra (implicit) operand to all vector
>> operations with scalable type.
>
> It is, but IIGIR, changing vscale and predicating are similar
> transformations to achieve the similar goals, but will not be
> represented the same way in IR.
They probably will not be represented the same way, though I think they
could be (but probably shouldn't be).
> Also, they're not always interchangeable, so that complicates the IR
> matching in ISel as well as potential matching in optimisation passes.
I'm not sure it does but I haven't worked something all the way through.
>> Why?  If a user does asm or some other such trick to change what vscale
>> means, that's on the user.  If a machine has a VL that changes
>> iteration-to-iteration, typically the compiler would be responsible for
>> controlling it.
>
> Not asm, sorry. Inline as is "user error".
Ok.
> I meant: make sure adding an IR visible change in VL (say, an
> intrinsic or instruction), within a self-contained block, becomes an
> IR error.
What do you mean by "self-contained block?"  Assuming I understood it
correctly, the restriction you described at the top seems reasonable for
now.
>> If the vendor provides some target intrinsics to let the user write
>> low-level vector code that changes vscale in a high-level language,
then
>> the vendor would be responsible for adding the necessary bits to the
>> frontend and LLVM.  I would not recommend a vendor try to do this.  :)
>
> Not recommending by making it an explicit error. :)
>
> It may sound harsh, but given we're taking some pretty liberal design
> choices right now, which could have long lasting impact on the
> stability and quality of LLVM's code generation, I'd say we need to
be
> as conservative as possible.
Ok, but would be optimizer be prevented from introducing VL changes?
>> I don't see why.  Anyone adding ability to change vscale would need
to
>> add intrinsics and specify their semantics.  That shouldn't change
>> anything about this proposal and any such additions shouldn't be
>> hampered by this proposal.
>
> I don't think it would be hard to do, but it could have consequences
> to the rest of the optimisation and code generation pipeline.
It could.  I don't think any of us has a clear idea of what those might
be.
> I do not claim to have a clear vision on any of this, but as I said
> above, it will pay off long term is we start conservative.
Being conservative is fine, but we should have a clear understanding of
exactly what that means.  I would not want to prohibit all VL changes
now and forever, because I see that as unnecessarily restrictive and
possibly damaging to supporting future architectures.

If we don't want to provide intrinsics for changing VL right now, I'm
all in favor.  There would be no reason to add error checks because
there would be no way within the IR to change VL.

But I don't want to preclude adding such intrinsics in the future.
>> I don't think we should worry about taking IR with dynamic changes
to VL
>> and trying to generate good code for any random target from it.  Such
IR
>> is very clearly tied to a specific kind of target and we shouldn't
>> bother pretending otherwise.
>
> We're preaching for the same goals. :)
Good!  :)
> But we're trying to represent slightly different techniques
> (predication, vscale change) which need to be tied down to only
> exactly what they do.
Wouldn't intrinsics to change vscale do exactly that?
> Being conservative and explicit on the semantics is, IMHO, the easiest
> path to get it right. We can surely expand later.
I'm all for being explicit.  I think we're basically on the same page,
though there are a few things noted above where I need a little more
clarity.

                               -David

Renato Golin via llvm-dev

2018-Jul-31 18:21 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi David,

Let me put the last two comments up:
> > But we're trying to represent slightly different techniques
> > (predication, vscale change) which need to be tied down to only
> > exactly what they do.
>
> Wouldn't intrinsics to change vscale do exactly that?
You're right. I've been using the same overloaded term and this is
probably what caused the confusion.

In some cases, predicating and shortening the vectors are semantically
equivalent. In this case, the IR should also be equivalent.
Instructions/intrinsics that handle predication could be used by the
backend to simply change VL instead, as long as it's guaranteed that
the semantics are identical. There are no problems here.

In other cases, for example widening or splitting the vector, or cases
we haven't thought of yet, the semantics are not the same, and having
them in IR would be bad. I think we're all in agreements on that.

All I'm asking is that we make a list of what we want to happen and
disallow everything else explicitly, until someone comes with a strong
case for it. Makes sense?

> I'm all for being explicit.  I think we're basically on the same
page,
> though there are a few things noted above where I need a little more
> clarity.
Yup, I think we are. :)


> What does "mid-loop" mean?  On traditional vector architectures
it was
> very common to change VL for the last loop iteration.  Otherwise you had
> to have a remainder loop.  It was much better to change VL.
You got it below...

> Ok, I think I am starting to grasp what you are saying.  If a value
> flows from memory or some scalar computation to vector and then back to
> memory or scalar, VL should only ever be set at the start of the vector
> computation until it finishes and the value is deposited in memory or
> otherwise extracted.  I think this is ok, but note that any vector
> functions called may change VL for the duration of the call.  The change
> would not be visible to the caller.
If a function is called and changes the length, does it restore back on return?

> I am not so sure about that.  Power requirements may very well drive
> more dynamic vector lengths.  Even today some AVX 512 implementations
> falter if there are "too many" 512-bit operations.  Scaling back
SIMD
> width statically is very common today and doing so dynamically seems
> like an obvious extension.  I don't know of any efforts to do this so
> it's all speculative at this point.  But the industry has done it in
the
> past and we have a curious pattern of reinventing things we did before.
Right, so it's not as clear cut as I hoped. But we can start
implementing the basic idea and then expand as we go. I think trying
to hash out all potential scenarios now will drive us crazy.

> It seems strange to me for an optimizer to operate in such a way.  The
> optimizer should be fully aware of the target's capabilities and use
> them accordingly.
Mid-end optimisers tend to be fairly agnostic. And when not, they
usually ask "is this supported" instead of "which one is
better".

> ARM seems to have no difficulty selecting instructions for it.  Changing
> the value of vscale shouldn't impact ISel at all.  The same
instructions
> are selected.
I may very well be getting lost in too many floating future ideas, atm. :)

> > It is, but IIGIR, changing vscale and predicating are similar
> > transformations to achieve the similar goals, but will not be
> > represented the same way in IR.
>
> They probably will not be represented the same way, though I think they
> could be (but probably shouldn't be).
Maybe in the simple cases (like last iteration) they should be?

> Ok, but would be optimizer be prevented from introducing VL changes?
In the case where they're represented in similar ways in IR, it
wouldn't need to.

Otherwise, we'd have to teach the two methods to IR optimisers that
are virtually identical in semantics. It'd be left for the back end to
implement the last iteration notation as a predicate fill or a vscale
change.

> Being conservative is fine, but we should have a clear understanding of
> exactly what that means.  I would not want to prohibit all VL changes
> now and forever, because I see that as unnecessarily restrictive and
> possibly damaging to supporting future architectures.
>
> If we don't want to provide intrinsics for changing VL right now,
I'm
> all in favor.  There would be no reason to add error checks because
> there would be no way within the IR to change VL.
Right, I think we're converging.

How about we don't forbid changes in vscale, but we find a common
notation for all the cases where predicating and changing vscale would
be semantically identical, and implement those in the same way.

Later on, if there are additional cases where changes in vscale would
be beneficial, we can discuss them independently.

Makes sense?

-- 
cheers,
--renato

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Jul 2018 - [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Apparently Analagous Threads