thr3ads.net - llvm dev - [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2018-Jul-30 20:12 UTC

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Mon, 30 Jul 2018 at 20:57, David A. Greene via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I'm not sure exactly how the SVE proposal would address this kind of
> operation.
SVE uses predication. The physical number of lanes doesn't have to
change to have the same effect (alignment, tails).

> I think it would be unlikely for anyone to need to change the vector
> length during evaluation of an in-register expression.
The worry here is not within each instruction but across instructions.
SVE (and I think RISC-V) allow register size to be dynamically set.

For example, on the same machine, it may be 256 for one process and
512 for another (for example, to save power).

But the change is via a system register, so in theory, anyone can
write an inline asm in the beginning of a function and change the
vector length to whatever they want.

Worst still, people can do that inside loops, or in a tail loop,
thinking it's a good idea (or this is a Cray machine :).

AFAIK, the interface for changing the register length will not be
exposed programmatically, so in theory, we should not worry about it.
Any inline asm hack can be considered out of scope / user error.

However, Hal's concern seems to be that, in the event of anyone
planning to add it to their APIs, we need to make sure the proposed
semantics can cope with it (do we need to update the predicates again?
what will vscale mean, then and when?).

If not, we may have to enforce that this will not come to pass in its
current form. In this case, changing it later will require *a lot*
more effort than doing it now.

So, it would be good to get a clear response from the two fronts (SVE
and RISC-V) about the future intention to expose that or not.

--
cheers,
--renato

Bruce Hoult via llvm-dev

2018-Jul-31 00:13 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Mon, Jul 30, 2018 at 1:12 PM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> The worry here is not within each instruction but across instructions.
> SVE (and I think RISC-V) allow register size to be dynamically set.
>
> For example, on the same machine, it may be 256 for one process and
> 512 for another (for example, to save power).
>
> But the change is via a system register, so in theory, anyone can
> write an inline asm in the beginning of a function and change the
> vector length to whatever they want.
>
> Worst still, people can do that inside loops, or in a tail loop,
> thinking it's a good idea (or this is a Cray machine :).
>
> AFAIK, the interface for changing the register length will not be
> exposed programmatically, so in theory, we should not worry about it.
> Any inline asm hack can be considered out of scope / user error.
>
> However, Hal's concern seems to be that, in the event of anyone
> planning to add it to their APIs, we need to make sure the proposed
> semantics can cope with it (do we need to update the predicates again?
> what will vscale mean, then and when?).
>
> If not, we may have to enforce that this will not come to pass in its
> current form. In this case, changing it later will require *a lot*
> more effort than doing it now.
>
> So, it would be good to get a clear response from the two fronts (SVE
> and RISC-V) about the future intention to expose that or not.
>
Some characteristics of how I believe RISC-V vectors will or could end up:

- the user's data is stored only in normal C "arrays" (which of
course can
mean a pointer into the middle of some arbitrary chunk of memory)

- vector register types will be used only within a loop in a single
user-written function. There is no way to pass a vector variable from one
function to another -- there is no effect on ABI.

- there will be some vector intrinsic functions such as trancendentals.
They will use a different, private ABI used only by the compiler and
implemented only in the runtime library. They will probably use the
alternate link register (x5 instead of x1) and will be totally not miscible
with normal functions.

- even within a single function, different loops may have different maximum
vector length, depending on how many vector registers are required and of
what element types (all vectors in a given loop have the same number of
elements).

- the active vector length can change from iteration to iteration of a
loop. In particular, it can be less on the final iteration to deal with
tails.

- the active vector length is set at the head of each iteration of a loop
by the program telling the hardware how many elements are left (possibly
thousands or millions) and the hardware saying "you can have 17 this
time"

- (maybe) the active vector length can become shorter during execution of a
loop iteration as a side effect of a vector load or store getting a
protection error and loading/storing only up to the protection boundary. In
this case an actual trap will be taken only if the first element of the
vector causes the problem. Different micro-architectures might handle this
differently. It should be a rare event. An interrupt or task switch during
execution of a vector loop may cause the active vector length to become
zero for that iteration.


So, this is quite different in detail to ARM's SVE but it should be able to
use the same type system. The main differences are probably that they seem
to intend to be able to pass vector types from one function to another --
but their vector length is fixed for any given processor (or process?).
RISC-V loops may need to query the active vector length at the end of each
loop iteration. That's a different instruction that needs to be emitted,
but has no effect on the type system.
>From the point of view of the type system, I think RISC-V is a subset ofSVE, as there is no need to pass vectors between functions and no effect on
the ABI.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180730/5bbc5674/attachment.html>

David A. Greene via llvm-dev

2018-Jul-31 02:53 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Renato Golin <renato.golin at linaro.org> writes:
> On Mon, 30 Jul 2018 at 20:57, David A. Greene via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> I'm not sure exactly how the SVE proposal would address this kind
of
>> operation.
>
> SVE uses predication. The physical number of lanes doesn't have to
> change to have the same effect (alignment, tails).
Right.  My wording was poor.  The current proposal doesn't directly
support a more dynamic vscale target but I believe it could be simply
extended to do so.
>> I think it would be unlikely for anyone to need to change the vector
>> length during evaluation of an in-register expression.
>
> The worry here is not within each instruction but across instructions.
> SVE (and I think RISC-V) allow register size to be dynamically set.
I wasn't talking about within an instruction but rather across
instructions in the same expression tree.  Something like this would be
weird:

A = load with VL
B = load with VL
C = A + B           # VL implicit
VL = <something>
D = ~C              # VL implicit
store D

Here and beyond, read "VL" as "vscale with minimum element count
1."

The points where VL would be changed are limited and I think would
require limited, straightforward additions on top of this proposal.
> For example, on the same machine, it may be 256 for one process and
> 512 for another (for example, to save power).
Sure.
> But the change is via a system register, so in theory, anyone can
> write an inline asm in the beginning of a function and change the
> vector length to whatever they want.
>
> Worst still, people can do that inside loops, or in a tail loop,
> thinking it's a good idea (or this is a Cray machine :).
>
> AFAIK, the interface for changing the register length will not be
> exposed programmatically, so in theory, we should not worry about it.
> Any inline asm hack can be considered out of scope / user error.
That's right.  This proposal doesn't expose a way to change vscale, but
I don't think it precludes a later addition to do so.
> However, Hal's concern seems to be that, in the event of anyone
> planning to add it to their APIs, we need to make sure the proposed
> semantics can cope with it (do we need to update the predicates again?
> what will vscale mean, then and when?).
I don't see why predicate values would be affected at all.  If a machine
with variable vector length has predicates, then typically the resulting
operation would operate on the bitwise AND of the predicate and a
conceptual all 1's predicate of length VL.

As I understand it, vscale is the runtime multiple of some minimal,
guaranteed vector length.  For SVE that minimum is whatever gives a bit
width of 128.  My guess is that for a machine with a more dynamic vector
length, the minimum would be 1.  vscale would then be the vector length
and would change accordingly if the vector length is changed.

Changing vscale would be no different than changing any other value in
the program.  The dataflow determines its possible values at various
program points.  vscale is an extra (implicit) operand to all vector
operations with scalable type.
> If not, we may have to enforce that this will not come to pass in its
> current form.
Why?  If a user does asm or some other such trick to change what vscale
means, that's on the user.  If a machine has a VL that changes
iteration-to-iteration, typically the compiler would be responsible for
controlling it.

If the vendor provides some target intrinsics to let the user write
low-level vector code that changes vscale in a high-level language, then
the vendor would be responsible for adding the necessary bits to the
frontend and LLVM.  I would not recommend a vendor try to do this.  :)
It wouldn't necessarily be hard to do, but it would be wasted work IMO
because it would be better to improve the vectorizer that already
exists.
> In this case, changing it later will require *a lot* more effort than
> doing it now.
I don't see why.  Anyone adding ability to change vscale would need to
add intrinsics and specify their semantics.  That shouldn't change
anything about this proposal and any such additions shouldn't be
hampered by this proposal.

Another way to think of vscale/vector length is as a different kind of
predicate.  Right now LLVM uses select to track predicate application.
It uses a "top-down" approach in that the root of an expression tree
(a
select) applies the predicate and presumably everything under it
operates under that predicate.  It also uses intrinsics for certain
operations (loads, stores, etc.) that absolutely must be predicated no
matter what for safety reasons.  So it's sort of a hybrid approach, with
predicate application at the root, certain leaves and maybe even on
interior nodes (FP operations come to mind).

To my knowledge, there's nothing in LLVM that checks to make sure these
predicate applications are all consistent with one another.  Someone
could do a load with predicate 0011 and then a "select div" with
predicate 1111, likely resulting in a runtime fault but nothing in LLVM
would assert on the predicate mismatch.

Predicates could also be applied only at the leaves and propagated up
the tree.  IIRC, Dan Gohman proposed something like this years back when
the topic of predication came up.  He called it "applymask" but
unfortunately the Google is failing to find it.  

I *could* imagine using select to also convey application of vector
length but that seems odd and unnecessarily complex.

If vector length were applied at the leaves, it would take a bit of work
to get it through instruction selection.  Target opcodes would be one
way to do it.  I think it would be straightforward to walk the DAG and
change generic opcodes to target opcodes when necessary.

I don't think we should worry about taking IR with dynamic changes to VL
and trying to generate good code for any random target from it.  Such IR
is very clearly tied to a specific kind of target and we shouldn't
bother pretending otherwise.  The vectorizer should be aware of the
target's capabilities and generate code accordingly.

                        -David

Renato Golin via llvm-dev

2018-Jul-31 11:13 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Tue, 31 Jul 2018 at 03:53, David A. Greene <dag at cray.com>
wrote:> I wasn't talking about within an instruction but rather across
> instructions in the same expression tree.  Something like this would be
> weird:
Yes, that's what I was referring as "not in the API" therefore
"user error".

> The points where VL would be changed are limited and I think would
> require limited, straightforward additions on top of this proposal.
Indeed. I have a limited view on the spec and even more so on hardware
implementations, but it is my understanding that there is no attempt
to change VL mid-loop.

If we can assume VL will be "the same" (not constant) throughout every
self-contained sub-graph (from scalar|memory->vector to
vector->scalar|memory), there we should encode it in the IR spec that
this is a hard requirement.

This seems consistent with your explanation of the Cray VL change as
well as Bruce's description of RISC-V (both seem very similar to me),
where VL can change between two loop iterations but not within the
same iteration.

We will still have to be careful with access safety (alias, loop
dependencies, etc), but that shouldn't be different than if VL was
required to be constant throughout the program.

> That's right.  This proposal doesn't expose a way to change vscale,
but
> I don't think it precludes a later addition to do so.
That was my point about this change being harder to do later than now.

I think no one wants to do that now, so we're all happy to pay the
price later, because that will likely never come.

> I don't see why predicate values would be affected at all.  If a
machine
> with variable vector length has predicates, then typically the resulting
> operation would operate on the bitwise AND of the predicate and a
> conceptual all 1's predicate of length VL.
I think the problem is that SVE is fully predicated and Cray (RISC-V?)
is not, so mixing the two could lead into weird predication
situations.

So, if a high level optimisation pass assumes full predication and
change the loop accordingly, and another pass assumes no predication
and adds VL changes (say, loop tails), then we may end up with
incompatible IR that will be hard to select down in ISel.

Given that SVE has both predication and vscale change, this could
happen in practice. It wouldn't be necessarily wrong, but it would
have to be a conscious decision.

> Changing vscale would be no different than changing any other value in
> the program.  The dataflow determines its possible values at various
> program points.  vscale is an extra (implicit) operand to all vector
> operations with scalable type.
It is, but IIGIR, changing vscale and predicating are similar
transformations to achieve the similar goals, but will not be
represented the same way in IR.

Also, they're not always interchangeable, so that complicates the IR
matching in ISel as well as potential matching in optimisation passes.

> Why?  If a user does asm or some other such trick to change what vscale
> means, that's on the user.  If a machine has a VL that changes
> iteration-to-iteration, typically the compiler would be responsible for
> controlling it.
Not asm, sorry. Inline as is "user error".

I meant: make sure adding an IR visible change in VL (say, an
intrinsic or instruction), within a self-contained block, becomes an
IR error.

> If the vendor provides some target intrinsics to let the user write
> low-level vector code that changes vscale in a high-level language, then
> the vendor would be responsible for adding the necessary bits to the
> frontend and LLVM.  I would not recommend a vendor try to do this.  :)
Not recommending by making it an explicit error. :)

It may sound harsh, but given we're taking some pretty liberal design
choices right now, which could have long lasting impact on the
stability and quality of LLVM's code generation, I'd say we need to be
as conservative as possible.

> I don't see why.  Anyone adding ability to change vscale would need to
> add intrinsics and specify their semantics.  That shouldn't change
> anything about this proposal and any such additions shouldn't be
> hampered by this proposal.
I don't think it would be hard to do, but it could have consequences
to the rest of the optimisation and code generation pipeline.

I do not claim to have a clear vision on any of this, but as I said
above, it will pay off long term is we start conservative.

> I don't think we should worry about taking IR with dynamic changes to
VL
> and trying to generate good code for any random target from it.  Such IR
> is very clearly tied to a specific kind of target and we shouldn't
> bother pretending otherwise.
We're preaching for the same goals. :)

But we're trying to represent slightly different techniques
(predication, vscale change) which need to be tied down to only
exactly what they do.

Being conservative and explicit on the semantics is, IMHO, the easiest
path to get it right. We can surely expand later.

-- 
cheers,
--renato

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jul 2018 - [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Maybe Matching Threads