thr3ads.net - llvm dev - [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2018-Jul-31 18:21 UTC

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi David,

Let me put the last two comments up:
> > But we're trying to represent slightly different techniques
> > (predication, vscale change) which need to be tied down to only
> > exactly what they do.
>
> Wouldn't intrinsics to change vscale do exactly that?
You're right. I've been using the same overloaded term and this is
probably what caused the confusion.

In some cases, predicating and shortening the vectors are semantically
equivalent. In this case, the IR should also be equivalent.
Instructions/intrinsics that handle predication could be used by the
backend to simply change VL instead, as long as it's guaranteed that
the semantics are identical. There are no problems here.

In other cases, for example widening or splitting the vector, or cases
we haven't thought of yet, the semantics are not the same, and having
them in IR would be bad. I think we're all in agreements on that.

All I'm asking is that we make a list of what we want to happen and
disallow everything else explicitly, until someone comes with a strong
case for it. Makes sense?

> I'm all for being explicit.  I think we're basically on the same
page,
> though there are a few things noted above where I need a little more
> clarity.
Yup, I think we are. :)


> What does "mid-loop" mean?  On traditional vector architectures
it was
> very common to change VL for the last loop iteration.  Otherwise you had
> to have a remainder loop.  It was much better to change VL.
You got it below...

> Ok, I think I am starting to grasp what you are saying.  If a value
> flows from memory or some scalar computation to vector and then back to
> memory or scalar, VL should only ever be set at the start of the vector
> computation until it finishes and the value is deposited in memory or
> otherwise extracted.  I think this is ok, but note that any vector
> functions called may change VL for the duration of the call.  The change
> would not be visible to the caller.
If a function is called and changes the length, does it restore back on return?

> I am not so sure about that.  Power requirements may very well drive
> more dynamic vector lengths.  Even today some AVX 512 implementations
> falter if there are "too many" 512-bit operations.  Scaling back
SIMD
> width statically is very common today and doing so dynamically seems
> like an obvious extension.  I don't know of any efforts to do this so
> it's all speculative at this point.  But the industry has done it in
the
> past and we have a curious pattern of reinventing things we did before.
Right, so it's not as clear cut as I hoped. But we can start
implementing the basic idea and then expand as we go. I think trying
to hash out all potential scenarios now will drive us crazy.

> It seems strange to me for an optimizer to operate in such a way.  The
> optimizer should be fully aware of the target's capabilities and use
> them accordingly.
Mid-end optimisers tend to be fairly agnostic. And when not, they
usually ask "is this supported" instead of "which one is
better".

> ARM seems to have no difficulty selecting instructions for it.  Changing
> the value of vscale shouldn't impact ISel at all.  The same
instructions
> are selected.
I may very well be getting lost in too many floating future ideas, atm. :)

> > It is, but IIGIR, changing vscale and predicating are similar
> > transformations to achieve the similar goals, but will not be
> > represented the same way in IR.
>
> They probably will not be represented the same way, though I think they
> could be (but probably shouldn't be).
Maybe in the simple cases (like last iteration) they should be?

> Ok, but would be optimizer be prevented from introducing VL changes?
In the case where they're represented in similar ways in IR, it
wouldn't need to.

Otherwise, we'd have to teach the two methods to IR optimisers that
are virtually identical in semantics. It'd be left for the back end to
implement the last iteration notation as a predicate fill or a vscale
change.

> Being conservative is fine, but we should have a clear understanding of
> exactly what that means.  I would not want to prohibit all VL changes
> now and forever, because I see that as unnecessarily restrictive and
> possibly damaging to supporting future architectures.
>
> If we don't want to provide intrinsics for changing VL right now,
I'm
> all in favor.  There would be no reason to add error checks because
> there would be no way within the IR to change VL.
Right, I think we're converging.

How about we don't forbid changes in vscale, but we find a common
notation for all the cases where predicating and changing vscale would
be semantically identical, and implement those in the same way.

Later on, if there are additional cases where changes in vscale would
be beneficial, we can discuss them independently.

Makes sense?

-- 
cheers,
--renato

David A. Greene via llvm-dev

2018-Jul-31 19:10 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Hi David,
>
> Let me put the last two comments up:
>
>> > But we're trying to represent slightly different techniques
>> > (predication, vscale change) which need to be tied down to only
>> > exactly what they do.
>>
>> Wouldn't intrinsics to change vscale do exactly that?
>
> You're right. I've been using the same overloaded term and this is
> probably what caused the confusion.
Me too.  Thanks Robin for clarifying this for all of us!  I'll try to
follow this terminology:

VL/active vector length - The software notion of how many elements to
                          operate on; a special case of predication

vscale - The hardware notion of how big a vector register is

TL;DR - Changing VL in a function doesn't affect anything about this
        proposal, but changing vscale might.  Changing VL shouldn't
        impact things like ISel at all but changing vscale might.
        Changing vscale is (much) more difficult than changing VL.
> In some cases, predicating and shortening the vectors are semantically
> equivalent. In this case, the IR should also be equivalent.
> Instructions/intrinsics that handle predication could be used by the
> backend to simply change VL instead, as long as it's guaranteed that
> the semantics are identical. There are no problems here.
Right.  Changing VL is no problem.  I think even reducing vscale is ok
from an IR perspective, if a little strange.
> In other cases, for example widening or splitting the vector, or cases
> we haven't thought of yet, the semantics are not the same, and having
> them in IR would be bad. I think we're all in agreements on that.
You mean going from a shorter active vector length to a longer active
vector length?  Or smaller vscale to larger vscale?  The latter would be
bad.  The former seems ok if the dataflow is captured and the vectorizer
generates correct code to account for it.  Presumably it would if it is
the thing changing the active vector length.
> All I'm asking is that we make a list of what we want to happen and
> disallow everything else explicitly, until someone comes with a strong
> case for it. Makes sense?
Yes.
>> Ok, I think I am starting to grasp what you are saying.  If a value
>> flows from memory or some scalar computation to vector and then back to
>> memory or scalar, VL should only ever be set at the start of the vector
>> computation until it finishes and the value is deposited in memory or
>> otherwise extracted.  I think this is ok, but note that any vector
>> functions called may change VL for the duration of the call.  The
change
>> would not be visible to the caller.
>
> If a function is called and changes the length, does it restore back on
return?
If a function changes VL, it would typically restore it before return.
This would be an ABI guarantee just like any other callee-save register.

If a function changes vscale, I don't know.  The RISC-V people seem to
have thought the most about this.  I have no point of reference here.
> Right, so it's not as clear cut as I hoped. But we can start
> implementing the basic idea and then expand as we go. I think trying
> to hash out all potential scenarios now will drive us crazy.
Sure.
>> It seems strange to me for an optimizer to operate in such a way.  The
>> optimizer should be fully aware of the target's capabilities and
use
>> them accordingly.
>
> Mid-end optimisers tend to be fairly agnostic. And when not, they
> usually ask "is this supported" instead of "which one is
better".
Yes, the "is this supported" question is common.  Isn't the whole
point
of VPlan to get the "which one is better" question answered for
vectorization?  That would be necessarily tied to the target.  The
questions asked can be agnostic, like the target-agnostics bits of
codegen use, but the answers would be target-specific.
>> ARM seems to have no difficulty selecting instructions for it. 
Changing
>> the value of vscale shouldn't impact ISel at all.  The same
instructions
>> are selected.
>
> I may very well be getting lost in too many floating future ideas, atm. :)
Given our clearer terminology, my statement above is maybe not correct.
Changing vscale *would* impact the IR and codegen (stack allocation,
etc.).  Changing VL would not, other than adding some Instructions to
capture the semantics.  I suspect neither would change ISel (I know VL
would not) but as you say I don't think we need concern ourselves with
changing vscale right now, unless others have a dire need to support it.
>> > It is, but IIGIR, changing vscale and predicating are similar
>> > transformations to achieve the similar goals, but will not be
>> > represented the same way in IR.
>>
>> They probably will not be represented the same way, though I think they
>> could be (but probably shouldn't be).
>
> Maybe in the simple cases (like last iteration) they should be?
Perhaps changing VL could be modeled the same way but I have a feeling
it will be awkward.  Changing vscale is something totally different and
likely should be represented differently if allowed at all.
>> Ok, but would be optimizer be prevented from introducing VL changes?
>
> In the case where they're represented in similar ways in IR, it
> wouldn't need to.
It would have to generate IR code to effect the software change in VL
somehow, by altering predicates or by using special instrinsics or some
other way.
> Otherwise, we'd have to teach the two methods to IR optimisers that
> are virtually identical in semantics. It'd be left for the back end to
> implement the last iteration notation as a predicate fill or a vscale
> change.
I suspect that is too late.  The vectorizer needs to account for the
choice and pick the most profitable course.  That's one of the reasons I
think modeling VL changes like predicates is maybe unnecessarily
complex.  If VL is modeled as "just another predicate" then
there's no
guarantee that ISel will honor the choices the vectorizer made to use VL
over predication.  If it's modeled explicitly, ISel should have an
easier time generating the code the vectorizer expects.

VL changes aren't always on the last iteration.  The Cray X1 had an
instruction (I would have to dust off old manuals to remember the
mnemonic) with somewhat strange semantics to get the desired VL for an
iteration.  Code would look something like this:

loop top:
  vl = getvl N      #  N contains the number of iterations left
  <do computation>
  N = N - vl
  branch N > 0, loop top

The "getvl" instruction would usually return the full hardware vector
register length (MAXVL), except on the 2nd-to-last iteration if N was
larger than MAXVL but less than 2*MAXVL it would return something like
<N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL).  The last
iteration would then run at the same VL or one less depending on whether
N was odd or even.  So the last two iterations would often run at less
than MAXVL and often at different VLs from each other.

And no, I don't know why the hardware operated this way.  :)
>> Being conservative is fine, but we should have a clear understanding of
>> exactly what that means.  I would not want to prohibit all VL changes
>> now and forever, because I see that as unnecessarily restrictive and
>> possibly damaging to supporting future architectures.
>>
>> If we don't want to provide intrinsics for changing VL right now,
I'm
>> all in favor.  There would be no reason to add error checks because
>> there would be no way within the IR to change VL.
>
> Right, I think we're converging.
Agreed.
> How about we don't forbid changes in vscale, but we find a common
> notation for all the cases where predicating and changing vscale would
> be semantically identical, and implement those in the same way.
>
> Later on, if there are additional cases where changes in vscale would
> be beneficial, we can discuss them independently.
>
> Makes sense?
Again trying to use the VL/vscale terminology:

Changing vscale - no IR support currently and less likely in the future
Changing VL     - no IR support currently but more likely in the future

The second seems like a straightforward extension to me.  There will be
some questions about how to represent VL semantics in IR but those don't
impact the proposal under discussion at all.

The first seems much harder, at least within a function.  It may or may
not impact the proposal under discussion.  It sounds like the RISC-V
people have some use cases so those should probably be the focal point
of this discussion.

                           -David

Renato Golin via llvm-dev

2018-Jul-31 19:36 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On Tue, 31 Jul 2018 at 20:10, David A. Greene <dag at cray.com>
wrote:> Me too.  Thanks Robin for clarifying this for all of us!  I'll try to
> follow this terminology:
+1

> TL;DR - Changing VL in a function doesn't affect anything about this
>         proposal, but changing vscale might.  Changing VL shouldn't
>         impact things like ISel at all but changing vscale might.
>         Changing vscale is (much) more difficult than changing VL.
Absolutely agreed. :)

> Right.  Changing VL is no problem.  I think even reducing vscale is ok
> from an IR perspective, if a little strange.
Yup.

> You mean going from a shorter active vector length to a longer active
> vector length?  Or smaller vscale to larger vscale?  The latter would be
> bad.
The latter. Bad indeed.

> If a function changes vscale, I don't know.  The RISC-V people seem to
> have thought the most about this.  I have no point of reference here.
I think the consensus is that this would be bad. So we should maybe
encode it as an error.

> Yes, the "is this supported" question is common.  Isn't the
whole point
> of VPlan to get the "which one is better" question answered for
> vectorization?
Yes, but the cost is high. We can have that in the vectoriser, as it's
a heavy pass and we're conscious, but we shouldn't make all other
passes "that smart".

> Changing vscale *would* impact the IR and codegen (stack allocation,
> etc.).  Changing VL would not, other than adding some Instructions to
> capture the semantics.  I suspect neither would change ISel (I know VL
> would not) but as you say I don't think we need concern ourselves with
> changing vscale right now, unless others have a dire need to support it.
Perfect! :)

> Perhaps changing VL could be modeled the same way but I have a feeling
> it will be awkward.  Changing vscale is something totally different and
> likely should be represented differently if allowed at all.
Right, I was talking about vscale.

It would be awkward, but if this is the only thing the hardware
supports (ie. no predication), than it's up to the back-end to lower
how it sees fit.

In IR, we still see as a predication.

> Again trying to use the VL/vscale terminology:
>
> Changing vscale - no IR support currently and less likely in the future
> Changing VL     - no IR support currently but more likely in the future
SGTM.

> The second seems like a straightforward extension to me.  There will be
> some questions about how to represent VL semantics in IR but those
don't
> impact the proposal under discussion at all.
Should be equivalent to predication, I imagine.

> The first seems much harder, at least within a function.
And it would require exposing the instruction to change it in IR.

>  It may or may not impact the proposal under discussion.
As per Robin's email, it doesn't. Functions are vscale boundaries in
their current proposal.

-- 
cheers,
--renato

Robin Kruppe via llvm-dev

2018-Jul-31 20:17 UTC

head link

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On 31 July 2018 at 21:10, David A. Greene via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
>> Hi David,
>>
>> Let me put the last two comments up:
>>
>>> > But we're trying to represent slightly different
techniques
>>> > (predication, vscale change) which need to be tied down to
only
>>> > exactly what they do.
>>>
>>> Wouldn't intrinsics to change vscale do exactly that?
>>
>> You're right. I've been using the same overloaded term and this
is
>> probably what caused the confusion.
>
> Me too.  Thanks Robin for clarifying this for all of us!  I'll try to
> follow this terminology:
>
> VL/active vector length - The software notion of how many elements to
>                           operate on; a special case of predication
>
> vscale - The hardware notion of how big a vector register is
>
> TL;DR - Changing VL in a function doesn't affect anything about this
>         proposal, but changing vscale might.  Changing VL shouldn't
>         impact things like ISel at all but changing vscale might.
>         Changing vscale is (much) more difficult than changing VL.
Great, seems like we're all in violent agreement that VL changes are a
non-issue for the discussion at hand.
>> In some cases, predicating and shortening the vectors are semantically
>> equivalent. In this case, the IR should also be equivalent.
>> Instructions/intrinsics that handle predication could be used by the
>> backend to simply change VL instead, as long as it's guaranteed
that
>> the semantics are identical. There are no problems here.
>
> Right.  Changing VL is no problem.  I think even reducing vscale is ok
> from an IR perspective, if a little strange.
>
>> In other cases, for example widening or splitting the vector, or cases
>> we haven't thought of yet, the semantics are not the same, and
having
>> them in IR would be bad. I think we're all in agreements on that.
>
> You mean going from a shorter active vector length to a longer active
> vector length?  Or smaller vscale to larger vscale?  The latter would be
> bad.  The former seems ok if the dataflow is captured and the vectorizer
> generates correct code to account for it.  Presumably it would if it is
> the thing changing the active vector length.
>
>> All I'm asking is that we make a list of what we want to happen and
>> disallow everything else explicitly, until someone comes with a strong
>> case for it. Makes sense?
>
> Yes.
>
>>> Ok, I think I am starting to grasp what you are saying.  If a value
>>> flows from memory or some scalar computation to vector and then
back to
>>> memory or scalar, VL should only ever be set at the start of the
vector
>>> computation until it finishes and the value is deposited in memory
or
>>> otherwise extracted.  I think this is ok, but note that any vector
>>> functions called may change VL for the duration of the call.  The
change
>>> would not be visible to the caller.
>>
>> If a function is called and changes the length, does it restore back on
return?
>
> If a function changes VL, it would typically restore it before return.
> This would be an ABI guarantee just like any other callee-save register.
>
> If a function changes vscale, I don't know.  The RISC-V people seem to
> have thought the most about this.  I have no point of reference here.
>
>> Right, so it's not as clear cut as I hoped. But we can start
>> implementing the basic idea and then expand as we go. I think trying
>> to hash out all potential scenarios now will drive us crazy.
>
> Sure.
>
>>> It seems strange to me for an optimizer to operate in such a way. 
The
>>> optimizer should be fully aware of the target's capabilities
and use
>>> them accordingly.
>>
>> Mid-end optimisers tend to be fairly agnostic. And when not, they
>> usually ask "is this supported" instead of "which one is
better".
>
> Yes, the "is this supported" question is common.  Isn't the
whole point
> of VPlan to get the "which one is better" question answered for
> vectorization?  That would be necessarily tied to the target.  The
> questions asked can be agnostic, like the target-agnostics bits of
> codegen use, but the answers would be target-specific.
Just like the old loop vectorizer, VPlan will need a cost model that
is based on properties of the target, exposed to the optimizer in the
form of e.g. TargetLowering hooks. But we should try really hard to
avoid having a hard distinction between e.g. predication- and VL-based
loops in the VPlan representation. Duplicating or triplicating
vectorization logic would be really bad, and there are a lot of
similarities that we can exploit to avoid that. For a simple example,
SVE and RVV both want the same basic loop skeleton: strip-mining with
predication of the loop body derived from the induction variable.
Hopefully we can have a 99% unified VPlan pipeline and most
differences can be delegated to the final VPlan->IR step and the
respective backends.

+ Diego, Florian and others that have been discussing this previously
>>> ARM seems to have no difficulty selecting instructions for it. 
Changing
>>> the value of vscale shouldn't impact ISel at all.  The same
instructions
>>> are selected.
>>
>> I may very well be getting lost in too many floating future ideas, atm.
:)
>
> Given our clearer terminology, my statement above is maybe not correct.
> Changing vscale *would* impact the IR and codegen (stack allocation,
> etc.).  Changing VL would not, other than adding some Instructions to
> capture the semantics.  I suspect neither would change ISel (I know VL
> would not) but as you say I don't think we need concern ourselves with
> changing vscale right now, unless others have a dire need to support it.
>
>>> > It is, but IIGIR, changing vscale and predicating are similar
>>> > transformations to achieve the similar goals, but will not be
>>> > represented the same way in IR.
>>>
>>> They probably will not be represented the same way, though I think
they
>>> could be (but probably shouldn't be).
>>
>> Maybe in the simple cases (like last iteration) they should be?
>
> Perhaps changing VL could be modeled the same way but I have a feeling
> it will be awkward.  Changing vscale is something totally different and
> likely should be represented differently if allowed at all.
>
>>> Ok, but would be optimizer be prevented from introducing VL
changes?
>>
>> In the case where they're represented in similar ways in IR, it
>> wouldn't need to.
>
> It would have to generate IR code to effect the software change in VL
> somehow, by altering predicates or by using special instrinsics or some
> other way.
>
>> Otherwise, we'd have to teach the two methods to IR optimisers that
>> are virtually identical in semantics. It'd be left for the back end
to
>> implement the last iteration notation as a predicate fill or a vscale
>> change.
>
> I suspect that is too late.  The vectorizer needs to account for the
> choice and pick the most profitable course.  That's one of the reasons
I
> think modeling VL changes like predicates is maybe unnecessarily
> complex.  If VL is modeled as "just another predicate" then
there's no
> guarantee that ISel will honor the choices the vectorizer made to use VL
> over predication.  If it's modeled explicitly, ISel should have an
> easier time generating the code the vectorizer expects.
>
> VL changes aren't always on the last iteration.  The Cray X1 had an
> instruction (I would have to dust off old manuals to remember the
> mnemonic) with somewhat strange semantics to get the desired VL for an
> iteration.  Code would look something like this:
>
> loop top:
>   vl = getvl N      #  N contains the number of iterations left
>   <do computation>
>   N = N - vl
>   branch N > 0, loop top
>
> The "getvl" instruction would usually return the full hardware
vector
> register length (MAXVL), except on the 2nd-to-last iteration if N was
> larger than MAXVL but less than 2*MAXVL it would return something like
> <N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL).  The last
> iteration would then run at the same VL or one less depending on whether
> N was odd or even.  So the last two iterations would often run at less
> than MAXVL and often at different VLs from each other.
FWIW this is exactly how the RISC-V vector unit works --
unsurprisingly, since it owes a lot to Cray-style processors :)
> And no, I don't know why the hardware operated this way.  :)
>
>>> Being conservative is fine, but we should have a clear
understanding of
>>> exactly what that means.  I would not want to prohibit all VL
changes
>>> now and forever, because I see that as unnecessarily restrictive
and
>>> possibly damaging to supporting future architectures.
>>>
>>> If we don't want to provide intrinsics for changing VL right
now, I'm
>>> all in favor.  There would be no reason to add error checks because
>>> there would be no way within the IR to change VL.
>>
>> Right, I think we're converging.
>
> Agreed.
+1, there is no need to deal with VL at all at this point. I would
even say there isn't even any concept of VL in IR at all at this time.

At some point in the future I will propose something in this space to
support RISC-V vectors, but we'll cross that bridge when we come to
it.
>> How about we don't forbid changes in vscale, but we find a common
>> notation for all the cases where predicating and changing vscale would
>> be semantically identical, and implement those in the same way.
>>
>> Later on, if there are additional cases where changes in vscale would
>> be beneficial, we can discuss them independently.
>>
>> Makes sense?
>
> Again trying to use the VL/vscale terminology:
>
> Changing vscale - no IR support currently and less likely in the future
> Changing VL     - no IR support currently but more likely in the future
>
> The second seems like a straightforward extension to me.  There will be
> some questions about how to represent VL semantics in IR but those
don't
> impact the proposal under discussion at all.
>
> The first seems much harder, at least within a function.  It may or may
> not impact the proposal under discussion.  It sounds like the RISC-V
> people have some use cases so those should probably be the focal point
> of this discussion.
Yes, for RISC-V we definitely need vscale to vary a bit, but are fine
with limiting that to function boundaries. The use case is *not*
"changing how large vectors are" in the middle of a loop or something
like that, which we all agree is very dubious at best. The RISC-V
vector unit is just very configurable (number of registers, vector
element sizes, etc.) and this configuration can impact how large the
vector registers are. For any given vectorized loop next we want to
configure the vector unit to suit that piece of code and run the loop
with whatever register size that configuration yields. And when that
loop is done, we stop using the vector unit entirely and disable it,
so that the next loop can use it differently, possibly with a
different register size. For IR modeling purposes, I propose to
enlarge "loop nest" to "function" but the same principle
applies, it
just means all vectorized loops in the function will have to share a
configuration.

Without getting too far into the details, does this make sense as a use case?


Cheers,
Robin
>                            -David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jul 2018 - [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Apparently Analagous Threads