thr3ads.net - llvm dev - [llvm-dev] Adding support for vscale [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Luke Kenneth Casson Leighton via llvm-dev

2019-Sep-30 21:53 UTC

[llvm-dev] Adding support for vscale

On Tuesday, October 1, 2019, Jacob Lifshay <programmerjake at gmail.com>
wrote:
> On Mon, Sep 30, 2019 at 2:30 AM Sander De Smalen via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > I've posted two patches on Phabricator to add support for VScale
in LLVM.

Excellent!

> >
> > A brief recap on `vscale`:
> > The scalable vector type in LLVM IR is defined as `<vscale x n x
m>`, to
> create types such as `<vscale x 16 x i8>` for a scalable vector with
at
> least 16 bytes. In the definition of the scalable type, `vscale` is
> specified as a positive constant of type integer that will only be known at
> runtime but is guaranteed to be constant throughout the program.

Ah.  Right.  There is something known as data-dependent fail-on-first,
which does not match with the assertion that vscale will be constant.

Yes any given vector would be vscale long and it is good to be able to
runtime declare such vectors: loops in assembler may be generated which
sets VL (a Control Status Register declaring the number of elements to be
processed in any given loop iteration)

However for e.g memcpy or strcpy or anything else which is *not* fixed
length and not even the program knows how long the vector will be, there is
data-dependent fail-on-first.

A related thread goes through this, pay attention to Stephen's questions
and it becomes clear:
https://groups.google.com/forum/?nomobile=true#!topic/comp.arch/3z3PlCwdq8U

A link to ARM SVE ffirst capability is also proved in that thread.  Yes,
SVE has ffirst although it is a SIMD variant rather than one that affects
VL.

> RISC-V RVV explicitly allows changing VL (which I am assuming is the
> same as vscale) at runtime, so VL wouldn't be a constant.

This would be good to clarify, Sander. On first reading it seems to me that
vscale is intended to be the actual full vector size, not related to VL.

Regardless, setting it even as *runtime* constant seems to be a red flag.

What is vscale intended for, and how does it relate to Cray-like Vector
Length?

> Additionally, we (libre-riscv) are working on a similar scalar vectors
> ISA called SimpleV that also allows changing VL at runtime and we are
> planning on basing it on LLVM's scalable vector support.

Both SV and RVV are based on Cray VL which is a runtime global CSR setting
the number of elements to be processed in any given vector loop.

The difference is that RVV *requests* a VL and is arbitrarily *allocated*
an actual VL (less than or equal to the requested VL), where in SV you get
exactly what is requested and if overallocated an illegal instruction is
raised.

>
> >
> > [1] https://reviews.llvm.org/D68202
> > [2] https://reviews.llvm.org/D68203
>
> Jacob Lifshay
>

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/222d6846/attachment.html>

Robin Kruppe via llvm-dev

2019-Oct-01 07:08 UTC

head link

[llvm-dev] Adding support for vscale

Hello Jacob and Luke,

First off, even if a dynamically changing vscale was truly necessary for
RVV or SV, this thread would be far too late to raise the question. That
vscale is constant -- that the number of elements in a scalable vector does
not change during program execution -- is baked into the accepted scalable
vector type proposal from top to bottom and in fact was one of the
conditions for its acceptance (runtime-variable type sizes create many more
headaches which nobody has worked out how to solve to a satisfactory degree
in the context of LLVM). *This* thread is just about whether vscale should
be exposed to programs in the form of a Constant or as an intrinsic which
always returns the same value during one program execution.

Luckily, this is not a problem for RVV. I do not know anything about this
"SV" extension you are working on so I cannot comment on that, but
I'll
sketch the reasons for why it's not an issue with RVV and maybe that helps
you with SV too. As mentioned above, this is tangential to the focus of
this thread, so if you want to discuss further I'd prefer you do that in a
new thread.

The dynamically-changing VL is a kind of predication in that it limits
processing to a subset of lanes, and like masks it can just be another SSA
value that is an input to the computations it affects. You may be aware of
Simon Moll's vector predication (previously: explicit vector length)
proposal which does just that. In contrast, the vscale concept is more
about how many elements a vector register contains, regardless of whether
some operations process only a subset of them. In RVV terms that means it's
related not to VL but more to VBITS, which is indeed a constant (and has
been for many months).

The only dynamic thing about "how many elements are there in a vector
register" is that varying the width of the elements (8b, 16b, etc.) and the
length multiplier (grouping together 1/2/4/8 registers) causes a
predictable, relative increase or decrease (x2, x8, x0.5, etc.)  of the
number of elements, regardless of the specific value of VBITS. But this is
perfectly compatible with a constant vscale because vscale only is the
unknown-at-compile-time *factor* in the size of a scalable vector type.
Varying the other components, the compile-time-constant factor and the
element type, results in scalable vectors with different *relative* sizes
in exactly the same way we need to handle RVV's element width and LMUL
concepts. For example <vscale x 4 x i16> has four times as many elements
and twice as many bits as <vscale x 1 x i32>, so it captures the
distinction between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1 vtype
setting.

Regards,
Robin

On Mon, 30 Sep 2019 at 23:53, Luke Kenneth Casson Leighton <lkcl at
lkcl.net>
wrote:
>
>
> On Tuesday, October 1, 2019, Jacob Lifshay <programmerjake at
gmail.com>
> wrote:
>
>> On Mon, Sep 30, 2019 at 2:30 AM Sander De Smalen via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >
>> > I've posted two patches on Phabricator to add support for
VScale in
>> LLVM.
>
>
> Excellent!
>
>
>> >
>> > A brief recap on `vscale`:
>> > The scalable vector type in LLVM IR is defined as `<vscale x n
x m>`,
>> to create types such as `<vscale x 16 x i8>` for a scalable
vector with at
>> least 16 bytes. In the definition of the scalable type, `vscale` is
>> specified as a positive constant of type integer that will only be
known at
>> runtime but is guaranteed to be constant throughout the program.
>
>
> Ah.  Right.  There is something known as data-dependent fail-on-first,
> which does not match with the assertion that vscale will be constant.
>
> Yes any given vector would be vscale long and it is good to be able to
> runtime declare such vectors: loops in assembler may be generated which
> sets VL (a Control Status Register declaring the number of elements to be
> processed in any given loop iteration)
>
> However for e.g memcpy or strcpy or anything else which is *not* fixed
> length and not even the program knows how long the vector will be, there is
> data-dependent fail-on-first.
>
> A related thread goes through this, pay attention to Stephen's
questions
> and it becomes clear:
> https://groups.google.com/forum/?nomobile=true#!topic/comp.arch/3z3PlCwdq8U
>
> A link to ARM SVE ffirst capability is also proved in that thread.  Yes,
> SVE has ffirst although it is a SIMD variant rather than one that affects
> VL.
>
>
>> RISC-V RVV explicitly allows changing VL (which I am assuming is the
>> same as vscale) at runtime, so VL wouldn't be a constant.
>
>
> This would be good to clarify, Sander. On first reading it seems to me
> that vscale is intended to be the actual full vector size, not related to
> VL.
>
> Regardless, setting it even as *runtime* constant seems to be a red flag.
>
> What is vscale intended for, and how does it relate to Cray-like Vector
> Length?
>
>
>> Additionally, we (libre-riscv) are working on a similar scalar vectors
>> ISA called SimpleV that also allows changing VL at runtime and we are
>> planning on basing it on LLVM's scalable vector support.
>
>
> Both SV and RVV are based on Cray VL which is a runtime global CSR setting
> the number of elements to be processed in any given vector loop.
>
> The difference is that RVV *requests* a VL and is arbitrarily *allocated*
> an actual VL (less than or equal to the requested VL), where in SV you get
> exactly what is requested and if overallocated an illegal instruction is
> raised.
>
>
>
>>
>> >
>> > [1] https://reviews.llvm.org/D68202
>> > [2] https://reviews.llvm.org/D68203
>>
>> Jacob Lifshay
>>
>
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/ae299b0f/attachment.html>

Jacob Lifshay via llvm-dev

2019-Oct-01 07:21 UTC

head link

[llvm-dev] Adding support for vscale

On Tue, Oct 1, 2019, 00:08 Robin Kruppe <robin.kruppe at gmail.com> wrote:
> Hello Jacob and Luke,
>
> First off, even if a dynamically changing vscale was truly necessary for
> RVV or SV, this thread would be far too late to raise the question. That
> vscale is constant -- that the number of elements in a scalable vector does
> not change during program execution -- is baked into the accepted scalable
> vector type proposal from top to bottom and in fact was one of the
> conditions for its acceptance (runtime-variable type sizes create many more
> headaches which nobody has worked out how to solve to a satisfactory degree
> in the context of LLVM). *This* thread is just about whether vscale should
> be exposed to programs in the form of a Constant or as an intrinsic which
> always returns the same value during one program execution.
>
> Luckily, this is not a problem for RVV. I do not know anything about this
> "SV" extension you are working on so I cannot comment on that,
but I'll
> sketch the reasons for why it's not an issue with RVV and maybe that
helps
> you with SV too. As mentioned above, this is tangential to the focus of
> this thread, so if you want to discuss further I'd prefer you do that
in a
> new thread.
>
> The dynamically-changing VL is a kind of predication in that it limits
> processing to a subset of lanes, and like masks it can just be another SSA
> value that is an input to the computations it affects. You may be aware of
> Simon Moll's vector predication (previously: explicit vector length)
> proposal which does just that. In contrast, the vscale concept is more
> about how many elements a vector register contains, regardless of whether
> some operations process only a subset of them. In RVV terms that means
it's
> related not to VL but more to VBITS, which is indeed a constant (and has
> been for many months).
>
> The only dynamic thing about "how many elements are there in a vector
> register" is that varying the width of the elements (8b, 16b, etc.)
and the
> length multiplier (grouping together 1/2/4/8 registers) causes a
> predictable, relative increase or decrease (x2, x8, x0.5, etc.)  of the
> number of elements, regardless of the specific value of VBITS. But this is
> perfectly compatible with a constant vscale because vscale only is the
> unknown-at-compile-time *factor* in the size of a scalable vector type.
> Varying the other components, the compile-time-constant factor and the
> element type, results in scalable vectors with different *relative* sizes
> in exactly the same way we need to handle RVV's element width and LMUL
> concepts. For example <vscale x 4 x i16> has four times as many
elements
> and twice as many bits as <vscale x 1 x i32>, so it captures the
> distinction between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1 vtype
> setting.
>
Ah, ok. So vscale is basically calculated based off of the type and vlmax
rather than being VL.

SV works mostly like that except it supports more than one vlmax, since
vlmax is derived from the number of contiguous int/fp registers that the
register allocator assigns to that particular vector (which can be part of
the vector type rather than leaving it entirely up to the register
allocator).

So, SV may not be able to use scalable vectors directly but may work better
with fixed-length vectors where all the vector ops have a VL parameter
there. perhaps it could use scalable vectors then translate to fixed-length
vectors + VL.

Jacob Lifshay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/205c4a40/attachment.html>

Luke Kenneth Casson Leighton via llvm-dev

2019-Oct-01 08:21 UTC

head link

[llvm-dev] Adding support for vscale

On Tue, Oct 1, 2019 at 8:08 AM Robin Kruppe <robin.kruppe at gmail.com>
wrote:>
> Hello Jacob and Luke,
>
> First off, even if a dynamically changing vscale was truly necessary
> for RVV or SV, this thread would be far too late to raise the question.
> That vscale is constant -- that the number of elements in a scalable
> vector does not change during program execution -- is baked into the
> accepted scalable vector type proposal from top to bottom and in fact
> was one of the conditions for its acceptance...
that should be explicitly made clear in the patches.  it sounds very
much like it's only suitable for statically-allocated
arrays-of-vectorisable-types:

typedef vec4 float[4]; // SEW=32,LMUL=4 probably
static vec4 globalvec[1024]; // vscale == 1024 here

or, would it be intended for use inside functions - again statically-allocated?

int somefn(void) {
  static vec4 localvec[1024]; // vscale == 1024 here
}

*or*, would it be intended to be used like this?
int somefn(num_of_vec4s) {
  static vec4 localvec[num_of_vec4s]; // vscale == dynamic, here
}

clarifying this in the documentation strings on vscale, perhaps even
providing c-style examples, would be extremely useful, and avoid
misunderstandings.
>... (runtime-variable type
> sizes create many more headaches which nobody has worked out
>how to solve to a satisfactory degree in the context of LLVM).
hmmmm.  so it looks like data-dependent fail-on-first is something
that's going to come up later, rather than right now.
> *This* thread is just about whether vscale should be exposed to programs
> in the form of a Constant or as an intrinsic which always returns the same
> value during one program execution.
>
> Luckily, this is not a problem for RVV. I do not know anything about this
> "SV" extension you are working on
SV has been designed specifically to help with the creation of
*Hybrid* CPU / VPU / GPUs.  it's very similar to RVV except that there
are no new instructions added.

a typical GPU would be happy to have 128-bit-wide SIMD or VLIW-style
instructions, on the basis that (A) the shader programs are usually no
greater than 1K in size and (B) those 128-bit-wide instructions have
an extremely high bang-per-buck ratio, of 32x FP32 operations issued
at once.

in a *hybrid* CPU - VPU - GPU context even a 1k shader program hits a
significant portion of the 1st level cache which is *not* separate
from a *GPU*'s 1st level cache because the CPU *is* the GPU.

consequently, SV has been specifically designed to "compactify"
instruction effectiveness by "prefixing" even RVC 16-bit opcodes with
vectorisation "tags".

this has the side-effect of reducing executable size by over 10% in
many cases when compared to RVV.

> so I cannot comment on that, but I'll sketch the reasons for why
it's not
> an issue with RVV and maybe that helps you with SV too.
looks like it does: Jacob explains (in another reply) that MVL is
exactly the same concept, except that in RVV it is hard-coded (baked)
into the hardware, where in SV it is explicitly set as a CSR, and i
explained in the previous reply that in RVV the VL CSR is requested
(and the hardware chooses a value), whereas in SV, the VL CSR *must*
be set to exactly what is requested [within the bounds of MVL, sorry,
left that out earlier].

> As mentioned above, this is tangential to the focus of this thread, so if
> you want to discuss further I'd prefer you do that in a new thread.
it's not yet clear whether vscale is intended for use in
static-allocation involving fixed constants or whether it's intended
for use with runtime-dependent variables inside functions.

with that not being clear, my questions are not tangential to the
focus of the thread.

however yes i would agree that data-dependent fail-on-first is
definitely not the focus of this thread, and would need to be
discussed later.

we are a very small team at the moment, we may end up missing valuable
discussions: how can it be ensured that we are included in future
discussions?
> [...]
> You may be aware of Simon Moll's vector predication (previously:
> explicit vector length) proposal which does just that.
ah yehyehyeh.  i remember.
> In contrast, the vscale concept is more about how many elements a
> vector register contains, regardless of whether some operations process
> only a subset of them.
ok so this *might* be answering my question about vscale being
relate-able to a function parameter (the latter of the c examples), it
would be good to clarify.
> In RVV terms that means it's related not to VL but more to VBITS,
> which is indeed a constant (and has been for many months).
ok so VL is definitely "assembly-level" rather than something that
actually is exposed to the intrinsics.  that may turn out to be a
mistake when it comes to data-dependent fail-on-first capability
(which is present in a *DIFFERENT* form in ARM SVE, btw), but would,
yes, need discussion separately.
> For example <vscale x 4 x i16> has four times as many elements and
> twice as many bits as <vscale x 1 x i32>, so it captures the
distinction
> between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1
> vtype setting.
hang on - so this may seem like a silly question: is it intended that
the *word* vscale would actually appear in LLVM-IR i.e. it is a new
compiler "keyword"?  or did you use it here in the context of just
"an
example", where actually the idea is that actual value would be <5 x 4
x i16> or <5 x 1 x i32>?

let me re-read the summary:

"This patch adds vscale as a symbolic constant to the IR, similar to
undef and zeroinitializer, so that it can be used in constant
expressions."

it's a keyword, isn't it?

so, that "vscale" keyword would be substituted at runtime by either a
constant (1024) *or* a runtime-calculated variable or function
parameter (num_of_vec4s), is that correct?

apologies for asking: these are precisely the kinds of
from-zero-prior-knowledge questions that help with any review process
to clarify things for other users/devs.

l.

llvm dev - Oct 2019 - Adding support for vscale

[llvm-dev] Adding support for vscale

[llvm-dev] Adding support for vscale

[llvm-dev] Adding support for vscale

[llvm-dev] Adding support for vscale