Thanks @Robin and @Graham for giving some background on scalable vectors and clarifying some of the details! Apologies if I'm repeating things here, but it is probably good to emphasize the conceptually different, but complementary models for scalable vectors: 1. Vectors of unknown, but constant size throughout the program. 2. Vectors of changing size throughout the program. Where (2) basically builds on (1). LLVM's scalable vectors support (1) directly. The scalable type is defined using the concept `vscale` that is constant throughout the program and expresses the unknown, but maximum size of a scalable vector. My patch builds on that definition by adding `vscale` as a keyword that can be used in expressions. For this model, predication can be used to disable the lanes that are not needed. Given that `vscale` is defined as inherently constant and a corner-stone of the scalable type, it makes no sense to describe the `vscale` keyword as an intrinsic. The other model for scalable vectors (2) requires additional intrinsics to get/set the `active VL` at runtime. This model would be complementary to `vscale`, as it still requires the same scalable vector type to describe a vector of unknown size. `vscale` can be used to express the maximum vector length, but the `active vector length` would need to be handled through explicit intrinsics. As Robin explained, it would also need Simon Moll's vector predication proposal to express operations on `active VL` elements.> apologies for asking: these are precisely the kinds of > from-zero-prior-knowledge questions that help with any review process > to clarify things for other users/devs.No apologies required, the discussion on scalable types have been going on for quite a while so there are much email threads to read through. It is important these concepts are clear and well understood!> clarifying this in the documentation strings on vscale, perhaps even > providing c-style examples, would be extremely useful, and avoid > misunderstandings.I wonder if we should add a separate document about scalable vectors that describe these concepts in more detail with some examples. Given that (2) is a very different use-case, I hope we can keep discussions on that model separate from this thread, if possible. Thanks, Sander> On 1 Oct 2019, at 11:07, Graham Hunter <Graham.Hunter at arm.com> wrote: > > Hi Luke, > >> On 1 Oct 2019, at 09:21, Luke Kenneth Casson Leighton via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >>> First off, even if a dynamically changing vscale was truly necessary >>> for RVV or SV, this thread would be far too late to raise the question. >>> That vscale is constant -- that the number of elements in a scalable >>> vector does not change during program execution -- is baked into the >>> accepted scalable vector type proposal from top to bottom and in fact >>> was one of the conditions for its acceptance... >> >> that should be explicitly made clear in the patches. it sounds very >> much like it's only suitable for statically-allocated >> arrays-of-vectorisable-types: >> >> typedef vec4 float[4]; // SEW=32,LMUL=4 probably >> static vec4 globalvec[1024]; // vscale == 1024 here > > 'vscale' just refers to the scaling factor that gives the maximum size of > the vector at runtime, not the number of currently active elements. > > SVE will be using predication alone to deal with data that doesn't fill an > entire vector, whereas RVV and SX-Aurora want to use a separate mechanism > that fits with their hardware having a changeable active length. > > The scalable type tells you the maximum number of elements that could be > operated on, and individual operations can constrain that to a smaller > number of elements. The latter is what Simon Moll's proposal addresses. > >>> ... (runtime-variable type >>> sizes create many more headaches which nobody has worked out >>> how to solve to a satisfactory degree in the context of LLVM). >> >> hmmmm. so it looks like data-dependent fail-on-first is something >> that's going to come up later, rather than right now. > > Arm's downstream compiler has been able to use the scalable type and a > constant vscale with first-faulting loads for around 4 years, so there's > no conflict here. > > We will need to figure out exactly what form the first faulting intrinsics > take of course, as I think SVE's predication-only approach doesn't quite > fit with others -- maybe we'll end up with two intrinsics? Or maybe we'll > be able to synthesize a predicate from an active vlen and pattern match? > Something to discuss later I guess. (I'm not even sure AVX512 has a > first-faulting form, possibly just no-faulting and check the first predicate > element?) > >>> As mentioned above, this is tangential to the focus of this thread, so if >>> you want to discuss further I'd prefer you do that in a new thread. >> >> it's not yet clear whether vscale is intended for use in >> static-allocation involving fixed constants or whether it's intended >> for use with runtime-dependent variables inside functions. > > Runtime-dependent, though you could use C-level types and intrinsics to > try a static approach. > >> ok so this *might* be answering my question about vscale being >> relate-able to a function parameter (the latter of the c examples), it >> would be good to clarify. >> >>> In RVV terms that means it's related not to VL but more to VBITS, >>> which is indeed a constant (and has been for many months). >> >> ok so VL is definitely "assembly-level" rather than something that >> actually is exposed to the intrinsics. that may turn out to be a >> mistake when it comes to data-dependent fail-on-first capability >> (which is present in a *DIFFERENT* form in ARM SVE, btw), but would, >> yes, need discussion separately. >> >>> For example <vscale x 4 x i16> has four times as many elements and >>> twice as many bits as <vscale x 1 x i32>, so it captures the distinction >>> between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1 >>> vtype setting. >> >> hang on - so this may seem like a silly question: is it intended that >> the *word* vscale would actually appear in LLVM-IR i.e. it is a new >> compiler "keyword"? or did you use it here in the context of just "an >> example", where actually the idea is that actual value would be <5 x 4 >> x i16> or <5 x 1 x i32>? > > If you're referring to the '<vscale x 4 x i32>' syntax, that's already part > of LLVM IR now (though effectively still in 'beta'). You can see a few > examples in .ll tests now, e.g. llvm/test/Bitcode/compatibility.ll > > It's also documented in the langref. > > Sander's patch takes the existing 'vscale' keyword and allows it to be > used outside the type, to serve as an integer constant that represents the > same runtime value as it does in the type. > > Some previous discussions proposed using an intrinsic to start with for this, > and that may still happen depending on community reaction, but the Arm > hpc compiler team felt it was important to at least start a wider discussion > on this topic before proceeding. From our experience, using an intrinsic makes > it harder to work with shufflevector or get good code generation. If someone > can spot a problem with our reasoning on that please let us know. > > -Graham >
Luke Kenneth Casson Leighton via llvm-dev
2019-Oct-01 13:42 UTC
[llvm-dev] Adding support for vscale
(readers note this, copied from the end before writing! "Given that (2) is a very different use-case, I hope we can keep discussions on that model separate from this thread, if possible.") On Tue, Oct 1, 2019 at 12:45 PM Sander De Smalen <Sander.DeSmalen at arm.com> wrote:> Thanks @Robin and @Graham for giving some background on scalable vectors and clarifying some of the details!hi sander, thanks for chipping in. um, just a point of order: was it intentional to leave out both jacob and myself? my understanding is that inclusive and welcoming language is supposed to used within this community, and it *might* be mistaken as being exclusionary and unwelcoming. if that was a misunderstanding or an oversight i apologise for raising it.> Apologies if I'm repeating things here, but it is probably good to emphasize > the conceptually different, but complementary models for scalable vectors: > 1. Vectors of unknown, but constant size throughout the program.... which matches with both hardware-fixed per-implementation variations in potential [max] SIMD-width for any given architecture as well as Vector-based "Maximum Vector Length", typically representing the "Lanes" of a [traditional] Vector Architecture.> 2. Vectors of changing size throughout the program....representing VL in "Cray-style" Vector Engines (NEC SX-Aurora, RVV, SV) and representing the (rather unfortunate) corner-case cleanup - and predication - deployed in SIMD (https://www.sigarch.org/simd-instructions-considered-harmful/)> Where (2) basically builds on (1). > > LLVM's scalable vectors support (1) directly. The scalable type is defined > using the concept `vscale` that is constant throughout the program and > expresses the unknown, but maximum size of a scalable vector. > My patch builds on that definition by adding `vscale` as a keyword that > can be used in expressions.ah HA! excccellent. *that* was the sentence giving the key piece of information needed to understand what is going on, here. i appreciate it does actually say that, "This patch adds vscale as a symbolic constant to the IR, similar to undef and zeroinitializer, so that it can be used in constant expressions" however without the context about what vscale is based *on*, it's just not possible to understand. can i therefore recommend a change, here: "Scalable vector types are defined as <vscale x #elts x #eltty>, where vscale itself is defined as a positive symbolic constant of type integer, representing a platform-dependent (fixed but implementor-specific) limit of any given hardware's maximum simultaneous "element processing" capacity" you could add, in brackets, "(typically the SIMD element width)" at the end there. then, this starts to make sense, but could be further made explicit: "This patch adds vscale as a symbolic constant to the IR, similar to undef and zeroinitializer, so that vscale - representing the runtime-detected "element processing" capacity - can be used in constant expressions"> For this model, predication can be used to disable the lanes > that are not needed. Given that `vscale` is defined as inherently > constant and a corner-stone of the scalable type, it makes no > sense to describe the `vscale` keyword as an intrinsic.indeed: if it's intended near-exclusively for SIMD-style hardware, then yes, absolutely. my only concern would be: some circumstances (some algorithms) may perform better with MMX, some with SSE, some with different levels of performance on e.g. AMD or Intel, which would, with benchmarking, show that some algorithms perform better if vscale=8 (resulting in some other MMX/SSE subset being utilised) than if vscale=16. in particular, on hardware which doesn't *have* predication, they're definitely in trouble if vscale is fixed (SIMD considered harmful). it may even be the case, for whatever reason, that performance sucks for AVX512 instructions with a low predicate bitcount, if compared to using smaller-range SIMD operations, perhaps due to the vastly-greater size of the AVX instructions themselves. honestly i don't know: i'm just throwing ideas out, here. would it be reasonable to assume that predication *always* is to be used in combination with vscale? or is it the intention to [eventually] be able to auto-generate the kinds of [painful in retrospect] SIMD assembly shown in the above article?> The other model for scalable vectors (2) requires additional intrinsics > to get/set the `active VL` at runtime.ok. with you here.> This model would be complementary to `vscale`, as it still requires the > same scalable vector type to describe a vector of unknown size.ah. that's where the assumption breaks down, because of SV allowing its vectors to "sit" on top of the *actual* scalar regfile(s), we do in fact permit an [immediate-specified] vscale to be set, arbitrarily, at any time. now, we mmmiiiight be able to get away with assuming that vscale is equal to the absolute maximum possible setting (64 for RV64, 32 for RV32), then use / play-with the "runtime active VL get/set" intrinsics. i'm kiinda wary of saying "absolutely yes that's the way forward" for us, particularly without some input from Jacob here.> `vscale` can be used to express the maximum vector length,wait... hang on: RVV i am pretty certain there is not supposed to be any kind of assumption of knowledge about MVL. in SV that's fine, but in RVV i don't believe it is. bruce, andrew, robin, can you comment here?> but the `active vector length` would need to be handled through > explicit intrinsics. As Robin explained, it would also need Simon Moll's > vector predication proposal to express operations on `active VL` elements.ok, a link to that would be handy... let me see if i can find it... what comes up is this: https://reviews.llvm.org/D57504 is that right?> > apologies for asking: these are precisely the kinds of > > from-zero-prior-knowledge questions that help with any review process > > to clarify things for other users/devs. > No apologies required, the discussion on scalable types have been going on for quite a while so there are much email threads to read through. It is important these concepts are clear and well understood!:)> > clarifying this in the documentation strings on vscale, perhaps even > > providing c-style examples, would be extremely useful, and avoid > > misunderstandings. > I wonder if we should add a separate document about scalable vectors > that describe these concepts in more detail with some examples.it's exceptionally complex, with so many variants, i feel this is almost essential.> Given that (2) is a very different use-case, I hope we can keep discussions on > that model separate from this thread, if possible.good idea, if there's a new thread started please do cc me. cross-relationship between (2) and vscale may make it slightly unavoidable though to involve this one. l.
Hi Luke,> was it intentional to leave out both jacob and myself? > [...] > if that was a misunderstanding or an oversight i apologise for raising it.It was definitely not my intention to be non-inclusive, my apologies if that seemed the case!> can i therefore recommend a change, here: > [...] > "This patch adds vscale as a symbolic constant to the IR, similar to > undef and zeroinitializer, so that vscale - representing the > runtime-detected "element processing" capacity - can be used in > constant expressions"Thanks for the suggestion! I like the use of the word `capacity` especially now that the term 'vector length' has overloaded meanings. I'll add some extra words to the vscale patch to clarify its meaning.> my only concern would be: some circumstances (some algorithms) may > perform better with MMX, some with SSE, some with different levels of > performance on e.g. AMD or Intel, which would, with benchmarking, show > that some algorithms perform better if vscale=8 (resulting in some > other MMX/SSE subset being utilised) than if vscale=16.If fixed-width/short vectors are more beneficial for some algorithm, I'd recommend using fixed-width vectors directly. It would be up to the target to lower that to the vector instruction set. For AArch64, this can be done using Neon (max 128bits) or with SVE/SVE2 using a 'fixed-width' predicate mask, e.g. vl4 for a predicate of 4 elements, even when the vector capacity is larger than 4.> would it be reasonable to assume that predication *always* is to be > used in combination with vscale? or is it the intention to > [eventually] be able to auto-generate the kinds of [painful in > retrospect] SIMD assembly shown in the above article?When the size of a vector is constant throughout the program, but unknown at compile-time, then some form of masking would be required for loads and stores (or other instructions that may cause an exception). So it is reasonable to assume that predication is used for such vectors.>> This model would be complementary to `vscale`, as it still requires the >> same scalable vector type to describe a vector of unknown size. > > ah. that's where the assumption breaks down, because of SV allowing > its vectors to "sit" on top of the *actual* scalar regfile(s), we do > in fact permit an [immediate-specified] vscale to be set, arbitrarily, > at any time.Maybe I'm missing something here, but if SV uses an immediate to define vscale, that implies the value of vscale is known at compile-time and thus regular (fixed-width) vector types can be used?> now, we mmmiiiight be able to get away with assuming that vscale is > equal to the absolute maximum possible setting (64 for RV64, 32 for > RV32), then use / play-with the "runtime active VL get/set" > intrinsics. > > i'm kiinda wary of saying "absolutely yes that's the way forward" for > us, particularly without some input from Jacob here.Note that there isn't a requirement to use `vscale` as proposed in my first patch. If RV only cares about the runtime active-VL then some explicit, separate mechanism to get/set the active VL would be needed anyway. I imagine the resulting runtime value (instead of `vscale`) to then be used in loop indvar updates, address computations, etc.> ok, a link to that would be handy... let me see if i can find it... > what comes up is this: https://reviews.llvm.org/D57504 is that right?Yes, that's the one! Thanks, Sander> On 1 Oct 2019, at 14:42, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote: > > (readers note this, copied from the end before writing! > "Given that (2) is a very different use-case, I hope we can keep discussions on > that model separate from this thread, if possible.") > > > On Tue, Oct 1, 2019 at 12:45 PM Sander De Smalen > <Sander.DeSmalen at arm.com> wrote: > >> Thanks @Robin and @Graham for giving some background on scalable vectors and clarifying some of the details! > > hi sander, thanks for chipping in. um, just a point of order: was it > intentional to leave out both jacob and myself? my understanding is > that inclusive and welcoming language is supposed to used within this > community, and it *might* be mistaken as being exclusionary and > unwelcoming. > > if that was a misunderstanding or an oversight i apologise for raising it. > >> Apologies if I'm repeating things here, but it is probably good to emphasize >> the conceptually different, but complementary models for scalable vectors: >> 1. Vectors of unknown, but constant size throughout the program. > > ... which matches with both hardware-fixed per-implementation > variations in potential [max] SIMD-width for any given architecture as > well as Vector-based "Maximum Vector Length", typically representing > the "Lanes" of a [traditional] Vector Architecture. > >> 2. Vectors of changing size throughout the program. > > ...representing VL in "Cray-style" Vector Engines (NEC SX-Aurora, RVV, > SV) and representing the (rather unfortunate) corner-case cleanup - > and predication - deployed in SIMD > (https://www.sigarch.org/simd-instructions-considered-harmful/) > >> Where (2) basically builds on (1). >> >> LLVM's scalable vectors support (1) directly. The scalable type is defined >> using the concept `vscale` that is constant throughout the program and >> expresses the unknown, but maximum size of a scalable vector. >> My patch builds on that definition by adding `vscale` as a keyword that >> can be used in expressions. > > ah HA! excccellent. *that* was the sentence giving the key piece of > information needed to understand what is going on, here. i appreciate > it does actually say that, "This patch adds vscale as a symbolic > constant to the IR, similar to > undef and zeroinitializer, so that it can be used in constant > expressions" however without the context about what vscale is based > *on*, it's just not possible to understand. > > can i therefore recommend a change, here: > > "Scalable vector types are defined as <vscale x #elts x #eltty>, > where vscale itself is defined as a positive symbolic constant > of type integer, representing a platform-dependent (fixed but > implementor-specific) limit of any given hardware's maximum > simultaneous "element processing" capacity" > > you could add, in brackets, "(typically the SIMD element width)" at > the end there. then, this starts to make sense, but could be further > made explicit: > > "This patch adds vscale as a symbolic constant to the IR, similar to > undef and zeroinitializer, so that vscale - representing the > runtime-detected "element processing" capacity - can be used in > constant expressions" > > >> For this model, predication can be used to disable the lanes >> that are not needed. Given that `vscale` is defined as inherently >> constant and a corner-stone of the scalable type, it makes no >> sense to describe the `vscale` keyword as an intrinsic. > > indeed: if it's intended near-exclusively for SIMD-style hardware, > then yes, absolutely. > > my only concern would be: some circumstances (some algorithms) may > perform better with MMX, some with SSE, some with different levels of > performance on e.g. AMD or Intel, which would, with benchmarking, show > that some algorithms perform better if vscale=8 (resulting in some > other MMX/SSE subset being utilised) than if vscale=16. > > in particular, on hardware which doesn't *have* predication, they're > definitely in trouble if vscale is fixed (SIMD considered harmful). > it may even be the case, for whatever reason, that performance sucks > for AVX512 instructions with a low predicate bitcount, if compared to > using smaller-range SIMD operations, perhaps due to the vastly-greater > size of the AVX instructions themselves. > > honestly i don't know: i'm just throwing ideas out, here. > > would it be reasonable to assume that predication *always* is to be > used in combination with vscale? or is it the intention to > [eventually] be able to auto-generate the kinds of [painful in > retrospect] SIMD assembly shown in the above article? > >> The other model for scalable vectors (2) requires additional intrinsics >> to get/set the `active VL` at runtime. > > ok. with you here. > >> This model would be complementary to `vscale`, as it still requires the >> same scalable vector type to describe a vector of unknown size. > > ah. that's where the assumption breaks down, because of SV allowing > its vectors to "sit" on top of the *actual* scalar regfile(s), we do > in fact permit an [immediate-specified] vscale to be set, arbitrarily, > at any time. > > now, we mmmiiiight be able to get away with assuming that vscale is > equal to the absolute maximum possible setting (64 for RV64, 32 for > RV32), then use / play-with the "runtime active VL get/set" > intrinsics. > > i'm kiinda wary of saying "absolutely yes that's the way forward" for > us, particularly without some input from Jacob here. > > >> `vscale` can be used to express the maximum vector length, > > wait... hang on: RVV i am pretty certain there is not supposed to be > any kind of assumption of knowledge about MVL. in SV that's fine, but > in RVV i don't believe it is. > > bruce, andrew, robin, can you comment here? > >> but the `active vector length` would need to be handled through >> explicit intrinsics. As Robin explained, it would also need Simon Moll's >> vector predication proposal to express operations on `active VL` elements. > > ok, a link to that would be handy... let me see if i can find it... > what comes up is this: https://reviews.llvm.org/D57504 is that right? > >>> apologies for asking: these are precisely the kinds of >>> from-zero-prior-knowledge questions that help with any review process >>> to clarify things for other users/devs. >> No apologies required, the discussion on scalable types have been going on for quite a while so there are much email threads to read through. It is important these concepts are clear and well understood! > > :) > >>> clarifying this in the documentation strings on vscale, perhaps even >>> providing c-style examples, would be extremely useful, and avoid >>> misunderstandings. >> I wonder if we should add a separate document about scalable vectors >> that describe these concepts in more detail with some examples. > > it's exceptionally complex, with so many variants, i feel this is > almost essential. > >> Given that (2) is a very different use-case, I hope we can keep discussions on >> that model separate from this thread, if possible. > > good idea, if there's a new thread started please do cc me. > cross-relationship between (2) and vscale may make it slightly > unavoidable though to involve this one. > > l.