On Wed, 2 Oct 2019 at 05:09, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:> > My general feeling on this then is that both RVV and SV should avoid using > vscale. > > In the case of RVV, MVL is a hardware defined constant that is never > *intended* to be known by applications. There's no published detection > mechanism. Loops are supposed to be designed to run a few more times on > lower spec'd hardware. > > Robin, what's your thoughts there? >Software should be portable across different RVV implementations, in particular across different values of the impl-defined constants VLEN, ELEN, SLEN. But being portable does not mean software must never mention these (and derived quantities such as vscale or, in the RVV spec, VLMAX) at all, just that it has to work correctly no matter which value they have. And in fact, there is a published (written out in the spec) mechanism for obtaining VLMAX, which is directly related to VLEN (so you can obtain VLEN with a little more arithmetic, though for most purposes VLMAX is more useful): requesting the vector length of -1 (unsigned: 2^XLEN - 1) is guaranteed to result in vl=VLMAX. For regular strip-mined loops, the vsetvl instruction takes care of everything so there's simply no need for the program to do this. But for other tasks, it's required (i.e., you can't sensibly write the program otherwise) and perfectly fine w.r.t. portability. One example is the stack frame layout when there's any vectors on the stack (e.g. for spills), since the vector stack slots must in general be large enough to hold a full vector (= VLEN*LMUL bits). Granted, I don't think this or other examples will normally occur in LLVM IR generated by a loop vectorizer, so vscale will probably not occur very frequently in RVV. Nevertheless, there is nothing inherently non-portable about it. Regards Robin PS: I don't want to read too much into your repeated use of "MVL", but FWIW the design of RVV has changed quite radically since "MVL" was last used in any spec draft. If you haven't read any version since v0.6 (~ December 2018) with a "clean slate", may I suggest you do that when you find the time? You can find the latest draft at https://github.com/riscv/riscv-v-spec/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191002/92906cba/attachment.html>
Luke Kenneth Casson Leighton via llvm-dev
2019-Oct-02 08:45 UTC
[llvm-dev] Adding support for vscale
On Wednesday, October 2, 2019, Robin Kruppe <robin.kruppe at gmail.com> wrote:> On Wed, 2 Oct 2019 at 05:09, Luke Kenneth Casson Leighton <lkcl at lkcl.net> > wrote: > >> >> My general feeling on this then is that both RVV and SV should avoid >> using vscale. >> >> In the case of RVV, MVL is a hardware defined constant that is never >> *intended* to be known by applications. There's no published detection >> mechanism. Loops are supposed to be designed to run a few more times on >> lower spec'd hardware. >> >> Robin, what's your thoughts there? >> > > Software should be portable across different RVV implementations, in > particular across different values of the impl-defined constants VLEN, > ELEN, SLEN. But being portable does not mean software must never > mention these (and derived quantities such as vscale or, in the RVV spec, > VLMAX) at all, just that it has to work correctly no matter which value > they have. And in fact, there is a published (written out in the spec) > mechanism for obtaining VLMAX, >Ah excellent. It's a little obtuse (the wording is very indirect. As the RVV WG is a closed list, can I leave it with you to raise that as an issue?)> which is directly related to VLEN (so you can obtain VLEN with a little > more arithmetic, though for most purposes VLMAX is more useful): requesting > the vector length of -1 (unsigned: 2^XLEN - 1) is guaranteed to result in > vl=VLMAX. > > For regular strip-mined loops, the vsetvl instruction takes care of > everything so there's simply no need for the program to do this. But for > other tasks, it's required (i.e., you can't sensibly write the program > otherwise) and perfectly fine w.r.t. portability. One example is the stack > frame layout when there's any vectors on the stack (e.g. for spills), since > the vector stack slots must in general be large enough to hold a full > vector (= VLEN*LMUL bits). >Kernel context switch as well. Both would likely be written in assembler.> Granted, I don't think this or other examples will normally occur in LLVM > IR generated by a loop vectorizer, so vscale will probably not occur very > frequently in RVV. >Interesting. It is sort-of what I had a hunch would be the case. Nevertheless, there is nothing inherently non-portable about it.>Indeed. Thank you for the insights, Robin.> > Regards > Robin > > PS: I don't want to read too much into your repeated use of "MVL", but > FWIW the design of RVV has changed quite radically since "MVL" was last > used in any spec draft. If you haven't read any version since v0.6 (~ > December 2018) with a "clean slate", may I suggest you do that when you > find the time? You can find the latest draft at https://github.com/riscv/ > riscv-v-spec/ >Ah yes thank you, I reference it at least three times a week: such a large document it is easy to miss things. I will replace MVL in SV with VLMAX. Appreciated the headsup. L. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191002/33433550/attachment.html>
Luke Kenneth Casson Leighton via llvm-dev
2019-Oct-05 11:36 UTC
[llvm-dev] Adding support for vscale
On Wednesday, October 2, 2019, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:> > > On Wednesday, October 2, 2019, Robin Kruppe <robin.kruppe at gmail.com> > wrote: > >> >> Granted, I don't think this or other examples will normally occur in LLVM >> IR generated by a loop vectorizer, so vscale will probably not occur very >> frequently in RVV. >> > > Interesting. It is sort-of what I had a hunch would be the case. > >Ok so taking the RISCV developers off cc, because it looks like neither SV nor RVV would use vscale, as we basically identified, eventually, that it is a way to express the "architectural SIMD width". The rest of this is therefore nothing to do with vector engines, and is purely some constructive input for future consideration. Let us take a scenario where data is short vectors, well below vscale. That there is also some inter-element dependence (cross product or other) which makes laying multiple short vectors into a single vscale long SIMD awkward. Under such circumstances having a fixed vscale is extremely wasteful, particularly if there is an out of order engine which could use mixed scalar or MMX/SSE with AVX512 for example. Thus for the longer operations the idea is to throw those at AVX512 and the shorter ones at 64 bit MMX/SSE. The point is: *both could benefit from vscale* excrpt unfortunately, there is only *one* vscale and it can therefore only be applied to *one* of the SIMD ALUs. This tends to suggest that either vscale should be a variable (and applicable on a per group basis, separated by LD/STs) OR That there should be more than one vscale. i.e that vscale should, instead.of being a fixed global type, should instead be morphed to be %vscaleN similar to %regN, conveying the context of its intended scope and use. Thus, certain groups of operations intended to be farmed to a SPECIFIC SIMD suite (AVX512) may be *specifically* separated from those intended to be targetted at another suite (MMX/SSE). Of course, on architectures which have no such distinction, a simple pass would merge them all into one global vscale. A thought for consideration. L. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191005/1da6b724/attachment.html>