Renato Golin via llvm-dev
2018-Jul-30 20:12 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Mon, 30 Jul 2018 at 20:57, David A. Greene via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I'm not sure exactly how the SVE proposal would address this kind of > operation.SVE uses predication. The physical number of lanes doesn't have to change to have the same effect (alignment, tails).> I think it would be unlikely for anyone to need to change the vector > length during evaluation of an in-register expression.The worry here is not within each instruction but across instructions. SVE (and I think RISC-V) allow register size to be dynamically set. For example, on the same machine, it may be 256 for one process and 512 for another (for example, to save power). But the change is via a system register, so in theory, anyone can write an inline asm in the beginning of a function and change the vector length to whatever they want. Worst still, people can do that inside loops, or in a tail loop, thinking it's a good idea (or this is a Cray machine :). AFAIK, the interface for changing the register length will not be exposed programmatically, so in theory, we should not worry about it. Any inline asm hack can be considered out of scope / user error. However, Hal's concern seems to be that, in the event of anyone planning to add it to their APIs, we need to make sure the proposed semantics can cope with it (do we need to update the predicates again? what will vscale mean, then and when?). If not, we may have to enforce that this will not come to pass in its current form. In this case, changing it later will require *a lot* more effort than doing it now. So, it would be good to get a clear response from the two fronts (SVE and RISC-V) about the future intention to expose that or not. -- cheers, --renato
Bruce Hoult via llvm-dev
2018-Jul-31 00:13 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Mon, Jul 30, 2018 at 1:12 PM, Renato Golin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> The worry here is not within each instruction but across instructions. > SVE (and I think RISC-V) allow register size to be dynamically set. > > For example, on the same machine, it may be 256 for one process and > 512 for another (for example, to save power). > > But the change is via a system register, so in theory, anyone can > write an inline asm in the beginning of a function and change the > vector length to whatever they want. > > Worst still, people can do that inside loops, or in a tail loop, > thinking it's a good idea (or this is a Cray machine :). > > AFAIK, the interface for changing the register length will not be > exposed programmatically, so in theory, we should not worry about it. > Any inline asm hack can be considered out of scope / user error. > > However, Hal's concern seems to be that, in the event of anyone > planning to add it to their APIs, we need to make sure the proposed > semantics can cope with it (do we need to update the predicates again? > what will vscale mean, then and when?). > > If not, we may have to enforce that this will not come to pass in its > current form. In this case, changing it later will require *a lot* > more effort than doing it now. > > So, it would be good to get a clear response from the two fronts (SVE > and RISC-V) about the future intention to expose that or not. >Some characteristics of how I believe RISC-V vectors will or could end up: - the user's data is stored only in normal C "arrays" (which of course can mean a pointer into the middle of some arbitrary chunk of memory) - vector register types will be used only within a loop in a single user-written function. There is no way to pass a vector variable from one function to another -- there is no effect on ABI. - there will be some vector intrinsic functions such as trancendentals. They will use a different, private ABI used only by the compiler and implemented only in the runtime library. They will probably use the alternate link register (x5 instead of x1) and will be totally not miscible with normal functions. - even within a single function, different loops may have different maximum vector length, depending on how many vector registers are required and of what element types (all vectors in a given loop have the same number of elements). - the active vector length can change from iteration to iteration of a loop. In particular, it can be less on the final iteration to deal with tails. - the active vector length is set at the head of each iteration of a loop by the program telling the hardware how many elements are left (possibly thousands or millions) and the hardware saying "you can have 17 this time" - (maybe) the active vector length can become shorter during execution of a loop iteration as a side effect of a vector load or store getting a protection error and loading/storing only up to the protection boundary. In this case an actual trap will be taken only if the first element of the vector causes the problem. Different micro-architectures might handle this differently. It should be a rare event. An interrupt or task switch during execution of a vector loop may cause the active vector length to become zero for that iteration. So, this is quite different in detail to ARM's SVE but it should be able to use the same type system. The main differences are probably that they seem to intend to be able to pass vector types from one function to another -- but their vector length is fixed for any given processor (or process?). RISC-V loops may need to query the active vector length at the end of each loop iteration. That's a different instruction that needs to be emitted, but has no effect on the type system.>From the point of view of the type system, I think RISC-V is a subset ofSVE, as there is no need to pass vectors between functions and no effect on the ABI. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180730/5bbc5674/attachment.html>
David A. Greene via llvm-dev
2018-Jul-31 02:53 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Renato Golin <renato.golin at linaro.org> writes:> On Mon, 30 Jul 2018 at 20:57, David A. Greene via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> I'm not sure exactly how the SVE proposal would address this kind of >> operation. > > SVE uses predication. The physical number of lanes doesn't have to > change to have the same effect (alignment, tails).Right. My wording was poor. The current proposal doesn't directly support a more dynamic vscale target but I believe it could be simply extended to do so.>> I think it would be unlikely for anyone to need to change the vector >> length during evaluation of an in-register expression. > > The worry here is not within each instruction but across instructions. > SVE (and I think RISC-V) allow register size to be dynamically set.I wasn't talking about within an instruction but rather across instructions in the same expression tree. Something like this would be weird: A = load with VL B = load with VL C = A + B # VL implicit VL = <something> D = ~C # VL implicit store D Here and beyond, read "VL" as "vscale with minimum element count 1." The points where VL would be changed are limited and I think would require limited, straightforward additions on top of this proposal.> For example, on the same machine, it may be 256 for one process and > 512 for another (for example, to save power).Sure.> But the change is via a system register, so in theory, anyone can > write an inline asm in the beginning of a function and change the > vector length to whatever they want. > > Worst still, people can do that inside loops, or in a tail loop, > thinking it's a good idea (or this is a Cray machine :). > > AFAIK, the interface for changing the register length will not be > exposed programmatically, so in theory, we should not worry about it. > Any inline asm hack can be considered out of scope / user error.That's right. This proposal doesn't expose a way to change vscale, but I don't think it precludes a later addition to do so.> However, Hal's concern seems to be that, in the event of anyone > planning to add it to their APIs, we need to make sure the proposed > semantics can cope with it (do we need to update the predicates again? > what will vscale mean, then and when?).I don't see why predicate values would be affected at all. If a machine with variable vector length has predicates, then typically the resulting operation would operate on the bitwise AND of the predicate and a conceptual all 1's predicate of length VL. As I understand it, vscale is the runtime multiple of some minimal, guaranteed vector length. For SVE that minimum is whatever gives a bit width of 128. My guess is that for a machine with a more dynamic vector length, the minimum would be 1. vscale would then be the vector length and would change accordingly if the vector length is changed. Changing vscale would be no different than changing any other value in the program. The dataflow determines its possible values at various program points. vscale is an extra (implicit) operand to all vector operations with scalable type.> If not, we may have to enforce that this will not come to pass in its > current form.Why? If a user does asm or some other such trick to change what vscale means, that's on the user. If a machine has a VL that changes iteration-to-iteration, typically the compiler would be responsible for controlling it. If the vendor provides some target intrinsics to let the user write low-level vector code that changes vscale in a high-level language, then the vendor would be responsible for adding the necessary bits to the frontend and LLVM. I would not recommend a vendor try to do this. :) It wouldn't necessarily be hard to do, but it would be wasted work IMO because it would be better to improve the vectorizer that already exists.> In this case, changing it later will require *a lot* more effort than > doing it now.I don't see why. Anyone adding ability to change vscale would need to add intrinsics and specify their semantics. That shouldn't change anything about this proposal and any such additions shouldn't be hampered by this proposal. Another way to think of vscale/vector length is as a different kind of predicate. Right now LLVM uses select to track predicate application. It uses a "top-down" approach in that the root of an expression tree (a select) applies the predicate and presumably everything under it operates under that predicate. It also uses intrinsics for certain operations (loads, stores, etc.) that absolutely must be predicated no matter what for safety reasons. So it's sort of a hybrid approach, with predicate application at the root, certain leaves and maybe even on interior nodes (FP operations come to mind). To my knowledge, there's nothing in LLVM that checks to make sure these predicate applications are all consistent with one another. Someone could do a load with predicate 0011 and then a "select div" with predicate 1111, likely resulting in a runtime fault but nothing in LLVM would assert on the predicate mismatch. Predicates could also be applied only at the leaves and propagated up the tree. IIRC, Dan Gohman proposed something like this years back when the topic of predication came up. He called it "applymask" but unfortunately the Google is failing to find it. I *could* imagine using select to also convey application of vector length but that seems odd and unnecessarily complex. If vector length were applied at the leaves, it would take a bit of work to get it through instruction selection. Target opcodes would be one way to do it. I think it would be straightforward to walk the DAG and change generic opcodes to target opcodes when necessary. I don't think we should worry about taking IR with dynamic changes to VL and trying to generate good code for any random target from it. Such IR is very clearly tied to a specific kind of target and we shouldn't bother pretending otherwise. The vectorizer should be aware of the target's capabilities and generate code accordingly. -David
Renato Golin via llvm-dev
2018-Jul-31 11:13 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Tue, 31 Jul 2018 at 03:53, David A. Greene <dag at cray.com> wrote:> I wasn't talking about within an instruction but rather across > instructions in the same expression tree. Something like this would be > weird:Yes, that's what I was referring as "not in the API" therefore "user error".> The points where VL would be changed are limited and I think would > require limited, straightforward additions on top of this proposal.Indeed. I have a limited view on the spec and even more so on hardware implementations, but it is my understanding that there is no attempt to change VL mid-loop. If we can assume VL will be "the same" (not constant) throughout every self-contained sub-graph (from scalar|memory->vector to vector->scalar|memory), there we should encode it in the IR spec that this is a hard requirement. This seems consistent with your explanation of the Cray VL change as well as Bruce's description of RISC-V (both seem very similar to me), where VL can change between two loop iterations but not within the same iteration. We will still have to be careful with access safety (alias, loop dependencies, etc), but that shouldn't be different than if VL was required to be constant throughout the program.> That's right. This proposal doesn't expose a way to change vscale, but > I don't think it precludes a later addition to do so.That was my point about this change being harder to do later than now. I think no one wants to do that now, so we're all happy to pay the price later, because that will likely never come.> I don't see why predicate values would be affected at all. If a machine > with variable vector length has predicates, then typically the resulting > operation would operate on the bitwise AND of the predicate and a > conceptual all 1's predicate of length VL.I think the problem is that SVE is fully predicated and Cray (RISC-V?) is not, so mixing the two could lead into weird predication situations. So, if a high level optimisation pass assumes full predication and change the loop accordingly, and another pass assumes no predication and adds VL changes (say, loop tails), then we may end up with incompatible IR that will be hard to select down in ISel. Given that SVE has both predication and vscale change, this could happen in practice. It wouldn't be necessarily wrong, but it would have to be a conscious decision.> Changing vscale would be no different than changing any other value in > the program. The dataflow determines its possible values at various > program points. vscale is an extra (implicit) operand to all vector > operations with scalable type.It is, but IIGIR, changing vscale and predicating are similar transformations to achieve the similar goals, but will not be represented the same way in IR. Also, they're not always interchangeable, so that complicates the IR matching in ISel as well as potential matching in optimisation passes.> Why? If a user does asm or some other such trick to change what vscale > means, that's on the user. If a machine has a VL that changes > iteration-to-iteration, typically the compiler would be responsible for > controlling it.Not asm, sorry. Inline as is "user error". I meant: make sure adding an IR visible change in VL (say, an intrinsic or instruction), within a self-contained block, becomes an IR error.> If the vendor provides some target intrinsics to let the user write > low-level vector code that changes vscale in a high-level language, then > the vendor would be responsible for adding the necessary bits to the > frontend and LLVM. I would not recommend a vendor try to do this. :)Not recommending by making it an explicit error. :) It may sound harsh, but given we're taking some pretty liberal design choices right now, which could have long lasting impact on the stability and quality of LLVM's code generation, I'd say we need to be as conservative as possible.> I don't see why. Anyone adding ability to change vscale would need to > add intrinsics and specify their semantics. That shouldn't change > anything about this proposal and any such additions shouldn't be > hampered by this proposal.I don't think it would be hard to do, but it could have consequences to the rest of the optimisation and code generation pipeline. I do not claim to have a clear vision on any of this, but as I said above, it will pay off long term is we start conservative.> I don't think we should worry about taking IR with dynamic changes to VL > and trying to generate good code for any random target from it. Such IR > is very clearly tied to a specific kind of target and we shouldn't > bother pretending otherwise.We're preaching for the same goals. :) But we're trying to represent slightly different techniques (predication, vscale change) which need to be tied down to only exactly what they do. Being conservative and explicit on the semantics is, IMHO, the easiest path to get it right. We can surely expand later. -- cheers, --renato