Renato Golin via llvm-dev
2018-Jul-31 11:13 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Tue, 31 Jul 2018 at 03:53, David A. Greene <dag at cray.com> wrote:> I wasn't talking about within an instruction but rather across > instructions in the same expression tree. Something like this would be > weird:Yes, that's what I was referring as "not in the API" therefore "user error".> The points where VL would be changed are limited and I think would > require limited, straightforward additions on top of this proposal.Indeed. I have a limited view on the spec and even more so on hardware implementations, but it is my understanding that there is no attempt to change VL mid-loop. If we can assume VL will be "the same" (not constant) throughout every self-contained sub-graph (from scalar|memory->vector to vector->scalar|memory), there we should encode it in the IR spec that this is a hard requirement. This seems consistent with your explanation of the Cray VL change as well as Bruce's description of RISC-V (both seem very similar to me), where VL can change between two loop iterations but not within the same iteration. We will still have to be careful with access safety (alias, loop dependencies, etc), but that shouldn't be different than if VL was required to be constant throughout the program.> That's right. This proposal doesn't expose a way to change vscale, but > I don't think it precludes a later addition to do so.That was my point about this change being harder to do later than now. I think no one wants to do that now, so we're all happy to pay the price later, because that will likely never come.> I don't see why predicate values would be affected at all. If a machine > with variable vector length has predicates, then typically the resulting > operation would operate on the bitwise AND of the predicate and a > conceptual all 1's predicate of length VL.I think the problem is that SVE is fully predicated and Cray (RISC-V?) is not, so mixing the two could lead into weird predication situations. So, if a high level optimisation pass assumes full predication and change the loop accordingly, and another pass assumes no predication and adds VL changes (say, loop tails), then we may end up with incompatible IR that will be hard to select down in ISel. Given that SVE has both predication and vscale change, this could happen in practice. It wouldn't be necessarily wrong, but it would have to be a conscious decision.> Changing vscale would be no different than changing any other value in > the program. The dataflow determines its possible values at various > program points. vscale is an extra (implicit) operand to all vector > operations with scalable type.It is, but IIGIR, changing vscale and predicating are similar transformations to achieve the similar goals, but will not be represented the same way in IR. Also, they're not always interchangeable, so that complicates the IR matching in ISel as well as potential matching in optimisation passes.> Why? If a user does asm or some other such trick to change what vscale > means, that's on the user. If a machine has a VL that changes > iteration-to-iteration, typically the compiler would be responsible for > controlling it.Not asm, sorry. Inline as is "user error". I meant: make sure adding an IR visible change in VL (say, an intrinsic or instruction), within a self-contained block, becomes an IR error.> If the vendor provides some target intrinsics to let the user write > low-level vector code that changes vscale in a high-level language, then > the vendor would be responsible for adding the necessary bits to the > frontend and LLVM. I would not recommend a vendor try to do this. :)Not recommending by making it an explicit error. :) It may sound harsh, but given we're taking some pretty liberal design choices right now, which could have long lasting impact on the stability and quality of LLVM's code generation, I'd say we need to be as conservative as possible.> I don't see why. Anyone adding ability to change vscale would need to > add intrinsics and specify their semantics. That shouldn't change > anything about this proposal and any such additions shouldn't be > hampered by this proposal.I don't think it would be hard to do, but it could have consequences to the rest of the optimisation and code generation pipeline. I do not claim to have a clear vision on any of this, but as I said above, it will pay off long term is we start conservative.> I don't think we should worry about taking IR with dynamic changes to VL > and trying to generate good code for any random target from it. Such IR > is very clearly tied to a specific kind of target and we shouldn't > bother pretending otherwise.We're preaching for the same goals. :) But we're trying to represent slightly different techniques (predication, vscale change) which need to be tied down to only exactly what they do. Being conservative and explicit on the semantics is, IMHO, the easiest path to get it right. We can surely expand later. -- cheers, --renato
Bruce Hoult via llvm-dev
2018-Jul-31 12:48 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Tue, Jul 31, 2018 at 9:13 PM, Renato Golin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Indeed. I have a limited view on the spec and even more so on hardware > implementations, but it is my understanding that there is no attempt > to change VL mid-loop. > > If we can assume VL will be "the same" (not constant) throughout every > self-contained sub-graph (from scalar|memory->vector to > vector->scalar|memory), there we should encode it in the IR spec that > this is a hard requirement. >I don't see any harm in (very occasionally) making the VL shorter somewhere within an iteration of a loop. Some work that was already done will be wasted, but that's not a correctness problem. Making the VL longer mid-iteration would of course be very bad. The important thing is that the various source and destination pointers are updated by the correct amount at the end of the loop.> This seems consistent with your explanation of the Cray VL change as > well as Bruce's description of RISC-V (both seem very similar to me), > where VL can change between two loop iterations but not within the > same iteration. >I'm not sure whether it will end up being possible or not, but I did describe two situations where at least some RISC-V implementations might want to change VL within an iteration: 1) a memory protection problem on some trailing part of a vector load or store, causing that iteration to operate only on the accessible part, and the next iteration to start from the first address in the non-accessible part (and actually take a fault) 2) an interrupt/task switch in the middle of a loop iteration. Some implementations may want to save/restore only the vector configuration, not the values of the vector registers.> I don't see why predicate values would be affected at all. If a machine > > with variable vector length has predicates, then typically the resulting > > operation would operate on the bitwise AND of the predicate and a > > conceptual all 1's predicate of length VL. > > I think the problem is that SVE is fully predicated and Cray (RISC-V?) > is not, so mixing the two could lead into weird predication > situations. >The current RISC-V proposal has a 2-bit field in each vector instruction, with the values indicating: - it's actually scalar - vector operation with no predication - vector operation, masked by the predicate register - vector operation, masked by the inverse of the predicate register -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180731/0be965be/attachment.html>
Renato Golin via llvm-dev
2018-Jul-31 13:54 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Tue, 31 Jul 2018 at 13:48, Bruce Hoult <brucehoult at sifive.com> wrote:> I don't see any harm in (very occasionally) making the VL shorter somewhere within an iteration of a loop. Some work that was already done will be wasted, but that's not a correctness problem. Making the VL longer mid-iteration would of course be very bad. > The important thing is that the various source and destination pointers are updated by the correct amount at the end of the loop.If this is orthogonal to the IR representation, ie. doesn't need current instructions to *know* about it, but the sequence of IR instructions will represent it, than it should be fine.> I'm not sure whether it will end up being possible or not, but I did describe two situations where at least some RISC-V implementations might want to change VL within an iteration:Apologies, I may have misinterpreted them.> 1) a memory protection problem on some trailing part of a vector load or store, causing that iteration to operate only on the accessible part, and the next iteration to start from the first address in the non-accessible part (and actually take a fault)SVE deals with those problems with predication and FFR (first-fault-register), not by changing the VL, but I imagine they're semantically similar.> 2) an interrupt/task switch in the middle of a loop iteration. Some implementations may want to save/restore only the vector configuration, not the values of the vector registers.I assume the architecture will have to continue the program in the same state they were when the interrupt occurred. How it does shouldn't concern the code generation. -- cheers, --renato
David A. Greene via llvm-dev
2018-Jul-31 15:36 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Renato Golin <renato.golin at linaro.org> writes:>> The points where VL would be changed are limited and I think would >> require limited, straightforward additions on top of this proposal. > > Indeed. I have a limited view on the spec and even more so on hardware > implementations, but it is my understanding that there is no attempt > to change VL mid-loop.What does "mid-loop" mean? On traditional vector architectures it was very common to change VL for the last loop iteration. Otherwise you had to have a remainder loop. It was much better to change VL.> If we can assume VL will be "the same" (not constant) throughout every > self-contained sub-graph (from scalar|memory->vector to > vector->scalar|memory), there we should encode it in the IR spec that > this is a hard requirement. > > This seems consistent with your explanation of the Cray VL change as > well as Bruce's description of RISC-V (both seem very similar to me), > where VL can change between two loop iterations but not within the > same iteration.Ok, I think I am starting to grasp what you are saying. If a value flows from memory or some scalar computation to vector and then back to memory or scalar, VL should only ever be set at the start of the vector computation until it finishes and the value is deposited in memory or otherwise extracted. I think this is ok, but note that any vector functions called may change VL for the duration of the call. The change would not be visible to the caller. Just thinking this through, a case where one might want to change VL mid-stream is something like a half-length set of operations that feeds a vector concat and then a full length set of operations following. But again I think this would be a strange way to do things. If someone really wants to do this they can predicate away the upper bits of the half-length operations and maintain the same VL throughout the computation. If predication isn't available they they've got more serious problems vectorizing code. :)> We will still have to be careful with access safety (alias, loop > dependencies, etc), but that shouldn't be different than if VL was > required to be constant throughout the program.Yep.>> That's right. This proposal doesn't expose a way to change vscale, but >> I don't think it precludes a later addition to do so. > > That was my point about this change being harder to do later than now.I guess I don't see why it would be any harder later.> I think no one wants to do that now, so we're all happy to pay the > price later, because that will likely never come.I am not so sure about that. Power requirements may very well drive more dynamic vector lengths. Even today some AVX 512 implementations falter if there are "too many" 512-bit operations. Scaling back SIMD width statically is very common today and doing so dynamically seems like an obvious extension. I don't know of any efforts to do this so it's all speculative at this point. But the industry has done it in the past and we have a curious pattern of reinventing things we did before.>> I don't see why predicate values would be affected at all. If a machine >> with variable vector length has predicates, then typically the resulting >> operation would operate on the bitwise AND of the predicate and a >> conceptual all 1's predicate of length VL. > > I think the problem is that SVE is fully predicated and Cray (RISC-V?) > is not, so mixing the two could lead into weird predication > situations.Cray vector ISAs were fully predicated and also used a vector length. It didn't cause us any serious issues. In many ways having an adjustable VL and predication makes things easier because you don't have to regenerate predicates to switch to a shorter VL.> So, if a high level optimisation pass assumes full predication and > change the loop accordingly, and another pass assumes no predication > and adds VL changes (say, loop tails), then we may end up with > incompatible IR that will be hard to select down in ISel. > > Given that SVE has both predication and vscale change, this could > happen in practice. It wouldn't be necessarily wrong, but it would > have to be a conscious decision.It seems strange to me for an optimizer to operate in such a way. The optimizer should be fully aware of the target's capabilities and use them accordingly. But let's say this happens. Pass 1 vectorizes the loop with predication (for a conditional loop body) and creates a remainder loop, which would also need to be predicated. Note that such a remainder loop is not necessary with full predication support but for the sake of argument lets say pass 1 is not too smart. Pass 2 comes along and says, "hey, I have the ability to change VL so we don't need a remainder loop." It rewrites the main loop to use dynamic VL and removes the remainder loop. During that rewrite, pass 2 would have to maintain predication. It can use the very same predicate values pass 1 generated. There is no need to adjust them because the VL is applied "on top of" the predicates. Pass 2 effectively rewrites the code to what the vectorizer should have emitted in the first place. I'm not seeing how ISel is any more difficult. SVE has an implicit vscale operand on every instruction and ARM seems to have no difficulty selecting instructions for it. Changing the value of vscale shouldn't impact ISel at all. The same instructions are selected.>> Changing vscale would be no different than changing any other value in >> the program. The dataflow determines its possible values at various >> program points. vscale is an extra (implicit) operand to all vector >> operations with scalable type. > > It is, but IIGIR, changing vscale and predicating are similar > transformations to achieve the similar goals, but will not be > represented the same way in IR.They probably will not be represented the same way, though I think they could be (but probably shouldn't be).> Also, they're not always interchangeable, so that complicates the IR > matching in ISel as well as potential matching in optimisation passes.I'm not sure it does but I haven't worked something all the way through.>> Why? If a user does asm or some other such trick to change what vscale >> means, that's on the user. If a machine has a VL that changes >> iteration-to-iteration, typically the compiler would be responsible for >> controlling it. > > Not asm, sorry. Inline as is "user error".Ok.> I meant: make sure adding an IR visible change in VL (say, an > intrinsic or instruction), within a self-contained block, becomes an > IR error.What do you mean by "self-contained block?" Assuming I understood it correctly, the restriction you described at the top seems reasonable for now.>> If the vendor provides some target intrinsics to let the user write >> low-level vector code that changes vscale in a high-level language, then >> the vendor would be responsible for adding the necessary bits to the >> frontend and LLVM. I would not recommend a vendor try to do this. :) > > Not recommending by making it an explicit error. :) > > It may sound harsh, but given we're taking some pretty liberal design > choices right now, which could have long lasting impact on the > stability and quality of LLVM's code generation, I'd say we need to be > as conservative as possible.Ok, but would be optimizer be prevented from introducing VL changes?>> I don't see why. Anyone adding ability to change vscale would need to >> add intrinsics and specify their semantics. That shouldn't change >> anything about this proposal and any such additions shouldn't be >> hampered by this proposal. > > I don't think it would be hard to do, but it could have consequences > to the rest of the optimisation and code generation pipeline.It could. I don't think any of us has a clear idea of what those might be.> I do not claim to have a clear vision on any of this, but as I said > above, it will pay off long term is we start conservative.Being conservative is fine, but we should have a clear understanding of exactly what that means. I would not want to prohibit all VL changes now and forever, because I see that as unnecessarily restrictive and possibly damaging to supporting future architectures. If we don't want to provide intrinsics for changing VL right now, I'm all in favor. There would be no reason to add error checks because there would be no way within the IR to change VL. But I don't want to preclude adding such intrinsics in the future.>> I don't think we should worry about taking IR with dynamic changes to VL >> and trying to generate good code for any random target from it. Such IR >> is very clearly tied to a specific kind of target and we shouldn't >> bother pretending otherwise. > > We're preaching for the same goals. :)Good! :)> But we're trying to represent slightly different techniques > (predication, vscale change) which need to be tied down to only > exactly what they do.Wouldn't intrinsics to change vscale do exactly that?> Being conservative and explicit on the semantics is, IMHO, the easiest > path to get it right. We can surely expand later.I'm all for being explicit. I think we're basically on the same page, though there are a few things noted above where I need a little more clarity. -David
Renato Golin via llvm-dev
2018-Jul-31 18:21 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi David, Let me put the last two comments up:> > But we're trying to represent slightly different techniques > > (predication, vscale change) which need to be tied down to only > > exactly what they do. > > Wouldn't intrinsics to change vscale do exactly that?You're right. I've been using the same overloaded term and this is probably what caused the confusion. In some cases, predicating and shortening the vectors are semantically equivalent. In this case, the IR should also be equivalent. Instructions/intrinsics that handle predication could be used by the backend to simply change VL instead, as long as it's guaranteed that the semantics are identical. There are no problems here. In other cases, for example widening or splitting the vector, or cases we haven't thought of yet, the semantics are not the same, and having them in IR would be bad. I think we're all in agreements on that. All I'm asking is that we make a list of what we want to happen and disallow everything else explicitly, until someone comes with a strong case for it. Makes sense?> I'm all for being explicit. I think we're basically on the same page, > though there are a few things noted above where I need a little more > clarity.Yup, I think we are. :)> What does "mid-loop" mean? On traditional vector architectures it was > very common to change VL for the last loop iteration. Otherwise you had > to have a remainder loop. It was much better to change VL.You got it below...> Ok, I think I am starting to grasp what you are saying. If a value > flows from memory or some scalar computation to vector and then back to > memory or scalar, VL should only ever be set at the start of the vector > computation until it finishes and the value is deposited in memory or > otherwise extracted. I think this is ok, but note that any vector > functions called may change VL for the duration of the call. The change > would not be visible to the caller.If a function is called and changes the length, does it restore back on return?> I am not so sure about that. Power requirements may very well drive > more dynamic vector lengths. Even today some AVX 512 implementations > falter if there are "too many" 512-bit operations. Scaling back SIMD > width statically is very common today and doing so dynamically seems > like an obvious extension. I don't know of any efforts to do this so > it's all speculative at this point. But the industry has done it in the > past and we have a curious pattern of reinventing things we did before.Right, so it's not as clear cut as I hoped. But we can start implementing the basic idea and then expand as we go. I think trying to hash out all potential scenarios now will drive us crazy.> It seems strange to me for an optimizer to operate in such a way. The > optimizer should be fully aware of the target's capabilities and use > them accordingly.Mid-end optimisers tend to be fairly agnostic. And when not, they usually ask "is this supported" instead of "which one is better".> ARM seems to have no difficulty selecting instructions for it. Changing > the value of vscale shouldn't impact ISel at all. The same instructions > are selected.I may very well be getting lost in too many floating future ideas, atm. :)> > It is, but IIGIR, changing vscale and predicating are similar > > transformations to achieve the similar goals, but will not be > > represented the same way in IR. > > They probably will not be represented the same way, though I think they > could be (but probably shouldn't be).Maybe in the simple cases (like last iteration) they should be?> Ok, but would be optimizer be prevented from introducing VL changes?In the case where they're represented in similar ways in IR, it wouldn't need to. Otherwise, we'd have to teach the two methods to IR optimisers that are virtually identical in semantics. It'd be left for the back end to implement the last iteration notation as a predicate fill or a vscale change.> Being conservative is fine, but we should have a clear understanding of > exactly what that means. I would not want to prohibit all VL changes > now and forever, because I see that as unnecessarily restrictive and > possibly damaging to supporting future architectures. > > If we don't want to provide intrinsics for changing VL right now, I'm > all in favor. There would be no reason to add error checks because > there would be no way within the IR to change VL.Right, I think we're converging. How about we don't forbid changes in vscale, but we find a common notation for all the cases where predicating and changing vscale would be semantically identical, and implement those in the same way. Later on, if there are additional cases where changes in vscale would be beneficial, we can discuss them independently. Makes sense? -- cheers, --renato