Renato Golin via llvm-dev
2018-Jul-31 18:21 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi David, Let me put the last two comments up:> > But we're trying to represent slightly different techniques > > (predication, vscale change) which need to be tied down to only > > exactly what they do. > > Wouldn't intrinsics to change vscale do exactly that?You're right. I've been using the same overloaded term and this is probably what caused the confusion. In some cases, predicating and shortening the vectors are semantically equivalent. In this case, the IR should also be equivalent. Instructions/intrinsics that handle predication could be used by the backend to simply change VL instead, as long as it's guaranteed that the semantics are identical. There are no problems here. In other cases, for example widening or splitting the vector, or cases we haven't thought of yet, the semantics are not the same, and having them in IR would be bad. I think we're all in agreements on that. All I'm asking is that we make a list of what we want to happen and disallow everything else explicitly, until someone comes with a strong case for it. Makes sense?> I'm all for being explicit. I think we're basically on the same page, > though there are a few things noted above where I need a little more > clarity.Yup, I think we are. :)> What does "mid-loop" mean? On traditional vector architectures it was > very common to change VL for the last loop iteration. Otherwise you had > to have a remainder loop. It was much better to change VL.You got it below...> Ok, I think I am starting to grasp what you are saying. If a value > flows from memory or some scalar computation to vector and then back to > memory or scalar, VL should only ever be set at the start of the vector > computation until it finishes and the value is deposited in memory or > otherwise extracted. I think this is ok, but note that any vector > functions called may change VL for the duration of the call. The change > would not be visible to the caller.If a function is called and changes the length, does it restore back on return?> I am not so sure about that. Power requirements may very well drive > more dynamic vector lengths. Even today some AVX 512 implementations > falter if there are "too many" 512-bit operations. Scaling back SIMD > width statically is very common today and doing so dynamically seems > like an obvious extension. I don't know of any efforts to do this so > it's all speculative at this point. But the industry has done it in the > past and we have a curious pattern of reinventing things we did before.Right, so it's not as clear cut as I hoped. But we can start implementing the basic idea and then expand as we go. I think trying to hash out all potential scenarios now will drive us crazy.> It seems strange to me for an optimizer to operate in such a way. The > optimizer should be fully aware of the target's capabilities and use > them accordingly.Mid-end optimisers tend to be fairly agnostic. And when not, they usually ask "is this supported" instead of "which one is better".> ARM seems to have no difficulty selecting instructions for it. Changing > the value of vscale shouldn't impact ISel at all. The same instructions > are selected.I may very well be getting lost in too many floating future ideas, atm. :)> > It is, but IIGIR, changing vscale and predicating are similar > > transformations to achieve the similar goals, but will not be > > represented the same way in IR. > > They probably will not be represented the same way, though I think they > could be (but probably shouldn't be).Maybe in the simple cases (like last iteration) they should be?> Ok, but would be optimizer be prevented from introducing VL changes?In the case where they're represented in similar ways in IR, it wouldn't need to. Otherwise, we'd have to teach the two methods to IR optimisers that are virtually identical in semantics. It'd be left for the back end to implement the last iteration notation as a predicate fill or a vscale change.> Being conservative is fine, but we should have a clear understanding of > exactly what that means. I would not want to prohibit all VL changes > now and forever, because I see that as unnecessarily restrictive and > possibly damaging to supporting future architectures. > > If we don't want to provide intrinsics for changing VL right now, I'm > all in favor. There would be no reason to add error checks because > there would be no way within the IR to change VL.Right, I think we're converging. How about we don't forbid changes in vscale, but we find a common notation for all the cases where predicating and changing vscale would be semantically identical, and implement those in the same way. Later on, if there are additional cases where changes in vscale would be beneficial, we can discuss them independently. Makes sense? -- cheers, --renato
David A. Greene via llvm-dev
2018-Jul-31 19:10 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:> Hi David, > > Let me put the last two comments up: > >> > But we're trying to represent slightly different techniques >> > (predication, vscale change) which need to be tied down to only >> > exactly what they do. >> >> Wouldn't intrinsics to change vscale do exactly that? > > You're right. I've been using the same overloaded term and this is > probably what caused the confusion.Me too. Thanks Robin for clarifying this for all of us! I'll try to follow this terminology: VL/active vector length - The software notion of how many elements to operate on; a special case of predication vscale - The hardware notion of how big a vector register is TL;DR - Changing VL in a function doesn't affect anything about this proposal, but changing vscale might. Changing VL shouldn't impact things like ISel at all but changing vscale might. Changing vscale is (much) more difficult than changing VL.> In some cases, predicating and shortening the vectors are semantically > equivalent. In this case, the IR should also be equivalent. > Instructions/intrinsics that handle predication could be used by the > backend to simply change VL instead, as long as it's guaranteed that > the semantics are identical. There are no problems here.Right. Changing VL is no problem. I think even reducing vscale is ok from an IR perspective, if a little strange.> In other cases, for example widening or splitting the vector, or cases > we haven't thought of yet, the semantics are not the same, and having > them in IR would be bad. I think we're all in agreements on that.You mean going from a shorter active vector length to a longer active vector length? Or smaller vscale to larger vscale? The latter would be bad. The former seems ok if the dataflow is captured and the vectorizer generates correct code to account for it. Presumably it would if it is the thing changing the active vector length.> All I'm asking is that we make a list of what we want to happen and > disallow everything else explicitly, until someone comes with a strong > case for it. Makes sense?Yes.>> Ok, I think I am starting to grasp what you are saying. If a value >> flows from memory or some scalar computation to vector and then back to >> memory or scalar, VL should only ever be set at the start of the vector >> computation until it finishes and the value is deposited in memory or >> otherwise extracted. I think this is ok, but note that any vector >> functions called may change VL for the duration of the call. The change >> would not be visible to the caller. > > If a function is called and changes the length, does it restore back on return?If a function changes VL, it would typically restore it before return. This would be an ABI guarantee just like any other callee-save register. If a function changes vscale, I don't know. The RISC-V people seem to have thought the most about this. I have no point of reference here.> Right, so it's not as clear cut as I hoped. But we can start > implementing the basic idea and then expand as we go. I think trying > to hash out all potential scenarios now will drive us crazy.Sure.>> It seems strange to me for an optimizer to operate in such a way. The >> optimizer should be fully aware of the target's capabilities and use >> them accordingly. > > Mid-end optimisers tend to be fairly agnostic. And when not, they > usually ask "is this supported" instead of "which one is better".Yes, the "is this supported" question is common. Isn't the whole point of VPlan to get the "which one is better" question answered for vectorization? That would be necessarily tied to the target. The questions asked can be agnostic, like the target-agnostics bits of codegen use, but the answers would be target-specific.>> ARM seems to have no difficulty selecting instructions for it. Changing >> the value of vscale shouldn't impact ISel at all. The same instructions >> are selected. > > I may very well be getting lost in too many floating future ideas, atm. :)Given our clearer terminology, my statement above is maybe not correct. Changing vscale *would* impact the IR and codegen (stack allocation, etc.). Changing VL would not, other than adding some Instructions to capture the semantics. I suspect neither would change ISel (I know VL would not) but as you say I don't think we need concern ourselves with changing vscale right now, unless others have a dire need to support it.>> > It is, but IIGIR, changing vscale and predicating are similar >> > transformations to achieve the similar goals, but will not be >> > represented the same way in IR. >> >> They probably will not be represented the same way, though I think they >> could be (but probably shouldn't be). > > Maybe in the simple cases (like last iteration) they should be?Perhaps changing VL could be modeled the same way but I have a feeling it will be awkward. Changing vscale is something totally different and likely should be represented differently if allowed at all.>> Ok, but would be optimizer be prevented from introducing VL changes? > > In the case where they're represented in similar ways in IR, it > wouldn't need to.It would have to generate IR code to effect the software change in VL somehow, by altering predicates or by using special instrinsics or some other way.> Otherwise, we'd have to teach the two methods to IR optimisers that > are virtually identical in semantics. It'd be left for the back end to > implement the last iteration notation as a predicate fill or a vscale > change.I suspect that is too late. The vectorizer needs to account for the choice and pick the most profitable course. That's one of the reasons I think modeling VL changes like predicates is maybe unnecessarily complex. If VL is modeled as "just another predicate" then there's no guarantee that ISel will honor the choices the vectorizer made to use VL over predication. If it's modeled explicitly, ISel should have an easier time generating the code the vectorizer expects. VL changes aren't always on the last iteration. The Cray X1 had an instruction (I would have to dust off old manuals to remember the mnemonic) with somewhat strange semantics to get the desired VL for an iteration. Code would look something like this: loop top: vl = getvl N # N contains the number of iterations left <do computation> N = N - vl branch N > 0, loop top The "getvl" instruction would usually return the full hardware vector register length (MAXVL), except on the 2nd-to-last iteration if N was larger than MAXVL but less than 2*MAXVL it would return something like <N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL). The last iteration would then run at the same VL or one less depending on whether N was odd or even. So the last two iterations would often run at less than MAXVL and often at different VLs from each other. And no, I don't know why the hardware operated this way. :)>> Being conservative is fine, but we should have a clear understanding of >> exactly what that means. I would not want to prohibit all VL changes >> now and forever, because I see that as unnecessarily restrictive and >> possibly damaging to supporting future architectures. >> >> If we don't want to provide intrinsics for changing VL right now, I'm >> all in favor. There would be no reason to add error checks because >> there would be no way within the IR to change VL. > > Right, I think we're converging.Agreed.> How about we don't forbid changes in vscale, but we find a common > notation for all the cases where predicating and changing vscale would > be semantically identical, and implement those in the same way. > > Later on, if there are additional cases where changes in vscale would > be beneficial, we can discuss them independently. > > Makes sense?Again trying to use the VL/vscale terminology: Changing vscale - no IR support currently and less likely in the future Changing VL - no IR support currently but more likely in the future The second seems like a straightforward extension to me. There will be some questions about how to represent VL semantics in IR but those don't impact the proposal under discussion at all. The first seems much harder, at least within a function. It may or may not impact the proposal under discussion. It sounds like the RISC-V people have some use cases so those should probably be the focal point of this discussion. -David
Renato Golin via llvm-dev
2018-Jul-31 19:36 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On Tue, 31 Jul 2018 at 20:10, David A. Greene <dag at cray.com> wrote:> Me too. Thanks Robin for clarifying this for all of us! I'll try to > follow this terminology:+1> TL;DR - Changing VL in a function doesn't affect anything about this > proposal, but changing vscale might. Changing VL shouldn't > impact things like ISel at all but changing vscale might. > Changing vscale is (much) more difficult than changing VL.Absolutely agreed. :)> Right. Changing VL is no problem. I think even reducing vscale is ok > from an IR perspective, if a little strange.Yup.> You mean going from a shorter active vector length to a longer active > vector length? Or smaller vscale to larger vscale? The latter would be > bad.The latter. Bad indeed.> If a function changes vscale, I don't know. The RISC-V people seem to > have thought the most about this. I have no point of reference here.I think the consensus is that this would be bad. So we should maybe encode it as an error.> Yes, the "is this supported" question is common. Isn't the whole point > of VPlan to get the "which one is better" question answered for > vectorization?Yes, but the cost is high. We can have that in the vectoriser, as it's a heavy pass and we're conscious, but we shouldn't make all other passes "that smart".> Changing vscale *would* impact the IR and codegen (stack allocation, > etc.). Changing VL would not, other than adding some Instructions to > capture the semantics. I suspect neither would change ISel (I know VL > would not) but as you say I don't think we need concern ourselves with > changing vscale right now, unless others have a dire need to support it.Perfect! :)> Perhaps changing VL could be modeled the same way but I have a feeling > it will be awkward. Changing vscale is something totally different and > likely should be represented differently if allowed at all.Right, I was talking about vscale. It would be awkward, but if this is the only thing the hardware supports (ie. no predication), than it's up to the back-end to lower how it sees fit. In IR, we still see as a predication.> Again trying to use the VL/vscale terminology: > > Changing vscale - no IR support currently and less likely in the future > Changing VL - no IR support currently but more likely in the futureSGTM.> The second seems like a straightforward extension to me. There will be > some questions about how to represent VL semantics in IR but those don't > impact the proposal under discussion at all.Should be equivalent to predication, I imagine.> The first seems much harder, at least within a function.And it would require exposing the instruction to change it in IR.> It may or may not impact the proposal under discussion.As per Robin's email, it doesn't. Functions are vscale boundaries in their current proposal. -- cheers, --renato
Robin Kruppe via llvm-dev
2018-Jul-31 20:17 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
On 31 July 2018 at 21:10, David A. Greene via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> Hi David, >> >> Let me put the last two comments up: >> >>> > But we're trying to represent slightly different techniques >>> > (predication, vscale change) which need to be tied down to only >>> > exactly what they do. >>> >>> Wouldn't intrinsics to change vscale do exactly that? >> >> You're right. I've been using the same overloaded term and this is >> probably what caused the confusion. > > Me too. Thanks Robin for clarifying this for all of us! I'll try to > follow this terminology: > > VL/active vector length - The software notion of how many elements to > operate on; a special case of predication > > vscale - The hardware notion of how big a vector register is > > TL;DR - Changing VL in a function doesn't affect anything about this > proposal, but changing vscale might. Changing VL shouldn't > impact things like ISel at all but changing vscale might. > Changing vscale is (much) more difficult than changing VL.Great, seems like we're all in violent agreement that VL changes are a non-issue for the discussion at hand.>> In some cases, predicating and shortening the vectors are semantically >> equivalent. In this case, the IR should also be equivalent. >> Instructions/intrinsics that handle predication could be used by the >> backend to simply change VL instead, as long as it's guaranteed that >> the semantics are identical. There are no problems here. > > Right. Changing VL is no problem. I think even reducing vscale is ok > from an IR perspective, if a little strange. > >> In other cases, for example widening or splitting the vector, or cases >> we haven't thought of yet, the semantics are not the same, and having >> them in IR would be bad. I think we're all in agreements on that. > > You mean going from a shorter active vector length to a longer active > vector length? Or smaller vscale to larger vscale? The latter would be > bad. The former seems ok if the dataflow is captured and the vectorizer > generates correct code to account for it. Presumably it would if it is > the thing changing the active vector length. > >> All I'm asking is that we make a list of what we want to happen and >> disallow everything else explicitly, until someone comes with a strong >> case for it. Makes sense? > > Yes. > >>> Ok, I think I am starting to grasp what you are saying. If a value >>> flows from memory or some scalar computation to vector and then back to >>> memory or scalar, VL should only ever be set at the start of the vector >>> computation until it finishes and the value is deposited in memory or >>> otherwise extracted. I think this is ok, but note that any vector >>> functions called may change VL for the duration of the call. The change >>> would not be visible to the caller. >> >> If a function is called and changes the length, does it restore back on return? > > If a function changes VL, it would typically restore it before return. > This would be an ABI guarantee just like any other callee-save register. > > If a function changes vscale, I don't know. The RISC-V people seem to > have thought the most about this. I have no point of reference here. > >> Right, so it's not as clear cut as I hoped. But we can start >> implementing the basic idea and then expand as we go. I think trying >> to hash out all potential scenarios now will drive us crazy. > > Sure. > >>> It seems strange to me for an optimizer to operate in such a way. The >>> optimizer should be fully aware of the target's capabilities and use >>> them accordingly. >> >> Mid-end optimisers tend to be fairly agnostic. And when not, they >> usually ask "is this supported" instead of "which one is better". > > Yes, the "is this supported" question is common. Isn't the whole point > of VPlan to get the "which one is better" question answered for > vectorization? That would be necessarily tied to the target. The > questions asked can be agnostic, like the target-agnostics bits of > codegen use, but the answers would be target-specific.Just like the old loop vectorizer, VPlan will need a cost model that is based on properties of the target, exposed to the optimizer in the form of e.g. TargetLowering hooks. But we should try really hard to avoid having a hard distinction between e.g. predication- and VL-based loops in the VPlan representation. Duplicating or triplicating vectorization logic would be really bad, and there are a lot of similarities that we can exploit to avoid that. For a simple example, SVE and RVV both want the same basic loop skeleton: strip-mining with predication of the loop body derived from the induction variable. Hopefully we can have a 99% unified VPlan pipeline and most differences can be delegated to the final VPlan->IR step and the respective backends. + Diego, Florian and others that have been discussing this previously>>> ARM seems to have no difficulty selecting instructions for it. Changing >>> the value of vscale shouldn't impact ISel at all. The same instructions >>> are selected. >> >> I may very well be getting lost in too many floating future ideas, atm. :) > > Given our clearer terminology, my statement above is maybe not correct. > Changing vscale *would* impact the IR and codegen (stack allocation, > etc.). Changing VL would not, other than adding some Instructions to > capture the semantics. I suspect neither would change ISel (I know VL > would not) but as you say I don't think we need concern ourselves with > changing vscale right now, unless others have a dire need to support it. > >>> > It is, but IIGIR, changing vscale and predicating are similar >>> > transformations to achieve the similar goals, but will not be >>> > represented the same way in IR. >>> >>> They probably will not be represented the same way, though I think they >>> could be (but probably shouldn't be). >> >> Maybe in the simple cases (like last iteration) they should be? > > Perhaps changing VL could be modeled the same way but I have a feeling > it will be awkward. Changing vscale is something totally different and > likely should be represented differently if allowed at all. > >>> Ok, but would be optimizer be prevented from introducing VL changes? >> >> In the case where they're represented in similar ways in IR, it >> wouldn't need to. > > It would have to generate IR code to effect the software change in VL > somehow, by altering predicates or by using special instrinsics or some > other way. > >> Otherwise, we'd have to teach the two methods to IR optimisers that >> are virtually identical in semantics. It'd be left for the back end to >> implement the last iteration notation as a predicate fill or a vscale >> change. > > I suspect that is too late. The vectorizer needs to account for the > choice and pick the most profitable course. That's one of the reasons I > think modeling VL changes like predicates is maybe unnecessarily > complex. If VL is modeled as "just another predicate" then there's no > guarantee that ISel will honor the choices the vectorizer made to use VL > over predication. If it's modeled explicitly, ISel should have an > easier time generating the code the vectorizer expects. > > VL changes aren't always on the last iteration. The Cray X1 had an > instruction (I would have to dust off old manuals to remember the > mnemonic) with somewhat strange semantics to get the desired VL for an > iteration. Code would look something like this: > > loop top: > vl = getvl N # N contains the number of iterations left > <do computation> > N = N - vl > branch N > 0, loop top > > The "getvl" instruction would usually return the full hardware vector > register length (MAXVL), except on the 2nd-to-last iteration if N was > larger than MAXVL but less than 2*MAXVL it would return something like > <N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL). The last > iteration would then run at the same VL or one less depending on whether > N was odd or even. So the last two iterations would often run at less > than MAXVL and often at different VLs from each other.FWIW this is exactly how the RISC-V vector unit works -- unsurprisingly, since it owes a lot to Cray-style processors :)> And no, I don't know why the hardware operated this way. :) > >>> Being conservative is fine, but we should have a clear understanding of >>> exactly what that means. I would not want to prohibit all VL changes >>> now and forever, because I see that as unnecessarily restrictive and >>> possibly damaging to supporting future architectures. >>> >>> If we don't want to provide intrinsics for changing VL right now, I'm >>> all in favor. There would be no reason to add error checks because >>> there would be no way within the IR to change VL. >> >> Right, I think we're converging. > > Agreed.+1, there is no need to deal with VL at all at this point. I would even say there isn't even any concept of VL in IR at all at this time. At some point in the future I will propose something in this space to support RISC-V vectors, but we'll cross that bridge when we come to it.>> How about we don't forbid changes in vscale, but we find a common >> notation for all the cases where predicating and changing vscale would >> be semantically identical, and implement those in the same way. >> >> Later on, if there are additional cases where changes in vscale would >> be beneficial, we can discuss them independently. >> >> Makes sense? > > Again trying to use the VL/vscale terminology: > > Changing vscale - no IR support currently and less likely in the future > Changing VL - no IR support currently but more likely in the future > > The second seems like a straightforward extension to me. There will be > some questions about how to represent VL semantics in IR but those don't > impact the proposal under discussion at all. > > The first seems much harder, at least within a function. It may or may > not impact the proposal under discussion. It sounds like the RISC-V > people have some use cases so those should probably be the focal point > of this discussion.Yes, for RISC-V we definitely need vscale to vary a bit, but are fine with limiting that to function boundaries. The use case is *not* "changing how large vectors are" in the middle of a loop or something like that, which we all agree is very dubious at best. The RISC-V vector unit is just very configurable (number of registers, vector element sizes, etc.) and this configuration can impact how large the vector registers are. For any given vectorized loop next we want to configure the vector unit to suit that piece of code and run the loop with whatever register size that configuration yields. And when that loop is done, we stop using the vector unit entirely and disable it, so that the next loop can use it differently, possibly with a different register size. For IR modeling purposes, I propose to enlarge "loop nest" to "function" but the same principle applies, it just means all vectorized loops in the function will have to share a configuration. Without getting too far into the details, does this make sense as a use case? Cheers, Robin> -David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev