Sander De Smalen via llvm-dev
2021-Jun-10 20:50 UTC
[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?
Hi, Last year we added the InstructionCost class which adds the ability to represent that an operation cannot be costed, i.e. operations that cannot be expanded by the code-generator will have an invalid cost. We started using this information in the Loop Vectorizer for scalable auto-vectorization. The LV has a legality- and a cost-model stage, which are conceptually separate concepts with different purposes. But with the introduction of having valid/invalid costs it's more inviting to use the cost-model as 'legalisation', which leads us to the following question: Should we be using the cost-model to do legalisation? 'Legalisation' in this context means asking the question beforehand if the code-generator can handle the IR emitted from the LV. Examples of operations that need such legalisation are predicated divides (at least until we can use the llvm.vp intrinsics), or intrinsic calls that have no scalable-vector equivalent. For fixed-width vectors this legalisation issue is mostly moot, since operations on fixed-width vectors can be scalarised. For scalable vectors this is neither supported nor feasible [1]. This means there's the option to do one of two things: [Option 1] Add checks to the LV legalisation to see if scalable-vectorisation is feasible. If so, assert the cost must be valid. Otherwise discard scalable VFs as possible candidates. * This has the benefit that the compiler can avoid calculating/considering VPlans that we know cannot be costed. * Legalisation and cost-model keep each other in check. If something cannot be costed then either the cost-model or legalisation was incomplete. [Option 2] Leave the question about legalisation to the CostModel, i.e. if the CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid selecting that VF. * This has the benefit that we don't need to do work up-front to discard scalable VFs, keeping the LV design simpler. * This makes gaps in the cost-model more difficult to spot. Note that it's not useful to combine Option 1 and Option 2, because having two ways to choose from takes away the need to do legalisation beforehand, and so that's basically a choice for Option 2. Both approaches lead to the same end-result, but we currently have a few patches in flight that have taken Option 1, and this led to some questions about the approach from both Florian and David Green. So we're looking to reach to a consensus and decision on what way to move forward. I've tentatively added this as a topic to the agenda of the upcoming LLVM SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an opportunity to discuss this more freely if we can get enough people who actively work on the LV together in that meeting (like Florian and David, although please forward to anyone else who might have input on this). Thanks, Sander [1] Expanding the vector operation into a scalarisation loop is currently not supported. It could be done, but we have done extensive experimentation with loops that handle each element of a scalable vector sequentially, but this has never proved beneficial, even when using special instructions to efficiently increment the predicate vector. I doubt this will be any different for other scalable vector architectures, because of the loop control overhead. Also the insertion/extraction of elements from a scalable vector is unlikely to be as cheap as for fixed-width vectors. [2] https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing
Sjoerd Meijer via llvm-dev
2021-Jun-11 06:35 UTC
[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?
Please correct me if I am wrong, but I thought this discussion was brought up by a temporarily workaround in the cost-model, working around current codegen limitations that needs fixing. I am asking because Option 1 is what we currently have, and I don't see reasons to depart from this general idea, even if the cost-model can return Invalid due to a workaround that would hopefully disappear soon. That would mean the assert that the legalisation and cost-model are in sync would need to be skipped, and while that is not ideal, I don't see that as a big problem and I don't see it as a total departure from Option 1, especially if this is all temporarily. And does this discussion disappear if the codegen issues are fixed? I don't know the scale of the problem/work, but is it not easier to fix that avoiding this cost-model vs. legalisation discussion? ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Sander De Smalen via llvm-dev <llvm-dev at lists.llvm.org> Sent: 10 June 2021 21:50 To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation? Hi, Last year we added the InstructionCost class which adds the ability to represent that an operation cannot be costed, i.e. operations that cannot be expanded by the code-generator will have an invalid cost. We started using this information in the Loop Vectorizer for scalable auto-vectorization. The LV has a legality- and a cost-model stage, which are conceptually separate concepts with different purposes. But with the introduction of having valid/invalid costs it's more inviting to use the cost-model as 'legalisation', which leads us to the following question: Should we be using the cost-model to do legalisation? 'Legalisation' in this context means asking the question beforehand if the code-generator can handle the IR emitted from the LV. Examples of operations that need such legalisation are predicated divides (at least until we can use the llvm.vp intrinsics), or intrinsic calls that have no scalable-vector equivalent. For fixed-width vectors this legalisation issue is mostly moot, since operations on fixed-width vectors can be scalarised. For scalable vectors this is neither supported nor feasible [1]. This means there's the option to do one of two things: [Option 1] Add checks to the LV legalisation to see if scalable-vectorisation is feasible. If so, assert the cost must be valid. Otherwise discard scalable VFs as possible candidates. * This has the benefit that the compiler can avoid calculating/considering VPlans that we know cannot be costed. * Legalisation and cost-model keep each other in check. If something cannot be costed then either the cost-model or legalisation was incomplete. [Option 2] Leave the question about legalisation to the CostModel, i.e. if the CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid selecting that VF. * This has the benefit that we don't need to do work up-front to discard scalable VFs, keeping the LV design simpler. * This makes gaps in the cost-model more difficult to spot. Note that it's not useful to combine Option 1 and Option 2, because having two ways to choose from takes away the need to do legalisation beforehand, and so that's basically a choice for Option 2. Both approaches lead to the same end-result, but we currently have a few patches in flight that have taken Option 1, and this led to some questions about the approach from both Florian and David Green. So we're looking to reach to a consensus and decision on what way to move forward. I've tentatively added this as a topic to the agenda of the upcoming LLVM SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an opportunity to discuss this more freely if we can get enough people who actively work on the LV together in that meeting (like Florian and David, although please forward to anyone else who might have input on this). Thanks, Sander [1] Expanding the vector operation into a scalarisation loop is currently not supported. It could be done, but we have done extensive experimentation with loops that handle each element of a scalable vector sequentially, but this has never proved beneficial, even when using special instructions to efficiently increment the predicate vector. I doubt this will be any different for other scalable vector architectures, because of the loop control overhead. Also the insertion/extraction of elements from a scalable vector is unlikely to be as cheap as for fixed-width vectors. [2] https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210611/89e7b799/attachment.html>
Vineet Kumar via llvm-dev
2021-Jun-11 18:36 UTC
[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?
Hi, IIUC, for option 1, unless the legalizer and the cost-model follow the same logic to determine the feasibility of scalable-vectorization, it seems inaccurate to assert that if something is legal it must have a valid cost. But if that assertion is relaxed, then as you mentioned in another reply, it would be a de facto choice for option 2. If we don't relax this assertion then for it to be accurate legalizer and cost-model will both do redundant work. Also, IIRC, an invalid cost also models a too-expensive-to-be-useful operation but that doesn't really imply illegal operation. I am not sure if this makes sense, but with option 2, since VPlans are not discarded upfront there is a chance that a transofrmation on VPlan might actually make it feasible. I didn't think deeply about this so I might be missing some obvious and key facts. Please correct me if I am wrong. Best, Vineet On 2021-06-10 10:50 p.m., Sander De Smalen via llvm-dev wrote:> Hi, > > Last year we added the InstructionCost class which adds the ability to > represent that an operation cannot be costed, i.e. operations that cannot > be expanded by the code-generator will have an invalid cost. > > We started using this information in the Loop Vectorizer for scalable > auto-vectorization. The LV has a legality- and a cost-model stage, which are > conceptually separate concepts with different purposes. But with the > introduction of having valid/invalid costs it's more inviting to use the > cost-model as 'legalisation', which leads us to the following question: > > Should we be using the cost-model to do legalisation? > > 'Legalisation' in this context means asking the question beforehand if the > code-generator can handle the IR emitted from the LV. Examples of > operations that need such legalisation are predicated divides (at least > until we can use the llvm.vp intrinsics), or intrinsic calls that have no > scalable-vector equivalent. For fixed-width vectors this legalisation issue > is mostly moot, since operations on fixed-width vectors can be scalarised. > For scalable vectors this is neither supported nor feasible [1]. > > This means there's the option to do one of two things: > > [Option 1] > > Add checks to the LV legalisation to see if scalable-vectorisation is > feasible. If so, assert the cost must be valid. Otherwise discard scalable > VFs as possible candidates. > * This has the benefit that the compiler can avoid > calculating/considering VPlans that we know cannot be costed. > * Legalisation and cost-model keep each other in check. If something > cannot be costed then either the cost-model or legalisation was > incomplete. > > > [Option 2] > > Leave the question about legalisation to the CostModel, i.e. if the > CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid > selecting that VF. > * This has the benefit that we don't need to do work up-front to > discard scalable VFs, keeping the LV design simpler. > * This makes gaps in the cost-model more difficult to spot. > > Note that it's not useful to combine Option 1 and Option 2, because having > two ways to choose from takes away the need to do legalisation beforehand, > and so that's basically a choice for Option 2. > > Both approaches lead to the same end-result, but we currently have a few > patches in flight that have taken Option 1, and this led to some questions > about the approach from both Florian and David Green. So we're looking to > reach to a consensus and decision on what way to move forward. > > I've tentatively added this as a topic to the agenda of the upcoming LLVM > SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an > opportunity to discuss this more freely if we can get enough people who > actively work on the LV together in that meeting (like Florian and David, > although please forward to anyone else who might have input on this). > > Thanks, > > Sander > > > [1] Expanding the vector operation into a scalarisation loop is currently > not supported. It could be done, but we have done extensive > experimentation with loops that handle each element of a scalable > vector sequentially, but this has never proved beneficial, even when > using special instructions to efficiently increment the predicate > vector. I doubt this will be any different for other scalable vector > architectures, because of the loop control overhead. Also the > insertion/extraction of elements from a scalable vector is unlikely to > be as cheap as for fixed-width vectors. > > [2] https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-devhttp://bsc.es/disclaimer
Florian Hahn via llvm-dev
2021-Jun-15 11:48 UTC
[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?
Hi, Thanks for bringing this up!> On Jun 10, 2021, at 21:50, Sander De Smalen <Sander.DeSmalen at arm.com> wrote: > > Hi, > > Last year we added the InstructionCost class which adds the ability to > represent that an operation cannot be costed, i.e. operations that cannot > be expanded by the code-generator will have an invalid cost. > > We started using this information in the Loop Vectorizer for scalable > auto-vectorization. The LV has a legality- and a cost-model stage, which are > conceptually separate concepts with different purposes. But with the > introduction of having valid/invalid costs it's more inviting to use the > cost-model as 'legalisation', which leads us to the following question: > > Should we be using the cost-model to do legalisation? > > 'Legalisation' in this context means asking the question beforehand if the > code-generator can handle the IR emitted from the LV. Examples of > operations that need such legalisation are predicated divides (at least > until we can use the llvm.vp intrinsics), or intrinsic calls that have no > scalable-vector equivalent. For fixed-width vectors this legalisation issue > is mostly moot, since operations on fixed-width vectors can be scalarised. > For scalable vectors this is neither supported nor feasible [1].I think this is one of the key points. LoopVectorLegality at the moment is mostly concerned whether the loop is not vectorizable due to conceptual issues (not sure sure what the best term would be) preventing vectorization (like unvectorizable dependences or FP constraints), not target specific instruction legality constraints like whether there is a vector version of a given instruction. It is also independent of any given concrete VF. IIUC this is also not limited to predication, e.g. we also need to check whether intrinsic calls are supported natively. Are there any others? As you mentioned, predicated instructions that are not supported by a target can be scalarized and predicated. Even for fixed vectors, this is quite expensive in general, but it allows LoopVectorLegality to work mostly independently of other target constraints and focus on general legality checks. IIUC conceptually there’s nothing preventing us from scalarizing operations on scalable vectors in a similar fashion, but it requires an explicit loop dependent on the vector width at runtime and deciding the cost depends on an unknown (the target vector width). I don’t have any insight in hardware-specific costs for scalable vector operations, so I can’t comment on any specifics there. I am wondering if the following hypothetical scenario is feasible/realistic: consider a loop where all operations expect one can be widened directly and a single operations needs scalarizing. I would expect there to be a point where the number of widenable operations gets large enough to offset the cost of emitting a loop to scalarize the single operation needing predication. Granted, depending on the maximum vector width of the hardware this number might be quite large, but I could imagine a scenario where we want to optimize given a tighter upper bound on the maximum vector width than allowed by the hardware spec. E.g. considering the an upper bound of 256 for the vector width, we’d only need to execute 2 iterations of such a loop on AArch64. It would also be interesting to get addition perspective on the cost question from people familiar with other hardware supporting scalable vectors. As a side-note, there is a use of TTI in LVL, but it is rather problematic. At the moment, LVL considers loops with non-temporal memory operations as unvectorizable up-front, if the vector version for an arbitrary VF (in that case 2 was chosen) is illegal on the target. This has the unfortunate side effect that using non-temporal stores flat out blocks vectorization if there’s no legal non-temporal load/store for VF = 2, which can be very surprising to our users, especially on AArch64 where a single element non-temporal memory operation may not be legal to start with and non-temporal stores may be legal at higher VFs (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp#L788 <https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp#L788>). It’s not a perfect example, illustrates one of the issues of bailing out too early.> This means there's the option to do one of two things: > > > [Option 1] > > Add checks to the LV legalisation to see if scalable-vectorisation is > feasible. If so, assert the cost must be valid. Otherwise discard scalable > VFs as possible candidates. > * This has the benefit that the compiler can avoid > calculating/considering VPlans that we know cannot be costed. > * Legalisation and cost-model keep each other in check. If something > cannot be costed then either the cost-model or legalisation was > incomplete. > > > [Option 2] > > Leave the question about legalisation to the CostModel, i.e. if the > CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid > selecting that VF. > * This has the benefit that we don't need to do work up-front to > discard scalable VFs, keeping the LV design simpler. > * This makes gaps in the cost-model more difficult to spot. >I think if we would support scalarization for scalable vectors via a loop, then the cost of predicating any given instruction would not be invalid, just possibly quite high (depending on the upper bound for the vector), right? So we should still be able to assert that each result != Invalid.> Note that it's not useful to combine Option 1 and Option 2, because having > two ways to choose from takes away the need to do legalisation beforehand, > and so that's basically a choice for Option 2. > > Both approaches lead to the same end-result, but we currently have a few > patches in flight that have taken Option 1, and this led to some questions > about the approach from both Florian and David Green. So we're looking to > reach to a consensus and decision on what way to move forward.I think one concern that came up during the discussion was that option 1 means that we need to add multiple isLegalToXXX helpers to TTI, which need to be kept in sync with the already existing cost functions. It also more closely couples legality checks and cost-modeling. I’m not sure if/how much native predication support will complicate things and there may be more places that need to be taught about native predication support. I’m not saying this is a blocker, just a trade-off to consider. Also, if we missed a case or add a new case that may require scalarization we would need to update/add additional checks. Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/62bf3d39/attachment.html>