Saito, Hideki via llvm-dev
2017-Dec-06 00:21 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Status Update on VPlan ---- where we are currently, and what's ahead of us ========================================================= Goal: ----- Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan. The goal of this status update is to summarize the progress and the future steps needed. Background: ----------- This is related to the VPlan infrastructure project we started a while back, a project to extend the (inner loop vectorization focused) Loop Vectorizer to support outer loop vectorization. VPlan is the vectorization planner that records the decisions and candidate directions to pursue in order to drive cost modeling and vector code generation. When it is fully integrated into LV (i.e., at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and transform it starting from the abstraction of the input IR to reflect current vectorization decisions being made. The HCFG eventually becomes the abstraction of the output IR, and the vector code generation is driven by this abstract representation. Please refer to the following for more detailed background: RFCs http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html (Extending LV to vectorize outerloops) http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html (Introducing VPlan to model the vectorized code and drive its transformation) "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop Auto-Vectorization" (Saito, et.al.) 2016 LLVM Developers' Meeting https://www.youtube.com/watch?v=XXAvdUwO7kQ "Introducing VPlan to the LoopVectorizer" (Rapaport and Zaks) 2017 EuroLLVM Developers' Meeting https://www.youtube.com/watch?v=IqzJRs6tb7Y "Vectorizing Loops with VPlan - Current State and Next Steps" (Zaks and Rapaport) 2017 LLVM Developers' Meeting https://www.youtube.com/watch?v=BjBSJFzYDVk Patches Committed: ------------------ Two big patches have been submitted/committed. https://reviews.llvm.org/D28975 by Gil Rapaport. (Introducing VPlan to model the vectorized code and drive its transformation) Has been broken down to a series of smaller patches and went in. The last (re)commit of the series is https://reviews.llvm.org/rL311849 https://reviews.llvm.org/D38676 by Gil Rapaport. (Modeling masking in VPlan, introducing VPInstructions) This is also being broken down to a series of smaller patches to facilitate the review. Committed as https://reviews.llvm.org/rL318645 Where We Are: ------------- With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction. All these, so far, are the refactoring of (still innermost loop vectorization centric) Loop Vectorizer's existing functionality to explicitly model what was implicitly handled before. Future Refactoring Needed: -------------------------- The following aspects of LV still need to be refactored into the VPlan based representation. This list is non-exhaustive, but should give you a ball park of the amount of work left here. * Predication * Cost model * Remainder Loop * Runtime Guards * External Users * Reduction Epilog * Interleave Grouping * Sink Scalar Operands Work Needed for Simple Outer Loop Vectorization: ------------------------------------------------ * Improve uniformity/divergence analysis ----- Uniformity in innermost loop vectorization is invariance. For outer loop vectorization, there are uniform values that are not invariant. * Better predication ---- Retaining uniform backedge is a must-have. Retaining uniform forward branch is good for inner loop vectorization as well. * Masking on HCFG * Code Generation driven by VPlan/HCFG Additional Work Needed to Handle Higher Complexity: --------------------------------------------------- * Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check) * VPlan to VPlan transform of divergent inner loop control flow into uniform loop control flow + divergent acyclic control flow (all vector elements has to iterate the same number of times) * Predication on the transformed VPlan. Additional Work Needed for Outer Loop Auto-Vectorization: --------------------------------------------------------- * Legality check * Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner). Other Enhancements (out of the scope of this doc): -------------------------------------------------- * Remainder Loop Vectorizaion * SLP and LV in one Vectorizer * Nested Vectorization * ... Related Work: ------------- In the previous RFC, we went with the direction to convert Function Vectorization into Loop Vectorization. When such a function has a loop inside, the loop vectorization needed in that scenario is "outer loop vectorization". http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html (X. Tian, RFC for vectorizing a call --- caller side and callee side) https://reviews.llvm.org/D22792 (M. Masten, Converting Function Vectorization to Loop Vectorization) https://reviews.llvm.org/D40575 (M. Masten, Caller side support for invoking vector function from vector loop) Related work of related work. Math lib vectorization using SVML. http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html (M. Masten, RFC for vector math lib call using Intel SVML) https://reviews.llvm.org/D19544 (M. Masten, vector math lib call using Intel SVML) Summary: -------- Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework. Shortly, we will send out an RFC to solicit community feedback on our plan to trade-off between 1) making concurrent progress on refactoring and outer loop vectorization and 2) finish refactoring and then adding outer loop vectorization. Please stay tuned. Thanks, Hideki Saito
Renato Golin via llvm-dev
2017-Dec-06 09:37 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Hi Hideki/Ayal/Gil et. al, First of all, thank you very much for the (past, current and future) efforts in the vectoriser. It's much appreciated! On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote:> With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction.This was expected, as we move into a radically different model. I think the current approach to implement & refactor is a good one and we must continue like that. Pushing for too many features will break the compiler and too much refactoring will break the spirits of everyone involved.> Additional Work Needed to Handle Higher Complexity: > --------------------------------------------------- > * Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check) > > Additional Work Needed for Outer Loop Auto-Vectorization: > --------------------------------------------------------- > * Legality check > * Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner).On these points, we may need to make it more clear what happens when. There is an overall legality check, but there also may be VPlan-specific legality issues (especially as we move to outer-loop vectorisation) that will not be obvious before we create the VPlans. I'm not too worried about illegal transformations made legal by VPlans (for example Polyhedral or inner-loop LICM), but the other way round, where we may break things outside a VPlan (for instance, A->C is legal but A->B->C is not). I can't think of anything right now (why I used "A" and "B"), but I'd welcome thoughts on the impact of more complex VPlans on the whole legality->cost->transform model.> Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework.Sounds like a plan! cheers, --renato
Florian Hahn via llvm-dev
2017-Dec-06 14:21 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Hi, On 06/12/2017 09:37, Renato Golin via llvm-dev wrote:> Hi Hideki/Ayal/Gil et. al, > > First of all, thank you very much for the (past, current and future) > efforts in the vectoriser. It's much appreciated! > > > On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction. > > This was expected, as we move into a radically different model. I > think the current approach to implement & refactor is a good one and > we must continue like that. Pushing for too many features will break > the compiler and too much refactoring will break the spirits of > everyone involved. > > >> Additional Work Needed to Handle Higher Complexity: >> --------------------------------------------------- >> * Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check) >> >> Additional Work Needed for Outer Loop Auto-Vectorization: >> --------------------------------------------------------- >> * Legality check >> * Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner). > > On these points, we may need to make it more clear what happens when. > There is an overall legality check, but there also may be > VPlan-specific legality issues (especially as we move to outer-loop > vectorisation) that will not be obvious before we create the VPlans. > > I'm not too worried about illegal transformations made legal by VPlans > (for example Polyhedral or inner-loop LICM), but the other way round, > where we may break things outside a VPlan (for instance, A->C is legal > but A->B->C is not). I can't think of anything right now (why I used > "A" and "B"), but I'd welcome thoughts on the impact of more complex > VPlans on the whole legality->cost->transform model. > > >> Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework. >That sounds like an excellent idea! Any concrete ideas/plans how people could get involved, besides doing reviews? Cheers, Florian
Saito, Hideki via llvm-dev
2017-Dec-06 20:10 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
>This was expected, as we move into a radically different model. I think the current approach to implement & refactor is a good one and we must continue like that.Outer loop vectorization implementation plan (http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html) is also like that. Since outer loop was never supported in LV, we can safely do everything on VPlan infrastructure w/o causing any regressions in functionality/performance. That should help everyone think where will be the best places in VPlan to fit the (remaining) non-VPlan aspects of LV.>On these points, we may need to make it more clear what happens when. >There is an overall legality check, but there also may be VPlan-specific legality issues (especially as we move to outer-loop >vectorisation) that will not be obvious before we create the VPlans. > >I'm not too worried about illegal transformations made legal by VPlans (for example Polyhedral or inner-loop LICM), but the other way round, where we may break things outside a VPlan (for instance, A->C is legal but A->B->C is not). I can't >think of anything right now (why I used "A" and "B"), but I'd welcome thoughts on the impact of more complex VPlans on the whole legality->cost->transform model.I'm not 100% sure if you and I are talking about the same thing here, but one of the easy ways to transform a legal-to-vectorize loop into an illegal-to-vectorize loop is THEN and ELSE flipping, when THEN ==> ELSE forward dependence exists. After vectorization legality is "ensured" (OpenMP simd is one like that, ensured by programmer before clang parses the code), we can't flip THEN and ELSE w/o making sure that dependence won't exist between THEN and ELSE ---- and this is relevant in inner loop vectorization scenario also. I think we need to document when/where different kinds of vectorization legality assurance happens and what transformations can break that before actual vectorization transformation kicks in. Thanks, Hideki -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org] Sent: Wednesday, December 06, 2017 1:38 AM To: Saito, Hideki <hideki.saito at intel.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us Hi Hideki/Ayal/Gil et. al, First of all, thank you very much for the (past, current and future) efforts in the vectoriser. It's much appreciated! On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote:> With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction.This was expected, as we move into a radically different model. I think the current approach to implement & refactor is a good one and we must continue like that. Pushing for too many features will break the compiler and too much refactoring will break the spirits of everyone involved.> Additional Work Needed to Handle Higher Complexity: > --------------------------------------------------- > * Construct VPlan near the beginning of LV (right after Legal or > Must-Vectorize directive check) > > Additional Work Needed for Outer Loop Auto-Vectorization: > --------------------------------------------------------- > * Legality check > * Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner).On these points, we may need to make it more clear what happens when. There is an overall legality check, but there also may be VPlan-specific legality issues (especially as we move to outer-loop vectorisation) that will not be obvious before we create the VPlans. I'm not too worried about illegal transformations made legal by VPlans (for example Polyhedral or inner-loop LICM), but the other way round, where we may break things outside a VPlan (for instance, A->C is legal but A->B->C is not). I can't think of anything right now (why I used "A" and "B"), but I'd welcome thoughts on the impact of more complex VPlans on the whole legality->cost->transform model.> Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework.Sounds like a plan! cheers, --renato
Serge Preis via llvm-dev
2017-Dec-14 11:14 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
<div>Hello,</div><div> </div><div>Just minor comment.</div><div> </div><blockquote><div>* Improve uniformity/divergence analysis ----- Uniformity in innermost loop vectorization is<br /> invariance. For outer loop vectorization, there are uniform values that are not invariant.</div></blockquote><div> </div><div>I believe that uniformity/divergence analysis is one of key technologies for efficient vectorization, so I appreciate you bringing this up and looking forward to extensive and comprehensive framework here.</div><div> </div><div>In fact there is uniformity in inner loop vectorization that is not invariance. Expressions like a[i/16] are uniform under certain conditions (namely i starts with 0 mod min(VL, 16), and 16 % VL == 0) while not invariant. It is unfortunate for many media codes operating on blocks that loop vectorizer (at least in my experience) cannot detect and harness this uniformity. I may even try to look into improving this if someone give me pointers where to start.</div><div> </div><div>Regards,</div><div>Serge Preis</div><div> </div><div> </div><div> </div><div>06.12.2017, 07:22, "Saito, Hideki via llvm-dev" <llvm-dev@lists.llvm.org>:</div><blockquote type="cite"><p><br />Status Update on VPlan ---- where we are currently, and what's ahead of us<br />==========================================================<br /> <br />Goal:<br />-----<br />Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan.<br />The goal of this status update is to summarize the progress and the future steps needed.<br /> <br />Background:<br />-----------<br />This is related to the VPlan infrastructure project we started a while back, a project to extend the (inner loop vectorization focused) Loop Vectorizer to support outer loop vectorization. VPlan is the vectorization planner that records the decisions and candidate directions to pursue in order to drive cost modeling and vector code generation. When it is fully integrated into LV (i.e., at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and transform it starting from the abstraction of the input IR to reflect current vectorization decisions being made. The HCFG eventually becomes the abstraction of the output IR, and the vector code generation is driven by this abstract representation.<br /> <br />Please refer to the following for more detailed background:<br /> <br />RFCs<br /> <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html">http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html</a> (Extending LV to vectorize outerloops)<br /> <a href="http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html">http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html</a> (Introducing VPlan to model the vectorized code and drive its transformation)<br /> <br />"Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop Auto-Vectorization" (Saito, et.al.)<br />2016 LLVM Developers' Meeting<br /><span>https://www.youtube.com/watch?v=XXAvdUwO7kQ</span><br /> <br />"Introducing VPlan to the LoopVectorizer" (Rapaport and Zaks)<br />2017 EuroLLVM Developers' Meeting<br /><span>https://www.youtube.com/watch?v=IqzJRs6tb7Y</span><br />"Vectorizing Loops with VPlan - Current State and Next Steps" (Zaks and Rapaport)<br />2017 LLVM Developers' Meeting<br /><span>https://www.youtube.com/watch?v=BjBSJFzYDVk</span><br /> <br />Patches Committed:<br />------------------<br />Two big patches have been submitted/committed.<br /><a href="https://reviews.llvm.org/D28975">https://reviews.llvm.org/D28975</a> by Gil Rapaport. (Introducing VPlan to model the vectorized code and drive its transformation)<br /> Has been broken down to a series of smaller patches and went in. The last (re)commit of the series is<br /> <a href="https://reviews.llvm.org/rL311849">https://reviews.llvm.org/rL311849</a><br /><a href="https://reviews.llvm.org/D38676">https://reviews.llvm.org/D38676</a> by Gil Rapaport. (Modeling masking in VPlan, introducing VPInstructions)<br /> This is also being broken down to a series of smaller patches to facilitate the review.<br /> Committed as <a href="https://reviews.llvm.org/rL318645">https://reviews.llvm.org/rL318645</a><br /> <br />Where We Are:<br />-------------<br />With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see <a href="https://reviews.llvm.org/D38676">https://reviews.llvm.org/D38676</a> for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction.<br /> <br />All these, so far, are the refactoring of (still innermost loop vectorization centric) Loop Vectorizer's existing functionality to explicitly model what was implicitly handled before.<br /> <br />Future Refactoring Needed:<br />--------------------------<br />The following aspects of LV still need to be refactored into the VPlan based representation. This list is non-exhaustive, but should give you a ball park of the amount of work left here.<br />* Predication<br />* Cost model<br />* Remainder Loop<br />* Runtime Guards<br />* External Users<br />* Reduction Epilog<br />* Interleave Grouping<br />* Sink Scalar Operands<br /> <br />Work Needed for Simple Outer Loop Vectorization:<br />------------------------------------------------<br />* Improve uniformity/divergence analysis ----- Uniformity in innermost loop vectorization is<br /> invariance. For outer loop vectorization, there are uniform values that are not invariant.<br />* Better predication ---- Retaining uniform backedge is a must-have. Retaining uniform forward<br /> branch is good for inner loop vectorization as well.<br />* Masking on HCFG<br />* Code Generation driven by VPlan/HCFG<br /> <br />Additional Work Needed to Handle Higher Complexity:<br />---------------------------------------------------<br />* Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check)<br />* VPlan to VPlan transform of divergent inner loop control flow into uniform loop control<br /> flow + divergent acyclic control flow (all vector elements has to iterate the same number of times)<br />* Predication on the transformed VPlan.<br /> <br />Additional Work Needed for Outer Loop Auto-Vectorization:<br />---------------------------------------------------------<br />* Legality check<br />* Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner).<br /> <br />Other Enhancements (out of the scope of this doc):<br />--------------------------------------------------<br />* Remainder Loop Vectorizaion<br />* SLP and LV in one Vectorizer<br />* Nested Vectorization<br />* ...<br /><br />Related Work:<br />-------------<br />In the previous RFC, we went with the direction to convert Function Vectorization into Loop Vectorization. When such a function has a loop inside,<br />the loop vectorization needed in that scenario is "outer loop vectorization".<br /><a href="http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html">http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html</a> (X. Tian, RFC for vectorizing a call --- caller side and callee side)<br /><a href="https://reviews.llvm.org/D22792">https://reviews.llvm.org/D22792</a> (M. Masten, Converting Function Vectorization to Loop Vectorization)<br /><a href="https://reviews.llvm.org/D40575">https://reviews.llvm.org/D40575</a> (M. Masten, Caller side support for invoking vector function from vector loop)<br /><br />Related work of related work. Math lib vectorization using SVML.<br /><a href="http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html">http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html</a> (M. Masten, RFC for vector math lib call using Intel SVML)<br /><a href="https://reviews.llvm.org/D19544">https://reviews.llvm.org/D19544</a> (M. Masten, vector math lib call using Intel SVML)<br /> <br />Summary:<br />--------<br />Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework.<br />Shortly, we will send out an RFC to solicit community feedback on our plan to trade-off between 1) making concurrent progress on refactoring and outer loop vectorization and 2) finish refactoring and then adding outer loop vectorization.<br />Please stay tuned.<br /> <br />Thanks,<br />Hideki Saito<br /><br />_______________________________________________<br />LLVM Developers mailing list<br /><a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br /><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></p></blockquote>
Saito, Hideki via llvm-dev
2017-Dec-14 23:08 UTC
[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
We are working with Univ. of Saarland folks for this aspect. What you wrote is true (and you know I know that) ---- I just didn’t write too much details in that one-liner explanation on why we need to work in that area, as I expect Simon Moll (U. Saarland) to be sending in his RFC on this topic in not too distant future. We think Divergence Analysis (DA) code from Region Vectorizer (RV) project has good potential for reuse in Outer Loop Vectorization project (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html), and good divergence analysis should also help innermost loop vectorization (e.g., gather/scatter versus unit-stride). I suggest first trying to get in touch with Simon if you are interested in this aspect of vectorization to see what DA in RV already has. Let us know if you are also interested in the outer loop vectorization. There are plenty of things for everyone interested. Thanks, Hideki From: Serge Preis [mailto:spreis at yandex-team.ru] Sent: Thursday, December 14, 2017 3:15 AM To: Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us Hello, Just minor comment. * Improve uniformity/divergence analysis ----- Uniformity in innermost loop vectorization is invariance. For outer loop vectorization, there are uniform values that are not invariant. I believe that uniformity/divergence analysis is one of key technologies for efficient vectorization, so I appreciate you bringing this up and looking forward to extensive and comprehensive framework here. In fact there is uniformity in inner loop vectorization that is not invariance. Expressions like a[i/16] are uniform under certain conditions (namely i starts with 0 mod min(VL, 16), and 16 % VL == 0) while not invariant. It is unfortunate for many media codes operating on blocks that loop vectorizer (at least in my experience) cannot detect and harness this uniformity. I may even try to look into improving this if someone give me pointers where to start. Regards, Serge Preis 06.12.2017, 07:22, "Saito, Hideki via llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>: Status Update on VPlan ---- where we are currently, and what's ahead of us ========================================================= Goal: ----- Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan. The goal of this status update is to summarize the progress and the future steps needed. Background: ----------- This is related to the VPlan infrastructure project we started a while back, a project to extend the (inner loop vectorization focused) Loop Vectorizer to support outer loop vectorization. VPlan is the vectorization planner that records the decisions and candidate directions to pursue in order to drive cost modeling and vector code generation. When it is fully integrated into LV (i.e., at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and transform it starting from the abstraction of the input IR to reflect current vectorization decisions being made. The HCFG eventually becomes the abstraction of the output IR, and the vector code generation is driven by this abstract representation. Please refer to the following for more detailed background: RFCs http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html (Extending LV to vectorize outerloops) http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html (Introducing VPlan to model the vectorized code and drive its transformation) "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop Auto-Vectorization" (Saito, et.al.) 2016 LLVM Developers' Meeting https://www.youtube.com/watch?v=XXAvdUwO7kQ "Introducing VPlan to the LoopVectorizer" (Rapaport and Zaks) 2017 EuroLLVM Developers' Meeting https://www.youtube.com/watch?v=IqzJRs6tb7Y "Vectorizing Loops with VPlan - Current State and Next Steps" (Zaks and Rapaport) 2017 LLVM Developers' Meeting https://www.youtube.com/watch?v=BjBSJFzYDVk Patches Committed: ------------------ Two big patches have been submitted/committed. https://reviews.llvm.org/D28975 by Gil Rapaport. (Introducing VPlan to model the vectorized code and drive its transformation) Has been broken down to a series of smaller patches and went in. The last (re)commit of the series is https://reviews.llvm.org/rL311849 https://reviews.llvm.org/D38676 by Gil Rapaport. (Modeling masking in VPlan, introducing VPInstructions) This is also being broken down to a series of smaller patches to facilitate the review. Committed as https://reviews.llvm.org/rL318645 Where We Are: ------------- With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details) which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction. All these, so far, are the refactoring of (still innermost loop vectorization centric) Loop Vectorizer's existing functionality to explicitly model what was implicitly handled before. Future Refactoring Needed: -------------------------- The following aspects of LV still need to be refactored into the VPlan based representation. This list is non-exhaustive, but should give you a ball park of the amount of work left here. * Predication * Cost model * Remainder Loop * Runtime Guards * External Users * Reduction Epilog * Interleave Grouping * Sink Scalar Operands Work Needed for Simple Outer Loop Vectorization: ------------------------------------------------ * Improve uniformity/divergence analysis ----- Uniformity in innermost loop vectorization is invariance. For outer loop vectorization, there are uniform values that are not invariant. * Better predication ---- Retaining uniform backedge is a must-have. Retaining uniform forward branch is good for inner loop vectorization as well. * Masking on HCFG * Code Generation driven by VPlan/HCFG Additional Work Needed to Handle Higher Complexity: --------------------------------------------------- * Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check) * VPlan to VPlan transform of divergent inner loop control flow into uniform loop control flow + divergent acyclic control flow (all vector elements has to iterate the same number of times) * Predication on the transformed VPlan. Additional Work Needed for Outer Loop Auto-Vectorization: --------------------------------------------------------- * Legality check * Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner). Other Enhancements (out of the scope of this doc): -------------------------------------------------- * Remainder Loop Vectorizaion * SLP and LV in one Vectorizer * Nested Vectorization * ... Related Work: ------------- In the previous RFC, we went with the direction to convert Function Vectorization into Loop Vectorization. When such a function has a loop inside, the loop vectorization needed in that scenario is "outer loop vectorization". http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html (X. Tian, RFC for vectorizing a call --- caller side and callee side) https://reviews.llvm.org/D22792 (M. Masten, Converting Function Vectorization to Loop Vectorization) https://reviews.llvm.org/D40575 (M. Masten, Caller side support for invoking vector function from vector loop) Related work of related work. Math lib vectorization using SVML. http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html (M. Masten, RFC for vector math lib call using Intel SVML) https://reviews.llvm.org/D19544 (M. Masten, vector math lib call using Intel SVML) Summary: -------- Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework. Shortly, we will send out an RFC to solicit community feedback on our plan to trade-off between 1) making concurrent progress on refactoring and outer loop vectorization and 2) finish refactoring and then adding outer loop vectorization. Please stay tuned. Thanks, Hideki Saito _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171214/2d5dbd9f/attachment-0001.html>