thr3ads.net - llvm dev - [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2017-Dec-06 00:21 UTC

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Status Update on VPlan ---- where we are currently, and what's ahead of us
========================================================= 
Goal:
-----
Extending Loop Vectorizer (LV) such that it can handle outer loops, via
uplifting its infrastructure with VPlan.
The goal of this status update is to summarize the progress and the future steps
needed.
 
Background:
-----------
This is related to the VPlan infrastructure project we started a while back, a
project to extend the (inner loop vectorization focused) Loop Vectorizer to
support outer loop vectorization. VPlan is the vectorization planner that
records the decisions and candidate directions to pursue in order to drive cost
modeling and vector code generation. When it is fully integrated into LV (i.e.,
at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and
transform it starting from the abstraction of the input IR to reflect current
vectorization decisions being made. The HCFG eventually becomes the abstraction
of the output IR, and the vector code generation is driven by this abstract
representation.
 
Please refer to the following for more detailed background:
 
RFCs
       http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html
(Extending LV to vectorize outerloops)
       http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html 
(Introducing VPlan to model the vectorized code and drive its transformation)
 
"Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
Auto-Vectorization"  (Saito, et.al.)
2016 LLVM Developers' Meeting
https://www.youtube.com/watch?v=XXAvdUwO7kQ
 
"Introducing VPlan to the LoopVectorizer"     (Rapaport and Zaks)
2017 EuroLLVM Developers' Meeting
https://www.youtube.com/watch?v=IqzJRs6tb7Y
"Vectorizing Loops with VPlan - Current State and Next Steps"   (Zaks
and Rapaport)
2017 LLVM Developers' Meeting
https://www.youtube.com/watch?v=BjBSJFzYDVk
 
Patches Committed:
------------------
Two big patches have been submitted/committed.
https://reviews.llvm.org/D28975 by Gil Rapaport. (Introducing VPlan to model the
vectorized code and drive its transformation)
     Has been broken down to a series of smaller patches and went in. The last
(re)commit of the series is
     https://reviews.llvm.org/rL311849
https://reviews.llvm.org/D38676 by Gil Rapaport. (Modeling masking in VPlan,
introducing VPInstructions)
     This is also being broken down to a series of smaller patches to facilitate
the review.
     Committed as https://reviews.llvm.org/rL318645
 
Where We Are:
-------------
With the first patch, we introduced the concept of VPlan to LV and started
explicitly recording decisions like interleave memory access optimization and
serialization. In the first patch, we resisted introducing VPInstructions -----
and introduced VPRecipes instead, in an attempt to avoid duplicating
Instructions in the abstract HCFG Representation (i.e., abstract Instructions in
HCFG that is separate from incoming IR Instructions). As we moved on, it became
more and more apparent that we have a need to introduce new abstract
Instructions (see https://reviews.llvm.org/D38676 for more details)  which also
requires representation of new use-def relations that does not exist in incoming
IR Instructions. As a result, with the second patch, as part of explicitly
modeling masking in VPlan, we introduced VPInstruction, which is an abstraction
of IR Instruction.
 
All these, so far, are the refactoring of (still innermost loop vectorization
centric) Loop Vectorizer's existing functionality to explicitly model what
was implicitly handled before.
 
Future Refactoring Needed:
--------------------------
The following aspects of LV still need to be refactored into the VPlan based
representation. This list is non-exhaustive, but should give you a ball park of
the amount of work left here.
* Predication
* Cost model
* Remainder Loop
* Runtime Guards
* External Users
* Reduction Epilog
* Interleave Grouping
* Sink Scalar Operands
 
Work Needed for Simple Outer Loop Vectorization:
------------------------------------------------
* Improve uniformity/divergence analysis  ----- Uniformity in innermost loop
vectorization is
   invariance. For outer loop vectorization, there are uniform values that are
not invariant.
* Better predication ---- Retaining uniform backedge is a must-have. Retaining
uniform forward
   branch is good for inner loop vectorization as well.
* Masking on HCFG
* Code Generation driven by VPlan/HCFG
 
Additional Work Needed to Handle Higher Complexity:
---------------------------------------------------
* Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize
directive check)
* VPlan to VPlan transform of divergent inner loop control flow into uniform
loop control
   flow + divergent acyclic control flow (all vector elements has to iterate the
same number of times)
* Predication on the transformed VPlan.
 
Additional Work Needed for Outer Loop Auto-Vectorization:
---------------------------------------------------------
* Legality check
* Cost modeling (compare it to inner loop vectorization strategy in
apples-to-apples manner).
 
Other Enhancements (out of the scope of this doc):
--------------------------------------------------
* Remainder Loop Vectorizaion
* SLP and LV in one Vectorizer
* Nested Vectorization
* ...

Related Work:
-------------
In the previous RFC, we went with the direction to convert Function
Vectorization into Loop Vectorization. When such a function has a loop inside,
the loop vectorization needed in that scenario is "outer loop
vectorization".
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html (X. Tian, RFC for
vectorizing a call --- caller side and callee side)
https://reviews.llvm.org/D22792 (M. Masten, Converting Function Vectorization to
Loop Vectorization)
https://reviews.llvm.org/D40575 (M. Masten, Caller side support for invoking
vector function from vector loop)

Related work of related work. Math lib vectorization using SVML.
http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html (M. Masten, RFC
for vector math lib call using Intel SVML)
https://reviews.llvm.org/D19544 (M. Masten, vector math lib call using Intel
SVML)
 
Summary:
--------
Summary of the current state of VPlan infrastructure project is presented, and
the remaining steps towards outer loop vectorization is listed. We are currently
at a point where we can slow down the refactoring effort for the purpose of
expediting the big functionality boost: outer loop vectorization ----- and by
doing so encourage more participation from the wider LLVM community in the
refactoring effort to expedite the overall transition to the VPlan framework.
Shortly, we will send out an RFC to solicit community feedback on our plan to
trade-off between 1) making concurrent progress on refactoring and outer loop
vectorization and 2) finish refactoring and then adding outer loop
vectorization.
Please stay tuned.
 
Thanks,
Hideki Saito

Renato Golin via llvm-dev

2017-Dec-06 09:37 UTC

head link

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Hi Hideki/Ayal/Gil et. al,

First of all, thank you very much for the (past, current and future)
efforts in the vectoriser. It's much appreciated!

On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> With the first patch, we introduced the concept of VPlan to LV and started
explicitly recording decisions like interleave memory access optimization and
serialization. In the first patch, we resisted introducing VPInstructions -----
and introduced VPRecipes instead, in an attempt to avoid duplicating
Instructions in the abstract HCFG Representation (i.e., abstract Instructions in
HCFG that is separate from incoming IR Instructions). As we moved on, it became
more and more apparent that we have a need to introduce new abstract
Instructions (see https://reviews.llvm.org/D38676 for more details)  which also
requires representation of new use-def relations that does not exist in incoming
IR Instructions. As a result, with the second patch, as part of explicitly
modeling masking in VPlan, we introduced VPInstruction, which is an abstraction
of IR Instruction.
This was expected, as we move into a radically different model. I
think the current approach to implement & refactor is a good one and
we must continue like that. Pushing for too many features will break
the compiler and too much refactoring will break the spirits of
everyone involved.

> Additional Work Needed to Handle Higher Complexity:
> ---------------------------------------------------
> * Construct VPlan near the beginning of LV (right after Legal or
Must-Vectorize directive check)
>
> Additional Work Needed for Outer Loop Auto-Vectorization:
> ---------------------------------------------------------
> * Legality check
> * Cost modeling (compare it to inner loop vectorization strategy in
apples-to-apples manner).
On these points, we may need to make it more clear what happens when.
There is an overall legality check, but there also may be
VPlan-specific legality issues (especially as we move to outer-loop
vectorisation) that will not be obvious before we create the VPlans.

I'm not too worried about illegal transformations made legal by VPlans
(for example Polyhedral or inner-loop LICM), but the other way round,
where we may break things outside a VPlan (for instance, A->C is legal
but A->B->C is not). I can't think of anything right now (why I used
"A" and "B"), but I'd welcome thoughts on the impact of
more complex
VPlans on the whole legality->cost->transform model.

> Summary of the current state of VPlan infrastructure project is presented,
and the remaining steps towards outer loop vectorization is listed. We are
currently at a point where we can slow down the refactoring effort for the
purpose of expediting the big functionality boost: outer loop vectorization
----- and by doing so encourage more participation from the wider LLVM community
in the refactoring effort to expedite the overall transition to the VPlan
framework.
Sounds like a plan!

cheers,
--renato

Florian Hahn via llvm-dev

2017-Dec-06 14:21 UTC

head link

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Hi,

On 06/12/2017 09:37, Renato Golin via llvm-dev wrote:> Hi Hideki/Ayal/Gil et. al,
> 
> First of all, thank you very much for the (past, current and future)
> efforts in the vectoriser. It's much appreciated!
> 
> 
> On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> With the first patch, we introduced the concept of VPlan to LV and
started explicitly recording decisions like interleave memory access
optimization and serialization. In the first patch, we resisted introducing
VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid
duplicating Instructions in the abstract HCFG Representation (i.e., abstract
Instructions in HCFG that is separate from incoming IR Instructions). As we
moved on, it became more and more apparent that we have a need to introduce new
abstract Instructions (see https://reviews.llvm.org/D38676 for more details) 
which also requires representation of new use-def relations that does not exist
in incoming IR Instructions. As a result, with the second patch, as part of
explicitly modeling masking in VPlan, we introduced VPInstruction, which is an
abstraction of IR Instruction.
> 
> This was expected, as we move into a radically different model. I
> think the current approach to implement & refactor is a good one and
> we must continue like that. Pushing for too many features will break
> the compiler and too much refactoring will break the spirits of
> everyone involved.
> 
> 
>> Additional Work Needed to Handle Higher Complexity:
>> ---------------------------------------------------
>> * Construct VPlan near the beginning of LV (right after Legal or
Must-Vectorize directive check)
>>
>> Additional Work Needed for Outer Loop Auto-Vectorization:
>> ---------------------------------------------------------
>> * Legality check
>> * Cost modeling (compare it to inner loop vectorization strategy in
apples-to-apples manner).
> 
> On these points, we may need to make it more clear what happens when.
> There is an overall legality check, but there also may be
> VPlan-specific legality issues (especially as we move to outer-loop
> vectorisation) that will not be obvious before we create the VPlans.
> 
> I'm not too worried about illegal transformations made legal by VPlans
> (for example Polyhedral or inner-loop LICM), but the other way round,
> where we may break things outside a VPlan (for instance, A->C is legal
> but A->B->C is not). I can't think of anything right now (why I
used
> "A" and "B"), but I'd welcome thoughts on the
impact of more complex
> VPlans on the whole legality->cost->transform model.
> 
> 
>> Summary of the current state of VPlan infrastructure project is
presented, and the remaining steps towards outer loop vectorization is listed.
We are currently at a point where we can slow down the refactoring effort for
the purpose of expediting the big functionality boost: outer loop vectorization
----- and by doing so encourage more participation from the wider LLVM community
in the refactoring effort to expedite the overall transition to the VPlan
framework.
>
That sounds like an excellent idea! Any concrete ideas/plans how people 
could get involved, besides doing reviews?

Cheers,
Florian

Saito, Hideki via llvm-dev

2017-Dec-06 20:10 UTC

head link

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

>This was expected, as we move into a radically different model. I think the
current approach to implement & refactor is a good one and we must continue
like that.
Outer loop vectorization implementation plan
(http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html) is also
like that. Since outer loop was never
supported in LV, we can safely do everything on VPlan infrastructure w/o causing
any regressions in functionality/performance. That should help everyone think
where will be the best places in VPlan to fit the (remaining) non-VPlan aspects
of LV.
>On these points, we may need to make it more clear what happens when.
>There is an overall legality check, but there also may be VPlan-specific
legality issues (especially as we move to outer-loop
>vectorisation) that will not be obvious before we create the VPlans.
>
>I'm not too worried about illegal transformations made legal by VPlans
(for example Polyhedral or inner-loop LICM), but the other way round, where we
may break things outside a VPlan (for instance, A->C is legal but
A->B->C is not). I can't >think of anything right now (why I used
"A" and "B"), but I'd welcome thoughts on the impact of
more complex VPlans on the whole legality->cost->transform model.
I'm not 100% sure if you and I are talking about the same thing here, but
one of the easy ways to transform a legal-to-vectorize loop into an
illegal-to-vectorize loop is THEN and ELSE flipping,
when THEN ==> ELSE forward dependence exists. After vectorization legality is
"ensured" (OpenMP simd is one like that, ensured by programmer before
clang parses the code), we can't flip THEN
and ELSE w/o making sure that dependence won't exist between THEN and ELSE
---- and this is relevant in inner loop vectorization scenario also. I think we
need to document
when/where different kinds of vectorization legality assurance happens and what
transformations can break that before actual vectorization transformation kicks
in.

Thanks,
Hideki

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: Wednesday, December 06, 2017 1:38 AM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are
currently, and what's ahead of us

Hi Hideki/Ayal/Gil et. al,

First of all, thank you very much for the (past, current and future) efforts in
the vectoriser. It's much appreciated!

On 6 December 2017 at 00:21, Saito, Hideki via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> With the first patch, we introduced the concept of VPlan to LV and started
explicitly recording decisions like interleave memory access optimization and
serialization. In the first patch, we resisted introducing VPInstructions -----
and introduced VPRecipes instead, in an attempt to avoid duplicating
Instructions in the abstract HCFG Representation (i.e., abstract Instructions in
HCFG that is separate from incoming IR Instructions). As we moved on, it became
more and more apparent that we have a need to introduce new abstract
Instructions (see https://reviews.llvm.org/D38676 for more details)  which also
requires representation of new use-def relations that does not exist in incoming
IR Instructions. As a result, with the second patch, as part of explicitly
modeling masking in VPlan, we introduced VPInstruction, which is an abstraction
of IR Instruction.
This was expected, as we move into a radically different model. I think the
current approach to implement & refactor is a good one and we must continue
like that. Pushing for too many features will break the compiler and too much
refactoring will break the spirits of everyone involved.

> Additional Work Needed to Handle Higher Complexity:
> ---------------------------------------------------
> * Construct VPlan near the beginning of LV (right after Legal or 
> Must-Vectorize directive check)
>
> Additional Work Needed for Outer Loop Auto-Vectorization:
> ---------------------------------------------------------
> * Legality check
> * Cost modeling (compare it to inner loop vectorization strategy in
apples-to-apples manner).
On these points, we may need to make it more clear what happens when.
There is an overall legality check, but there also may be VPlan-specific
legality issues (especially as we move to outer-loop
vectorisation) that will not be obvious before we create the VPlans.

I'm not too worried about illegal transformations made legal by VPlans (for
example Polyhedral or inner-loop LICM), but the other way round, where we may
break things outside a VPlan (for instance, A->C is legal but A->B->C
is not). I can't think of anything right now (why I used "A" and
"B"), but I'd welcome thoughts on the impact of more complex
VPlans on the whole legality->cost->transform model.

> Summary of the current state of VPlan infrastructure project is presented,
and the remaining steps towards outer loop vectorization is listed. We are
currently at a point where we can slow down the refactoring effort for the
purpose of expediting the big functionality boost: outer loop vectorization
----- and by doing so encourage more participation from the wider LLVM community
in the refactoring effort to expedite the overall transition to the VPlan
framework.
Sounds like a plan!

cheers,
--renato

Serge Preis via llvm-dev

2017-Dec-14 11:14 UTC

head link

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

<div>Hello,</div><div> </div><div>Just minor
comment.</div><div> </div><blockquote><div>*
Improve uniformity/divergence analysis  ----- Uniformity in innermost loop
vectorization is<br />   invariance. For outer loop vectorization, there
are uniform values that are not
invariant.</div></blockquote><div> </div><div>I
believe that uniformity/divergence analysis is one of key technologies for
efficient vectorization, so I appreciate you bringing this up and looking
forward to extensive and comprehensive framework
here.</div><div> </div><div>In fact there is uniformity
in inner loop vectorization that is not invariance. Expressions like a[i/16] are
uniform under certain conditions (namely i starts with 0 mod min(VL, 16), and 16
% VL == 0) while not invariant. It is unfortunate for many media codes operating
on blocks that loop vectorizer (at least in my experience) cannot detect and
harness this uniformity. I may even try to look into improving this if someone
give me pointers where to
start.</div><div> </div><div>Regards,</div><div>Serge
Preis</div><div> </div><div> </div><div> </div><div>06.12.2017,
07:22, "Saito, Hideki via llvm-dev"
<llvm-dev@lists.llvm.org>:</div><blockquote
type="cite"><p><br />Status Update on VPlan ---- where
we are currently, and what's ahead of us<br
/>==========================================================<br
/> <br />Goal:<br />-----<br />Extending Loop Vectorizer
(LV) such that it can handle outer loops, via uplifting its infrastructure with
VPlan.<br />The goal of this status update is to summarize the progress
and the future steps needed.<br /> <br />Background:<br
/>-----------<br />This is related to the VPlan infrastructure project
we started a while back, a project to extend the (inner loop vectorization
focused) Loop Vectorizer to support outer loop vectorization. VPlan is the
vectorization planner that records the decisions and candidate directions to
pursue in order to drive cost modeling and vector code generation. When it is
fully integrated into LV (i.e., at the end of this big project), VPlan will use
a Hierarchical-CFG (HCFG) and transform it starting from the abstraction of the
input IR to reflect current vectorization decisions being made. The HCFG
eventually becomes the abstraction of the output IR, and the vector code
generation is driven by this abstract representation.<br /> <br
/>Please refer to the following for more detailed background:<br
/> <br />RFCs<br />       <a
href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html">http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html</a>
(Extending LV to vectorize outerloops)<br />       <a
href="http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html">http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html</a> 
(Introducing VPlan to model the vectorized code and drive its
transformation)<br /> <br />"Extending LoopVectorizer:
OpenMP4.5 SIMD and Outer Loop Auto-Vectorization"  (Saito, et.al.)<br
/>2016 LLVM Developers' Meeting<br
/><span>https://www.youtube.com/watch?v=XXAvdUwO7kQ</span><br
/> <br />"Introducing VPlan to the LoopVectorizer"    
(Rapaport and Zaks)<br />2017 EuroLLVM Developers' Meeting<br
/><span>https://www.youtube.com/watch?v=IqzJRs6tb7Y</span><br
/>"Vectorizing Loops with VPlan - Current State and Next Steps"  
(Zaks and Rapaport)<br />2017 LLVM Developers' Meeting<br
/><span>https://www.youtube.com/watch?v=BjBSJFzYDVk</span><br
/> <br />Patches Committed:<br />------------------<br
/>Two big patches have been submitted/committed.<br /><a
href="https://reviews.llvm.org/D28975">https://reviews.llvm.org/D28975</a>
by Gil Rapaport. (Introducing VPlan to model the vectorized code and drive its
transformation)<br />     Has been broken down to a series of smaller
patches and went in. The last (re)commit of the series is<br />     <a
href="https://reviews.llvm.org/rL311849">https://reviews.llvm.org/rL311849</a><br
/><a
href="https://reviews.llvm.org/D38676">https://reviews.llvm.org/D38676</a>
by Gil Rapaport. (Modeling masking in VPlan, introducing VPInstructions)<br
/>     This is also being broken down to a series of smaller patches to
facilitate the review.<br />     Committed as <a
href="https://reviews.llvm.org/rL318645">https://reviews.llvm.org/rL318645</a><br
/> <br />Where We Are:<br />-------------<br />With the
first patch, we introduced the concept of VPlan to LV and started explicitly
recording decisions like interleave memory access optimization and
serialization. In the first patch, we resisted introducing VPInstructions -----
and introduced VPRecipes instead, in an attempt to avoid duplicating
Instructions in the abstract HCFG Representation (i.e., abstract Instructions in
HCFG that is separate from incoming IR Instructions). As we moved on, it became
more and more apparent that we have a need to introduce new abstract
Instructions (see <a
href="https://reviews.llvm.org/D38676">https://reviews.llvm.org/D38676</a>
for more details)  which also requires representation of new use-def relations
that does not exist in incoming IR Instructions. As a result, with the second
patch, as part of explicitly modeling masking in VPlan, we introduced
VPInstruction, which is an abstraction of IR Instruction.<br /> <br
/>All these, so far, are the refactoring of (still innermost loop
vectorization centric) Loop Vectorizer's existing functionality to
explicitly model what was implicitly handled before.<br /> <br
/>Future Refactoring Needed:<br />--------------------------<br
/>The following aspects of LV still need to be refactored into the VPlan
based representation. This list is non-exhaustive, but should give you a ball
park of the amount of work left here.<br />* Predication<br />* Cost
model<br />* Remainder Loop<br />* Runtime Guards<br />*
External Users<br />* Reduction Epilog<br />* Interleave
Grouping<br />* Sink Scalar Operands<br /> <br />Work Needed
for Simple Outer Loop Vectorization:<br
/>------------------------------------------------<br />* Improve
uniformity/divergence analysis  ----- Uniformity in innermost loop vectorization
is<br />   invariance. For outer loop vectorization, there are uniform
values that are not invariant.<br />* Better predication ---- Retaining
uniform backedge is a must-have. Retaining uniform forward<br />   branch
is good for inner loop vectorization as well.<br />* Masking on HCFG<br
/>* Code Generation driven by VPlan/HCFG<br /> <br />Additional
Work Needed to Handle Higher Complexity:<br
/>---------------------------------------------------<br />* Construct
VPlan near the beginning of LV (right after Legal or Must-Vectorize directive
check)<br />* VPlan to VPlan transform of divergent inner loop control
flow into uniform loop control<br />   flow + divergent acyclic control
flow (all vector elements has to iterate the same number of times)<br />*
Predication on the transformed VPlan.<br /> <br />Additional Work
Needed for Outer Loop Auto-Vectorization:<br
/>---------------------------------------------------------<br />*
Legality check<br />* Cost modeling (compare it to inner loop
vectorization strategy in apples-to-apples manner).<br /> <br
/>Other Enhancements (out of the scope of this doc):<br
/>--------------------------------------------------<br />* Remainder
Loop Vectorizaion<br />* SLP and LV in one Vectorizer<br />* Nested
Vectorization<br />* ...<br /><br />Related Work:<br
/>-------------<br />In the previous RFC, we went with the direction to
convert Function Vectorization into Loop Vectorization. When such a function has
a loop inside,<br />the loop vectorization needed in that scenario is
"outer loop vectorization".<br /><a
href="http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html">http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html</a>
(X. Tian, RFC for vectorizing a call --- caller side and callee side)<br
/><a
href="https://reviews.llvm.org/D22792">https://reviews.llvm.org/D22792</a>
(M. Masten, Converting Function Vectorization to Loop Vectorization)<br
/><a
href="https://reviews.llvm.org/D40575">https://reviews.llvm.org/D40575</a>
(M. Masten, Caller side support for invoking vector function from vector
loop)<br /><br />Related work of related work. Math lib
vectorization using SVML.<br /><a
href="http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html">http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html</a>
(M. Masten, RFC for vector math lib call using Intel SVML)<br /><a
href="https://reviews.llvm.org/D19544">https://reviews.llvm.org/D19544</a>
(M. Masten, vector math lib call using Intel SVML)<br /> <br
/>Summary:<br />--------<br />Summary of the current state of
VPlan infrastructure project is presented, and the remaining steps towards outer
loop vectorization is listed. We are currently at a point where we can slow down
the refactoring effort for the purpose of expediting the big functionality
boost: outer loop vectorization ----- and by doing so encourage more
participation from the wider LLVM community in the refactoring effort to
expedite the overall transition to the VPlan framework.<br />Shortly, we
will send out an RFC to solicit community feedback on our plan to trade-off
between 1) making concurrent progress on refactoring and outer loop
vectorization and 2) finish refactoring and then adding outer loop
vectorization.<br />Please stay tuned.<br /> <br
/>Thanks,<br />Hideki Saito<br /><br
/>_______________________________________________<br />LLVM Developers
mailing list<br /><a
href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br
/><a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></p></blockquote>

Saito, Hideki via llvm-dev

2017-Dec-14 23:08 UTC

head link

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

We are working with Univ. of Saarland folks for this aspect. What you wrote is
true (and you know I know that) ---- I just didn’t write too
much details in that one-liner explanation on why we need to work in that area,
as I expect Simon Moll (U. Saarland) to be sending in his
RFC on this topic in not too distant future. We think Divergence Analysis (DA)
code from Region Vectorizer (RV) project has good potential
for reuse in Outer Loop Vectorization project (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html), and good
divergence analysis should also help innermost loop vectorization (e.g.,
gather/scatter versus unit-stride).

I suggest first trying to get in touch with Simon if you are interested in this
aspect of vectorization to see what DA in RV already has. Let us
know if you are also interested in the outer loop vectorization. There are
plenty of things for everyone interested.

Thanks,
Hideki

From: Serge Preis [mailto:spreis at yandex-team.ru]
Sent: Thursday, December 14, 2017 3:15 AM
To: Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are
currently, and what's ahead of us

Hello,

Just minor comment.

* Improve uniformity/divergence analysis  ----- Uniformity in innermost loop
vectorization is
   invariance. For outer loop vectorization, there are uniform values that are
not invariant.

I believe that uniformity/divergence analysis is one of key technologies for
efficient vectorization, so I appreciate you bringing this up and looking
forward to extensive and comprehensive framework here.

In fact there is uniformity in inner loop vectorization that is not invariance.
Expressions like a[i/16] are uniform under certain conditions (namely i starts
with 0 mod min(VL, 16), and 16 % VL == 0) while not invariant. It is unfortunate
for many media codes operating on blocks that loop vectorizer (at least in my
experience) cannot detect and harness this uniformity. I may even try to look
into improving this if someone give me pointers where to start.

Regards,
Serge Preis



06.12.2017, 07:22, "Saito, Hideki via llvm-dev" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>:

Status Update on VPlan ---- where we are currently, and what's ahead of us
=========================================================
Goal:
-----
Extending Loop Vectorizer (LV) such that it can handle outer loops, via
uplifting its infrastructure with VPlan.
The goal of this status update is to summarize the progress and the future steps
needed.

Background:
-----------
This is related to the VPlan infrastructure project we started a while back, a
project to extend the (inner loop vectorization focused) Loop Vectorizer to
support outer loop vectorization. VPlan is the vectorization planner that
records the decisions and candidate directions to pursue in order to drive cost
modeling and vector code generation. When it is fully integrated into LV (i.e.,
at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and
transform it starting from the abstraction of the input IR to reflect current
vectorization decisions being made. The HCFG eventually becomes the abstraction
of the output IR, and the vector code generation is driven by this abstract
representation.

Please refer to the following for more detailed background:

RFCs
       http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html
(Extending LV to vectorize outerloops)
       http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html 
(Introducing VPlan to model the vectorized code and drive its transformation)

"Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
Auto-Vectorization"  (Saito, et.al.)
2016 LLVM Developers' Meeting
https://www.youtube.com/watch?v=XXAvdUwO7kQ

"Introducing VPlan to the LoopVectorizer"     (Rapaport and Zaks)
2017 EuroLLVM Developers' Meeting
https://www.youtube.com/watch?v=IqzJRs6tb7Y
"Vectorizing Loops with VPlan - Current State and Next Steps"   (Zaks
and Rapaport)
2017 LLVM Developers' Meeting
https://www.youtube.com/watch?v=BjBSJFzYDVk

Patches Committed:
------------------
Two big patches have been submitted/committed.
https://reviews.llvm.org/D28975 by Gil Rapaport. (Introducing VPlan to model the
vectorized code and drive its transformation)
     Has been broken down to a series of smaller patches and went in. The last
(re)commit of the series is
     https://reviews.llvm.org/rL311849
https://reviews.llvm.org/D38676 by Gil Rapaport. (Modeling masking in VPlan,
introducing VPInstructions)
     This is also being broken down to a series of smaller patches to facilitate
the review.
     Committed as https://reviews.llvm.org/rL318645

Where We Are:
-------------
With the first patch, we introduced the concept of VPlan to LV and started
explicitly recording decisions like interleave memory access optimization and
serialization. In the first patch, we resisted introducing VPInstructions -----
and introduced VPRecipes instead, in an attempt to avoid duplicating
Instructions in the abstract HCFG Representation (i.e., abstract Instructions in
HCFG that is separate from incoming IR Instructions). As we moved on, it became
more and more apparent that we have a need to introduce new abstract
Instructions (see https://reviews.llvm.org/D38676 for more details)  which also
requires representation of new use-def relations that does not exist in incoming
IR Instructions. As a result, with the second patch, as part of explicitly
modeling masking in VPlan, we introduced VPInstruction, which is an abstraction
of IR Instruction.

All these, so far, are the refactoring of (still innermost loop vectorization
centric) Loop Vectorizer's existing functionality to explicitly model what
was implicitly handled before.

Future Refactoring Needed:
--------------------------
The following aspects of LV still need to be refactored into the VPlan based
representation. This list is non-exhaustive, but should give you a ball park of
the amount of work left here.
* Predication
* Cost model
* Remainder Loop
* Runtime Guards
* External Users
* Reduction Epilog
* Interleave Grouping
* Sink Scalar Operands

Work Needed for Simple Outer Loop Vectorization:
------------------------------------------------
* Improve uniformity/divergence analysis  ----- Uniformity in innermost loop
vectorization is
   invariance. For outer loop vectorization, there are uniform values that are
not invariant.
* Better predication ---- Retaining uniform backedge is a must-have. Retaining
uniform forward
   branch is good for inner loop vectorization as well.
* Masking on HCFG
* Code Generation driven by VPlan/HCFG

Additional Work Needed to Handle Higher Complexity:
---------------------------------------------------
* Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize
directive check)
* VPlan to VPlan transform of divergent inner loop control flow into uniform
loop control
   flow + divergent acyclic control flow (all vector elements has to iterate the
same number of times)
* Predication on the transformed VPlan.

Additional Work Needed for Outer Loop Auto-Vectorization:
---------------------------------------------------------
* Legality check
* Cost modeling (compare it to inner loop vectorization strategy in
apples-to-apples manner).

Other Enhancements (out of the scope of this doc):
--------------------------------------------------
* Remainder Loop Vectorizaion
* SLP and LV in one Vectorizer
* Nested Vectorization
* ...

Related Work:
-------------
In the previous RFC, we went with the direction to convert Function
Vectorization into Loop Vectorization. When such a function has a loop inside,
the loop vectorization needed in that scenario is "outer loop
vectorization".
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html (X. Tian, RFC for
vectorizing a call --- caller side and callee side)
https://reviews.llvm.org/D22792 (M. Masten, Converting Function Vectorization to
Loop Vectorization)
https://reviews.llvm.org/D40575 (M. Masten, Caller side support for invoking
vector function from vector loop)

Related work of related work. Math lib vectorization using SVML.
http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html (M. Masten, RFC
for vector math lib call using Intel SVML)
https://reviews.llvm.org/D19544 (M. Masten, vector math lib call using Intel
SVML)

Summary:
--------
Summary of the current state of VPlan infrastructure project is presented, and
the remaining steps towards outer loop vectorization is listed. We are currently
at a point where we can slow down the refactoring effort for the purpose of
expediting the big functionality boost: outer loop vectorization ----- and by
doing so encourage more participation from the wider LLVM community in the
refactoring effort to expedite the overall transition to the VPlan framework.
Shortly, we will send out an RFC to solicit community feedback on our plan to
trade-off between 1) making concurrent progress on refactoring and outer loop
vectorization and 2) finish refactoring and then adding outer loop
vectorization.
Please stay tuned.

Thanks,
Hideki Saito

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171214/2d5dbd9f/attachment-0001.html>

llvm dev - Dec 2017 - [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us