thr3ads.net - llvm dev - [llvm-dev] Vectorizing multiple exit loops [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2019-Sep-18 23:19 UTC

[llvm-dev] Vectorizing multiple exit loops

On 9/17/2019 3:17 AM, Renato Golin wrote:> Hi Philip,
>
> Apologies for leaving this thread linger so long. It was in my
> back-burner but Alex's weekly remind me to reply (thanks again,
> Alex!). Starting from the end...
>
> On Mon, 9 Sep 2019 at 18:53, Philip Reames via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Current Plans
>>
>> At the current moment, I'm reasonable sure that I'm going to
get the resources to at least tackle some of the cases where we bail out
unnecessarily.  This will be a huge practical improvement in vectorizing
robustness, at (I hope) relatively little implementation cost.
> This makes a lot of sense to me. It doesn't sound it will need a huge
> refactoring to the current code and it can be done in a piece-wise way
> so that progress is consistent and non-regressable.
>
> It would be good to know if any of the test-suite benchmarks will be
> affected, or if we could insert a new one there, just to keep track of
> things.
>
>> I'm also going to continue playing around with enhancements to our
current dereferenceability logic.  I see that as being a key building block to
make any predication based approach practical.
> This is *always* good work. There has been some work in the past to
> common up LV with VPLan analysis stages (separating classes, moving
> code), and I think if we keep working on the common infrastructure,
> we'll benefit more than just LV or VPlan.
>
>
>> I'm not sure I'm going to get to the predication support. 
I'd like to, but am not sure my resource constraints allow it.  I'll
also mention that I'm not at all sure how all of this might fit in with the
VPLAN work.  I'd really welcome feedback on that; is what I'm proposing
at all consistent with others plans?
> As I said, the analysis work can play well with VPlan, and if it
> doesn't, we can make it work by following the same cleanups that has
> been done in the past. These are complementary parts of the work and
> it's great you can spend some time on some of it.
>
>
>
>> The use cases I'm looking at basically fall into two buckets:
>>
>> for (int i = 0; i < N; i++) {
>>    if (expr(a[i])) break;
>>    ... other vectorizable stuff ...
>> }
>>
>> for (int i = 0; i < N; i++) {
>>    if (expr(i)) break;
>>    ... other vectorizable stuff ...
>> }
> The safety analysis is different, but the vectorizer would have to
> behave similarly on both cases. Specilative loads, boundary
> conditions, run time checks (unless the bounds are compile time
> constants?).
>
> Doing the latter first would make the second almost *only* about
> reference analysis.
>
>
>> First, there are all the cases where we could have handled the multiple
exit loop, but chose not to for implementation reasons.  A great example is:
>>
>> for (int i = 0; i < N; i++) {
>>    if (i > M) break;
>>    a[i] = i;
>> }
>>
>> In this case, SCEV can happily compute the actual exit bound of the
loop, and we could use the existing vector-body/scalar slow-path structure.  The
only change needed would be to exit the vector body earlier.  (There are some
details here, but it's almost as easy as I'm describing if my POC patch
isn't missing something major.)
> If M < N, then this could potentially be converted to iterate over M
> and make the loop unconditional, no? I know this is a silly case, but
> if a more complex loop simplifies to this one by other optimisations,
> this would be a very easy change to make.If M < N (provably), SCEV would return an exit count for the loop which 
reflects this.  If not, we'd get umin(M,N).  We can still generate the 
vector body at the cost of inserting the umin computation above the loop 
body.  Both cases can be handled by running the vector loop up to the 
minimum trip count (well, one less to handle mid-loop exits and side 
effects).  The M < N case is falls out of the more general
one.>
> We could also look at it from a loop-splitting point of view, for
> example the common pattern:
>
> for (i: M) {
>    if (a == 0)
>      startup(i);
>    else
>      remainder(i);
> }Iteration spaces splitting is a separate optimization.  IRCE implements 
a form of this.>
> Or more complicated examples, like matrix operations that are
"easy"
> in the bulk of it, but complicated (non-vectorizeable) at the edges.
> This would probably be something for the VPlan side of things, but
> it's good to keep in mind that a simple example can get very
> interesting. :)I don't know of any plans to incorporate iteration space splitting 
w/VPLan.  I don't have any plans to go that far; if I had such test 
cases - I don't - I'd want to start with separately factored transforms 
if we could.  Doing everything within one transform is undesirable. 
:)>
>
>> // Assume a and b are both statically sized.
>> for (int i = 0; i < N; i++) {
>>    t = b[i];
>>    if (t > M) throw();
>>    sum += a[t];
>> }
>>
>> (This is a classic a[b[i]] pattern, but with range checks added.)
> This looks like a clear pattern for predicated semantics. :)
>
> In the absence of that, if the condition is simple, we can try
> splitting or tail loops, but only predication would give us an extra
> mile on preventing speculative loads to crash.
>
>
>> // assume a[0:N] is known dereferenceable
>> for (int i = 0; i < N; i++)
>>    if (a[i] == 0) return false;
>> return true;
>>
>> If N is large, and the array is non-zero, then this is profitable to
vectorize.  If a[0] == 0, then it isn't, regardless of the value of N.
>>
>> Figuring out when to vectorize vs not for cases like this will require
some thought.  I don't really have a good answer for this yet, other than
when the profile on the early exit tells us it's rarely taken.
> We can already bail on small constant N, and generally, code that
> assumes it's small, usually use it as a constant (like RGB or xyz
> handling). In those cases, it may also be profitable to do strided
> vectorisation (which we do).
>
>
>> Third, for both of the former cases, we need to be able to compute exit
values along the early exit edges.  We can get a lot of mileage out of simply
skipping loops with exit values (i.e. lcssa phis), as our loop exit value
rewriting will tend to eliminate them.  However, we will eventually want to
handle this case, which will require generating some interesting complicated
code down the exit path to figure out which iteration actually exit.
> I think getting a real case in the test-suite and make that work would
> make a lot of people happy. :)I wish.  Unfortunately, my "real test case" is a java benchmark.   I 
doubt I'll be able to get it into the test suite. 
:(>
>
>> 1) Use the vector-body/scalar-body idiom of today, and have the vector
body exit with IV = I when any iteration in [I, I+VF-1] would exit.  (i.e. roll
back)
>>
>> 2) Insert dedicated exit blocks which recompute exit conditions to
determine exact exit value, and then let the vector body run all iterations in
VF which contain the exiting iteration.  (Assumes predicated stores, and that
the exit blocks skip the scalar loop)
> The second option sounds easier to reason with, but it could also
> generate more clutter for other passes to get lost in. I don't have a
> strong opinion either, but I would like to limit the damage of the
> current vectoriser (to other passes, or itself), if we are to move to
> VPlan any time soon.
>
> Thanks for working on this, I think a lot of good stuff will come out of
it! :)
>
> cheers,
> --renato

Renato Golin via llvm-dev

2019-Sep-19 10:16 UTC

head link

[llvm-dev] Vectorizing multiple exit loops

On Thu, 19 Sep 2019 at 00:19, Philip Reames <listmail at philipreames.com>
wrote:> If M < N (provably), SCEV would return an exit count for the loop which
> reflects this.  If not, we'd get umin(M,N).  We can still generate the
> vector body at the cost of inserting the umin computation above the loop
> body.  Both cases can be handled by running the vector loop up to the
> minimum trip count (well, one less to handle mid-loop exits and side
> effects).  The M < N case is falls out of the more general one.
Yup.
> I don't know of any plans to incorporate iteration space splitting
> w/VPLan.  I don't have any plans to go that far; if I had such test
> cases - I don't - I'd want to start with separately factored
transforms
> if we could.  Doing everything within one transform is undesirable.  :)
AFAIK this isn't strictly in the immediate plans, but loop splitting
was one of the aims for doing outer-loop vectorisation.

My point was that if we want to do that, vplan would be a good place,
because we could add this as its own plan, which would expose other
vectorisation opportunities (including fusion with other loops).
> > I think getting a real case in the test-suite and make that work would
> > make a lot of people happy. :)
> I wish.  Unfortunately, my "real test case" is a java benchmark. 
I
> doubt I'll be able to get it into the test suite.  :(
Hm, not likely. Getting those loops as IR in the lit tests would
probably be enough, if these transformations don't affect anything in
the test-suite.

Zaks, Ayal (Mobileye) via llvm-dev

2019-Sep-30 19:59 UTC

head link

[llvm-dev] Vectorizing multiple exit loops

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Renato
> Golin via llvm-dev
> Sent: Thursday, September 19, 2019 13:16
> To: Philip Reames <listmail at philipreames.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Vectorizing multiple exit loops
> 
> On Thu, 19 Sep 2019 at 00:19, Philip Reames <listmail at
philipreames.com>
> wrote:
> > If M < N (provably), SCEV would return an exit count for the loop
> > which reflects this.  If not, we'd get umin(M,N).  We can still
> > generate the vector body at the cost of inserting the umin computation
> > above the loop body.  Both cases can be handled by running the vector
> > loop up to the minimum trip count (well, one less to handle mid-loop
> > exits and side effects).  The M < N case is falls out of the more
general one.
> 
> Yup.
> 
> > I don't know of any plans to incorporate iteration space splitting
> > w/VPLan.  I don't have any plans to go that far; if I had such
test
> > cases - I don't - I'd want to start with separately factored
> > transforms if we could.  Doing everything within one transform is
> > undesirable.  :)
> 
> AFAIK this isn't strictly in the immediate plans, but loop splitting
was one of
> the aims for doing outer-loop vectorisation.
The aim being to vectorize an outer-loop when an inner-loop is difficult /
requires loop splitting to vectorize?
> My point was that if we want to do that, vplan would be a good place,
because
> we could add this as its own plan, which would expose other vectorisation
> opportunities (including fusion with other loops).
VPlan is designed to support the tentative decisions and alternatives when
vectorizing. E.g., when different types of instructions are to be generated for
different VF's. Loop index splitting, and turning a multiple exit loop into
a (countable) single exit loop, seem like preparatory transformations that can
enable vectorization and/or interleaving for any VF/UF, similar to loop
distribution.

VPlan currently models the instructions inside the vector loop. If a
"while" or multiple exit loop has stores above the "break
point", it may be possible to sink them in order to avoid issuing
speculative stores. LV applies similar "SinkAfter" code motion when
needed to facilitate 1st order recurrence. Loads may indeed be handled
speculatively by dereferencability, which may in turn require peeling the first
iterations to reach an  aligned address. Such peeling may be folded into the
vector loop by masking, if desired, analogous to LV's foldTailByMasking.

Perhaps a simplest example is an str[n]len() loop. "Exploiting the AltiVec
Unit for Commercial Applications", CAECW-9, 2006 demonstrated the speedups
and slowdowns of vectorizing it on PowerPC970.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

llvm dev - Sep 2019 - Vectorizing multiple exit loops

[llvm-dev] Vectorizing multiple exit loops

[llvm-dev] Vectorizing multiple exit loops

[llvm-dev] Vectorizing multiple exit loops