thr3ads.net - llvm dev - [LLVMdev] Loop vectorizer preamble [Aug 2014]

If this information is useful, please help other people find it:
Share via:

James Molloy

2014-Aug-30 11:47 UTC

[LLVMdev] Loop vectorizer preamble

Hi Arnold, Nadav,

I've been taking a look at the preamble and bailout tests created by the
loop vectorizer, and I can't help but feel it could be rather conservative.
I'm not a vectorization expert, so I apologise in advance if say something
obviously wrong...

I'm looking in particular at the overflow check and the trip count
computation. From my reading, it goes something like:

  take the backedge taken count and add one -> Count
  emit code to check Count didn't overflow
  // pointer aliasing checks, if any
  calculate vector trip count = Count - (Count % Step)

It seems to me that there should be cases when we don't need to check for
overflow. In a well-formed loop, which this should be at this point, there
is an increment of the indvar before the backedge. If this increment is
marked 'nuw', we should be guaranteed that we don't get an overflow
when
calculating numBackedges + 1.

Also, many many loops don't have a single point-test for exit (x != 0).
Instead, they have a greater-than or less-than condition. If this is the
case, we should be able to elide all of our logic with Count and just count
down until the test is broken. For example:

for (i = 0; i < n; ++i)
  ...

->

count = 0
loop:
  ...
  count += VF * UF
  if count >= n goto scalar_loop else goto loop

This could remove a lot of overflow checks and "urem"s from real code.

Also, we don't currently coalesce overflow checks, vector memchecks and
trip count computation for adjacent and similar loops. For example:

  for (i = 0; i < n/2; ++i)
    p[i] = q[i+1];
  for (i = n/2; i < n; ++i)
    p[i] = q[i-1];

Really, these two loops should share a common preamble which checks between
the range [0, n). Now, they have two preambles, one checking [0, n/2) and
the other [n/2, n).

Sorry for the braindump, and for probably missing many issues!

Cheers,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/bba752a5/attachment.html>

Arnold Schwaighofer

2014-Aug-30 17:08 UTC

head link

[LLVMdev] Loop vectorizer preamble

Hi James,

thank you for looking into this!
> On Aug 30, 2014, at 4:47 AM, James Molloy <james at
jamesmolloy.co.uk> wrote:
> 
> Hi Arnold, Nadav,
> 
> I've been taking a look at the preamble and bailout tests created by
the loop vectorizer, and I can't help but feel it could be rather
conservative. I'm not a vectorization expert, so I apologise in advance if
say something obviously wrong...
> 
> I'm looking in particular at the overflow check and the trip count
computation. From my reading, it goes something like:
> 
>   take the backedge taken count and add one -> Count
>   emit code to check Count didn't overflow
>   // pointer aliasing checks, if any
>   calculate vector trip count = Count - (Count % Step)
> 
> It seems to me that there should be cases when we don't need to check
for overflow. In a well-formed loop, which this should be at this point, there
is an increment of the indvar before the backedge. If this increment is marked
'nuw', we should be guaranteed that we don't get an overflow when
calculating numBackedges + 1.
> 

Lets say we have a loop that iterates 256 times (if n == 0):

unsigned char i = n;
do {

  —i;
} while (i);


We need to guard against such cases (see also PR17288).

However, I agree that by looking at the IR (in addition to the SCEV expression
for the back-edge)  we should be able to do better in some cases.

> Also, many many loops don't have a single point-test for exit (x != 0).
Instead, they have a greater-than or less-than condition. If this is the case,
we should be able to elide all of our logic with Count and just count down until
the test is broken. For example:
> 
> for (i = 0; i < n; ++i)
>   ...
> 
> ->
> 
> count = 0
> loop:
>   ...
>   count += VF * UF
>   if count >= n goto scalar_loop else goto loop
> 
> This could remove a lot of overflow checks and "urem"s from real
code.
This would overflow for say n = 255, VF = 2, char ?

start = 0

loop:
  i = phi (start, next)
  next = + (i, VF)
  stop = >= (n, next)
  br stop, exit, loop

exit

> Also, we don't currently coalesce overflow checks, vector memchecks and
trip count computation for adjacent and similar loops. For example:
> 
>   for (i = 0; i < n/2; ++i)
>     p[i] = q[i+1];
>   for (i = n/2; i < n; ++i)
>     p[i] = q[i-1];
> 
> Really, these two loops should share a common preamble which checks between
the range [0, n). Now, they have two preambles, one checking [0, n/2) and the
other [n/2, n).

You are right currently we only look at one loop at a time.

James Molloy

2014-Aug-30 18:44 UTC

head link

[LLVMdev] Loop vectorizer preamble

Hi Arnold,

Thanks for the reply. Taking your example:

This would overflow for say n = 255, VF = 2, char ?> start = 0
> loop:
>   i = phi (start, next)
>   next = + (i, VF)
>   stop = >= (n, next)
>   br stop, exit, loop
> exit

Why do we have to use i8 for the induction variable type? If the original
add is no-unsigned-wrap, then we could safely extend the induction variable
to i32/i64 and have an accurate >= comparison.

We couldn't do this with an i64 type because there'd be nothing to
extend
it to, but in practice people still write loops with 32-bit variables a lot
of the time.

Cheers,

James


On 30 August 2014 18:08, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> Hi James,
>
> thank you for looking into this!
>
> > On Aug 30, 2014, at 4:47 AM, James Molloy <james at
jamesmolloy.co.uk>
> wrote:
> >
> > Hi Arnold, Nadav,
> >
> > I've been taking a look at the preamble and bailout tests created
by the
> loop vectorizer, and I can't help but feel it could be rather
conservative.
> I'm not a vectorization expert, so I apologise in advance if say
something
> obviously wrong...
> >
> > I'm looking in particular at the overflow check and the trip count
> computation. From my reading, it goes something like:
> >
> >   take the backedge taken count and add one -> Count
> >   emit code to check Count didn't overflow
> >   // pointer aliasing checks, if any
> >   calculate vector trip count = Count - (Count % Step)
> >
> > It seems to me that there should be cases when we don't need to
check
> for overflow. In a well-formed loop, which this should be at this point,
> there is an increment of the indvar before the backedge. If this increment
> is marked 'nuw', we should be guaranteed that we don't get an
overflow when
> calculating numBackedges + 1.
> >
>
>
> Lets say we have a loop that iterates 256 times (if n == 0):
>
> unsigned char i = n;
> do {
>
>   —i;
> } while (i);
>
>
> We need to guard against such cases (see also PR17288).
>
> However, I agree that by looking at the IR (in addition to the SCEV
> expression for the back-edge)  we should be able to do better in some
cases.
>
>
> > Also, many many loops don't have a single point-test for exit (x
!= 0).
> Instead, they have a greater-than or less-than condition. If this is the
> case, we should be able to elide all of our logic with Count and just count
> down until the test is broken. For example:
> >
> > for (i = 0; i < n; ++i)
> >   ...
> >
> > ->
> >
> > count = 0
> > loop:
> >   ...
> >   count += VF * UF
> >   if count >= n goto scalar_loop else goto loop
> >
> > This could remove a lot of overflow checks and "urem"s from
real code.
>
> This would overflow for say n = 255, VF = 2, char ?
>
> start = 0
>
> loop:
>   i = phi (start, next)
>   next = + (i, VF)
>   stop = >= (n, next)
>   br stop, exit, loop
>
> exit
>
>
> > Also, we don't currently coalesce overflow checks, vector
memchecks and
> trip count computation for adjacent and similar loops. For example:
> >
> >   for (i = 0; i < n/2; ++i)
> >     p[i] = q[i+1];
> >   for (i = n/2; i < n; ++i)
> >     p[i] = q[i-1];
> >
> > Really, these two loops should share a common preamble which checks
> between the range [0, n). Now, they have two preambles, one checking [0,
> n/2) and the other [n/2, n).
>
>
> You are right currently we only look at one loop at a time.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/1681cdc9/attachment.html>

llvm dev - Aug 2014 - [LLVMdev] Loop vectorizer preamble

[LLVMdev] Loop vectorizer preamble

[LLVMdev] Loop vectorizer preamble

[LLVMdev] Loop vectorizer preamble