Hi Arnold, Nadav, I've been taking a look at the preamble and bailout tests created by the loop vectorizer, and I can't help but feel it could be rather conservative. I'm not a vectorization expert, so I apologise in advance if say something obviously wrong... I'm looking in particular at the overflow check and the trip count computation. From my reading, it goes something like: take the backedge taken count and add one -> Count emit code to check Count didn't overflow // pointer aliasing checks, if any calculate vector trip count = Count - (Count % Step) It seems to me that there should be cases when we don't need to check for overflow. In a well-formed loop, which this should be at this point, there is an increment of the indvar before the backedge. If this increment is marked 'nuw', we should be guaranteed that we don't get an overflow when calculating numBackedges + 1. Also, many many loops don't have a single point-test for exit (x != 0). Instead, they have a greater-than or less-than condition. If this is the case, we should be able to elide all of our logic with Count and just count down until the test is broken. For example: for (i = 0; i < n; ++i) ... -> count = 0 loop: ... count += VF * UF if count >= n goto scalar_loop else goto loop This could remove a lot of overflow checks and "urem"s from real code. Also, we don't currently coalesce overflow checks, vector memchecks and trip count computation for adjacent and similar loops. For example: for (i = 0; i < n/2; ++i) p[i] = q[i+1]; for (i = n/2; i < n; ++i) p[i] = q[i-1]; Really, these two loops should share a common preamble which checks between the range [0, n). Now, they have two preambles, one checking [0, n/2) and the other [n/2, n). Sorry for the braindump, and for probably missing many issues! Cheers, James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/bba752a5/attachment.html>
Hi James, thank you for looking into this!> On Aug 30, 2014, at 4:47 AM, James Molloy <james at jamesmolloy.co.uk> wrote: > > Hi Arnold, Nadav, > > I've been taking a look at the preamble and bailout tests created by the loop vectorizer, and I can't help but feel it could be rather conservative. I'm not a vectorization expert, so I apologise in advance if say something obviously wrong... > > I'm looking in particular at the overflow check and the trip count computation. From my reading, it goes something like: > > take the backedge taken count and add one -> Count > emit code to check Count didn't overflow > // pointer aliasing checks, if any > calculate vector trip count = Count - (Count % Step) > > It seems to me that there should be cases when we don't need to check for overflow. In a well-formed loop, which this should be at this point, there is an increment of the indvar before the backedge. If this increment is marked 'nuw', we should be guaranteed that we don't get an overflow when calculating numBackedges + 1. >Lets say we have a loop that iterates 256 times (if n == 0): unsigned char i = n; do { —i; } while (i); We need to guard against such cases (see also PR17288). However, I agree that by looking at the IR (in addition to the SCEV expression for the back-edge) we should be able to do better in some cases.> Also, many many loops don't have a single point-test for exit (x != 0). Instead, they have a greater-than or less-than condition. If this is the case, we should be able to elide all of our logic with Count and just count down until the test is broken. For example: > > for (i = 0; i < n; ++i) > ... > > -> > > count = 0 > loop: > ... > count += VF * UF > if count >= n goto scalar_loop else goto loop > > This could remove a lot of overflow checks and "urem"s from real code.This would overflow for say n = 255, VF = 2, char ? start = 0 loop: i = phi (start, next) next = + (i, VF) stop = >= (n, next) br stop, exit, loop exit> Also, we don't currently coalesce overflow checks, vector memchecks and trip count computation for adjacent and similar loops. For example: > > for (i = 0; i < n/2; ++i) > p[i] = q[i+1]; > for (i = n/2; i < n; ++i) > p[i] = q[i-1]; > > Really, these two loops should share a common preamble which checks between the range [0, n). Now, they have two preambles, one checking [0, n/2) and the other [n/2, n).You are right currently we only look at one loop at a time.
Hi Arnold, Thanks for the reply. Taking your example: This would overflow for say n = 255, VF = 2, char ?> start = 0 > loop: > i = phi (start, next) > next = + (i, VF) > stop = >= (n, next) > br stop, exit, loop > exitWhy do we have to use i8 for the induction variable type? If the original add is no-unsigned-wrap, then we could safely extend the induction variable to i32/i64 and have an accurate >= comparison. We couldn't do this with an i64 type because there'd be nothing to extend it to, but in practice people still write loops with 32-bit variables a lot of the time. Cheers, James On 30 August 2014 18:08, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:> Hi James, > > thank you for looking into this! > > > On Aug 30, 2014, at 4:47 AM, James Molloy <james at jamesmolloy.co.uk> > wrote: > > > > Hi Arnold, Nadav, > > > > I've been taking a look at the preamble and bailout tests created by the > loop vectorizer, and I can't help but feel it could be rather conservative. > I'm not a vectorization expert, so I apologise in advance if say something > obviously wrong... > > > > I'm looking in particular at the overflow check and the trip count > computation. From my reading, it goes something like: > > > > take the backedge taken count and add one -> Count > > emit code to check Count didn't overflow > > // pointer aliasing checks, if any > > calculate vector trip count = Count - (Count % Step) > > > > It seems to me that there should be cases when we don't need to check > for overflow. In a well-formed loop, which this should be at this point, > there is an increment of the indvar before the backedge. If this increment > is marked 'nuw', we should be guaranteed that we don't get an overflow when > calculating numBackedges + 1. > > > > > Lets say we have a loop that iterates 256 times (if n == 0): > > unsigned char i = n; > do { > > —i; > } while (i); > > > We need to guard against such cases (see also PR17288). > > However, I agree that by looking at the IR (in addition to the SCEV > expression for the back-edge) we should be able to do better in some cases. > > > > Also, many many loops don't have a single point-test for exit (x != 0). > Instead, they have a greater-than or less-than condition. If this is the > case, we should be able to elide all of our logic with Count and just count > down until the test is broken. For example: > > > > for (i = 0; i < n; ++i) > > ... > > > > -> > > > > count = 0 > > loop: > > ... > > count += VF * UF > > if count >= n goto scalar_loop else goto loop > > > > This could remove a lot of overflow checks and "urem"s from real code. > > This would overflow for say n = 255, VF = 2, char ? > > start = 0 > > loop: > i = phi (start, next) > next = + (i, VF) > stop = >= (n, next) > br stop, exit, loop > > exit > > > > Also, we don't currently coalesce overflow checks, vector memchecks and > trip count computation for adjacent and similar loops. For example: > > > > for (i = 0; i < n/2; ++i) > > p[i] = q[i+1]; > > for (i = n/2; i < n; ++i) > > p[i] = q[i-1]; > > > > Really, these two loops should share a common preamble which checks > between the range [0, n). Now, they have two preambles, one checking [0, > n/2) and the other [n/2, n). > > > You are right currently we only look at one loop at a time. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/1681cdc9/attachment.html>