Nemanja Ivanovic via llvm-dev
2019-Mar-22 18:50 UTC
[llvm-dev] Is SCEV unable to determine obvious max trip count?
Given a simple test where the maximum trip count is quite obvious, we end up producing a vectorized loop (or excessively complex control flow) seemingly because we are unable to determine the maximum trip count of the loop. Here's an example: #ifndef _START #define _START 3 #endif unsigned long test(unsigned long *a, unsigned long b) { for (unsigned long i = _START; i < (b & 7); i++) a[i] = b; return b; } If we compile that with -D_START=0, the loop is fully unrolled with a comparison and exit if the result of the and is equal to any of the 7 possible values at each unrolled iteration. However, if we do not have that define (i.e. loop doesn't start from zero), we will vectorize the loop. There's even a "min.iters" check that is very obviously redundant: %and = and i64 %b, 7 ... %0 = add nsw i64 %and, -3 %min.iters.check = icmp ult i64 %0, 24 The constant here will be different on different targets, but redundant nonetheless. Does anyone think this is important to fix this or have any ideas on how to improve the handling here? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190322/0f4592de/attachment.html>
Eli Friedman via llvm-dev
2019-Mar-25 18:44 UTC
[llvm-dev] [EXT] Is SCEV unable to determine obvious max trip count?
If you look at the SCEV output (opt -analyze -scalar-evolution) on the optimized loop without vectorization/unrolling, it looks something like this: Determining loop execution counts for: @test Loop %for.body: backedge-taken count is (-4 + (zext i3 (trunc i64 %b to i3) to i64))<nsw> Loop %for.body: max backedge-taken count is -2 Loop %for.body: Predicated backedge-taken count is (-4 + (zext i3 (trunc i64 %b to i3) to i64))<nsw> If you exclude the guard outside the loop, the max backedge-taken count is completely unknown: the trip count can be -1UL, -2UL, -3UL, or -4UL (i.e. numbers close to ULONG_MAX). But there’s important information outside the loop: there’s a guard “icmp ugt i64 %and, 3”, which excludes all those cases. SCEV currently has some very limited support for reducing the max backedge-taken count based on conditions outside the loop; see ScalarEvolution::howFarToZero, in particular, the bit that calls isLoopEntryGuardedByCond. But that support isn’t very complete; it could definitely be improved. As far as I know, nobody is working on this at the moment. -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Nemanja Ivanovic via llvm-dev Sent: Friday, March 22, 2019 11:51 AM To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [EXT] [llvm-dev] Is SCEV unable to determine obvious max trip count? Given a simple test where the maximum trip count is quite obvious, we end up producing a vectorized loop (or excessively complex control flow) seemingly because we are unable to determine the maximum trip count of the loop. Here's an example: #ifndef _START #define _START 3 #endif unsigned long test(unsigned long *a, unsigned long b) { for (unsigned long i = _START; i < (b & 7); i++) a[i] = b; return b; } If we compile that with -D_START=0, the loop is fully unrolled with a comparison and exit if the result of the and is equal to any of the 7 possible values at each unrolled iteration. However, if we do not have that define (i.e. loop doesn't start from zero), we will vectorize the loop. There's even a "min.iters" check that is very obviously redundant: %and = and i64 %b, 7 ... %0 = add nsw i64 %and, -3 %min.iters.check = icmp ult i64 %0, 24 The constant here will be different on different targets, but redundant nonetheless. Does anyone think this is important to fix this or have any ideas on how to improve the handling here? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190325/170a2a13/attachment.html>