Adhemerval Zanella via llvm-dev
2018-Oct-02 18:00 UTC
[llvm-dev] IC value with reduction on LoopVectorizationCostModel::selectInterleaveCount
Hi all, I am trying to understand in which basis the loop-vectorizer optimization optimal Interleave Count (IC) with reductions was added: lib/Transforms/Vectorize/LoopVectorize.cpp: 4773 // Interleave if we vectorized this loop and there is a reduction that could 4774 // benefit from interleaving. 4775 if (VF > 1 && !Legal->getReductionVars()->empty()) { 4776 LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n"); 4777 return IC; 4778 } The IC in this context will be within [1, MaxInterleaveCount] and MaxInterleaveCount will be set based target defaults. The issue is for unbounded loops (when trip count can't be infered) where vectorization is beneficial even for small element count, the loop-vectorization will use the architecture defined IC. And if the arch-defined IC is higher than 2 the vectorization code path won't be used element count less than IC*VF. For instance the code snippet: --- #include <float.h> struct vec { float x, y; }; struct polyshape { int count; struct vec v[8]; }; float foo (const polyshape *poly, float x1, float x2, float y1, float y2) { float si = FLT_MAX; for (int j=0; j<poly->count; j++) { float sij = x1 * (poly->v[j].x + x2) + y1 * (poly->v[j].y - y2); if (sij < si) si = sij; } return si; } --- When building for aarch64-linux-gnu (which has default IC for 2) loop-vectorizer debug will show: LV: Interleaving because of reductions. LV: Found a vectorizable loop (4) in test.cc LV: Interleave Count is 2 Setting best plan to VF=4, UF=2 LV: Interleaving disabled by the pass manager And then the vectorized code path won't be used for polyshape->count between 4 and 8. Is this optimization for reduced case indeed beneficial for all cases? Can't the rest of LoopVectorizationCostModel::selectInterleaveCount infer a better IC for reduction cases?