Oleg Ranevskyy via llvm-dev
2016-Apr-20 23:10 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi, Would you be able to kindly check and assist with the IndVarSimplify / SCEV problem I got in the latest LLVM, please? Sometimes IndVarSimplify may not eliminate narrow IV's when there actually exists such a possibility. This may affect other LLVM passes and result in inefficient code. The reproducing test 'indvar_test.cpp' is attached. The problem is with the second 'for' loop that accesses array elements with different indexes on each iteration. The latest LLVM fails to reuse array element values from previous iterations and generates an unnecessary GEP. The generated IR is shown in the attached file 'bad.ir'. This happens because IndVarSimplify fails to eliminate '%idxprom7' and '%idxprom10'. The clang command line we use: clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf indvar_test.cpp -o bad.ir I found that 'WidenIV::widenIVUse' (IndVarSimplify.cpp) may fail to widen narrow IV uses. When the function gets a NarrowUse such as '{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>', it first tries to get a wide recurrence for it via the 'getWideRecurrence' call. 'getWideRecurrence' returns recurrence like this: '{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>', which is fine by itself. Then a wide use operation is generated by 'cloneIVUser'. The generated wide use is evaluated to '{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>', which is different from 'getWideRecurrence' result (please note the position of -2). 'cloneIVUser' sees the difference and returns nullptr. I attached a test patch 'indvar.patch', which is not correct for all cases, but it fixes the specific 'indvar_test.cpp' scenario to demonstrate the efficient code that could have been generated (good.ir). It transforms expressions like '(sext i32 (-2 + %inc.lcssa) to i64)' into '-2 + (sext i32 %inc.lcssa to i64)' making expressions comparison succeed. IV's are successfully eliminated, which can be seen in the '-mllvm -debug' output. The problem with the patch is that it uses wrong extend logic for the '-2' operand. It must be sign or zero extended depending on the context. Could you check and confirm the problem, and give any hints how this might be fixed properly, please? Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: bad.ir Type: application/octet-stream Size: 2791 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: good.ir Type: application/octet-stream Size: 1929 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: indvar.patch Type: application/octet-stream Size: 989 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: indvar_test.cpp Type: text/x-c++src Size: 229 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.cpp>
Sanjoy Das via llvm-dev
2016-Apr-22 08:02 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Oleg, I think the problem here is that SCEV forgets to propagate no-wrap flags when folding "{S,+,X}+T ==> {S+T,+,X}". I haven't carefully thought about the implications and whether the change is even correct, but the appended patch fixes the test case you've attached. I'll give it some more thought and if it holds up I'll check it in in the next few days. Meanwhile if you have a larger test case that you extracted indvar_test.cpp from, I'd be interested in hearing if this change works there as well. diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp index 39ced1e..2e87902 100644 --- a/lib/Analysis/ScalarEvolution.cpp +++ b/lib/Analysis/ScalarEvolution.cpp @@ -2274,19 +2274,19 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops, } // If we found some loop invariants, fold them into the recurrence. if (!LIOps.empty()) { // NLI + LI + {Start,+,Step} --> NLI + {LI+Start,+,Step} LIOps.push_back(AddRec->getStart()); SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(), AddRec->op_end()); - AddRecOps[0] = getAddExpr(LIOps); + AddRecOps[0] = getAddExpr(LIOps, Flags); // Build the new addrec. Propagate the NUW and NSW flags if both the // outer add and the inner addrec are guaranteed to have no overflow. // Always propagate NW. Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW)); const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags); // If all of the other operands were loop invariant, we are done. if (Ops.size() == 1) return NewRec; Thanks! -- Sanjoy
Oleg Ranevskyy via llvm-dev
2016-Apr-23 13:48 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy, Thank you for looking into this! Yes, your patch does fix my larger test case too. My algorithm gets double performance improvement with the patch, as the loop now has a smaller instruction set and succeeds to unroll w/o any extra #pragma's. I also ran the LLVM tests against the patch. There are 6 new failures: Analysis/LoopAccessAnalysis/number-of-memchecks.ll Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll Analysis/ScalarEvolution/flags-from-poison.ll Analysis/ScalarEvolution/nsw-offset-assume.ll Analysis/ScalarEvolution/nsw-offset.ll Analysis/ScalarEvolution/nsw.ll I haven't inspected these failures in detail yet, but it's likely the tests merely need to be adjusted to handle the new no-wrap flags the patch introduced. I will double-check this soon. Kind regards, Oleg On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi Oleg, > > I think the problem here is that SCEV forgets to propagate no-wrap > flags when folding "{S,+,X}+T ==> {S+T,+,X}". > > I haven't carefully thought about the implications and whether the > change is even correct, but the appended patch fixes the test case > you've attached. I'll give it some more thought and if it holds up > I'll check it in in the next few days. Meanwhile if you have a larger > test case that you extracted indvar_test.cpp from, I'd be interested > in hearing if this change works there as well. > > diff --git a/lib/Analysis/ScalarEvolution.cpp > b/lib/Analysis/ScalarEvolution.cpp > index 39ced1e..2e87902 100644 > --- a/lib/Analysis/ScalarEvolution.cpp > +++ b/lib/Analysis/ScalarEvolution.cpp > @@ -2274,19 +2274,19 @@ const SCEV > *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops, > } > > // If we found some loop invariants, fold them into the recurrence. > if (!LIOps.empty()) { > // NLI + LI + {Start,+,Step} --> NLI + {LI+Start,+,Step} > LIOps.push_back(AddRec->getStart()); > > SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(), > AddRec->op_end()); > - AddRecOps[0] = getAddExpr(LIOps); > + AddRecOps[0] = getAddExpr(LIOps, Flags); > > // Build the new addrec. Propagate the NUW and NSW flags if both the > // outer add and the inner addrec are guaranteed to have no > overflow. > // Always propagate NW. > Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW)); > const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags); > > // If all of the other operands were loop invariant, we are done. > if (Ops.size() == 1) return NewRec; > > Thanks! > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160423/178fcd8d/attachment.html>