Oleg Ranevskyy via llvm-dev
2016-Apr-23 13:48 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy, Thank you for looking into this! Yes, your patch does fix my larger test case too. My algorithm gets double performance improvement with the patch, as the loop now has a smaller instruction set and succeeds to unroll w/o any extra #pragma's. I also ran the LLVM tests against the patch. There are 6 new failures: Analysis/LoopAccessAnalysis/number-of-memchecks.ll Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll Analysis/ScalarEvolution/flags-from-poison.ll Analysis/ScalarEvolution/nsw-offset-assume.ll Analysis/ScalarEvolution/nsw-offset.ll Analysis/ScalarEvolution/nsw.ll I haven't inspected these failures in detail yet, but it's likely the tests merely need to be adjusted to handle the new no-wrap flags the patch introduced. I will double-check this soon. Kind regards, Oleg On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi Oleg, > > I think the problem here is that SCEV forgets to propagate no-wrap > flags when folding "{S,+,X}+T ==> {S+T,+,X}". > > I haven't carefully thought about the implications and whether the > change is even correct, but the appended patch fixes the test case > you've attached. I'll give it some more thought and if it holds up > I'll check it in in the next few days. Meanwhile if you have a larger > test case that you extracted indvar_test.cpp from, I'd be interested > in hearing if this change works there as well. > > diff --git a/lib/Analysis/ScalarEvolution.cpp > b/lib/Analysis/ScalarEvolution.cpp > index 39ced1e..2e87902 100644 > --- a/lib/Analysis/ScalarEvolution.cpp > +++ b/lib/Analysis/ScalarEvolution.cpp > @@ -2274,19 +2274,19 @@ const SCEV > *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops, > } > > // If we found some loop invariants, fold them into the recurrence. > if (!LIOps.empty()) { > // NLI + LI + {Start,+,Step} --> NLI + {LI+Start,+,Step} > LIOps.push_back(AddRec->getStart()); > > SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(), > AddRec->op_end()); > - AddRecOps[0] = getAddExpr(LIOps); > + AddRecOps[0] = getAddExpr(LIOps, Flags); > > // Build the new addrec. Propagate the NUW and NSW flags if both the > // outer add and the inner addrec are guaranteed to have no > overflow. > // Always propagate NW. > Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW)); > const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags); > > // If all of the other operands were loop invariant, we are done. > if (Ops.size() == 1) return NewRec; > > Thanks! > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160423/178fcd8d/attachment.html>
Oleg Ranevskyy via llvm-dev
2016-Apr-28 20:51 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy, Attached is the patch that fixes the LLVM test regressions caused by this fix. It just adds <nsw> entries the fix has introduced. Would you have a couple of minutes to check it, please? I would also like to share two differences in the opt logs I noticed for these tests. *1.* The fixed LLVM might split sext of a sum into a sum of two sext's while doing SCEV analysis. This does not affect the final IR however - it is the same for all the patched tests with and w/o your fix. E.g., this can be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix prints this: -------- Printing analysis 'Scalar Evolution Analysis' for function 'test-sub-nsw': Classifying expressions for: @test-sub-nsw ... %index64 = sext i32 %index32 to i64 --> {(sext i32 ((-1 * %halfsub)<nsw> + %start) to i64),+,1}<nsw><%loop> U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32 (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 * %halfsub)<nsw> + %start) to i64)) -------- The fixed one prints a sum of sext's: -------- %index64 = sext i32 %index32 to i64 --> {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S: [-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to i64)) -------- *2.* The fixed LLVM seems to be more aggressive in extracting constants from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add function. W/o the fix: -------- %ptr = getelementptr inbounds float, float* %input, i32 %index32 --> {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) + %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 * (zext i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to i64)) + %input) -------- With the fix: -------- %ptr = getelementptr inbounds float, float* %input, i32 %index32 --> {(4 + (4 * (sext i32 (-1 * %offset) to i64)) + %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4 * (zext i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64)) + %input) -------- Hope this helps. On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com> wrote:> Hi Sanjoy, > > Thank you for looking into this! > Yes, your patch does fix my larger test case too. My algorithm gets double > performance improvement with the patch, as the loop now has a smaller > instruction set and succeeds to unroll w/o any extra #pragma's. > > I also ran the LLVM tests against the patch. There are 6 new failures: > Analysis/LoopAccessAnalysis/number-of-memchecks.ll > Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll > Analysis/ScalarEvolution/flags-from-poison.ll > Analysis/ScalarEvolution/nsw-offset-assume.ll > Analysis/ScalarEvolution/nsw-offset.ll > Analysis/ScalarEvolution/nsw.ll > > I haven't inspected these failures in detail yet, but it's likely the > tests merely need to be adjusted to handle the new no-wrap flags the patch > introduced. I will double-check this soon. > > Kind regards, > Oleg > > On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >> Hi Oleg, >> >> I think the problem here is that SCEV forgets to propagate no-wrap >> flags when folding "{S,+,X}+T ==> {S+T,+,X}". >> >> I haven't carefully thought about the implications and whether the >> change is even correct, but the appended patch fixes the test case >> you've attached. I'll give it some more thought and if it holds up >> I'll check it in in the next few days. Meanwhile if you have a larger >> test case that you extracted indvar_test.cpp from, I'd be interested >> in hearing if this change works there as well. >> >> diff --git a/lib/Analysis/ScalarEvolution.cpp >> b/lib/Analysis/ScalarEvolution.cpp >> index 39ced1e..2e87902 100644 >> --- a/lib/Analysis/ScalarEvolution.cpp >> +++ b/lib/Analysis/ScalarEvolution.cpp >> @@ -2274,19 +2274,19 @@ const SCEV >> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops, >> } >> >> // If we found some loop invariants, fold them into the recurrence. >> if (!LIOps.empty()) { >> // NLI + LI + {Start,+,Step} --> NLI + {LI+Start,+,Step} >> LIOps.push_back(AddRec->getStart()); >> >> SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(), >> AddRec->op_end()); >> - AddRecOps[0] = getAddExpr(LIOps); >> + AddRecOps[0] = getAddExpr(LIOps, Flags); >> >> // Build the new addrec. Propagate the NUW and NSW flags if both >> the >> // outer add and the inner addrec are guaranteed to have no >> overflow. >> // Always propagate NW. >> Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW)); >> const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags); >> >> // If all of the other operands were loop invariant, we are done. >> if (Ops.size() == 1) return NewRec; >> >> Thanks! >> -- Sanjoy >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: scev-tests.patch Type: application/octet-stream Size: 8858 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.obj>
Oleg Ranevskyy via llvm-dev
2016-May-06 23:47 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy, Could you let me know if you have had a chance to check the patch and the LLVM test corrections, please? Thanks! On Thu, Apr 28, 2016 at 11:51 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com> wrote:> Hi Sanjoy, > > Attached is the patch that fixes the LLVM test regressions caused by this > fix. It just adds <nsw> entries the fix has introduced. > > Would you have a couple of minutes to check it, please? > > I would also like to share two differences in the opt logs I noticed for > these tests. > > *1.* The fixed LLVM might split sext of a sum into a sum of two sext's > while doing SCEV analysis. This does not affect the final IR however - it > is the same for all the patched tests with and w/o your fix. E.g., this can > be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix > prints this: > > -------- > Printing analysis 'Scalar Evolution Analysis' for function 'test-sub-nsw': > Classifying expressions for: @test-sub-nsw > ... > %index64 = sext i32 %index32 to i64 > --> {(sext i32 ((-1 * %halfsub)<nsw> + %start) to i64),+,1}<nsw><%loop> > U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32 > (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 * > %halfsub)<nsw> + %start) to i64)) > -------- > > The fixed one prints a sum of sext's: > > -------- > %index64 = sext i32 %index32 to i64 > --> {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to > i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S: > [-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) + > %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext > i32 %start to i64)) > -------- > > *2.* The fixed LLVM seems to be more aggressive in extracting constants > from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add > function. W/o the fix: > > -------- > %ptr = getelementptr inbounds float, float* %input, i32 %index32 > --> {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) + > %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 * (zext i32 > (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to > i64)) + %input) > -------- > > With the fix: > > -------- > %ptr = getelementptr inbounds float, float* %input, i32 %index32 > --> {(4 + (4 * (sext i32 (-1 * %offset) to i64)) + > %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4 * (zext > i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64)) > + %input) > -------- > > Hope this helps. > > On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com> > wrote: > >> Hi Sanjoy, >> >> Thank you for looking into this! >> Yes, your patch does fix my larger test case too. My algorithm gets >> double performance improvement with the patch, as the loop now has a >> smaller instruction set and succeeds to unroll w/o any extra #pragma's. >> >> I also ran the LLVM tests against the patch. There are 6 new failures: >> Analysis/LoopAccessAnalysis/number-of-memchecks.ll >> Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll >> Analysis/ScalarEvolution/flags-from-poison.ll >> Analysis/ScalarEvolution/nsw-offset-assume.ll >> Analysis/ScalarEvolution/nsw-offset.ll >> Analysis/ScalarEvolution/nsw.ll >> >> I haven't inspected these failures in detail yet, but it's likely the >> tests merely need to be adjusted to handle the new no-wrap flags the patch >> introduced. I will double-check this soon. >> >> Kind regards, >> Oleg >> >> On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das < >> sanjoy at playingwithpointers.com> wrote: >> >>> Hi Oleg, >>> >>> I think the problem here is that SCEV forgets to propagate no-wrap >>> flags when folding "{S,+,X}+T ==> {S+T,+,X}". >>> >>> I haven't carefully thought about the implications and whether the >>> change is even correct, but the appended patch fixes the test case >>> you've attached. I'll give it some more thought and if it holds up >>> I'll check it in in the next few days. Meanwhile if you have a larger >>> test case that you extracted indvar_test.cpp from, I'd be interested >>> in hearing if this change works there as well. >>> >>> diff --git a/lib/Analysis/ScalarEvolution.cpp >>> b/lib/Analysis/ScalarEvolution.cpp >>> index 39ced1e..2e87902 100644 >>> --- a/lib/Analysis/ScalarEvolution.cpp >>> +++ b/lib/Analysis/ScalarEvolution.cpp >>> @@ -2274,19 +2274,19 @@ const SCEV >>> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops, >>> } >>> >>> // If we found some loop invariants, fold them into the recurrence. >>> if (!LIOps.empty()) { >>> // NLI + LI + {Start,+,Step} --> NLI + {LI+Start,+,Step} >>> LIOps.push_back(AddRec->getStart()); >>> >>> SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(), >>> AddRec->op_end()); >>> - AddRecOps[0] = getAddExpr(LIOps); >>> + AddRecOps[0] = getAddExpr(LIOps, Flags); >>> >>> // Build the new addrec. Propagate the NUW and NSW flags if both >>> the >>> // outer add and the inner addrec are guaranteed to have no >>> overflow. >>> // Always propagate NW. >>> Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW)); >>> const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags); >>> >>> // If all of the other operands were loop invariant, we are done. >>> if (Ops.size() == 1) return NewRec; >>> >>> Thanks! >>> -- Sanjoy >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160507/6f25149b/attachment.html>