Oleg Ranevskyy via llvm-dev
2016-Apr-20  23:10 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi,
Would you be able to kindly check and assist with the IndVarSimplify / SCEV
problem I got in the latest LLVM, please?
Sometimes IndVarSimplify may not eliminate narrow IV's when there actually
exists such a possibility. This may affect other LLVM passes and result in
inefficient code. The reproducing test 'indvar_test.cpp' is attached.
The problem is with the second 'for' loop that accesses array elements
with
different indexes on each iteration.
The latest LLVM fails to reuse array element values from previous
iterations and generates an unnecessary GEP. The generated IR is shown in
the attached file 'bad.ir'.
This happens because IndVarSimplify fails to eliminate '%idxprom7' and
'%idxprom10'.
The clang command line we use:
    clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf
indvar_test.cpp -o bad.ir
I found that 'WidenIV::widenIVUse' (IndVarSimplify.cpp) may fail to
widen
narrow IV uses.
When the function gets a NarrowUse such as '{(-2 +
%inc.lcssa),+,1}<nsw><%for.body3>', it first tries to get a wide
recurrence
for it via the 'getWideRecurrence' call.
'getWideRecurrence' returns recurrence like this: '{(sext i32 (-2 +
%inc.lcssa) to i64),+,1}<nsw><%for.body3>', which is fine by
itself.
Then a wide use operation is generated by 'cloneIVUser'. The generated
wide
use is evaluated to '{(-2 + (sext i32 %inc.lcssa to
i64))<nsw>,+,1}<nsw><%for.body3>', which is different from
'getWideRecurrence' result (please note the position of -2).
'cloneIVUser'
sees the difference and returns nullptr.
I attached a test patch 'indvar.patch', which is not correct for all
cases,
but it fixes the specific 'indvar_test.cpp' scenario to demonstrate the
efficient code that could have been generated (good.ir).
It transforms expressions like '(sext i32 (-2 + %inc.lcssa) to i64)'
into
'-2 + (sext i32 %inc.lcssa to i64)' making expressions comparison
succeed.
IV's are successfully eliminated, which can be seen in the '-mllvm
-debug'
output.
The problem with the patch is that it uses wrong extend logic for the
'-2'
operand. It must be sign or zero extended depending on the context.
Could you check and confirm the problem, and give any hints how this might
be fixed properly, please?
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bad.ir
Type: application/octet-stream
Size: 2791 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: good.ir
Type: application/octet-stream
Size: 1929 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: indvar.patch
Type: application/octet-stream
Size: 989 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: indvar_test.cpp
Type: text/x-c++src
Size: 229 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/c1d1ee9f/attachment.cpp>
Sanjoy Das via llvm-dev
2016-Apr-22  08:02 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Oleg,
I think the problem here is that SCEV forgets to propagate no-wrap
flags when folding "{S,+,X}+T ==> {S+T,+,X}".
I haven't carefully thought about the implications and whether the
change is even correct, but the appended patch fixes the test case
you've attached.  I'll give it some more thought and if it holds up
I'll check it in in the next few days.  Meanwhile if you have a larger
test case that you extracted indvar_test.cpp from, I'd be interested
in hearing if this change works there as well.
diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp
index 39ced1e..2e87902 100644
--- a/lib/Analysis/ScalarEvolution.cpp
+++ b/lib/Analysis/ScalarEvolution.cpp
@@ -2274,19 +2274,19 @@ const SCEV
*ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
        }
      // If we found some loop invariants, fold them into the recurrence.
      if (!LIOps.empty()) {
        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
        LIOps.push_back(AddRec->getStart());
        SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),
                                               AddRec->op_end());
-      AddRecOps[0] = getAddExpr(LIOps);
+      AddRecOps[0] = getAddExpr(LIOps, Flags);
        // Build the new addrec. Propagate the NUW and NSW flags if both the
        // outer add and the inner addrec are guaranteed to have no overflow.
        // Always propagate NW.
        Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));
        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);
        // If all of the other operands were loop invariant, we are done.
        if (Ops.size() == 1) return NewRec;
Thanks!
-- Sanjoy
Oleg Ranevskyy via llvm-dev
2016-Apr-23  13:48 UTC
[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy,
Thank you for looking into this!
Yes, your patch does fix my larger test case too. My algorithm gets double
performance improvement with the patch, as the loop now has a smaller
instruction set and succeeds to unroll w/o any extra #pragma's.
I also ran the LLVM tests against the patch. There are 6 new failures:
    Analysis/LoopAccessAnalysis/number-of-memchecks.ll
    Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
    Analysis/ScalarEvolution/flags-from-poison.ll
    Analysis/ScalarEvolution/nsw-offset-assume.ll
    Analysis/ScalarEvolution/nsw-offset.ll
    Analysis/ScalarEvolution/nsw.ll
I haven't inspected these failures in detail yet, but it's likely the
tests
merely need to be adjusted to handle the new no-wrap flags the patch
introduced. I will double-check this soon.
Kind regards,
Oleg
On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> Hi Oleg,
>
> I think the problem here is that SCEV forgets to propagate no-wrap
> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>
> I haven't carefully thought about the implications and whether the
> change is even correct, but the appended patch fixes the test case
> you've attached.  I'll give it some more thought and if it holds up
> I'll check it in in the next few days.  Meanwhile if you have a larger
> test case that you extracted indvar_test.cpp from, I'd be interested
> in hearing if this change works there as well.
>
> diff --git a/lib/Analysis/ScalarEvolution.cpp
> b/lib/Analysis/ScalarEvolution.cpp
> index 39ced1e..2e87902 100644
> --- a/lib/Analysis/ScalarEvolution.cpp
> +++ b/lib/Analysis/ScalarEvolution.cpp
> @@ -2274,19 +2274,19 @@ const SCEV
> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
>        }
>
>      // If we found some loop invariants, fold them into the recurrence.
>      if (!LIOps.empty()) {
>        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
>        LIOps.push_back(AddRec->getStart());
>
>        SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),
>                                               AddRec->op_end());
> -      AddRecOps[0] = getAddExpr(LIOps);
> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>
>        // Build the new addrec. Propagate the NUW and NSW flags if both the
>        // outer add and the inner addrec are guaranteed to have no
> overflow.
>        // Always propagate NW.
>        Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));
>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);
>
>        // If all of the other operands were loop invariant, we are done.
>        if (Ops.size() == 1) return NewRec;
>
> Thanks!
> -- Sanjoy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160423/178fcd8d/attachment.html>
Apparently Analagous Threads
- [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
- [LLVMdev] ScalarEvolution Patch
- [LLVMdev] SCEV getMulExpr() not propagating Wrap flags
- [LLVMdev] SCEV getMulExpr() not propagating Wrap flags
- [LLVMdev] SCEV getMulExpr() not propagating Wrap flags