thr3ads.net - llvm dev - [llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code [May 2016]

If this information is useful, please help other people find it:
Share via:

Oleg Ranevskyy via llvm-dev

2016-Apr-23 13:48 UTC

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Hi Sanjoy,

Thank you for looking into this!
Yes, your patch does fix my larger test case too. My algorithm gets double
performance improvement with the patch, as the loop now has a smaller
instruction set and succeeds to unroll w/o any extra #pragma's.

I also ran the LLVM tests against the patch. There are 6 new failures:
    Analysis/LoopAccessAnalysis/number-of-memchecks.ll
    Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
    Analysis/ScalarEvolution/flags-from-poison.ll
    Analysis/ScalarEvolution/nsw-offset-assume.ll
    Analysis/ScalarEvolution/nsw-offset.ll
    Analysis/ScalarEvolution/nsw.ll

I haven't inspected these failures in detail yet, but it's likely the
tests
merely need to be adjusted to handle the new no-wrap flags the patch
introduced. I will double-check this soon.

Kind regards,
Oleg

On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> Hi Oleg,
>
> I think the problem here is that SCEV forgets to propagate no-wrap
> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>
> I haven't carefully thought about the implications and whether the
> change is even correct, but the appended patch fixes the test case
> you've attached.  I'll give it some more thought and if it holds up
> I'll check it in in the next few days.  Meanwhile if you have a larger
> test case that you extracted indvar_test.cpp from, I'd be interested
> in hearing if this change works there as well.
>
> diff --git a/lib/Analysis/ScalarEvolution.cpp
> b/lib/Analysis/ScalarEvolution.cpp
> index 39ced1e..2e87902 100644
> --- a/lib/Analysis/ScalarEvolution.cpp
> +++ b/lib/Analysis/ScalarEvolution.cpp
> @@ -2274,19 +2274,19 @@ const SCEV
> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
>        }
>
>      // If we found some loop invariants, fold them into the recurrence.
>      if (!LIOps.empty()) {
>        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
>        LIOps.push_back(AddRec->getStart());
>
>        SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),
>                                               AddRec->op_end());
> -      AddRecOps[0] = getAddExpr(LIOps);
> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>
>        // Build the new addrec. Propagate the NUW and NSW flags if both the
>        // outer add and the inner addrec are guaranteed to have no
> overflow.
>        // Always propagate NW.
>        Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));
>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);
>
>        // If all of the other operands were loop invariant, we are done.
>        if (Ops.size() == 1) return NewRec;
>
> Thanks!
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160423/178fcd8d/attachment.html>

Oleg Ranevskyy via llvm-dev

2016-Apr-28 20:51 UTC

head link

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Hi Sanjoy,

Attached is the patch that fixes the LLVM test regressions caused by this
fix. It just adds <nsw> entries the fix has introduced.

Would you have a couple of minutes to check it, please?

I would also like to share two differences in the opt logs I noticed for
these tests.

*1.* The fixed LLVM might split sext of a sum into a sum of two sext's
while doing SCEV analysis. This does not affect the final IR however - it
is the same for all the patched tests with and w/o your fix. E.g., this can
be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix
prints this:

--------
Printing analysis 'Scalar Evolution Analysis' for function
'test-sub-nsw':
Classifying expressions for: @test-sub-nsw
  ...
  %index64 = sext i32 %index32 to i64
  -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to
i64),+,1}<nsw><%loop>
U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32
(-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 *
%halfsub)<nsw> + %start) to i64))
--------

The fixed one prints a sum of sext's:

--------
  %index64 = sext i32 %index32 to i64
  -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to
i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S:
[-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) +
%numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext
i32 %start to i64))
--------

*2.* The fixed LLVM seems to be more aggressive in extracting constants
from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add
function. W/o the fix:

--------
  %ptr = getelementptr inbounds float, float* %input, i32 %index32
  -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
%input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 * (zext
i32 (-1
+ %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
%input)
--------

With the fix:

--------
  %ptr = getelementptr inbounds float, float* %input, i32 %index32
  -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) +
%input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4 *
(zext i32
(-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64)) +
%input)
--------

Hope this helps.

On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at
gmail.com>
wrote:
> Hi Sanjoy,
>
> Thank you for looking into this!
> Yes, your patch does fix my larger test case too. My algorithm gets double
> performance improvement with the patch, as the loop now has a smaller
> instruction set and succeeds to unroll w/o any extra #pragma's.
>
> I also ran the LLVM tests against the patch. There are 6 new failures:
>     Analysis/LoopAccessAnalysis/number-of-memchecks.ll
>     Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
>     Analysis/ScalarEvolution/flags-from-poison.ll
>     Analysis/ScalarEvolution/nsw-offset-assume.ll
>     Analysis/ScalarEvolution/nsw-offset.ll
>     Analysis/ScalarEvolution/nsw.ll
>
> I haven't inspected these failures in detail yet, but it's likely
the
> tests merely need to be adjusted to handle the new no-wrap flags the patch
> introduced. I will double-check this soon.
>
> Kind regards,
> Oleg
>
> On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <
> sanjoy at playingwithpointers.com> wrote:
>
>> Hi Oleg,
>>
>> I think the problem here is that SCEV forgets to propagate no-wrap
>> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>>
>> I haven't carefully thought about the implications and whether the
>> change is even correct, but the appended patch fixes the test case
>> you've attached.  I'll give it some more thought and if it
holds up
>> I'll check it in in the next few days.  Meanwhile if you have a
larger
>> test case that you extracted indvar_test.cpp from, I'd be
interested
>> in hearing if this change works there as well.
>>
>> diff --git a/lib/Analysis/ScalarEvolution.cpp
>> b/lib/Analysis/ScalarEvolution.cpp
>> index 39ced1e..2e87902 100644
>> --- a/lib/Analysis/ScalarEvolution.cpp
>> +++ b/lib/Analysis/ScalarEvolution.cpp
>> @@ -2274,19 +2274,19 @@ const SCEV
>> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *>
&Ops,
>>        }
>>
>>      // If we found some loop invariants, fold them into the
recurrence.
>>      if (!LIOps.empty()) {
>>        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
>>        LIOps.push_back(AddRec->getStart());
>>
>>        SmallVector<const SCEV *, 4>
AddRecOps(AddRec->op_begin(),
>>                                               AddRec->op_end());
>> -      AddRecOps[0] = getAddExpr(LIOps);
>> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>>
>>        // Build the new addrec. Propagate the NUW and NSW flags if both
>> the
>>        // outer add and the inner addrec are guaranteed to have no
>> overflow.
>>        // Always propagate NW.
>>        Flags = AddRec->getNoWrapFlags(setFlags(Flags,
SCEV::FlagNW));
>>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop,
Flags);
>>
>>        // If all of the other operands were loop invariant, we are
done.
>>        if (Ops.size() == 1) return NewRec;
>>
>> Thanks!
>> -- Sanjoy
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scev-tests.patch
Type: application/octet-stream
Size: 8858 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.obj>

Oleg Ranevskyy via llvm-dev

2016-May-06 23:47 UTC

head link

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Hi Sanjoy,

Could you let me know if you have had a chance to check the patch and the
LLVM test corrections, please?

Thanks!

On Thu, Apr 28, 2016 at 11:51 PM, Oleg Ranevskyy <llvm.mail.list at
gmail.com>
wrote:
> Hi Sanjoy,
>
> Attached is the patch that fixes the LLVM test regressions caused by this
> fix. It just adds <nsw> entries the fix has introduced.
>
> Would you have a couple of minutes to check it, please?
>
> I would also like to share two differences in the opt logs I noticed for
> these tests.
>
> *1.* The fixed LLVM might split sext of a sum into a sum of two sext's
> while doing SCEV analysis. This does not affect the final IR however - it
> is the same for all the patched tests with and w/o your fix. E.g., this can
> be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix
> prints this:
>
> --------
> Printing analysis 'Scalar Evolution Analysis' for function
'test-sub-nsw':
> Classifying expressions for: @test-sub-nsw
>   ...
>   %index64 = sext i32 %index32 to i64
>   -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to
i64),+,1}<nsw><%loop>
> U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32
> (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 *
> %halfsub)<nsw> + %start) to i64))
> --------
>
> The fixed one prints a sum of sext's:
>
> --------
>   %index64 = sext i32 %index32 to i64
>   -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start
to
> i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767)
S:
> [-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) +
> %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) +
(sext
> i32 %start to i64))
> --------
>
> *2.* The fixed LLVM seems to be more aggressive in extracting constants
> from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add
> function. W/o the fix:
>
> --------
>   %ptr = getelementptr inbounds float, float* %input, i32 %index32
>   -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
> %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 *
(zext i32
> (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to
> i64)) + %input)
> --------
>
> With the fix:
>
> --------
>   %ptr = getelementptr inbounds float, float* %input, i32 %index32
>   -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) +
> %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4
* (zext
> i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64))
> + %input)
> --------
>
> Hope this helps.
>
> On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at
gmail.com>
> wrote:
>
>> Hi Sanjoy,
>>
>> Thank you for looking into this!
>> Yes, your patch does fix my larger test case too. My algorithm gets
>> double performance improvement with the patch, as the loop now has a
>> smaller instruction set and succeeds to unroll w/o any extra
#pragma's.
>>
>> I also ran the LLVM tests against the patch. There are 6 new failures:
>>     Analysis/LoopAccessAnalysis/number-of-memchecks.ll
>>     Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
>>     Analysis/ScalarEvolution/flags-from-poison.ll
>>     Analysis/ScalarEvolution/nsw-offset-assume.ll
>>     Analysis/ScalarEvolution/nsw-offset.ll
>>     Analysis/ScalarEvolution/nsw.ll
>>
>> I haven't inspected these failures in detail yet, but it's
likely the
>> tests merely need to be adjusted to handle the new no-wrap flags the
patch
>> introduced. I will double-check this soon.
>>
>> Kind regards,
>> Oleg
>>
>> On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <
>> sanjoy at playingwithpointers.com> wrote:
>>
>>> Hi Oleg,
>>>
>>> I think the problem here is that SCEV forgets to propagate no-wrap
>>> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>>>
>>> I haven't carefully thought about the implications and whether
the
>>> change is even correct, but the appended patch fixes the test case
>>> you've attached.  I'll give it some more thought and if it
holds up
>>> I'll check it in in the next few days.  Meanwhile if you have a
larger
>>> test case that you extracted indvar_test.cpp from, I'd be
interested
>>> in hearing if this change works there as well.
>>>
>>> diff --git a/lib/Analysis/ScalarEvolution.cpp
>>> b/lib/Analysis/ScalarEvolution.cpp
>>> index 39ced1e..2e87902 100644
>>> --- a/lib/Analysis/ScalarEvolution.cpp
>>> +++ b/lib/Analysis/ScalarEvolution.cpp
>>> @@ -2274,19 +2274,19 @@ const SCEV
>>> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *>
&Ops,
>>>        }
>>>
>>>      // If we found some loop invariants, fold them into the
recurrence.
>>>      if (!LIOps.empty()) {
>>>        //  NLI + LI + {Start,+,Step}  -->  NLI +
{LI+Start,+,Step}
>>>        LIOps.push_back(AddRec->getStart());
>>>
>>>        SmallVector<const SCEV *, 4>
AddRecOps(AddRec->op_begin(),
>>>                                               AddRec->op_end());
>>> -      AddRecOps[0] = getAddExpr(LIOps);
>>> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>>>
>>>        // Build the new addrec. Propagate the NUW and NSW flags if
both
>>> the
>>>        // outer add and the inner addrec are guaranteed to have no
>>> overflow.
>>>        // Always propagate NW.
>>>        Flags = AddRec->getNoWrapFlags(setFlags(Flags,
SCEV::FlagNW));
>>>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop,
Flags);
>>>
>>>        // If all of the other operands were loop invariant, we are
done.
>>>        if (Ops.size() == 1) return NewRec;
>>>
>>> Thanks!
>>> -- Sanjoy
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160507/6f25149b/attachment.html>

llvm dev - May 2016 - [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code