thr3ads.net - llvm dev - [LLVMdev] missing optimization for icmps in induction variables? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2015-Jan-08 07:20 UTC

[LLVMdev] missing optimization for icmps in induction variables?

Hi Nick,

I checked in something towards (1) yesterday -- http://reviews.llvm.org/D6748

I was under the impression that (2) is exactly the kind of predicate
ScalarEvolution::isKnownPredicate is designed to solve (using
isImpliedCondXXX or something like that).  Is there a reason to prefer
GVN over that?

On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky <nicholas at mxc.ca>
wrote:> Sanjoy Das wrote:
>>
>> Hi all,
>>
>> I'm trying to get llvm to optimize away the %cmp to true in
>>
>> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
>>   entry:
>>    %length = load i32* %length_ptr, !range !0
>>    %len.sub.1 = sub i32 %length, 1
>>    %upper = icmp slt i32 %init, %len.sub.1
>>    br i1 %upper, label %loop, label %exit
>>
>>   loop:
>>    %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
>>    %civ.inc = add i32 %civ, 1
>>    %cmp = icmp slt i32 %civ.inc, %length
>>    br i1 %cmp, label %latch, label %break
>>
>>   latch:
>>    store i32 0, i32* %array
>>    %check = icmp slt i32 %civ.inc, %len.sub.1
>>    br i1 %check, label %loop, label %break
>>
>>   break:
>>    ret i32 %civ.inc
>>
>>   exit:
>>    ret i32 42
>> }
>>
>> !0 = !{i32 0, i32 2147483647}
>>
>>
>> One way to prove "%cmp == true" in two steps
>>
>>   1. notice that since both on the backedge and entry, %civ is known to
>>      be less than %len.sub.1, which not i32_signed_max.  This means
>>      %civ.inc is an "add nsw".
>>
>>   2. on both the entry and backedge, we know "%civ `slt`
%len.sub.1".
>>      This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)"
==>
>>      "%civ.inc `slt` %len".
>>
>> Currently neither of these happen (i.e. even if I make transformation
>> (1) manually, (2) does not kick in).
>>
>> Is the above reasoning correct?  If it is, what is the right place to
>> add this logic to?  I'm thinking ScalarEvolution (so that this gets
>> picked up by SimplifyIndVar), but maybe there is a more idiomatic
>> place?  The case above is a simplified form of my real workload, which
>> involves smin and smax expressions; so the implementation has to be
>> easily generalizable to such cases.
>
>
> Before reading your two steps I was going to suggest jump threading. Jump
> threading is where we optimize redundant tests across blocks that feed into
> branches (block A tests property X and branches to block B which also tests
> property X). However jump threading is powered by lazy value info, which I
> don't think is suited for the sort of reasoning in your two steps.
>
> One option is GVN. GVN does have x < y expressions but it doesn't
try to
> deduce nuw/nsw bits. It might be possible to add that, but it isn't
> immediately obvious to me how. GVN also does path-sensitive expression
> commoning.
>
> Nick

Nick Lewycky

2015-Jan-13 06:05 UTC

head link

[LLVMdev] missing optimization for icmps in induction variables?

Sanjoy Das wrote:> Hi Nick,
>
> I checked in something towards (1) yesterday --
http://reviews.llvm.org/D6748
Ah, I missed that. Catching up on email.
> I was under the impression that (2) is exactly the kind of predicate
> ScalarEvolution::isKnownPredicate is designed to solve (using
> isImpliedCondXXX or something like that).  Is there a reason to prefer
> GVN over that?
If you think of the problem as a loop problem then yes 
ScalarEvolution::isKnownPredicate makes sense. If you think of it as a 
redundancy elimination problem then GVN makes sense. Does this sort of 
problem occur outside of loops?
> On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky<nicholas at mxc.ca> 
wrote:
>> Sanjoy Das wrote:
>>>
>>> Hi all,
>>>
>>> I'm trying to get llvm to optimize away the %cmp to true in
>>>
>>> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
>>>    entry:
>>>     %length = load i32* %length_ptr, !range !0
>>>     %len.sub.1 = sub i32 %length, 1
>>>     %upper = icmp slt i32 %init, %len.sub.1
>>>     br i1 %upper, label %loop, label %exit
>>>
>>>    loop:
>>>     %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
>>>     %civ.inc = add i32 %civ, 1
>>>     %cmp = icmp slt i32 %civ.inc, %length
>>>     br i1 %cmp, label %latch, label %break
>>>
>>>    latch:
>>>     store i32 0, i32* %array
>>>     %check = icmp slt i32 %civ.inc, %len.sub.1
>>>     br i1 %check, label %loop, label %break
>>>
>>>    break:
>>>     ret i32 %civ.inc
>>>
>>>    exit:
>>>     ret i32 42
>>> }
>>>
>>> !0 = !{i32 0, i32 2147483647}
>>>
>>>
>>> One way to prove "%cmp == true" in two steps
>>>
>>>    1. notice that since both on the backedge and entry, %civ is
known to
>>>       be less than %len.sub.1, which not i32_signed_max.  This
means
>>>       %civ.inc is an "add nsw".
>>>
>>>    2. on both the entry and backedge, we know "%civ `slt`
%len.sub.1".
>>>       This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+
1)" ==>
>>>       "%civ.inc `slt` %len".
>>>
>>> Currently neither of these happen (i.e. even if I make
transformation
>>> (1) manually, (2) does not kick in).
>>>
>>> Is the above reasoning correct?  If it is, what is the right place
to
>>> add this logic to?  I'm thinking ScalarEvolution (so that this
gets
>>> picked up by SimplifyIndVar), but maybe there is a more idiomatic
>>> place?  The case above is a simplified form of my real workload,
which
>>> involves smin and smax expressions; so the implementation has to be
>>> easily generalizable to such cases.
>>
>>
>> Before reading your two steps I was going to suggest jump threading.
Jump
>> threading is where we optimize redundant tests across blocks that feed
into
>> branches (block A tests property X and branches to block B which also
tests
>> property X). However jump threading is powered by lazy value info,
which I
>> don't think is suited for the sort of reasoning in your two steps.
>>
>> One option is GVN. GVN does have x<  y expressions but it
doesn't try to
>> deduce nuw/nsw bits. It might be possible to add that, but it isn't
>> immediately obvious to me how. GVN also does path-sensitive expression
>> commoning.
>>
>> Nick
>

Sanjoy Das

2015-Jan-14 23:23 UTC

head link

[LLVMdev] missing optimization for icmps in induction variables?

> If you think of the problem as a loop problem then yes
> ScalarEvolution::isKnownPredicate makes sense. If you think of it as a
> redundancy elimination problem then GVN makes sense. Does this sort of
> problem occur outside of loops?
I've only observed them happen inside loops, but that is only because
I've been looking at loops. :)  At least in theory if LLVM's GVN can
be made to in a path-sensitive manner, then there is no need to do
such tricks using SCEV.

I've only cursorily glanced into GVN.cpp, but given that we want LLVM
to conclude that '(x nsw+ 5) SLT L' implies 'x SLT (L nsw- 5)',
will
the value numbering for cmp expressions have to become richer?  The
operands to icmp themselves won't value number to the same value,
because they're not the same; but maybe there is some canonicalization
form we want to reduce the icmp expressions to before numbering them?

-- Sanjoy
>
>
>> On Wed, Jan 7, 2015 at 10:06 PM, Nick Lewycky<nicholas at mxc.ca>
wrote:
>>>
>>> Sanjoy Das wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I'm trying to get llvm to optimize away the %cmp to true in
>>>>
>>>> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
>>>>    entry:
>>>>     %length = load i32* %length_ptr, !range !0
>>>>     %len.sub.1 = sub i32 %length, 1
>>>>     %upper = icmp slt i32 %init, %len.sub.1
>>>>     br i1 %upper, label %loop, label %exit
>>>>
>>>>    loop:
>>>>     %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
>>>>     %civ.inc = add i32 %civ, 1
>>>>     %cmp = icmp slt i32 %civ.inc, %length
>>>>     br i1 %cmp, label %latch, label %break
>>>>
>>>>    latch:
>>>>     store i32 0, i32* %array
>>>>     %check = icmp slt i32 %civ.inc, %len.sub.1
>>>>     br i1 %check, label %loop, label %break
>>>>
>>>>    break:
>>>>     ret i32 %civ.inc
>>>>
>>>>    exit:
>>>>     ret i32 42
>>>> }
>>>>
>>>> !0 = !{i32 0, i32 2147483647}
>>>>
>>>>
>>>> One way to prove "%cmp == true" in two steps
>>>>
>>>>    1. notice that since both on the backedge and entry, %civ is
known to
>>>>       be less than %len.sub.1, which not i32_signed_max.  This
means
>>>>       %civ.inc is an "add nsw".
>>>>
>>>>    2. on both the entry and backedge, we know "%civ `slt`
%len.sub.1".
>>>>       This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+
1)" ==>
>>>>       "%civ.inc `slt` %len".
>>>>
>>>> Currently neither of these happen (i.e. even if I make
transformation
>>>> (1) manually, (2) does not kick in).
>>>>
>>>> Is the above reasoning correct?  If it is, what is the right
place to
>>>> add this logic to?  I'm thinking ScalarEvolution (so that
this gets
>>>> picked up by SimplifyIndVar), but maybe there is a more
idiomatic
>>>> place?  The case above is a simplified form of my real
workload, which
>>>> involves smin and smax expressions; so the implementation has
to be
>>>> easily generalizable to such cases.
>>>
>>>
>>>
>>> Before reading your two steps I was going to suggest jump
threading. Jump
>>> threading is where we optimize redundant tests across blocks that
feed
>>> into
>>> branches (block A tests property X and branches to block B which
also
>>> tests
>>> property X). However jump threading is powered by lazy value info,
which
>>> I
>>> don't think is suited for the sort of reasoning in your two
steps.
>>>
>>> One option is GVN. GVN does have x<  y expressions but it
doesn't try to
>>> deduce nuw/nsw bits. It might be possible to add that, but it
isn't
>>> immediately obvious to me how. GVN also does path-sensitive
expression
>>> commoning.
>>>
>>> Nick
>>
>>
>

llvm dev - Jan 2015 - [LLVMdev] missing optimization for icmps in induction variables?

[LLVMdev] missing optimization for icmps in induction variables?

[LLVMdev] missing optimization for icmps in induction variables?

[LLVMdev] missing optimization for icmps in induction variables?