thr3ads.net - llvm dev - [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>? [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Alexandre Isoard via llvm-dev

2018-Aug-15 21:27 UTC

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {> entry:
>   %adds = add nsw i32 %x, %y
>   %addu = add nuw i32 %x, %y
>   %cond = select i1 %b, i32 %adds, i32 %addu
>   ret i32 %cond
> }

It is important to not propagate the nsw/nuw between the two SCEV
expressions (which unification would do today, can I consider that a bug or
is it a feature?).
So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function
'func':> Classifying expressions for: @func
>   %adds = add nsw i32 %x, %y
>   -->  (%x + %y) U: full-set S: full-set
>   %addu = add nuw i32 %x, %y
>   -->  (%x + %y) U: full-set S: full-set
>   %cond = select i1 %b, i32 %adds, i32 %addu
>   -->  %cond U: full-set S: full-set
> Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when
unifying SCEVs?

On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <efriedma at codeaurora.org>
wrote:
> On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
>
> Is that why we do not deduce +<nsw> from "add nsw" either?
>
>
> Essentially, yes.
>
> Is that an intrinsic limitation of creating a context-invariant
> expressions from a Value* or is that a limitation of our implementation
> (our unification not considering the nsw flags)?
>
>
> It's a consequence of unification not considering nsw.  (nsw on an
> instruction is naturally invariant because violating nsw produces poison,
> not UB.)
>
> -Eli
>
>
> On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <efriedma at
codeaurora.org>
> wrote:
>
>> On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
>>
>> Hello,
>>
>> If I run clang on the following code:
>>
>> void func(unsigned n) {
>>>   for (unsigned long x = 1; x < n; ++x)
>>>     dummy(x);
>>> }
>>
>>
>> I get the following llvm ir:
>>
>> define void @func(i32 %n) {
>>> entry:
>>>   %conv = zext i32 %n to i64
>>>   %cmp5 = icmp ugt i32 %n, 1
>>>   br i1 %cmp5, label %for.body, label %for.cond.cleanup
>>> for.cond.cleanup:                                 ; preds =
%for.body,
>>> %entry
>>>   ret void
>>> for.body:                                         ; preds = %entry,
>>> %for.body
>>>   %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>   tail call void @dummy(i64 %x.06) #2
>>>   %inc = add nuw nsw i64 %x.06, 1
>>>   %exitcond = icmp eq i64 %inc, %conv
>>>   br i1 %exitcond, label %for.cond.cleanup, label %for.body
>>> }
>>
>>
>> Over which, SCEV will provide the following analysis:
>>
>> Printing analysis 'Scalar Evolution Analysis' for function
'func':
>>> Classifying expressions for: @func
>>>   %conv = zext i32 %n to i64
>>>   -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
>>>   %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>   -->  {1,+,1}<nuw><nsw><%for.body> U:
[1,-9223372036854775808) S:
>>> [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64))
LoopDispositions:
>>> { %for.body: Computable }
>>>   %inc = add nuw nsw i64 %x.06, 1
>>>   -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0)
Exits: (zext i32 %n to
>>> i64) LoopDispositions: { %for.body: Computable }
>>> Determining loop execution counts for: @func
>>> Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to
i64))<nsw>
>>> Loop %for.body: max backedge-taken count is -2
>>> Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32
%n to
>>> i64))<nsw>
>>>  Predicates:
>>> Loop %for.body: Trip multiple is 1
>>
>>
>> Now, I was surprised by the max backedge-taken count being -2, and I
>> suspect it is due to the backedge-taken count being marked as
<nsw> instead
>> of <nuw>.
>>
>> Is that on purpose, is that a bug, or is my analysis incorrect? I am
not
>> sure where to fix that issue.
>>
>>
>> The backedge-taken count isn't nuw because nsw/nuw markings
aren't
>> flow-sensitive: there isn't any way to mark the trip count as nuw
without
>> marking every computation of `(long)n-2` as nuw.
>>
>> There's some code in ScalarEvolution::howFarToZero to try to refine
the
>> max backedge-taken count in some cases, but it isn't very general. 
See
>> https://reviews.llvm.org/D28536 .
>>
>> -Eli
>>
>> --
>> Employee of Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
>>
>>
>
> --
> *Alexandre Isoard*
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180815/460f1c2f/attachment-0001.html>

Friedman, Eli via llvm-dev

2018-Aug-15 21:40 UTC

head link

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

On 8/15/2018 2:27 PM, Alexandre Isoard wrote:> I'm not sure I understand the poison/undef/UB distinctions.
>
> But on this example:
>
>     define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
>     entry:
>       %adds = add nsw i32 %x, %y
>       %addu = add nuw i32 %x, %y
>       %cond = select i1 %b, i32 %adds, i32 %addu
>       ret i32 %cond
>     }
>
>
> It is important to not propagate the nsw/nuw between the two SCEV 
> expressions (which unification would do today, can I consider that a 
> bug or is it a feature?).
It's an intentional design choice.
> So we work-around it by not informing SCEV of the flags:
>
>     Printing analysis 'Scalar Evolution Analysis' for function
'func':
>     Classifying expressions for: @func
>       %adds = add nsw i32 %x, %y
>       -->  (%x + %y) U: full-set S: full-set
>       %addu = add nuw i32 %x, %y
>       -->  (%x + %y) U: full-set S: full-set
>       %cond = select i1 %b, i32 %adds, i32 %addu
>       -->  %cond U: full-set S: full-set
>     Determining loop execution counts for: @func
>
>
> Would there be problems if we properly considered nuw/nsw flags when 
> unifying SCEVs?
There would be other consequences.  For example, `(%x + %y)<nsw>` and 
`(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and
all the places that call setNoWrapFlags would have to be rewritten.  
It's probably possible to come up with some workable design, but nobody 
has actually tried it, so it's not clear how much work it would be to 
implement, or whether it would improve the generated code overall.

-Eli
>
> On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <efriedma at
codeaurora.org
> <mailto:efriedma at codeaurora.org>> wrote:
>
>     On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
>>     Is that why we do not deduce +<nsw> from "add nsw"
either?
>
>     Essentially, yes.
>
>>     Is that an intrinsic limitation of creating a context-invariant
>>     expressions from a Value* or is that a limitation of our
>>     implementation (our unification not considering the nsw flags)?
>
>     It's a consequence of unification not considering nsw.  (nsw on an
>     instruction is naturally invariant because violating nsw produces
>     poison, not UB.)
>
>     -Eli
>
>>
>>     On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli
>>     <efriedma at codeaurora.org <mailto:efriedma at
codeaurora.org>> wrote:
>>
>>         On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
>>>         Hello,
>>>
>>>         If I run clang on the following code:
>>>
>>>             void func(unsigned n) {
>>>               for (unsigned long x = 1; x < n; ++x)
>>>                 dummy(x);
>>>             }
>>>
>>>
>>>         I get the following llvm ir:
>>>
>>>             define void @func(i32 %n) {
>>>             entry:
>>>               %conv = zext i32 %n to i64
>>>               %cmp5 = icmp ugt i32 %n, 1
>>>               br i1 %cmp5, label %for.body, label %for.cond.cleanup
>>>             for.cond.cleanup:        ; preds = %for.body, %entry
>>>               ret void
>>>             for.body:        ; preds = %entry, %for.body
>>>               %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>               tail call void @dummy(i64 %x.06) #2
>>>               %inc = add nuw nsw i64 %x.06, 1
>>>               %exitcond = icmp eq i64 %inc, %conv
>>>               br i1 %exitcond, label %for.cond.cleanup, label
%for.body
>>>             }
>>>
>>>
>>>         Over which, SCEV will provide the following analysis:
>>>
>>>             Printing analysis 'Scalar Evolution Analysis'
for
>>>             function 'func':
>>>             Classifying expressions for: @func
>>>               %conv = zext i32 %n to i64
>>>               -->  (zext i32 %n to i64) U: [0,4294967296) S:
>>>             [0,4294967296)
>>>               %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>               --> {1,+,1}<nuw><nsw><%for.body>
U:
>>>             [1,-9223372036854775808) S:
>>>             [1,-9223372036854775808)Exits: (-1 + (zext i32 %n to
>>>             i64))LoopDispositions: { %for.body: Computable }
>>>               %inc = add nuw nsw i64 %x.06, 1
>>>               --> {2,+,1}<nuw><%for.body> U: [2,0)
S: [2,0)Exits:
>>>             (zext i32 %n to i64)LoopDispositions: { %for.body:
>>>             Computable }
>>>             Determining loop execution counts for: @func
>>>             Loop %for.body: backedge-taken count is (-2 + (zext i32
>>>             %n to i64))<nsw>
>>>             Loop %for.body: max backedge-taken count is -2
>>>             Loop %for.body: Predicated backedge-taken count is (-2
+
>>>             (zext i32 %n to i64))<nsw>
>>>              Predicates:
>>>             Loop %for.body: Trip multiple is 1
>>>
>>>
>>>         Now, I was surprised by the max backedge-taken count being
>>>         -2, and I suspect it is due to the backedge-taken count
>>>         being marked as <nsw> instead of <nuw>.
>>>
>>>         Is that on purpose, is that a bug, or is my analysis
>>>         incorrect? I am not sure where to fix that issue.
>>
>>         The backedge-taken count isn't nuw because nsw/nuw markings
>>         aren't flow-sensitive: there isn't any way to mark the
trip
>>         count as nuw without marking every computation of `(long)n-2`
>>         as nuw.
>>
>>         There's some code in ScalarEvolution::howFarToZero to try
to
>>         refine the max backedge-taken count in some cases, but it
>>         isn't very general.  See https://reviews.llvm.org/D28536 .
>>
>>         -Eli
>>
>>         -- 
>>         Employee of Qualcomm Innovation Center, Inc.
>>         Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum, a Linux Foundation Collaborative Project
>>
>>
>>
>>     -- 
>>     *Alexandre Isoard*
>
>
>     -- 
>     Employee of Qualcomm Innovation Center, Inc.
>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
>
>
>
> -- 
> *Alexandre Isoard*

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180815/1fe7f830/attachment.html>

Alexandre Isoard via llvm-dev

2018-Aug-16 23:09 UTC

head link

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Ok.

To go back to the original issue, would it be meaningful to add a
SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <efriedma at codeaurora.org>
wrote:
> On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
>
> I'm not sure I understand the poison/undef/UB distinctions.
>
> But on this example:
>
> define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
>> entry:
>>   %adds = add nsw i32 %x, %y
>>   %addu = add nuw i32 %x, %y
>>   %cond = select i1 %b, i32 %adds, i32 %addu
>>   ret i32 %cond
>> }
>
>
> It is important to not propagate the nsw/nuw between the two SCEV
> expressions (which unification would do today, can I consider that a bug or
> is it a feature?).
>
>
> It's an intentional design choice.
>
> So we work-around it by not informing SCEV of the flags:
>
> Printing analysis 'Scalar Evolution Analysis' for function
'func':
>> Classifying expressions for: @func
>>   %adds = add nsw i32 %x, %y
>>   -->  (%x + %y) U: full-set S: full-set
>>   %addu = add nuw i32 %x, %y
>>   -->  (%x + %y) U: full-set S: full-set
>>   %cond = select i1 %b, i32 %adds, i32 %addu
>>   -->  %cond U: full-set S: full-set
>> Determining loop execution counts for: @func
>
>
> Would there be problems if we properly considered nuw/nsw flags when
> unifying SCEVs?
>
>
> There would be other consequences.  For example, `(%x + %y)<nsw>` and
`(%x
> + %y)<nuw>` wouldn't compare equal for other simplifications, and
all the
> places that call setNoWrapFlags would have to be rewritten.  It's
probably
> possible to come up with some workable design, but nobody has actually
> tried it, so it's not clear how much work it would be to implement, or
> whether it would improve the generated code overall.
>
> -Eli
>
>
> On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <efriedma at
codeaurora.org>
> wrote:
>
>> On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
>>
>> Is that why we do not deduce +<nsw> from "add nsw"
either?
>>
>>
>> Essentially, yes.
>>
>> Is that an intrinsic limitation of creating a context-invariant
>> expressions from a Value* or is that a limitation of our implementation
>> (our unification not considering the nsw flags)?
>>
>>
>> It's a consequence of unification not considering nsw.  (nsw on an
>> instruction is naturally invariant because violating nsw produces
poison,
>> not UB.)
>>
>> -Eli
>>
>>
>> On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <efriedma at
codeaurora.org>
>> wrote:
>>
>>> On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
>>>
>>> Hello,
>>>
>>> If I run clang on the following code:
>>>
>>> void func(unsigned n) {
>>>>   for (unsigned long x = 1; x < n; ++x)
>>>>     dummy(x);
>>>> }
>>>
>>>
>>> I get the following llvm ir:
>>>
>>> define void @func(i32 %n) {
>>>> entry:
>>>>   %conv = zext i32 %n to i64
>>>>   %cmp5 = icmp ugt i32 %n, 1
>>>>   br i1 %cmp5, label %for.body, label %for.cond.cleanup
>>>> for.cond.cleanup:                                 ; preds =
%for.body,
>>>> %entry
>>>>   ret void
>>>> for.body:                                         ; preds =
%entry,
>>>> %for.body
>>>>   %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>>   tail call void @dummy(i64 %x.06) #2
>>>>   %inc = add nuw nsw i64 %x.06, 1
>>>>   %exitcond = icmp eq i64 %inc, %conv
>>>>   br i1 %exitcond, label %for.cond.cleanup, label %for.body
>>>> }
>>>
>>>
>>> Over which, SCEV will provide the following analysis:
>>>
>>> Printing analysis 'Scalar Evolution Analysis' for function
'func':
>>>> Classifying expressions for: @func
>>>>   %conv = zext i32 %n to i64
>>>>   -->  (zext i32 %n to i64) U: [0,4294967296) S:
[0,4294967296)
>>>>   %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
>>>>   -->  {1,+,1}<nuw><nsw><%for.body> U:
[1,-9223372036854775808) S:
>>>> [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64))
LoopDispositions:
>>>> { %for.body: Computable }
>>>>   %inc = add nuw nsw i64 %x.06, 1
>>>>   -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0)
Exits: (zext i32 %n
>>>> to i64) LoopDispositions: { %for.body: Computable }
>>>> Determining loop execution counts for: @func
>>>> Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to
i64))<nsw>
>>>> Loop %for.body: max backedge-taken count is -2
>>>> Loop %for.body: Predicated backedge-taken count is (-2 + (zext
i32 %n
>>>> to i64))<nsw>
>>>>  Predicates:
>>>> Loop %for.body: Trip multiple is 1
>>>
>>>
>>> Now, I was surprised by the max backedge-taken count being -2, and
I
>>> suspect it is due to the backedge-taken count being marked as
<nsw> instead
>>> of <nuw>.
>>>
>>> Is that on purpose, is that a bug, or is my analysis incorrect? I
am not
>>> sure where to fix that issue.
>>>
>>>
>>> The backedge-taken count isn't nuw because nsw/nuw markings
aren't
>>> flow-sensitive: there isn't any way to mark the trip count as
nuw without
>>> marking every computation of `(long)n-2` as nuw.
>>>
>>> There's some code in ScalarEvolution::howFarToZero to try to
refine the
>>> max backedge-taken count in some cases, but it isn't very
general.  See
>>> https://reviews.llvm.org/D28536 .
>>>
>>> -Eli
>>>
>>> --
>>> Employee of Qualcomm Innovation Center, Inc.
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
>>>
>>>
>>
>> --
>> *Alexandre Isoard*
>>
>>
>> --
>> Employee of Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
>>
>>
>
> --
> *Alexandre Isoard*
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180816/a14ee024/attachment-0001.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Aug 2018 - [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Seemingly Similar Threads