thr3ads.net - llvm dev - [llvm-dev] Improving SCEV's behavior around IR level no-wrap flags [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2016-Sep-23 09:09 UTC

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

Hi all,

This is about a project I've been prototyping on-and-off for a while
that has finally reached a point where I can claim it to be
"potentially viable".  I'd like to gather some input from the
community before moving too far ahead.


# The problem

There is a representation issue within SCEV that prevents it from
fully using information from nsw/nuw flags present in the IR.  This
isn't just a theoretical issue, e.g. today LLVM won't unroll this
loop:

void f(int x, long* arr) {
  for (int i = x + 1; i < x + 3; i++)
    arr[i] = 40;
}

since SCEV is unable to exploit the no-overflow on x+1 and x+3 to
prove that the loop only runs twice.

The fundamental problem here is that SCEV expressions are unique'd but
the nsw/nuw flags on SCEV expressions are not part of the key they're
unique'd by.  Instead, nsw/nuw flags on SCEV expressions are expressed
by mutating the SCEV expressions in place.

This means "add %x, 1" and "add nsw %x, 1" both map to the
_same_ SCEV
expression (that is, literally the same SCEV* object), and we can't
mutate the common SCEV expression to flag it as nsw since that will
incorrectly denote "add %x, 1" as nsw too.

In general, this means SCEV has to be very conservative about marking
SCEV expressions as no-wrap.  In some cases (e.g. the loop above),
this ends up being excessively conservative.

One path forward is to have SCEV try to prove that if a certain
operation produces poison, the program definitely has undefined
behavior.  This can let us mutate the corresponding SCEV objects to
pull the "nsw"-ness into SCEV.  For instance, if we have

  %x = load ...
  %t = add i32 nsw %x, 1
  %addr = gep(%array, %t)
  store i32 0, %addr
  %t2 = add i32 %x, 1

then transferring NSW to getSCEV(%t) is okay, since even though %t2
(which will be mapped to the same SCEV expression as %t) does not have
"nsw" on the instruction, we know adding 1 to %x cannot overflow since
the program would have UB otherwise.

Bjarke Roune has implemented some of this. However, this is difficult
to do for cases like the x+1 .. x+3 loop above without running a
control flow analysis over the entire function.  And this approach
does not work in the presence of function calls or general control
flow, like

  %x = load ...
  %t = add i32 nsw %x, 1
  call void @f()
  %addr = gep(%array, %t)
  store i32 0, %addr

or

  %x = load ...
  %t = add i32 nsw %x, 1
  if (<condition>)
    return;
  %addr = gep(%array, %t)
  store i32 0, %addr

since unless the side-effecting use of %t (the store) "strongly"[1]
post dominates the def of %x, there is no guaranteed undefined
behavior on a poisonous %t.  Things are even more complex if %x is not
a load, but an expression SCEV an look through, like an add or a shift
by a constant.

*I think the current representation of nsw/nuw in SCEV expressions is
not congruent with LLVM's specification of poison values, and that is
blocking us from exploiting poison values as intended by LLVM's
design.*



# The proposed solution

Since poison values are, well, _values_, I propose we model them as
data within SCEV.  We treat nsw/nuw flags as "operands" since they
contribute to the result of an SCEV expression just like normal inputs
to the expression.

This means we'd treat "add %x, %y" as a different SCEV expression
than
"add nsw %x, %y", since the latter sometimes produces poison while the
former doesn't.  The latter would be known to not overflow, and SCEV
would use that fact in the usual ways.

With this change SCEV expressions will be pointer equal less often,
and while relying on pointer equality for value equality will be
correct, it will be somewhat pessimistic; and we'll have to implement
and use some form of structural equality.

In other words, some places that do

  SCEV *X = ...
  SCEV *Y = ...
  if (X == Y)
    ...

will have to be changed to do

  SCEV *X = ...
  SCEV *Y = ...
  if (X->equals(Y))
    ...

This has potential for compile-time regressions.  Hopefully they'll
all be addressable.

There are cases in which SCEV (via trip count analysis, say) can
_prove_ that a certain expression does not overflow.  In those cases
we will mutate the SCEV expression to indicate no-wrap; since the
no-wrap flag is just a "cache" of a proof based on the structure of
the SCEV expression, and _does_ apply to all SCEV expressions with the
same shapes.

Concretely, we'll endow relevant SCEV expression types with two sets
distinct of flags:

 - AxiomaticFlags: These flags follow from nsw/nuw annotations in the
   IR.  These will be part of the key the SCEV expression is unique'd
   on.
 - ComputedFlags: These flags are derived from the structure of the
   SCEV expression, and they're *not* a part of the key the SCEV
   expression is unique'd on.

For the purposes of consumption, there will be no difference between
AxiomaticFlags and ComputedFlags.  Consumers will get a union of the
two when they ask for the set of flags that apply to a specific SCEV
expression.

ComputedFlags will, in general, depend on AxiomaticFlags.  For
instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we can add
"nuw" to its ComputedFlags.  There is no need to further distinguish
"{1,+,1}-axiomatic<nsw>" on the computed<nuw> dimension
since
"{1,+,1}-axiomatic<nsw>" will always be computed<nuw>.




What do you think?  Does the overall picture here make sense?

Alternate solutions are also more than welcome (especially if they're
easier to implement!).

Thanks,
-- Sanjoy

[1]: That is, it the store is guaranteed to execute once the load has
  been issued.

Andrew Trick via llvm-dev

2016-Sep-23 10:08 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

> On Sep 23, 2016, at 2:09 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi all,
> 
> This is about a project I've been prototyping on-and-off for a while
> that has finally reached a point where I can claim it to be
> "potentially viable".  I'd like to gather some input from the
> community before moving too far ahead.
> 
> 
> # The problem
> 
> There is a representation issue within SCEV that prevents it from
> fully using information from nsw/nuw flags present in the IR.  This
> isn't just a theoretical issue, e.g. today LLVM won't unroll this
> loop:
> 
> void f(int x, long* arr) {
>  for (int i = x + 1; i < x + 3; i++)
>    arr[i] = 40;
> }
> 
> since SCEV is unable to exploit the no-overflow on x+1 and x+3 to
> prove that the loop only runs twice.
> 
> The fundamental problem here is that SCEV expressions are unique'd but
> the nsw/nuw flags on SCEV expressions are not part of the key they're
> unique'd by.  Instead, nsw/nuw flags on SCEV expressions are expressed
> by mutating the SCEV expressions in place.
> 
> This means "add %x, 1" and "add nsw %x, 1" both map to
the _same_ SCEV
> expression (that is, literally the same SCEV* object), and we can't
> mutate the common SCEV expression to flag it as nsw since that will
> incorrectly denote "add %x, 1" as nsw too.
> 
> In general, this means SCEV has to be very conservative about marking
> SCEV expressions as no-wrap.  In some cases (e.g. the loop above),
> this ends up being excessively conservative.
> 
> One path forward is to have SCEV try to prove that if a certain
> operation produces poison, the program definitely has undefined
> behavior.  This can let us mutate the corresponding SCEV objects to
> pull the "nsw"-ness into SCEV.  For instance, if we have
> 
>  %x = load ...
>  %t = add i32 nsw %x, 1
>  %addr = gep(%array, %t)
>  store i32 0, %addr
>  %t2 = add i32 %x, 1
> 
> then transferring NSW to getSCEV(%t) is okay, since even though %t2
> (which will be mapped to the same SCEV expression as %t) does not have
> "nsw" on the instruction, we know adding 1 to %x cannot overflow
since
> the program would have UB otherwise.
> 
> Bjarke Roune has implemented some of this. However, this is difficult
> to do for cases like the x+1 .. x+3 loop above without running a
> control flow analysis over the entire function.  And this approach
> does not work in the presence of function calls or general control
> flow, like
> 
>  %x = load ...
>  %t = add i32 nsw %x, 1
>  call void @f()
>  %addr = gep(%array, %t)
>  store i32 0, %addr
> 
> or
> 
>  %x = load ...
>  %t = add i32 nsw %x, 1
>  if (<condition>)
>    return;
>  %addr = gep(%array, %t)
>  store i32 0, %addr
> 
> since unless the side-effecting use of %t (the store)
"strongly"[1]
> post dominates the def of %x, there is no guaranteed undefined
> behavior on a poisonous %t.  Things are even more complex if %x is not
> a load, but an expression SCEV an look through, like an add or a shift
> by a constant.
Great explanation.
> *I think the current representation of nsw/nuw in SCEV expressions is
> not congruent with LLVM's specification of poison values, and that is
> blocking us from exploiting poison values as intended by LLVM's
> design.*
I’m not 100% sure what you mean, but since SCEV expressions don’t represent IR
values, mapping poison semantics, which are specific to IR values, onto SCEV
expressions never made sense to me.
> # The proposed solution
> 
> Since poison values are, well, _values_, I propose we model them as
> data within SCEV.  We treat nsw/nuw flags as "operands" since
they
> contribute to the result of an SCEV expression just like normal inputs
> to the expression.
> 
> This means we'd treat "add %x, %y" as a different SCEV
expression than
> "add nsw %x, %y", since the latter sometimes produces poison
while the
> former doesn't.  The latter would be known to not overflow, and SCEV
> would use that fact in the usual ways.
Ok. What happens when you canonicalize the SCEV expressions? Reassociation is
just one of the ways the nsw flags will be lost.
> With this change SCEV expressions will be pointer equal less often,
> and while relying on pointer equality for value equality will be
> correct, it will be somewhat pessimistic; and we'll have to implement
> and use some form of structural equality.
> 
> In other words, some places that do
> 
>  SCEV *X = ...
>  SCEV *Y = ...
>  if (X == Y)
>    ...
> 
> will have to be changed to do
> 
>  SCEV *X = ...
>  SCEV *Y = ...
>  if (X->equals(Y))
>    ...
> 
> This has potential for compile-time regressions.  Hopefully they'll
> all be addressable.
That’s the main reason we never did this. It wasn’t clear to me what impact it
might have on reducing SCEV expressions in general. Are you saying that this
will not affect the outcome of SCEV reduction/canonicalization as long as we pay
the cost of slow comparisons? What about nested expressions? Won’t this require
scanning the whole SCEV tree for each comparison?
> There are cases in which SCEV (via trip count analysis, say) can
> _prove_ that a certain expression does not overflow.  In those cases
> we will mutate the SCEV expression to indicate no-wrap; since the
> no-wrap flag is just a "cache" of a proof based on the structure
of
> the SCEV expression, and _does_ apply to all SCEV expressions with the
> same shapes.
This is getting confusing. I think you’re saying that add %x, %y, and add nsw
%x, %y should be different SCEV opcodes. In addition, you will keep the nsw
flags independent of the opcode and only as a cache of the analysis. So add nsw
%x, %y will always have the flag set, and add %x, %y may or may not have the
flag.

Let’s remember that SCEV expressions are immutable and, currently, nsw/nuw
really is just a cache of if side info hanging off the SCEV. So that doesn’t
really change.

SCEV should really be a stable analysis. You should get the same expressions for
two values regardless of which you compute first. Today that’s not the case, and
it’s crazy. I don’t think you’re solving that problem.
> Concretely, we'll endow relevant SCEV expression types with two sets
> distinct of flags:
> 
> - AxiomaticFlags: These flags follow from nsw/nuw annotations in the
>   IR.  These will be part of the key the SCEV expression is unique'd
>   on.
> - ComputedFlags: These flags are derived from the structure of the
>   SCEV expression, and they're *not* a part of the key the SCEV
>   expression is unique'd on.
> 
> For the purposes of consumption, there will be no difference between
> AxiomaticFlags and ComputedFlags.  Consumers will get a union of the
> two when they ask for the set of flags that apply to a specific SCEV
> expression.
> 
> ComputedFlags will, in general, depend on AxiomaticFlags.  For
> instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we can add
> "nuw" to its ComputedFlags.  There is no need to further
distinguish
> "{1,+,1}-axiomatic<nsw>" on the computed<nuw>
dimension since
> "{1,+,1}-axiomatic<nsw>" will always be
computed<nuw>.
Great. That answers my question above.
> What do you think?  Does the overall picture here make sense?
Makes sense except for the issues I raised above.
> Alternate solutions are also more than welcome (especially if they're
> easier to implement!).
The alternative is to represent proofs about IR values independent from SCEV. I
don’t know how much net value we get out of embedding these facts within SCEV
(aside from the convenience given the current implementation). SCEV is about
canonicalizing and algebraic reduction so that simple relations between IR
values can be discovered. Of course, computing trip count needs this
information, but is could always look elsewhere for those facts. The question is
what SCEV reductions themselves need the nsw/nuw info, vs. simply propagating
the flags. Can those reductions in turn be pulled out of as some high level
analysis? I think sign-extend elimination would need some serious thought. But
the current situation with reducing truncations/extensions in SCEV is not good
anyway. It defeats SCEVs canonical form—you can get different answers depending
on the order of reductions.

-Andy
> Thanks,
> -- Sanjoy
> 
> [1]: That is, it the store is guaranteed to execute once the load has
>  been issued.

Sebastian Pop via llvm-dev

2016-Sep-23 14:28 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

On Fri, Sep 23, 2016 at 6:08 AM, Andrew Trick <atrick at apple.com>
wrote:> The alternative is to represent proofs about IR values independent from
SCEV.
I like this for two reasons:
- it simplifies the SCEV analysis and representation by moving other
analyses results on the side,
- and it will make the analyses more context sensitive: proofs
attached to an IR location, and not to the SCEV expressions.

Christof Douma via llvm-dev

2016-Sep-23 14:50 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

On 23 Sep 2016, at 11:08, Andrew Trick via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> On Sep 23, 2016, at 2:09 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
>> 
>> Hi all,
>> 
>> This is about a project I've been prototyping on-and-off for a
while
>> that has finally reached a point where I can claim it to be
>> "potentially viable".  I'd like to gather some input from
the
>> community before moving too far ahead.
>> 
>> 
>> # The problem
>> 
>> There is a representation issue within SCEV that prevents it from
>> fully using information from nsw/nuw flags present in the IR.  This
>> isn't just a theoretical issue, e.g. today LLVM won't unroll
this
>> loop:
>> 
>> void f(int x, long* arr) {
>> for (int i = x + 1; i < x + 3; i++)
>>   arr[i] = 40;
>> }
>> 
>> since SCEV is unable to exploit the no-overflow on x+1 and x+3 to
>> prove that the loop only runs twice.
>> 
>> The fundamental problem here is that SCEV expressions are unique'd
but
>> the nsw/nuw flags on SCEV expressions are not part of the key
they're
>> unique'd by.  Instead, nsw/nuw flags on SCEV expressions are
expressed
>> by mutating the SCEV expressions in place.
>> 
>> This means "add %x, 1" and "add nsw %x, 1" both map
to the _same_ SCEV
>> expression (that is, literally the same SCEV* object), and we can't
>> mutate the common SCEV expression to flag it as nsw since that will
>> incorrectly denote "add %x, 1" as nsw too.
>> 
>> In general, this means SCEV has to be very conservative about marking
>> SCEV expressions as no-wrap.  In some cases (e.g. the loop above),
>> this ends up being excessively conservative.
>> 
>> One path forward is to have SCEV try to prove that if a certain
>> operation produces poison, the program definitely has undefined
>> behavior.  This can let us mutate the corresponding SCEV objects to
>> pull the "nsw"-ness into SCEV.  For instance, if we have
>> 
>> %x = load ...
>> %t = add i32 nsw %x, 1
>> %addr = gep(%array, %t)
>> store i32 0, %addr
>> %t2 = add i32 %x, 1
>> 
>> then transferring NSW to getSCEV(%t) is okay, since even though %t2
>> (which will be mapped to the same SCEV expression as %t) does not have
>> "nsw" on the instruction, we know adding 1 to %x cannot
overflow since
>> the program would have UB otherwise.
>> 
>> Bjarke Roune has implemented some of this. However, this is difficult
>> to do for cases like the x+1 .. x+3 loop above without running a
>> control flow analysis over the entire function.  And this approach
>> does not work in the presence of function calls or general control
>> flow, like
>> 
>> %x = load ...
>> %t = add i32 nsw %x, 1
>> call void @f()
>> %addr = gep(%array, %t)
>> store i32 0, %addr
>> 
>> or
>> 
>> %x = load ...
>> %t = add i32 nsw %x, 1
>> if (<condition>)
>>   return;
>> %addr = gep(%array, %t)
>> store i32 0, %addr
>> 
>> since unless the side-effecting use of %t (the store)
"strongly"[1]
>> post dominates the def of %x, there is no guaranteed undefined
>> behavior on a poisonous %t.  Things are even more complex if %x is not
>> a load, but an expression SCEV an look through, like an add or a shift
>> by a constant.
> 
> Great explanation.
> 
>> *I think the current representation of nsw/nuw in SCEV expressions is
>> not congruent with LLVM's specification of poison values, and that
is
>> blocking us from exploiting poison values as intended by LLVM's
>> design.*
> 
> I’m not 100% sure what you mean, but since SCEV expressions don’t represent
IR values, mapping poison semantics, which are specific to IR values, onto SCEV
expressions never made sense to me.
As far as I understand this, the reasoning is that in SCEV the flags must be
interpreted as UB/not-allowed to happen because of SCEV expressions can
represent instructions in different contexts (e.g. in different execution
paths). In the IR these flags mean poison. This is a semantical mismatch that
gives a big proof burden on SCEV.

To give an example (test/Analysis/ScalarEvolution/nsw.ll - addnsw):

01: entry:
02:  %tmp = add i32 %a, %b
03:  %cmp = icmp sgt i32 %tmp, 0
04:  br i1 %cmp, label %greater, label %exit
05: greater:
06:   %tmp2 = add nsw i32 %a, %b
07:  br label %exit
08: exit:
09:  %result = phi i32 [ %a, %entry ], [ %tmp2, %greater ]
10:  ret i32 %result

In the following example line 2 and 6 are mapped to a single SCEV expressions:
"(%a + %b)”. The SCEV version of ‘NSW' does not hold because at line 2
the instruction might actually wrap around.

Up till now I had not realised that sharing of SCEV expressions was the only
reason for SCEV to have this different semantics.
> 
>> # The proposed solution
>> 
>> Since poison values are, well, _values_, I propose we model them as
>> data within SCEV.  We treat nsw/nuw flags as "operands" since
they
>> contribute to the result of an SCEV expression just like normal inputs
>> to the expression.
>> 
>> This means we'd treat "add %x, %y" as a different SCEV
expression than
>> "add nsw %x, %y", since the latter sometimes produces poison
while the
>> former doesn't.  The latter would be known to not overflow, and
SCEV
>> would use that fact in the usual ways.
> 
> Ok. What happens when you canonicalize the SCEV expressions? Reassociation
is just one of the ways the nsw flags will be lost.
> 
>> With this change SCEV expressions will be pointer equal less often,
>> and while relying on pointer equality for value equality will be
>> correct, it will be somewhat pessimistic; and we'll have to
implement
>> and use some form of structural equality.
>> 
>> In other words, some places that do
>> 
>> SCEV *X = ...
>> SCEV *Y = ...
>> if (X == Y)
>>   ...
>> 
>> will have to be changed to do
>> 
>> SCEV *X = ...
>> SCEV *Y = ...
>> if (X->equals(Y))
>>   ...
>> 
>> This has potential for compile-time regressions.  Hopefully they'll
>> all be addressable.
> 
> That’s the main reason we never did this. It wasn’t clear to me what impact
it might have on reducing SCEV expressions in general. Are you saying that this
will not affect the outcome of SCEV reduction/canonicalization as long as we pay
the cost of slow comparisons? What about nested expressions? Won’t this require
scanning the whole SCEV tree for each comparison?
There is a chance that due to the folding, the nsw and non-nsw nodes will look
quite different. Currently SCEV does represent all these nodes identically. Part
of this might be recovered by moving the current code in scalar evolution that
performs the strong post-dominator analysis to a new pass that propagate nsw
flags in the IR, but that will likely be only a small part now that SCEV is
going to add NSW to many more expressions (which is the whole point).
>> There are cases in which SCEV (via trip count analysis, say) can
>> _prove_ that a certain expression does not overflow.  In those cases
>> we will mutate the SCEV expression to indicate no-wrap; since the
>> no-wrap flag is just a "cache" of a proof based on the
structure of
>> the SCEV expression, and _does_ apply to all SCEV expressions with the
>> same shapes.
> 
> This is getting confusing. I think you’re saying that add %x, %y, and add
nsw %x, %y should be different SCEV opcodes. In addition, you will keep the nsw
flags independent of the opcode and only as a cache of the analysis. So add nsw
%x, %y will always have the flag set, and add %x, %y may or may not have the
flag.
> 
> Let’s remember that SCEV expressions are immutable and, currently, nsw/nuw
really is just a cache of if side info hanging off the SCEV. So that doesn’t
really change.
> 
> SCEV should really be a stable analysis. You should get the same
expressions for two values regardless of which you compute first. Today that’s
not the case, and it’s crazy. I don’t think you’re solving that problem.

I hope that I understood Sanjoy its attempt correctly. I belief that a non-wrap
and a wrap instruction result into two different SCEV expressions.
> 
>> Concretely, we'll endow relevant SCEV expression types with two
sets
>> distinct of flags:
>> 
>> - AxiomaticFlags: These flags follow from nsw/nuw annotations in the
>>  IR.  These will be part of the key the SCEV expression is unique'd
>>  on.
>> - ComputedFlags: These flags are derived from the structure of the
>>  SCEV expression, and they're *not* a part of the key the SCEV
>>  expression is unique'd on.
>> 
>> For the purposes of consumption, there will be no difference between
>> AxiomaticFlags and ComputedFlags.  Consumers will get a union of the
>> two when they ask for the set of flags that apply to a specific SCEV
>> expression.
>> 
>> ComputedFlags will, in general, depend on AxiomaticFlags.  For
>> instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we can
add
>> "nuw" to its ComputedFlags.  There is no need to further
distinguish
>> "{1,+,1}-axiomatic<nsw>" on the computed<nuw>
dimension since
>> "{1,+,1}-axiomatic<nsw>" will always be
computed<nuw>.
> 
> Great. That answers my question above.
> 
>> What do you think?  Does the overall picture here make sense?
> 
> Makes sense except for the issues I raised above.
It is definitely something that improves SCEV its capabilities. And I love to
see SCEV get the iteration count on that loop correct so it can be unrolled. I
just hope the issue Andrew raised is not too bad.
> 
>> Alternate solutions are also more than welcome (especially if
they're
>> easier to implement!).
> 
> The alternative is to represent proofs about IR values independent from
SCEV. I don’t know how much net value we get out of embedding these facts within
SCEV (aside from the convenience given the current implementation). SCEV is
about canonicalizing and algebraic reduction so that simple relations between IR
values can be discovered. Of course, computing trip count needs this
information, but is could always look elsewhere for those facts. The question is
what SCEV reductions themselves need the nsw/nuw info, vs. simply propagating
the flags. Can those reductions in turn be pulled out of as some high level
analysis? I think sign-extend elimination would need some serious thought. But
the current situation with reducing truncations/extensions in SCEV is not good
anyway. It defeats SCEVs canonical form—you can get different answers depending
on the order of reductions.
> 
> -Andy
I can’t help to ask. Why not define a wrapping nsw instruction as UB, instead of
“delayed UB” aka poison? I believe we have the notion of poison solely to ease
the movement of instructions. In my example above line 6 can be moved to line 2
due to poison. If the wrapping was UB such a move is still possible but would
require dropping the nsw flag, nothing more. If this information loss is
considered a problem, the use of llvm.assume can counter that.

Has somebody ever looked at not having poison at all? What are the downsides of
that?

> 
>> Thanks,
>> -- Sanjoy
>> 
>> [1]: That is, it the store is guaranteed to execute once the load has
>> been issued.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hal Finkel via llvm-dev

2016-Sep-23 17:20 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew
Trick" <atrick at apple.com>, "Dan Gohman" <dan433584 at
gmail.com>, "Hal
> Finkel" <hfinkel at anl.gov>, "Chandler Carruth"
<chandlerc at gmail.com>, "David Majnemer" <david.majnemer at
gmail.com>,
> "Sebastian Pop" <sebpop at gmail.com>
> Sent: Friday, September 23, 2016 4:09:19 AM
> Subject: Improving SCEV's behavior around IR level no-wrap flags
> 
> Hi all,
> 
> This is about a project I've been prototyping on-and-off for a while
> that has finally reached a point where I can claim it to be
> "potentially viable".  I'd like to gather some input from the
> community before moving too far ahead.
> 
> 
> # The problem
> 
> There is a representation issue within SCEV that prevents it from
> fully using information from nsw/nuw flags present in the IR.  This
> isn't just a theoretical issue, e.g. today LLVM won't unroll this
> loop:
> 
> void f(int x, long* arr) {
>   for (int i = x + 1; i < x + 3; i++)
>     arr[i] = 40;
> }
> 
> since SCEV is unable to exploit the no-overflow on x+1 and x+3 to
> prove that the loop only runs twice.
> 
> The fundamental problem here is that SCEV expressions are unique'd
> but
> the nsw/nuw flags on SCEV expressions are not part of the key they're
> unique'd by.  Instead, nsw/nuw flags on SCEV expressions are
> expressed
> by mutating the SCEV expressions in place.
My understanding is that there were two problems; this is one of them. The
second problem is that the SCEV expressions were meant to be context insensitive
so that the expressions could be compared, and expanded, without worrying about
potential control dependencies on the wrapping flags.

An attempt at an example...

for (int i = 0; i < n; ++i)
for (int j = 0; j < m + q; ++j) {
  array[j][i] = i + j;
}

The fact that (m+q) does not wrap may have a control dependency on n > 0. If
we want, for example, to perform loop interchange, we need to drop the nsw on
the (m+q) when we materialize the trip-count expression outside of the
input's outer loop. Maybe we can just handle this issue in the expander, but
I suspect we need to be careful in other contexts as well, for example, if we
have:

for (int i = 0; i < n + q; ++i)
for (int j = 0; j < m + q; ++j) {
  array[j][i] = i + j;
}

and I want to construct a comparison between (n+q) < (m+q); I need to be
careful about just using the nsw on (n+q) and (m+q) to simplify this to be n
< m because the nsw on (m+q) is really only dependable if n + q > 0.

 -Hal
> 
> This means "add %x, 1" and "add nsw %x, 1" both map to
the _same_
> SCEV
> expression (that is, literally the same SCEV* object), and we can't
> mutate the common SCEV expression to flag it as nsw since that will
> incorrectly denote "add %x, 1" as nsw too.
> 
> In general, this means SCEV has to be very conservative about marking
> SCEV expressions as no-wrap.  In some cases (e.g. the loop above),
> this ends up being excessively conservative.
> 
> One path forward is to have SCEV try to prove that if a certain
> operation produces poison, the program definitely has undefined
> behavior.  This can let us mutate the corresponding SCEV objects to
> pull the "nsw"-ness into SCEV.  For instance, if we have
> 
>   %x = load ...
>   %t = add i32 nsw %x, 1
>   %addr = gep(%array, %t)
>   store i32 0, %addr
>   %t2 = add i32 %x, 1
> 
> then transferring NSW to getSCEV(%t) is okay, since even though %t2
> (which will be mapped to the same SCEV expression as %t) does not
> have
> "nsw" on the instruction, we know adding 1 to %x cannot overflow
> since
> the program would have UB otherwise.
> 
> Bjarke Roune has implemented some of this. However, this is difficult
> to do for cases like the x+1 .. x+3 loop above without running a
> control flow analysis over the entire function.  And this approach
> does not work in the presence of function calls or general control
> flow, like
> 
>   %x = load ...
>   %t = add i32 nsw %x, 1
>   call void @f()
>   %addr = gep(%array, %t)
>   store i32 0, %addr
> 
> or
> 
>   %x = load ...
>   %t = add i32 nsw %x, 1
>   if (<condition>)
>     return;
>   %addr = gep(%array, %t)
>   store i32 0, %addr
> 
> since unless the side-effecting use of %t (the store)
"strongly"[1]
> post dominates the def of %x, there is no guaranteed undefined
> behavior on a poisonous %t.  Things are even more complex if %x is
> not
> a load, but an expression SCEV an look through, like an add or a
> shift
> by a constant.
> 
> *I think the current representation of nsw/nuw in SCEV expressions is
> not congruent with LLVM's specification of poison values, and that is
> blocking us from exploiting poison values as intended by LLVM's
> design.*
> 
> 
> 
> # The proposed solution
> 
> Since poison values are, well, _values_, I propose we model them as
> data within SCEV.  We treat nsw/nuw flags as "operands" since
they
> contribute to the result of an SCEV expression just like normal
> inputs
> to the expression.
> 
> This means we'd treat "add %x, %y" as a different SCEV
expression
> than
> "add nsw %x, %y", since the latter sometimes produces poison
while
> the
> former doesn't.  The latter would be known to not overflow, and SCEV
> would use that fact in the usual ways.
> 
> With this change SCEV expressions will be pointer equal less often,
> and while relying on pointer equality for value equality will be
> correct, it will be somewhat pessimistic; and we'll have to implement
> and use some form of structural equality.
> 
> In other words, some places that do
> 
>   SCEV *X = ...
>   SCEV *Y = ...
>   if (X == Y)
>     ...
> 
> will have to be changed to do
> 
>   SCEV *X = ...
>   SCEV *Y = ...
>   if (X->equals(Y))
>     ...
> 
> This has potential for compile-time regressions.  Hopefully they'll
> all be addressable.
> 
> There are cases in which SCEV (via trip count analysis, say) can
> _prove_ that a certain expression does not overflow.  In those cases
> we will mutate the SCEV expression to indicate no-wrap; since the
> no-wrap flag is just a "cache" of a proof based on the structure
of
> the SCEV expression, and _does_ apply to all SCEV expressions with
> the
> same shapes.
> 
> Concretely, we'll endow relevant SCEV expression types with two sets
> distinct of flags:
> 
>  - AxiomaticFlags: These flags follow from nsw/nuw annotations in the
>    IR.  These will be part of the key the SCEV expression is unique'd
>    on.
>  - ComputedFlags: These flags are derived from the structure of the
>    SCEV expression, and they're *not* a part of the key the SCEV
>    expression is unique'd on.
> 
> For the purposes of consumption, there will be no difference between
> AxiomaticFlags and ComputedFlags.  Consumers will get a union of the
> two when they ask for the set of flags that apply to a specific SCEV
> expression.
> 
> ComputedFlags will, in general, depend on AxiomaticFlags.  For
> instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we can add
> "nuw" to its ComputedFlags.  There is no need to further
distinguish
> "{1,+,1}-axiomatic<nsw>" on the computed<nuw>
dimension since
> "{1,+,1}-axiomatic<nsw>" will always be
computed<nuw>.
> 
> 
> 
> 
> What do you think?  Does the overall picture here make sense?
> 
> Alternate solutions are also more than welcome (especially if they're
> easier to implement!).
> 
> Thanks,
> -- Sanjoy
> 
> [1]: That is, it the store is guaranteed to execute once the load has
>   been issued.
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Sanjoy Das via llvm-dev

2016-Sep-24 02:25 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

Hi Andy,

Andrew Trick wrote:

[snip]

 >> *I think the current representation of nsw/nuw in SCEV expressions is
 >> not congruent with LLVM's specification of poison values, and that
is
 >> blocking us from exploiting poison values as intended by LLVM's
 >> design.*
 >
 > I’m not 100% sure what you mean, but since SCEV expressions don’t
 > represent IR values, mapping poison semantics, which are specific to
 > IR values, onto SCEV expressions never made sense to me.

I agree that making SCEV a pure 2's complement is cleaner (and
something I find preferable personally, tbqh); but in practice users
tend to prefer SCEV do at least some reasoning based on no-wrap flags
present in the IR.  If we decide SCEV is not the right tool for the
job, we at least need to discuss what is the right tool.

[snip]

 >> This means we'd treat "add %x, %y" as a different SCEV
expression than
 >> "add nsw %x, %y", since the latter sometimes produces poison
while the
 >> former doesn't.  The latter would be known to not overflow, and
SCEV
 >> would use that fact in the usual ways.
 >
 > Ok. What happens when you canonicalize the SCEV expressions?
 > Reassociation is just one of the ways the nsw flags will be lost.

We'll just have to make the no-wrap-ness part of the algebra, like
everything else.  So

((a +nsw b) +nsw c) ==> (+nsw a b c)

((a +nsw b) + c) ==> (+ a b c)

and so on.

I suspect a very conservative approach of dropping the no-wrap flags
for anything beyond the simplest cases will be better than what we
have today.

[snip]

 >> will have to be changed to do
 >>
 >>   SCEV *X = ...
 >>   SCEV *Y = ...
 >>   if (X->equals(Y))
 >>     ...
 >>
 >> This has potential for compile-time regressions.  Hopefully
they'll
 >> all be addressable.
 >
 > That’s the main reason we never did this. It wasn’t clear to me what
 > impact it might have on reducing SCEV expressions in general. Are you
 > saying that this will not affect the outcome of SCEV
 > reduction/canonicalization as long as we pay the cost of slow
 > comparisons? What about nested expressions? Won’t this require
 > scanning the whole SCEV tree for each comparison?

Yes, in some cases getting satisfactory results will require scanning
and matching whole expression trees.  There are things we can do to
make that hurt less (e.g. maintain a hash with every expression that
ignores the no-wrap flags) or try to recover the difference elsewhere,
but I will admit that this is the weakest part of the proposal.

[snip]

 >> ComputedFlags will, in general, depend on AxiomaticFlags.  For
 >> instance if AxiomaticFlags is "nsw" for, say, {1,+,1}, we
can add
 >> "nuw" to its ComputedFlags.  There is no need to further
distinguish
 >> "{1,+,1}-axiomatic<nsw>" on the computed<nuw> 
dimension since
 >> "{1,+,1}-axiomatic<nsw>" will always be
computed<nuw>.
 >
 > Great. That answers my question above.

Okay, I didn't bother re-answering then. :)

[snip]

 > The alternative is to represent proofs about IR values independent
 > from SCEV. I don’t know how much net value we get out of embedding
 > these facts within SCEV (aside from the convenience given the current
 > implementation). SCEV is about canonicalizing and algebraic reduction
 > so that simple relations between IR values can be discovered. Of
 > course, computing trip count needs this information, but is could
 > always look elsewhere for those facts. The question is what SCEV
 > reductions themselves need the nsw/nuw info, vs. simply propagating
 > the flags. Can those reductions in turn be pulled out of as some high
 > level analysis? I think sign-extend elimination would need some
 > serious thought. But the current situation with reducing
 > truncations/extensions in SCEV is not good anyway. It defeats SCEVs
 > canonical form—you can get different answers depending on the order of
 > reductions.

I may have misunderstood your proposal, but I think by the point we
have a SCEV expression it is already too late to do a precise
poison-value analysis.

For instance, say we have:

   %t0 = add     %x, 1
   %t1 = add nsw %x, 1
   ...
   br i1 %cond, label %exit_loop, label %take_backedge

Now lets say SCEV says the trip count of the loop is "(smax (x + 1)
x)".  Given only that information, I don't have a way of knowing if
the (x + 1) in the trip count is %t0 (which can be assumed to
overflow) or %t1 (which is no-wrap).  It could also have come from
some completely different reduction altogether (say "(x + y) - (y -
1)"), and unless we have the provenance of all the terms it will be
difficult to exploit language specific no-wrap behavior.

Did that make sense?

-- Sanjoy

Sanjoy Das via llvm-dev

2016-Sep-24 05:27 UTC

head link

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

Hi Hal,

Hal Finkel wrote:
 > My understanding is that there were two problems; this is one of
 > them. The second problem is that the SCEV expressions were meant to be
 > context insensitive so that the expressions could be compared, and
 > expanded, without worrying about potential control dependencies on the
 > wrapping flags.

These things are somewhat confusing to precisely reason about because
poison only has informal semantics today, but I think these problems
are related and the new proposal addresses the issue you raised.

 > An attempt at an example...
 >
 > for (int i = 0; i<  n; ++i)
 >   for (int j = 0; j<  m + q; ++j) {
 >     array[j][i] = i + j;
 >   }

Explicitly, your example is

for (int i = 0; i < n; ++i) {
   int tmp = m +nsw q;
   for (int j = 0; j < tmp; ++j) {
     array[j][i] = i + j;
   }
}

(Just to make things 100% clear, in the new scheme getSCEV(tmp) will
be nsw, but getAddExpr(getSCEV(m), getSCEV(q)) won't be.)

Firstly, I don't see any problems with materializing "add nsw m,
q"
outside the loop per-se, that's not any different from LICM'ing out
the nsw addition from the loop.  However, we're in trouble if the
materialized expression (outside the loop, via SCEVExpnader) generates
poison in a way that causes UB when there wouldn't have been any
before.

Assuming SCEVExpander is used in "reasonable" ways, I don't think
we
will introduce undefined behavior.  For instance, rewriting loop exit
values:

   int last = 0;
   for (int i = 0; i < n; ++i) {
     int tmp = m +nsw q;
     for (int j = 0; j < tmp; ++j) {
       last = j;
     }
   }
   print(last);

==>

   int last = 0 < n ? (m +nsw q) : 0;
   print(last);

is fine with a sign-overflowing (m +nsw q), since we're using poison
in a side effect only if 0 < n, in which case the original program
would have had undefined behavior also.

The same is true for loop-interchange type optimizations:

   for (int i = 0; i < n; ++i) {
     int tmp = m +nsw q;
     for (int j = 0; j < tmp; ++j) {
       array[j][i] = i + j;
     }
   }

==>

   int tmp = m +nsw q;
   for (int j = 0; j < tmp; ++j) {
     for (int i = 0; i < n; ++i) {
       array[j][i] = i + j;
     }
   }

seems fine on a sign-overflowing m +nsw q since in both the original
and final program, there is a side effect dependent on poison iff
"0 < n".

 > The fact that (m+q) does not wrap may have a control dependency on
 > n> 0. If we want, for example, to perform loop interchange, we need to
 > drop the nsw on the (m+q) when we materialize the trip-count
 > expression outside of the input's outer loop. Maybe we can just handle
 > this issue in the expander, but I suspect we need to be careful in
 > other contexts as well, for example, if we have:
 >
 > for (int i = 0; i<  n + q; ++i)
 >   for (int j = 0; j<  m + q; ++j) {
 >     array[j][i] = i + j;
 >   }
 >
 > and I want to construct a comparison between (n+q)< (m+q); I need to
 > be careful about just using the nsw on (n+q) and (m+q) to simplify
 > this to be n< m because the nsw on (m+q) is really only dependable if
 > n + q> 0.

Written explicitly, your example looks like (with a minor addition):

   int tmp0 = n +nsw q;
   for (int i = 0; i < tmp0; ++i) {
     int tmp1 = m +nsw q;
     for (int j = 0; j < tmp1; ++j) {
       array[j][i] = i + j;
     }
     array2[int(tmp0 < tmp1)] = 40;
   }

Now simplifying this to

   int tmp0 = n +nsw q;
   for (int i = 0; i < tmp0; ++i) {
     int tmp1 = m +nsw q;
     for (int j = 0; j < tmp1; ++j) {
       array[j][i] = i + j;
     }
     array2[int(n < m)] = 40;
   }

is fine, since replacing tmp0 < tmp1 with n < m changes observable
behavior only if 0 < tmp0 and either tmp0 or tmp1 sign-overflowed; and
in those cases the original program was undefined.

However, if you instead do:

   isKnownPredicate(ICMP_SLT,
                    getAdd(getSCEV(n), getSCEV(q)),
                    getAdd(getSCEV(m), getSCEV(q)))

then you'll get what you said -- (n + q) s< (m + q), not (n +nsw q) s<
(m +nsw q).

-- Sanjoy

llvm dev - Sep 2016 - Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags