Maxim Kazantsev via llvm-dev
2018-Jan-25  06:03 UTC
[llvm-dev] Late setting of SCEV NoWrap flags does bad with cache
Hello Everyone,
I want to raise a discussion about reasonability of late setting of nsw/nuw/nw
flags to SCEV AddRecs through setNoWrapFlags method. A discussion about this
have already happened in August last year, there was a concern about different
no-wrap flags that come from different sequences of SCEV flags invocations. It
was mentioned there that late setting of flags is actually a hack to save some
compile time.
Recently I've stumbled over a different problem related to this late flag
setting. The problem is that current SCEV implementation caches a lot of data
about SCEVs. In particular, information about ranges and loop backedge taken
counts. A side effect of that is that if we had an AddRec like {1,+,1} and
cached its range as [MIN_INT, MAX_INT+1), and then later found out that this
recurrency provably has <nsw>, its cached range doesn't get updated.
As well as ranges of all other SCEVs that depend on that. As result, we have a
very weird behavior of SCEV that is unable to prove that
sext({1,+,1}<nsw>) is a positive value just because it has outdated cached
ranges.
In fact, I don't see any point in having <nsw>/<nuw> for AddRecs
if they are not used to prove useful facts about these recs (such as
non-negativeness or positiveness). We will only be able to do it if we
accidentally call forgetMemoizedResults for all SCEV that depend on that AddRec
(which is very unlikely to ever happen in practice). I tried to clean up cache
selectively, but in fact it takes full traversal through all cached SCEVs at
least to identify the ones that depend on the updated AddRec, and this
completely ruins the idea of saving compile time.
You may find the example of such behavior if you re-enable the assertion from
patch https://reviews.llvm.org/rL323309 . The test that was added with this
patch fails on this assert with exactly this problem.
My question is: do we still want to have this hack with late setting of nsw/nuw
given that this flag does not give us any benefits in many real situations? What
would you think?
Best regards,
Max
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180125/9ec84d57/attachment.html>
Sanjoy Das via llvm-dev
2018-Jan-26  05:04 UTC
[llvm-dev] Late setting of SCEV NoWrap flags does bad with cache
Hi Max, On Wed, Jan 24, 2018 at 10:03 PM, Maxim Kazantsev via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I want to raise a discussion about reasonability of late setting of > nsw/nuw/nw flags to SCEV AddRecs through setNoWrapFlags method. A discussion > about this have already happened in August last year, there was a concern > about different no-wrap flags that come from different sequences of SCEV > flags invocations. It was mentioned there that late setting of flags is > actually a hack to save some compile time. > > Recently I've stumbled over a different problem related to this late flag > setting. The problem is that current SCEV implementation caches a lot of > data about SCEVs. In particular, information about ranges and loop backedge > taken counts. A side effect of that is that if we had an AddRec like {1,+,1} > and cached its range as [MIN_INT, MAX_INT+1), and then later found out that > this recurrency provably has <nsw>, its cached range doesn't get updated. As > well as ranges of all other SCEVs that depend on that. As result, we have a > very weird behavior of SCEV that is unable to prove that sext({1,+,1}<nsw>) > is a positive value just because it has outdated cached ranges.Yes, this problem has come up several times and it is counter-intuitive. Part of the problem is (I'm guessing, I wasn't there when SCEV was written) that SCEV is tuned for C/C++ and for C/C++ add recurrences are almost always nsw on construction so you don't have these, let's say "phase ordering", issues. However, for Java, I was under the impression that despite this caching issue we are fairly effective at proving ranges. Of course "fairly effective" does not help if the one thing we don't optimize is 90% of an Important Benchmark(TM). I see a couple of ways out of this: 1. Eagerly compute nsw/nuw on AddRec construction. I'd be happy with this if you can show that we don't regress compile time. Other than the test-suite, you can probably ask for interesting compile time benchmarks on this list. 2. Don't cache ranges so much. Especially for things like sext({1,+,1}), just recomputing the ranges may not be significantly worse than the densemap lookup, assuming the backedge taken count is already computed. Of course for complex SCEVs we should still cache ranges, but for those perhaps we're okay with "locking in" a more conservative result. We could even return conservatives result for recursion deeper than a certain depth, like we do for Values in ComputeKnownBits. I like this philosophically since it reduces caching (a lot of the complexity in SCEV is due to caching), but again, you'll have to show that we don't regress compile time. 3. Precise invalidation via use-lists -- add use lists to SCEV and use that to precisely kick values out of caches as you infer NSW/NUW etc. We can optimize the representation of use lists by e.g. avoiding keeping use lists for SCEVConstants and exploiting the fact that we don't delete SCEVs so use lists don't need fast deletion. Again, you'll need to show compile time memory usage does not go up by very much.> You may find the example of such behavior if you re-enable the assertion > from patch https://reviews.llvm.org/rL323309 . The test that was added withFor that case specifically: I don't think such asserts are a good idea, unless you've already checked SE.isKnownNonNegative(X) (i.e. not only do you know that X is "supposed to be" non-negative, but you also know that SCEV can prove it). For instance see https://bugs.llvm.org/show_bug.cgi?id=25170 where SCEV can prove "A-B" is a constant but can't prove that "B-A" is a constant. I think this behavior is fine btw, SCEV is allowed to be arbitrarily conservative.> My question is: do we still want to have this hack with late setting of > nsw/nuw given that this flag does not give us any benefits in many real > situations? What would you think?We can't just remove it (because you'll regress the cases where it does help), but we can replace it with something that is more effective. -- Sanjoy
Maxim Kazantsev via llvm-dev
2018-Jan-26  05:55 UTC
[llvm-dev] Late setting of SCEV NoWrap flags does bad with cache
Thanks for your insides Sanjoy!
I don't really believe that option 2 may work just because even if we
recalculate the range for this add recurrency, there are already its derivatives
with cached ranges (the most obvious example is sext and expressions where this
sext is involved). We can speculate about what is "simple enough" to
not cache the ranges, but I believe that there is always a counter-example to
any definition. It's just maybe hard to construct.
Just a bit of speculation about option 3 (not sure how often this case is): we
can have dependency between SCEVs which is not represented in use-list. For
example:
  if ({100,+,1} > -1) {
    x = a / {1,+,1}
  }
If we prove nsw for the first addrec, it will also imply nsw for the second
addrec (because the number of iterations in this loop is provably less than
MAX_INT - 99). But there is no clear use-chain dependency between them. So I
would rather belive in option 1 as in the most reliable.
I'll try to experiment with options 1 and 3 and see where it gets us.
Best regards,
Max
-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] 
Sent: Friday, January 26, 2018 12:04 PM
To: Maxim Kazantsev <max.kazantsev at azul.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Late setting of SCEV NoWrap flags does bad with cache
Hi Max,
On Wed, Jan 24, 2018 at 10:03 PM, Maxim Kazantsev via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> I want to raise a discussion about reasonability of late setting of 
> nsw/nuw/nw flags to SCEV AddRecs through setNoWrapFlags method. A 
> discussion about this have already happened in August last year, there 
> was a concern about different no-wrap flags that come from different 
> sequences of SCEV flags invocations. It was mentioned there that late 
> setting of flags is actually a hack to save some compile time.
>
> Recently I've stumbled over a different problem related to this late 
> flag setting. The problem is that current SCEV implementation caches a 
> lot of data about SCEVs. In particular, information about ranges and 
> loop backedge taken counts. A side effect of that is that if we had an 
> AddRec like {1,+,1} and cached its range as [MIN_INT, MAX_INT+1), and 
> then later found out that this recurrency provably has <nsw>, its 
> cached range doesn't get updated. As well as ranges of all other SCEVs 
> that depend on that. As result, we have a very weird behavior of SCEV 
> that is unable to prove that sext({1,+,1}<nsw>) is a positive value
just because it has outdated cached ranges.
Yes, this problem has come up several times and it is counter-intuitive.
Part of the problem is (I'm guessing, I wasn't there when SCEV was
written) that SCEV is tuned for C/C++ and for C/C++ add recurrences are almost
always nsw on construction so you don't have these, let's say
"phase ordering", issues.
However, for Java, I was under the impression that despite this caching issue we
are fairly effective at proving ranges.  Of course "fairly effective"
does not help if the one thing we don't optimize is 90% of an Important
Benchmark(TM).
I see a couple of ways out of this:
  1. Eagerly compute nsw/nuw on AddRec construction.  I'd be happy with this
if you can show that we don't regress compile time.  Other than the
test-suite, you can probably ask for interesting compile time benchmarks on this
list.
  2. Don't cache ranges so much.  Especially for things like sext({1,+,1}),
just recomputing the ranges may not be significantly worse than the densemap
lookup, assuming the backedge taken count is already computed.  Of course for
complex SCEVs we should still cache ranges, but for those perhaps we're okay
with "locking in" a more conservative result.  We could even return
conservatives result for recursion deeper than a certain depth, like we do for
Values in ComputeKnownBits. I like this philosophically since it reduces caching
(a lot of the complexity in SCEV is due to caching), but again, you'll have
to show that we don't regress compile time.
  3. Precise invalidation via use-lists -- add use lists to SCEV and use that to
precisely kick values out of caches as you infer NSW/NUW etc.  We can optimize
the representation of use lists by e.g. avoiding keeping use lists for
SCEVConstants and exploiting the fact that we don't delete SCEVs so use
lists don't need fast deletion.  Again, you'll need to show compile time
memory usage does not go up by very much.
> You may find the example of such behavior if you re-enable the 
> assertion from patch https://reviews.llvm.org/rL323309 . The test that 
> was added with
For that case specifically:  I don't think such asserts are a good idea,
unless you've already checked SE.isKnownNonNegative(X) (i.e. not only do you
know that X is "supposed to be" non-negative, but you also know that
SCEV can prove it).  For instance see
https://bugs.llvm.org/show_bug.cgi?id=25170 where SCEV can prove "A-B"
is a constant but can't prove that "B-A" is a constant.  I think
this behavior is fine btw, SCEV is allowed to be arbitrarily conservative.
> My question is: do we still want to have this hack with late setting 
> of nsw/nuw given that this flag does not give us any benefits in many 
> real situations? What would you think?
We can't just remove it (because you'll regress the cases where it does
help), but we can replace it with something that is more effective.
-- Sanjoy
Maybe Matching Threads
- Late setting of SCEV NoWrap flags does bad with cache
- Late setting of SCEV NoWrap flags does bad with cache
- Late setting of SCEV NoWrap flags does bad with cache
- Late setting of SCEV NoWrap flags does bad with cache
- [SCEV] Inconsistent SCEV formation for zext