Matti Niemenmaa via llvm-dev
2016-Sep-20 19:05 UTC
[llvm-dev] Inferring nsw/nuw flags for increment/decrement based on relational comparisons
Hi everyone, I posted some questions related to implementing inference of nsw/nuw flags based on known icmp results to Bug 30428 ( https://llvm.org/bugs/show_bug.cgi?id=30428 ) and it was recommended that I engage a wider audience by coming here. The minimal context is the following, please see the bug report for more detail: > 1. If (X s< Y), then both X + 1 and Y - 1 are nsw. > 2. If (X u< Y), then both X + 1 and Y - 1 are nuw. This is basically what I might be interested in implementing, or at least mapping out an implementation approach for, for the sake of eliminating some bounds checks in Rust that the Go compiler is apparently able to eliminate: https://github.com/rust-lang/rust/issues/35981 I'll quote my questions about the possible implementation below and try to add some more context. > DominatorTree and AssumptionCache do seem to be sufficient and already > present. computeOverflowForSignedAdd and computeOverflowForUnsignedAdd > look like the kind of places to extend with this analysis based on > that info. ValueTracking has access to DominatorTree and AssumptionCache information, and provides the named two functions. InstCombine calls them and sets nsw/nuw flags based on the result. > Two questions about that approach though: > > 1. Going back to the predsimplify problems: performance. Walking up > the CFG (even only up to the MaxDepth = 6 defined in > ValueTracking.cpp) and doing dominance queries on certain icmps seems > like a fairly expensive operation to perform on every add/sub > instruction. At the minimum I'd imagine the icmps that we know to be > true should be cached for the basic block. ...and this is more or less > equivalent to the suggestion about llvm.assume generation that was > shot down in the discussion you linked? The link in question is the following: http://lists.llvm.org/pipermail/llvm-dev/2015-January/080666.html Perhaps these operations are cheaper than I think and such caching is not needed? Alternatively they could be put behind -O3 i.e. the ExpensiveCombines variable Sanjay pointed out. > 2. InstCombiner::visitAdd only calls into ValueTracking for the > unsigned case, i.e. computeOverflowForUnsignedAdd. There are no > computeOverflowFor*Sub functions that InstCombiner::visitSub even > could make use of. Instead, InstCombiner has its own > WillNotOverflow{S,UnS}igned{Add,Sub} functions. The relationship > between these and the computeOverflow* functions is unclear to me. > They look like they overlap to an extent. Is some refactoring warranted here or does this mean that the InstCombiner functions and not the ValueTracking ones are the ones to edit? It seems a bit strange that InstCombine has extra logic here since ValueTracking is used from other locations as well and seems like the right place to provide the ultimate truth. InstCombiner::visitAdd has a nearby TODO implying performance concerns, is that what it's about? — Matti
Philip Reames via llvm-dev
2016-Sep-26 23:28 UTC
[llvm-dev] Inferring nsw/nuw flags for increment/decrement based on relational comparisons
On 09/20/2016 12:05 PM, Matti Niemenmaa via llvm-dev wrote:> Hi everyone, > > I posted some questions related to implementing inference of nsw/nuw > flags based on known icmp results to Bug 30428 ( > https://llvm.org/bugs/show_bug.cgi?id=30428 ) and it was recommended > that I engage a wider audience by coming here. The minimal context is > the following, please see the bug report for more detail: > > > 1. If (X s< Y), then both X + 1 and Y - 1 are nsw. > > 2. If (X u< Y), then both X + 1 and Y - 1 are nuw.If this is the only case you want to support, this sounds like a fairly straight forward extension to the LazyValueInfo analysis. In particular, take a look at getValueFromICmpCondition. I'd be happy to help review a patch here once you've got something working. The basic idea would be that (X s<Y ) implies X s< INT_MAX since Y must be INT_MAX or smaller and X is less than that. We can tell this without needing to know anything about Y. Fair warning, we're actively working through issues related to nsw/nuw inference causing overall regressions. I think we've got the key one identified and a patch is under review, but I suspect you'll stumble across the same thing.> > This is basically what I might be interested in implementing, or at > least mapping out an implementation approach for, for the sake of > eliminating some bounds checks in Rust that the Go compiler is > apparently able to eliminate: > https://github.com/rust-lang/rust/issues/35981 > > I'll quote my questions about the possible implementation below and try > to add some more context. > > > DominatorTree and AssumptionCache do seem to be sufficient and already > > present. computeOverflowForSignedAdd and computeOverflowForUnsignedAdd > > look like the kind of places to extend with this analysis based on > > that info. > > ValueTracking has access to DominatorTree and AssumptionCache > information, and provides the named two functions. InstCombine calls > them and sets nsw/nuw flags based on the result. > > > Two questions about that approach though: > > > > 1. Going back to the predsimplify problems: performance. Walking up > > the CFG (even only up to the MaxDepth = 6 defined in > > ValueTracking.cpp) and doing dominance queries on certain icmps seems > > like a fairly expensive operation to perform on every add/sub > > instruction. At the minimum I'd imagine the icmps that we know to be > > true should be cached for the basic block. ...and this is more or less > > equivalent to the suggestion about llvm.assume generation that was > > shot down in the discussion you linked? > > The link in question is the following: > http://lists.llvm.org/pipermail/llvm-dev/2015-January/080666.html > > Perhaps these operations are cheaper than I think and such caching is > not needed? Alternatively they could be put behind -O3 i.e. the > ExpensiveCombines variable Sanjay pointed out. > > > 2. InstCombiner::visitAdd only calls into ValueTracking for the > > unsigned case, i.e. computeOverflowForUnsignedAdd. There are no > > computeOverflowFor*Sub functions that InstCombiner::visitSub even > > could make use of. Instead, InstCombiner has its own > > WillNotOverflow{S,UnS}igned{Add,Sub} functions. The relationship > > between these and the computeOverflow* functions is unclear to me. > > They look like they overlap to an extent. > > Is some refactoring warranted here or does this mean that the > InstCombiner functions and not the ValueTracking ones are the ones to > edit? It seems a bit strange that InstCombine has extra logic here since > ValueTracking is used from other locations as well and seems like the > right place to provide the ultimate truth. InstCombiner::visitAdd has a > nearby TODO implying performance concerns, is that what it's about? > > — Matti > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Matti Niemenmaa via llvm-dev
2016-Sep-27 16:55 UTC
[llvm-dev] Inferring nsw/nuw flags for increment/decrement based on relational comparisons
On 2016-09-27 02:28, Philip Reames wrote:> On 09/20/2016 12:05 PM, Matti Niemenmaa via llvm-dev wrote: >> I posted some questions related to implementing inference of nsw/nuw >> flags based on known icmp results to Bug 30428 ( >> https://llvm.org/bugs/show_bug.cgi?id=30428 ) and it was recommended >> that I engage a wider audience by coming here. The minimal context is >> the following, please see the bug report for more detail: >> >> > 1. If (X s< Y), then both X + 1 and Y - 1 are nsw. >> > 2. If (X u< Y), then both X + 1 and Y - 1 are nuw. > If this is the only case you want to support, this sounds like a fairly > straight forward extension to the LazyValueInfo analysis. In > particular, take a look at getValueFromICmpCondition. I'd be happy to > help review a patch here once you've got something working. > > The basic idea would be that (X s<Y ) implies X s< INT_MAX since Y must > be INT_MAX or smaller and X is less than that. We can tell this without > needing to know anything about Y.Looks like a good idea, but I'm not sure how LazyValueInfo's interface would support this case. Did you mean synthesizing the INT_MAX constant and then checking specifically for "X s< INT_MAX" using LazyValueInfo::getPredicateAt? At a glance that seems like it would work, but it feels like an odd way of doing it. Initially I was looking at LVI::getConstantRange but its "at the end of the specified block" interface seems too restrictive. The block containing the comparison may end in a conditional br and so surely LVI can't prove anything there. And the block containing the increment/decrement instruction may contain some later information that LVI can prove at the end of the block, but is not true at the instruction? CorrelatedValuePropagation and JumpThreading appear to be the only transformation passes making use of LVI at the moment, and that's probably something we don't want to change. This kind of nsw/nuw flag inference doesn't really fit in either, but CVP is definitely the closer match and it should be possible to shoehorn it in there.> Fair warning, we're actively working through issues related to nsw/nuw > inference causing overall regressions. I think we've got the key one > identified and a patch is under review, but I suspect you'll stumble > across the same thing.Interesting, so adding nsw/nuw flags is pessimizing the generated code? Can you provide any links?