Davide Italiano via llvm-dev
2016-Dec-31 12:19 UTC
[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)
On Sat, Dec 31, 2016 at 12:15 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Fri, Dec 30, 2016 at 11:55 PM, Sanjoy Das > <sanjoy at playingwithpointers.com> wrote: >> >> Hi Daniel, >> >> On Fri, Dec 30, 2016 at 10:55 PM, Daniel Berlin <dberlin at dberlin.org> >> wrote: >> >> Right, but we are talking about "when, in the intermediate state, can i >> >> transform an undef to a different value". >> >> >> >> Remember you can only go down the lattice. So you can't make undef >> >> constant, and then discover it's wrong, and go back up :) >> >> IE can't change phi undef, undef to constant value 50, , discover 50 >> >> doesn't work, and say, well actually, let's make it undef again for an >> >> iteration :) >> >> If the kind of simplification (I think) you've mentioned is allowed, >> then I think even Davide's lattice is problematic. Say we have: >> >> loop: >> X = phi(entry: undef, loop: T) >> ... >> T = undef >> if C then br loop else br outside >> >> When we first visit X then we'll compute its state as (Undef MEET >> Unknown) = Undef. > > Yes > >> >> My guess is that you're implying that the Undef >> lattice state (or some other computation hanging off of it) can be >> folded immediately to, say, 50, but once we mark the backedge as >> executable we'll set it to undef (or mark it as 49)? > > Yes > > >> And both of >> these are bad because you're making a non-monotonic transition? > > Yes > >> >> Given >> this, I'm not sure what prevents the bad transform from happening even >> if we have separate Unknown and Undef states. > > > You can always get *out* of the bad transform by dropping to overdefined if > you discover it doesn't work. > The question is more one of "what is the optimal time to try to figure out > a constant value that works". > If you drop too early, you have to go overdefined when you may have been > able to figure out a constant to use to replace the undef. >> >> >> >> >> If the solution is to not fold X into undef (which, by assumption, can >> "spontaneously decay" into any value) until you're sure every incoming >> value is also undef then I suppose we'll need some special handling >> for undef values to prevent breaking cases like (as shown earlier): >> >> loop: >> X = phi(entry: 10, loop: T) >> if X == 10 then br outside else br loop >> >> => >> >> loop: >> X = phi(entry: 10, loop: T) >> br outside >> >> (i.e. to keep the algorithm properly optimistic)? > > > Right. This is precisely the unknown part, or at least, that was my > thinking. > >> >> >> And if we do have a way of keeping undefs as undefs (i.e. prevent X >> from decaying into 50 when we first visit the first loop) then why >> can't we re-use the same mechanism to avoid spuriously decaying "phi >> undef undef-but-actually-unknown" to "50"? > > That mechanism is the "all executable paths leading to our operands visited" > bit, which is .. the unknown vs undef state. > > IE we only mark the operand undef when we visit it from a reachable edge. >> >> >> Another way of stating my suggestion is that, if you agree that this >> is a correct lattice (different from Davide's proposal) and pretend >> the "spontaneous undef decay" problem does not exist, then: >> >> digraph G { >> Unknown -> Undef >> Undef -> Constant1 >> Undef -> Constant2 >> Undef -> Constant3 >> Constant1 -> Bottom >> Constant2 -> Bottom >> Constant3-> Bottom >> } >> >> then it should be legal / correct to first drop every lattice element >> from "Unknown" to "Undef" before running the algorithm. The only >> cases where this would give us a conservative result is a place where >> we'd have otherwise gotten "Unknown" for a used value; and the >> algorithm should never have terminated in such a state. > > > It is definitely *legal* to do this or the algorithm is broken otherwise > (you should just end up in overdefined more). > > I'm not sure i believe the last part though. I have to think about it. >Thanks for the comments Sanjoy, I'll have to think about it further. My problem with this is when you put undef into the equation the original Zadeck's algorithm becomes not entirely obvious. I mean, I originally thought it was only my problem, but we're here discussing (and have discussed this already in the past, so it seems it's not).> > FWIW: My initial proposal to all of this was "stop caring about trying to > optimize undef here at all". Remove it from the lattice, and treat undef as > overdefined if we see it. If we want to optimize undef, before SCCP solving, > build SCCs of the parts of the SSA graph that contain undef, choose correct > values for those subgraphs, then mark them constant *before* the solver sees > it". >Although originally I wasn't, I'm increasingly convinced this is the right path forward (at least in the beginning), i.e. strip undef handling entirely. I tried to integrate undef in the solver to see how that worked and it seems the proposed lattice has still some issues. We have hard time reasoning about simple things like what's the lattice structure and what should be the invariants we maintain/enforce at the end of each run (something that's very nice about the original algorithm is that you know at the end of the algorithm nothing should have value TOP otherwise you forgot to lower something/have a bug, but we can't currently assert this because of `undef`). I'm also becoming increasingly convinced that this problem is something we (LLVM) created. If optimizing `undef` (which is sometimes if not often symptom of undefined behaviour/bugs in your program) should come at the expense of correctness or complicated logic/potentially unsound algorithms I don't think we should go that way. What Danny proposed I think it's very appealing but I'm taking a more extreme position here saying that we may consider not doing that in the beginning. Today I ran SCCP over a bunch of codebases (completely removing the undef optimization) and I found it never actually resulting in a runtime performance hit. Of course this is not representative at all, but still a datapoint. I think we should take this opportunity as a way to see if we can make things simpler/easier to consider correct instead of adding cases for problems we discover.> This is pre-solve instead of post-solve or iterative solve. We already do > this for arguments by solving them to overdefined. IPCP also works much the > same way by pre-solving arguments constants > > This is precisely because this stuff is non-trivial to reason about, and > unclear to me that it's really worth it. If *it* is worth it, we can do > better than we do now (and contain the mess *entirely* to the presolver), > and if it's not, we shouldn't have complicated reasoning to try to make it > work well. > > Right now we have the worst of all worlds. We have a mess in both places > (the resolver and the solver both step on each other and differ in > behavior), it's hard to reason about (though provably not correct), and it > doesn't even give you the best answers it can *anyway* > > (generally, at the point where we have a bunch of very smart people racking > their heads about whether something is really going to work algorithmically > or not, i take it as a sign that maybe we have the wrong path. Obviously, > there are some really complex problems, but ... this should not be one of > them :P) >+1 on everything. -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
Sanjoy Das via llvm-dev
2017-Jan-02 22:24 UTC
[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)
Hi Davide, On Sat, Dec 31, 2016 at 4:19 AM, Davide Italiano <davide at freebsd.org> wrote:> Although originally I wasn't, I'm increasingly convinced this is the > right path forward (at least in the beginning), i.e. strip undef > handling entirely. I tried to integrate undef in the solver to see how > that worked and it seems the proposed lattice has still some issues.By "integrate undef" do you mean you implemented the lattice you mentioned in the original post? If so, can you mention what issues you ran into?> We have hard time reasoning about simple things like what's the > lattice structure and what should be the invariants we > maintain/enforce at the end of each run (something that's very nice > about the original algorithm is that you know at the end of the > algorithm nothing should have value TOP otherwise you forgot to lower > something/have a bug, but we can't currently assert this because of > `undef`).That is a good point. Can we get the same kind of checking by keeping track of if we've visited every non-dead instruction in the function at least once?> I'm also becoming increasingly convinced that this problem is > something we (LLVM) created. > If optimizing `undef` (which is sometimes if not often symptom of > undefined behaviour/bugs in your program) should come at the expense > of correctness or complicated logic/potentially unsound algorithms I > don't think we should go that way.I agree that undef has soundness issues, but I don't think those are relevant here. And with respect to `undef` and its implications on the correctness of the source program, there are two factors here: Using `undef` in a side effect (e.g. printf(undef)) is usually a sign of a buggy source program, especially if the source language is C or C++. The *presence* of `undef` is fine, and does not imply that the source program is incorrect.> What Danny proposed I think it's very appealing but I'm taking a more > extreme position here saying that we may consider not doing that in > the beginning. Today I ran SCCP over a bunch of codebases (completely > removing the undef optimization) and I found it never actually > resulting in a runtime performance hit. Of course this is not > representative at all, but still a datapoint.What is the contribution of SCCP itself to the (I presume internal) benchmarks? If you disable SCCP completely, how much performance do you lose?> I think we should take this opportunity as a way to see if we can make > things simpler/easier to consider correct instead of adding cases for > problems we discover.In theory I'm completely fine with not handling undef at all (i.e. your proposal) as a starting point. If we see a need for it, we can always add it back later. In practice, this is the kind of thing that tends to get added back poorly (because someone's benchmark will be faster by 1% with a "small fix" to SCCP). Given that we're taking a step back and looking at the bigger picture now anyway, this could be an opportune moment to fix the underlying issue with regards to undef. Having said that, since you're doing the work I'm more than happy to let you make the call on this one. I have the armchair, but you have the keyboard. :) -- Sanjoy PS: I still don't have an answer to the very first question I asked: ```>> Looking at the original bug, it seems like a straightforward >> undef-propagation bug to me -- SCCP was folding "or undef, constant" >> to "undef", which is wrong. Why is changing that not the fix? That >> is, some variant of > > > You would still need to fix the iteration order of the resolver, or it will > make more wrong decisions. > As Davide discovered, there are bugs open with the same cause.Davide -- can you point me to those? ```
Davide Italiano via llvm-dev
2017-Jan-02 22:48 UTC
[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)
On Mon, Jan 2, 2017 at 2:24 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi Davide, > > On Sat, Dec 31, 2016 at 4:19 AM, Davide Italiano <davide at freebsd.org> wrote: >> Although originally I wasn't, I'm increasingly convinced this is the >> right path forward (at least in the beginning), i.e. strip undef >> handling entirely. I tried to integrate undef in the solver to see how >> that worked and it seems the proposed lattice has still some issues. > > By "integrate undef" do you mean you implemented the lattice you > mentioned in the original post? If so, can you mention what issues > you ran into? >Nothing in the testcases I used, but still you pointed out a case where the lattice I proposed in the original mail wasn't necessarily correct.>> We have hard time reasoning about simple things like what's the >> lattice structure and what should be the invariants we >> maintain/enforce at the end of each run (something that's very nice >> about the original algorithm is that you know at the end of the >> algorithm nothing should have value TOP otherwise you forgot to lower >> something/have a bug, but we can't currently assert this because of >> `undef`). > > That is a good point. Can we get the same kind of checking by keeping > track of if we've visited every non-dead instruction in the function > at least once? >I think that could work.>> I'm also becoming increasingly convinced that this problem is >> something we (LLVM) created. >> If optimizing `undef` (which is sometimes if not often symptom of >> undefined behaviour/bugs in your program) should come at the expense >> of correctness or complicated logic/potentially unsound algorithms I >> don't think we should go that way. > > I agree that undef has soundness issues, but I don't think those are > relevant here. > > And with respect to `undef` and its implications on the correctness of > the source program, there are two factors here: > > Using `undef` in a side effect (e.g. printf(undef)) is usually a sign > of a buggy source program, especially if the source language is C or > C++. > > The *presence* of `undef` is fine, and does not imply that the source > program is incorrect. >Thanks for the clarification. That's why I said sometimes if not often (not always), anyway, I get your point =)>> What Danny proposed I think it's very appealing but I'm taking a more >> extreme position here saying that we may consider not doing that in >> the beginning. Today I ran SCCP over a bunch of codebases (completely >> removing the undef optimization) and I found it never actually >> resulting in a runtime performance hit. Of course this is not >> representative at all, but still a datapoint. > > What is the contribution of SCCP itself to the (I presume internal) > benchmarks? If you disable SCCP completely, how much performance do > you lose? >~1.5% runtime on a game scene (I mean, disabling both SCCP/IPSCCP, this is LTO, FWIW). Note: I wish I could share numbers, but I'm not allowed.>> I think we should take this opportunity as a way to see if we can make >> things simpler/easier to consider correct instead of adding cases for >> problems we discover. > > In theory I'm completely fine with not handling undef at all > (i.e. your proposal) as a starting point. If we see a need for it, we > can always add it back later. >I have personally no rush to get this code in, so, while we're here, we can just bite the bullet and fix this entirely. Side note: I originally wanted to fix this for 4.0 so that I can avoid maintaining a patch downstream. Turns out that the burden of keeping a patch downstream is not terribly high, so this can probably wait (and get more careful thoughts). Side note 2: I think that even if we want to disable undef handling that needs to be sustained with a set of benchmarks showing up we don't lose too much. My original testing was, of course, non-representative of the entire world (I just reported as a data-point).> In practice, this is the kind of thing that tends to get added back > poorly (because someone's benchmark will be faster by 1% with a "small > fix" to SCCP). Given that we're taking a step back and looking at the > bigger picture now anyway, this could be an opportune moment to fix > the underlying issue with regards to undef. > > Having said that, since you're doing the work I'm more than happy to > let you make the call on this one. I have the armchair, but you have > the keyboard. :) >Your comments are of course very welcome (that goes without saying). Are you happy with me experimenting with something similar to what Danny proposed (a pre-solver computing the SCCs on the SSA graph?). At this point this is my favourite solution because we can stick with the default algorithm (which will keep me happier as somebody else did the hard work of proving correct on my behalf).> -- Sanjoy > > > PS: > > I still don't have an answer to the very first question I asked: > > ``` >>> Looking at the original bug, it seems like a straightforward >>> undef-propagation bug to me -- SCCP was folding "or undef, constant" >>> to "undef", which is wrong. Why is changing that not the fix? That >>> is, some variant of >> >> >> You would still need to fix the iteration order of the resolver, or it will >> make more wrong decisions. >> As Davide discovered, there are bugs open with the same cause. > > Davide -- can you point me to those? > ```Sorry, I missed this one :( I have another case (not reduced) where this falls apart. I think it's the same issue as I locally patched with something very similar to what you proposed in your original mail, so I'm tempted to claim it's the same issue (or a slight modification of it). That said, sure, I think we can probably get the patch you proposed originally in and call it a night. But I'm still very nervous that we're not sure of the correctness of the algorithm and I'm very afraid it may fall apart in the future again. -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
Possibly Parallel Threads
- SCCP is not always correct in presence of undef (+ proposed fix)
- SCCP is not always correct in presence of undef (+ proposed fix)
- SCCP is not always correct in presence of undef (+ proposed fix)
- SCCP is not always correct in presence of undef (+ proposed fix)
- SCCP is not always correct in presence of undef (+ proposed fix)