Xinliang David Li via llvm-dev
2016-Jun-08 23:20 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Wed, Jun 8, 2016 at 12:36 PM, Sanjoy Das via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Does it make sense to change RefSCCs to hold a list of > RefSCC-DAG-Roots that were split out of it because of edge deletion? > Then one way to phrase the inliner/function pass iteration would be > (assuming I understand the issues): > > Stack.push(RefSCC_Leaves); > while (!Stack.empty()) { > RefSCC = Stack.pop(); > InlineCallSites(RefSCC); > if (!RefSCC.splitOutSCCs.empty()) > goto repush; > for each func in RefSCC: > FPM.run(func); > if (!RefSCC.splitOutSCCs.empty()) > goto repush; > continue; > repush: > for (refscc_dag_root in RefSCCs.splitOutSCCs) > // here we don't want to push every leaf, but leafs that > // have functions that haven't had the FPM run on (maybe we can > do this by maintaining a set?) > // if we don't push a leaf, we push its parent (which we want > to push even if we've run FPM on it > // since we'd like to re-run the inliner on it). > refscc_dag_root.push_leaves_to(Stack) > } > > > (I know this isn't ideal, since now RefSCC is no longer "just a data > structure", but is actually has incidental information.) > >Is it in the category of invalidating the iterator while iterating' which feels very wrong to me. We should avoid going there and find better ways to solve the motivating problems (perhaps defining them more clearly first ?). thanks, David> > On Wed, Jun 8, 2016 at 12:07 PM, Finkel, Hal J. via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Sent from my Verizon Wireless 4G LTE DROID > > > > On Jun 8, 2016 1:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> > >> > >>> On Jun 8, 2016, at 9:32 AM, Hal Finkel via llvm-dev > >>> <llvm-dev at lists.llvm.org> wrote: > >>> > >>> > >>> ________________________________ > >>>> > >>>> From: "Sean Silva via llvm-dev" <llvm-dev at lists.llvm.org> > >>>> To: "llvm-dev" <llvm-dev at lists.llvm.org> > >>>> Sent: Wednesday, June 8, 2016 6:19:03 AM > >>>> Subject: [llvm-dev] Intended behavior of CGSCC pass manager. > >>>> > >>>> Hi Chandler, Philip, Mehdi, (and llvm-dev,) > >>>> > >>>> (this is partially a summary of some discussions that happened at the > >>>> last LLVM bay area social, and partially a discussion about the > direction of > >>>> the CGSCC pass manager) > >>>> > >>>> > >>>> A the last LLVM social we discussed the progress on the CGSCC pass > >>>> manager. It seems like Chandler has a CGSCC pass manager working, but > it is > >>>> still unresolved exactly which semantics we want (more about this > below) > >>>> that are reasonably implementable. > >>>> > >>>> AFAICT, there has been no public discussion about what exact semantics > >>>> we ultimately want to have. We should figure that out. > >>>> > >>>> The main difficulty which Chandler described is the apparently quite > >>>> complex logic surrounding needing to run function passes nested > within an > >>>> SCC pass manager, while providing some guarantees about exactly what > order > >>>> the function passes are run. The existing CGSCC pass manager just > punts on > >>>> some of the problems that arise (look in CGPassManager::runOnModule, > >>>> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in > >>>> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems > that > >>>> Chandler has been trying to solve. > >>>> > >>>> ( > >>>> Why is this "function passes inside CGSCC passes" stuff interesting? > >>>> Because LLVM can do inlining on an SCC (often just a single function) > and > >>>> then run function passes to simplify the function(s) in the SCC > before it > >>>> tries to inline into a parent SCC. (the SCC visitation order is > post-order) > >>>> For example, we may inline a bunch of code, but after inlining we can > >>>> tremendously simplify the function, and we want to do so before > considering > >>>> this function for inlining into its callers so that we get an accurate > >>>> evaluation of the inline cost. > >>>> Based on what Chandler said, it seems that LLVM is fairly unique in > this > >>>> regard and other compilers don't do this (which is why we can't just > look at > >>>> how other compilers solve this problem; they don't have this problem > (maybe > >>>> they should? or maybe we shouldn't?)). For example, he described that > GCC > >>>> uses different inlining "phases"; e.g. it does early inlining on the > entire > >>>> module, then does simplifications on the entire module, then does late > >>>> inlining on the entire module; so it is not able to incrementally > simplify > >>>> as it inlines like LLVM does. > >>> > >>> This incremental simplification is an important feature of our inliner, > >>> and one we should endeavor to keep. We might also want different > phases at > >>> some point (e.g. a top-down and a bottom-up phase), but that's another > >>> story. > >>>> > >>>> ) > >>>> > >>>> As background for what is below, the LazyCallGraph tracks two graphs: > >>>> the "call graph" and the "ref graph". > >>>> Conceptually, the call graph is the graph of direct calls, where > >>>> indirect calls and calls to external functions do not appear (or are > >>>> connected to dummy nodes). The ref graph is basically the graph of all > >>>> functions transitively accessible based on the globals/constants/etc. > >>>> referenced by a function (e.g. if a function `foo` references a > vtable that > >>>> is defined in the module, there is an edge in the ref graph from > `foo` to > >>>> every function in the vtable). > >>>> The call graph is a strict subset of the ref graph. > >>>> > >>>> Chandler described that he had a major breakthrough in that the CGSCC > >>>> pass manager only had to deal with 3 classes of modifications that can > >>>> occur: > >>>> - a pass may e.g. propagate a load of a function pointer into an > >>>> indirect call, turning it into an direct call. This requires adding > an edge > >>>> in the CG but not in the ref graph. > >>>> - a pass may take a direct call and turn it into an indirect call. > This > >>>> requires removing an edge from the CG, but not in the ref graph. > >>>> - a pass may delete a direct call. This removes an edge in the CG and > >>>> also in the ref graph. > >>>> > >>>> From the perspective of the CGSCC pass manager, these operations can > >>>> affect the SCC structure. Adding an edge might merge SCC's and > deleting an > >>>> edge might split SCC's. Chandler mentioned that apparently the issues > of > >>>> splitting and merging SCC's within the current infrastructure are > actually > >>>> quite challenging and lead to e.g. iterator invalidation issues, and > that is > >>>> what he is working on. > >>>> > >>>> ( > >>>> The ref graph is important to guide the overall SCC visitation order > >>>> because it basically represents "the largest graph that the CG may > turn into > >>>> due to our static analysis of this module". I.e. no transformation we > can > >>>> statically make in the CGSCC passes can ever cause us to need to > merge SCC's > >>>> in the ref graph. > >>>> ) > >>>> > >>>> > >>>> > >>>> I have a couple overall questions/concerns: > >>>> > >>>> > >>>> 1. The ref graph can easily go quadratic. E.g. > >>>> > >>>> typedef void (*fp)(); > >>>> fp funcs[] = { > >>>> &foo1, > >>>> &foo2, > >>>> ... > >>>> &fooN > >>>> } > >>>> void foo1() { funcs[something](); } > >>>> void foo2() { funcs[something](); } > >>>> ... > >>>> void fooN() { funcs[something](); } > >>>> > >>>> One real-world case where this might come about is in the presence of > >>>> vtables. > >>>> > >>>> The existing CGSCC pass manager does not have this issue AFAIK because > >>>> it does not consider the ref graph. > >>>> > >>>> Does anybody have any info/experience about how densely connected the > >>>> ref graph can get in programs that might reasonably be fed to the > compiler? > >>>> I just did a quick sanity check with LLD/ELF using > >>>> `--lto-newpm-passes=cgscc(no-op-cgscc)` and it at least seemed to > terminate > >>>> / not run out of memory. Based on some rough calculations looking at > the > >>>> profile, it seem like the entire run of the inliner in the old LTO > pipeline > >>>> (which is about 5% of total LTO time on this particular example I > looked at) > >>>> is only 2-3x as expensive as just > `--lto-newpm-passes=cgscc(no-op-cgscc)`, > >>>> so the LazyCallGraph construction is certainly not free. > >>>> > >>>> > >>>> 2. What is the intended behavior of CGSCC passes when SCC's are split > or > >>>> merged? E.g. a CGSCC pass runs on an SCC (e.g. the inliner). Now we > run some > >>>> function passes nested inside the CGSCC pass manager (e.g. to simplify > >>>> things after inlining). Consider: > >>> > >>> This is not how I thought the current scheme worked ;) -- I was under > the > >>> impression that we had a call graph with conservatively-connected dummy > >>> nodes for external/indirect functions. > >> > >> > >> The fact that we have a separate nodes for calling into an external > >> function and "being called" from an external function, these don't form > SCC. > >> So it means we can end up merging SCC IIUC. > > > > Yes, although I thought there was only one dummy node for those things. > > > > -Hal > > > >> > >> -- > >> Mehdi > >> > >> > >> > >>> As a result, there is no semantics-preserving optimization that will > >>> merge SCCs, only split them. In that case, I'd expect that once an SCC > is > >>> split, we re-run the CGSCC passes over the newly-separated SCCs. But > this > >>> corresponds to running over the "ref graph", as you describe it. I > don't > >>> understand why we want the non-conservative graph. > >>> > >>>> > >>>> a) These function passes are e.g. now able to devirtualize a call, > >>>> adding an edge to the CG, forming a larger CG SCC. Do you re-run the > CGSCC > >>>> pass (say, the inliner) on this larger SCC? > >>>> > >>>> b) These function passes are e.g. able to DCE a call, removing an edge > >>>> from the CG. This converts, say, a CG SCC which is a cycle graph (like > >>>> a->b->c->a) into a path graph (a->b->c, with no edge back to a). The > inliner > >>>> had already visited a, b, and c as a single SCC. Now does it have to > >>>> re-visit c, then b, then a, as single-node SCC's? > >>>> > >>>> > >>>> btw: > >>>> > >>>> One way that I have found it useful to think about this is in terms of > >>>> the visitation during Tarjan's SCC algorithm. I'll reference the > pseudocode > >>>> in > >>>> > https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm > . > >>>> Inside the "strongconnect" routine when we have identified an SCC > (the true > >>>> branch of `if (v.lowlink = v.index)` test ) we can visit > >>>> stack[v.index:stack.size()] as an SCC. This may or may not invalidate > some > >>>> things on the stack (the variable `S` in the pseudocode) and we may > need to > >>>> fix it up (e.g. inlining deleted a function, so we can't have an > entry on > >>>> the stack). Then, we can run function passes as we pop individual > functions > >>>> off the stack, but it is easier to think about IMO than merging of > SCC data > >>>> structures: if we add edges to the CG then we have to do more DFS on > the new > >>>> edges and if we delete edges then the DFS order of the stack gives us > >>>> certain guarantees. > >>>> Personally I find this much easier to reason about than the > description > >>>> in terms of splitting and merging SCC's in the CG and ref graph > (which the > >>>> LazyCallGraph API makes one to think about since it hides the > underlying > >>>> Tarjan's algorithm). > >>>> The LazyCallGraph API makes the current loop in > >>>> > http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h;272124$100 > >>>> very clean, but at least for my thinking about the problem, it seems > like > >>>> the wrong abstraction (and most of the LazyCallGraph API seems to be > unused, > >>>> so it seems like it may be overly heavyweight). > >>>> E.g. I think that maybe the easiest thing to do is to turn the current > >>>> approach inside out: instead of having the pass manager logic be the > "normal > >>>> code" and forcing the Tarjan algorithm to become a state machine of > >>>> iterators, use an open-coded Tarjan algorithm with some callbacks and > make > >>>> the pass management logic be the state machine. > >>>> This will also open the door to avoiding the potentially quadratic > size > >>>> of the ref graph, since e.g. in the example I gave above, we can mark > the > >>>> `funcs` array itself as already having been visited during the walk. > In the > >>>> current LazyCallGraph, this would require adding some sort of notion > of > >>>> hyperedge. > >>> > >>> FWIW, I see no purpose in abstracting one algorithm, especially if that > >>> makes things algorithmically harder. Also, the LazyCallGraph > abstraction and > >>> the iterator abstraction seem like separate issues. Iterator > abstractions > >>> are often useful because you can use them in generic algorithms, etc. > >>>> > >>>> > >>>> Since this is such a high priority (due to blocking PGO inlining), I > >>>> will probably try my hand at implementing the CGSCC pass manager > sometime > >>>> soon unless somebody beats me to it. (I'll probably try the > "open-coded SCC > >>>> visit" approach). > >>>> > >>>> Another possibility is implementing the new CGSCC pass manager that > uses > >>>> the same visitation semantics as the one in the old PM, and then we > can > >>>> refactor that as needed. In fact, that may be the best approach so > that > >>>> porting to the new PM is as NFC as possible and we can isolate the > >>>> functional (i.e., need benchmarks, measurements ...) changes in > separate > >>>> commits. > >>> > >>> I'm in favor of this approach for exactly the reason you mention. Being > >>> able to bisect regressions to the algorithmic change, separate from the > >>> infrastructure change, will likely make things easier in the long run > (and > >>> will avoid the problem, to the extent possible, of performance > regressions > >>> blocking the pass-manager work). > >>>> > >>>> > >>>> Sorry for the wall of text. > >>> > >>> No problem; much appreciated. > >>> > >>> -Hal > >>> > >>>> > >>>> -- Sean Silva > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >>> > >>> > >>> > >>> -- > >>> Hal Finkel > >>> Assistant Computational Scientist > >>> Leadership Computing Facility > >>> Argonne National Laboratory > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > -- > Sanjoy Das > http://playingwithpointers.com > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/8453eef5/attachment.html>
Sanjoy Das via llvm-dev
2016-Jun-09 00:54 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
Hi David, On Wed, Jun 8, 2016 at 4:20 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> Is it in the category of invalidating the iterator while iterating' which > feels very wrong to me. We should avoid going there and find better ways to > solve the motivating problems (perhaps defining them more clearly first ?).I'm not a 100% sure of what you meant by that, so I'll try to give a general answer, and hope that it covers the points you wanted to raise. :) In the scheme above I'm not trying to solve iterator invalidation -- I'm trying to solve the following problem: the CGSCC pass manager ran the inliner and a set of function passes on a function, and they did something to invalidate the RefSCC we were iterating over[0]. How do we _continue_ our iteration over this now non-existent RefSCC? The solution I'm trying to propose is this: The only possibility for invalidation is that the RefSCC was broken up into a forest of RefSCC-DAGs[1]. This means if we had a way to get to the leaves of this forest of RefSCCs, we could restart our iteration from there (I've tacitly assumed we're interested in a bottom-up SCC order). This may be difficult in general, but my idea was to "cheat" and explicitly remember the Ref-SCC-forest a RefSCC was broken down into when we do that invalidation. Then once an RefSCC is split, we can pick up the forest from the original RefSCC* (which is otherwise useless now), gather the leaves, and re-start our iteration from those leaves. This leaves the question of what to do with the SCC DAG nested inside the RefSCC. I'm not sure what Chandler has in mind in how much influence these should have over the iteration order, but if we wanted to iterate over the SCC-DAG in bottom up order as well (as we iterated over a single RefSCC), we could have the same scheme to handle SCC-splits, and a similar scheme to handle SCC-merges (when you merge an SCC, the SCC that gets cleaned out gets a pointer to the SCC where all the functions went, and if the SCC you were iterating over gets such a pointer after running the inliner/FPM you chase that pointer (possibly multiple times, if more than one SCC was merged) and re-start iteration over that SCC). By "incident data structure" I meant that with these additions the RefSCC or SCC is no longer a "pure" function of the structure of the module, but has state that is a function of what the pass manager did which is not ideal. That is, in theory this isn't significantly cleaner than the passes reaching out into and changing the CGSCC pass manager's state, but perhaps we are okay with this kind of design for practicality's sake? [0]: One question I don't know the answer to -- how will we detect that something has removed a call or ref edge? Will we rescan functions to verify that edges that we though existed still exist? Or will we have a ValueHandles-like scheme? [1]: As Sean mentioned, by design nothing in the function pass manaager pipeline could have invalidated the RefSCC by merging it with other RefSCCs. -- Sanjoy
Xinliang David Li via llvm-dev
2016-Jun-09 03:59 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Wed, Jun 8, 2016 at 5:54 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi David, > > On Wed, Jun 8, 2016 at 4:20 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > > Is it in the category of invalidating the iterator while iterating' which > > feels very wrong to me. We should avoid going there and find better > ways to > > solve the motivating problems (perhaps defining them more clearly first > ?). > > I'm not a 100% sure of what you meant by that, so I'll try to give a > general answer, and hope that it covers the points you wanted to > raise. :) > > In the scheme above I'm not trying to solve iterator invalidation -- > I'm trying to solve the following problem: the CGSCC pass manager ran > the inliner and a set of function passes on a function, and they did > something to invalidate the RefSCC we were iterating over[0]. How do > we _continue_ our iteration over this now non-existent RefSCC? >What you described is: "the iterator pointing to the current SCC gets invalidated because the pointee disappears, so we need to find a way to let continue working" - - so it does sound like it is trying to solve the iterator invalidation problem? What I suggested is that we should try to avoid getting into that situation in the first place -- not trying to find a solution for the problem we introduce in the design.> > The solution I'm trying to propose is this: The only possibility for > invalidation is that the RefSCC was broken up into a forest of > RefSCC-DAGs[1]. This means if we had a way to get to the leaves of > this forest of RefSCCs, we could restart our iteration from there > (I've tacitly assumed we're interested in a bottom-up SCC order). > This may be difficult in general, but my idea was to "cheat" and > explicitly remember the Ref-SCC-forest a RefSCC was broken down into > when we do that invalidation. Then once an RefSCC is split, we can > pick up the forest from the original RefSCC* (which is otherwise > useless now), gather the leaves, and re-start our iteration from those > leaves. > > This leaves the question of what to do with the SCC DAG nested inside > the RefSCC. I'm not sure what Chandler has in mind in how much > influence these should have over the iteration order, but if we wanted > to iterate over the SCC-DAG in bottom up order as well (as we iterated > over a single RefSCC), we could have the same scheme to handle > SCC-splits, and a similar scheme to handle SCC-merges (when you merge > an SCC, the SCC that gets cleaned out gets a pointer to the SCC where > all the functions went, and if the SCC you were iterating over gets > such a pointer after running the inliner/FPM you chase that pointer > (possibly multiple times, if more than one SCC was merged) and > re-start iteration over that SCC). > > By "incident data structure" I meant that with these additions the > RefSCC or SCC is no longer a "pure" function of the structure of the > module, but has state that is a function of what the pass manager did > which is not ideal. That is, in theory this isn't significantly > cleaner than the passes reaching out into and changing the CGSCC pass > manager's state, but perhaps we are okay with this kind of design for > practicality's sake? >The complexity you described above is my main source of concerns. Complicated algorithms are nice, but simplicity is better :) thanks, David [0]: One question I don't know the answer to -- how will we detect> that something has removed a call or ref edge? Will we rescan > functions to verify that edges that we though existed still exist? Or > will we have a ValueHandles-like scheme? > > [1]: As Sean mentioned, by design nothing in the function pass > manaager pipeline could have invalidated the RefSCC by merging it with > other RefSCCs. > > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/28b5dff8/attachment-0001.html>