Sean Silva via llvm-dev
2016-Jun-08 21:54 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Wed, Jun 8, 2016 at 9:39 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote: > >> Hi Chandler, Philip, Mehdi, (and llvm-dev,) >> >> (this is partially a summary of some discussions that happened at the >> last LLVM bay area social, and partially a discussion about the direction >> of the CGSCC pass manager) >> >> >> A the last LLVM social we discussed the progress on the CGSCC pass >> manager. It seems like Chandler has a CGSCC pass manager working, but it is >> still unresolved exactly which semantics we want (more about this below) >> that are reasonably implementable. >> >> AFAICT, there has been no public discussion about what exact semantics we >> ultimately want to have. We should figure that out. >> >> The main difficulty which Chandler described is the apparently quite >> complex logic surrounding needing to run function passes nested within an >> SCC pass manager, while providing some guarantees about exactly what order >> the function passes are run. The existing CGSCC pass manager just punts on >> some of the problems that arise (look in CGPassManager::runOnModule, >> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in >> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that >> Chandler has been trying to solve. >> >> ( >> Why is this "function passes inside CGSCC passes" stuff interesting? >> Because LLVM can do inlining on an SCC (often just a single function) and >> then run function passes to simplify the function(s) in the SCC before it >> tries to inline into a parent SCC. (the SCC visitation order is post-order) >> For example, we may inline a bunch of code, but after inlining we can >> tremendously simplify the function, and we want to do so before considering >> this function for inlining into its callers so that we get an accurate >> evaluation of the inline cost. >> Based on what Chandler said, it seems that LLVM is fairly unique in this >> regard and other compilers don't do this (which is why we can't just look >> at how other compilers solve this problem; they don't have this problem >> (maybe they should? or maybe we shouldn't?)). For example, he described >> that GCC uses different inlining "phases"; e.g. it does early inlining on >> the entire module, then does simplifications on the entire module, then >> does late inlining on the entire module; so it is not able to incrementally >> simplify as it inlines like LLVM does. >> ) >> >> As background for what is below, the LazyCallGraph tracks two graphs: the >> "call graph" and the "ref graph". >> Conceptually, the call graph is the graph of direct calls, where indirect >> calls and calls to external functions do not appear (or are connected to >> dummy nodes). The ref graph is basically the graph of all functions >> transitively accessible based on the globals/constants/etc. referenced by a >> function (e.g. if a function `foo` references a vtable that is defined in >> the module, there is an edge in the ref graph from `foo` to every function >> in the vtable). >> The call graph is a strict subset of the ref graph. >> >> Chandler described that he had a major breakthrough in that the CGSCC >> pass manager only had to deal with 3 classes of modifications that can >> occur: >> - a pass may e.g. propagate a load of a function pointer into an indirect >> call, turning it into an direct call. This requires adding an edge in the >> CG but not in the ref graph. >> - a pass may take a direct call and turn it into an indirect call. This >> requires removing an edge from the CG, but not in the ref graph. >> - a pass may delete a direct call. This removes an edge in the CG and >> also in the ref graph. >> >> From the perspective of the CGSCC pass manager, these operations can >> affect the SCC structure. Adding an edge might merge SCC's and deleting an >> edge might split SCC's. Chandler mentioned that apparently the issues of >> splitting and merging SCC's within the current infrastructure are actually >> quite challenging and lead to e.g. iterator invalidation issues, and that >> is what he is working on. >> >> ( >> The ref graph is important to guide the overall SCC visitation order >> because it basically represents "the largest graph that the CG may turn >> into due to our static analysis of this module". I.e. no transformation we >> can statically make in the CGSCC passes can ever cause us to need to merge >> SCC's in the ref graph. >> ) >> >> >> >> I have a couple overall questions/concerns: >> >> >> 1. The ref graph can easily go quadratic. E.g. >> >> typedef void (*fp)(); >> fp funcs[] = { >> &foo1, >> &foo2, >> ... >> &fooN >> } >> void foo1() { funcs[something](); } >> void foo2() { funcs[something](); } >> ... >> void fooN() { funcs[something](); } >> >> One real-world case where this might come about is in the presence of >> vtables. >> >> The existing CGSCC pass manager does not have this issue AFAIK because it >> does not consider the ref graph. >> >> Does anybody have any info/experience about how densely connected the ref >> graph can get in programs that might reasonably be fed to the compiler? >> > > > I can state that almost all call graphs of compilers include edges for > indirect calls and external functions, so they are already quadratic in > this sense. > > if what you state is correct, and we don't have a conservatively correct > call graph, that would be ... interesting. >Mehdi clarified downthread the situation (at least for me). Now that I look more closely at the comments in https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h it intentionally does not model a call graph in this sense. I.e. - in a traditional CG, an edge means something like "at runtime there may be a call from A->B" - in the LazyCallGraph an edge (a "ref edge" as it calls it) represents something like "during optimization of this module, we may discover the existence of a direct call from A->B". There is also a distinguished subgraph of the ref graph (which I think LazyCallGraph calls just the "call graph") which represents the actual direct calls that are present in the module currently. The comments in LazyCallGraph.h are quite good, but the existing CallGraph.h doesn't seem to touch on this up front in its comments in quite the same way, but it does at least say that it models with two external nodes like Mehdi mentioned: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/CallGraph.h#L30 So its edges don't represent "at runtime there may be a call from A->B". But since it doesn't maintain a "ref graph" I'm not sure what the edges exactly represent. -- Sean Silva> The solution to most issues (large sccs, etc) that exist here that most > other compilers take is to try to make the call graph more precise, not to > avoid indirect/external calls in the call graph. > > In turn, this means the solution often take is to not have two graphs at > all. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/f812285c/attachment.html>
Mehdi Amini via llvm-dev
2016-Jun-08 22:10 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
> On Jun 8, 2016, at 2:54 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Wed, Jun 8, 2016 at 9:39 AM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> wrote: > > > On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote: > Hi Chandler, Philip, Mehdi, (and llvm-dev,) > > (this is partially a summary of some discussions that happened at the last LLVM bay area social, and partially a discussion about the direction of the CGSCC pass manager) > > > A the last LLVM social we discussed the progress on the CGSCC pass manager. It seems like Chandler has a CGSCC pass manager working, but it is still unresolved exactly which semantics we want (more about this below) that are reasonably implementable. > > AFAICT, there has been no public discussion about what exact semantics we ultimately want to have. We should figure that out. > > The main difficulty which Chandler described is the apparently quite complex logic surrounding needing to run function passes nested within an SCC pass manager, while providing some guarantees about exactly what order the function passes are run. The existing CGSCC pass manager just punts on some of the problems that arise (look in CGPassManager::runOnModule, CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that Chandler has been trying to solve. > > ( > Why is this "function passes inside CGSCC passes" stuff interesting? Because LLVM can do inlining on an SCC (often just a single function) and then run function passes to simplify the function(s) in the SCC before it tries to inline into a parent SCC. (the SCC visitation order is post-order) > For example, we may inline a bunch of code, but after inlining we can tremendously simplify the function, and we want to do so before considering this function for inlining into its callers so that we get an accurate evaluation of the inline cost. > Based on what Chandler said, it seems that LLVM is fairly unique in this regard and other compilers don't do this (which is why we can't just look at how other compilers solve this problem; they don't have this problem (maybe they should? or maybe we shouldn't?)). For example, he described that GCC uses different inlining "phases"; e.g. it does early inlining on the entire module, then does simplifications on the entire module, then does late inlining on the entire module; so it is not able to incrementally simplify as it inlines like LLVM does. > ) > > As background for what is below, the LazyCallGraph tracks two graphs: the "call graph" and the "ref graph". > Conceptually, the call graph is the graph of direct calls, where indirect calls and calls to external functions do not appear (or are connected to dummy nodes). The ref graph is basically the graph of all functions transitively accessible based on the globals/constants/etc. referenced by a function (e.g. if a function `foo` references a vtable that is defined in the module, there is an edge in the ref graph from `foo` to every function in the vtable). > The call graph is a strict subset of the ref graph. > > Chandler described that he had a major breakthrough in that the CGSCC pass manager only had to deal with 3 classes of modifications that can occur: > - a pass may e.g. propagate a load of a function pointer into an indirect call, turning it into an direct call. This requires adding an edge in the CG but not in the ref graph. > - a pass may take a direct call and turn it into an indirect call. This requires removing an edge from the CG, but not in the ref graph. > - a pass may delete a direct call. This removes an edge in the CG and also in the ref graph. > > From the perspective of the CGSCC pass manager, these operations can affect the SCC structure. Adding an edge might merge SCC's and deleting an edge might split SCC's. Chandler mentioned that apparently the issues of splitting and merging SCC's within the current infrastructure are actually quite challenging and lead to e.g. iterator invalidation issues, and that is what he is working on. > > ( > The ref graph is important to guide the overall SCC visitation order because it basically represents "the largest graph that the CG may turn into due to our static analysis of this module". I.e. no transformation we can statically make in the CGSCC passes can ever cause us to need to merge SCC's in the ref graph. > ) > > > > I have a couple overall questions/concerns: > > > 1. The ref graph can easily go quadratic. E.g. > > typedef void (*fp)(); > fp funcs[] = { > &foo1, > &foo2, > ... > &fooN > } > void foo1() { funcs[something](); } > void foo2() { funcs[something](); } > ... > void fooN() { funcs[something](); } > > One real-world case where this might come about is in the presence of vtables. > > The existing CGSCC pass manager does not have this issue AFAIK because it does not consider the ref graph. > > Does anybody have any info/experience about how densely connected the ref graph can get in programs that might reasonably be fed to the compiler? > > > I can state that almost all call graphs of compilers include edges for indirect calls and external functions, so they are already quadratic in this sense. > > if what you state is correct, and we don't have a conservatively correct call graph, that would be ... interesting. > > Mehdi clarified downthread the situation (at least for me). > > Now that I look more closely at the comments in https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h <https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h> it intentionally does not model a call graph in this sense. I.e. > - in a traditional CG, an edge means something like "at runtime there may be a call from A->B" > - in the LazyCallGraph an edge (a "ref edge" as it calls it) represents something like "during optimization of this module, we may discover the existence of a direct call from A->B". There is also a distinguished subgraph of the ref graph (which I think LazyCallGraph calls just the "call graph") which represents the actual direct calls that are present in the module currently. > > The comments in LazyCallGraph.h are quite good, but the existing CallGraph.h doesn't seem to touch on this up front in its comments in quite the same way, but it does at least say that it models with two external nodes like Mehdi mentioned: > https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/CallGraph.h#L30 <https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/CallGraph.h#L30> > So its edges don't represent "at runtime there may be a call from A->B". But since it doesn't maintain a "ref graph" I'm not sure what the edges exactly represent.I thought of it as the edges are "there is a direct call from A -> B". Which is a subset of "at runtime there may be a call from A->B". I think that with all this discussion, it is important to distinguish that (I think) there is no "correctness" issue at stance (we won't miscompile anything), but there may be missing optimization in some cases. I think the current scheme catches most cases and when it does not we are just missing potential inlining. The question may be how much (more) cases we really need to catch with the new pass manager? And could a first implementation not catch everything and be improved incrementally? This comes back somehow to what Hal was mentioning (reproducing the current behavior before improving it). -- Mehdi> > > The solution to most issues (large sccs, etc) that exist here that most other compilers take is to try to make the call graph more precise, not to avoid indirect/external calls in the call graph. > > In turn, this means the solution often take is to not have two graphs at all.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/73c66b98/attachment.html>
Sean Silva via llvm-dev
2016-Jun-08 23:48 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Wed, Jun 8, 2016 at 3:10 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Jun 8, 2016, at 2:54 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Wed, Jun 8, 2016 at 9:39 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > >> >> >> On Wed, Jun 8, 2016 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote: >> >>> Hi Chandler, Philip, Mehdi, (and llvm-dev,) >>> >>> (this is partially a summary of some discussions that happened at the >>> last LLVM bay area social, and partially a discussion about the direction >>> of the CGSCC pass manager) >>> >>> >>> A the last LLVM social we discussed the progress on the CGSCC pass >>> manager. It seems like Chandler has a CGSCC pass manager working, but it is >>> still unresolved exactly which semantics we want (more about this below) >>> that are reasonably implementable. >>> >>> AFAICT, there has been no public discussion about what exact semantics >>> we ultimately want to have. We should figure that out. >>> >>> The main difficulty which Chandler described is the apparently quite >>> complex logic surrounding needing to run function passes nested within an >>> SCC pass manager, while providing some guarantees about exactly what order >>> the function passes are run. The existing CGSCC pass manager just punts on >>> some of the problems that arise (look in CGPassManager::runOnModule, >>> CGPassManager::RunAllPassesOnSCC, and CGPassManager::RunPassOnSCC in >>> llvm/lib/Analysis/CallGraphSCCPass.cpp), and these are the problems that >>> Chandler has been trying to solve. >>> >>> ( >>> Why is this "function passes inside CGSCC passes" stuff interesting? >>> Because LLVM can do inlining on an SCC (often just a single function) and >>> then run function passes to simplify the function(s) in the SCC before it >>> tries to inline into a parent SCC. (the SCC visitation order is post-order) >>> For example, we may inline a bunch of code, but after inlining we can >>> tremendously simplify the function, and we want to do so before considering >>> this function for inlining into its callers so that we get an accurate >>> evaluation of the inline cost. >>> Based on what Chandler said, it seems that LLVM is fairly unique in this >>> regard and other compilers don't do this (which is why we can't just look >>> at how other compilers solve this problem; they don't have this problem >>> (maybe they should? or maybe we shouldn't?)). For example, he described >>> that GCC uses different inlining "phases"; e.g. it does early inlining on >>> the entire module, then does simplifications on the entire module, then >>> does late inlining on the entire module; so it is not able to incrementally >>> simplify as it inlines like LLVM does. >>> ) >>> >>> As background for what is below, the LazyCallGraph tracks two graphs: >>> the "call graph" and the "ref graph". >>> Conceptually, the call graph is the graph of direct calls, where >>> indirect calls and calls to external functions do not appear (or are >>> connected to dummy nodes). The ref graph is basically the graph of all >>> functions transitively accessible based on the globals/constants/etc. >>> referenced by a function (e.g. if a function `foo` references a vtable that >>> is defined in the module, there is an edge in the ref graph from `foo` to >>> every function in the vtable). >>> The call graph is a strict subset of the ref graph. >>> >>> Chandler described that he had a major breakthrough in that the CGSCC >>> pass manager only had to deal with 3 classes of modifications that can >>> occur: >>> - a pass may e.g. propagate a load of a function pointer into an >>> indirect call, turning it into an direct call. This requires adding an edge >>> in the CG but not in the ref graph. >>> - a pass may take a direct call and turn it into an indirect call. This >>> requires removing an edge from the CG, but not in the ref graph. >>> - a pass may delete a direct call. This removes an edge in the CG and >>> also in the ref graph. >>> >>> From the perspective of the CGSCC pass manager, these operations can >>> affect the SCC structure. Adding an edge might merge SCC's and deleting an >>> edge might split SCC's. Chandler mentioned that apparently the issues of >>> splitting and merging SCC's within the current infrastructure are actually >>> quite challenging and lead to e.g. iterator invalidation issues, and that >>> is what he is working on. >>> >>> ( >>> The ref graph is important to guide the overall SCC visitation order >>> because it basically represents "the largest graph that the CG may turn >>> into due to our static analysis of this module". I.e. no transformation we >>> can statically make in the CGSCC passes can ever cause us to need to merge >>> SCC's in the ref graph. >>> ) >>> >>> >>> >>> I have a couple overall questions/concerns: >>> >>> >>> 1. The ref graph can easily go quadratic. E.g. >>> >>> typedef void (*fp)(); >>> fp funcs[] = { >>> &foo1, >>> &foo2, >>> ... >>> &fooN >>> } >>> void foo1() { funcs[something](); } >>> void foo2() { funcs[something](); } >>> ... >>> void fooN() { funcs[something](); } >>> >>> One real-world case where this might come about is in the presence of >>> vtables. >>> >>> The existing CGSCC pass manager does not have this issue AFAIK because >>> it does not consider the ref graph. >>> >>> Does anybody have any info/experience about how densely connected the >>> ref graph can get in programs that might reasonably be fed to the compiler? >>> >> >> >> I can state that almost all call graphs of compilers include edges for >> indirect calls and external functions, so they are already quadratic in >> this sense. >> >> if what you state is correct, and we don't have a conservatively correct >> call graph, that would be ... interesting. >> > > Mehdi clarified downthread the situation (at least for me). > > Now that I look more closely at the comments in > https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LazyCallGraph.h > it intentionally does not model a call graph in this sense. I.e. > - in a traditional CG, an edge means something like "at runtime there may > be a call from A->B" > - in the LazyCallGraph an edge (a "ref edge" as it calls it) represents > something like "during optimization of this module, we may discover the > existence of a direct call from A->B". There is also a distinguished > subgraph of the ref graph (which I think LazyCallGraph calls just the "call > graph") which represents the actual direct calls that are present in the > module currently. > > The comments in LazyCallGraph.h are quite good, but the existing > CallGraph.h doesn't seem to touch on this up front in its comments in quite > the same way, but it does at least say that it models with two external > nodes like Mehdi mentioned: > > https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/CallGraph.h#L30 > So its edges don't represent "at runtime there may be a call from A->B". > But since it doesn't maintain a "ref graph" I'm not sure what the edges > exactly represent. > > > > I thought of it as the edges are "there is a direct call from A -> B". > Which is a subset of "at runtime there may be a call from A->B". > > I think that with all this discussion, it is important to distinguish that > (I think) there is no "correctness" issue at stance (we won't miscompile > anything) >This is a good point worth explicitly noting (and hopefully there are in fact no passes relying on it to mean "at runtime there may be a call from A->B").> , but there may be missing optimization in some cases. I think the current > scheme catches most cases and when it does not we are just missing > potential inlining. The question may be how much (more) cases we really > need to catch with the new pass manager? >I think this only affects inlining of mutually recursive functions. Most functions are not mutually recursive [citation needed] so I'm not too worried. (The main performance-critical case that I can think of using mutual recursion would be parsers). -- Sean Silva> And could a first implementation not catch everything and be improved > incrementally? This comes back somehow to what Hal was mentioning > (reproducing the current behavior before improving it). > > -- > Mehdi > > > > > > > >> The solution to most issues (large sccs, etc) that exist here that most >> other compilers take is to try to make the call graph more precise, not to >> avoid indirect/external calls in the call graph. >> >> In turn, this means the solution often take is to not have two graphs at >> all. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/793d0aff/attachment.html>