Rafael Avila de Espindola via llvm-dev
2017-Jul-31 21:20 UTC
[llvm-dev] [RFC] Profile guided section layout
Michael Spencer via llvm-dev <llvm-dev at lists.llvm.org> writes:> I've recently implemented profile guided section layout in llvm + lld using > the Call-Chain Clustering (C³) heuristic from > https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf > . In the programs I've tested it on I've gotten from 0% to 5% performance > improvement over standard PGO with zero cases of slowdowns and up to 15% > reduction in ITLB misses. > > > There are three parts to this implementation. > > The first is a new llvm pass which uses branch frequency info to get counts > for each call instruction and then adds a module flags metatdata table of > function -> function edges along with their counts. > > The second takes the module flags metadata and writes it into a > .note.llvm.callgraph section in the object file. This currently just dumps > it as text, but could save space by reusing the string table. > > The last part is in lld. It reads the .note.llvm.callgraph data from each > object file and merges them into a single table. It then builds a call > graph based on the profile data then iteratively merges the hottest call > edges using the C³ heuristic as long as it would not create a cluster > larger than the page size. All clusters are then sorted by a density metric > to further improve locality.Since the branch frequency info is in a llvm specific format, it makes sense for llvm to read it instead of expecting lld to do it again. Since .o files is how the compiler talks to the linker, it also makes sense for llvm to record the required information there. In the same way, since the linker is the first place with global knowledge, it makes sense for it to be the one that implements a section ordering heuristic instead of just being told by some other tool, which would complicate the build. However, do we need to start with instrumentation? The original paper uses sampling with good results and current intel cpus can record every branch in a program. I would propose starting with just an lld patch that reads the call graph from a file. The format would be very similar to what you propose, just weight,caller,callee. In a another patch we can then look at instrumentation: Why it is more convenient for some uses and what performance advantage it might have. I have written a small tool that usesr intel_bts and 'perf script' to construct the callgraph. I am giving it a try with your lld patch and will hopefully post results today. Cheers, Rafael
Michael Spencer via llvm-dev
2017-Aug-01 21:25 UTC
[llvm-dev] [RFC] Profile guided section layout
On Tue, Aug 1, 2017 at 1:57 PM, Justin Bogner <mail at justinbogner.com> wrote:> Rafael Avila de Espindola via llvm-dev <llvm-dev at lists.llvm.org> writes: > > Michael Spencer via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > >> I've recently implemented profile guided section layout in llvm + lld > using > >> the Call-Chain Clustering (C³) heuristic from > >> https://research.fb.com/wp-content/uploads/2017/01/ > cgo2017-hfsort-final1.pdf > >> . In the programs I've tested it on I've gotten from 0% to 5% > performance > >> improvement over standard PGO with zero cases of slowdowns and up to 15% > >> reduction in ITLB misses. > >> > >> > >> There are three parts to this implementation. > >> > >> The first is a new llvm pass which uses branch frequency info to get > counts > >> for each call instruction and then adds a module flags metatdata table > of > >> function -> function edges along with their counts. > >> > >> The second takes the module flags metadata and writes it into a > >> .note.llvm.callgraph section in the object file. This currently just > dumps > >> it as text, but could save space by reusing the string table. > >> > >> The last part is in lld. It reads the .note.llvm.callgraph data from > each > >> object file and merges them into a single table. It then builds a call > >> graph based on the profile data then iteratively merges the hottest call > >> edges using the C³ heuristic as long as it would not create a cluster > >> larger than the page size. All clusters are then sorted by a density > metric > >> to further improve locality. > > > > Since the branch frequency info is in a llvm specific format, it makes > > sense for llvm to read it instead of expecting lld to do it again. Since > > .o files is how the compiler talks to the linker, it also makes sense > > for llvm to record the required information there. > > > > In the same way, since the linker is the first place with global > > knowledge, it makes sense for it to be the one that implements a section > > ordering heuristic instead of just being told by some other tool, which > > would complicate the build. > > > > However, do we need to start with instrumentation? The original paper > > uses sampling with good results and current intel cpus can record every > > branch in a program. > > This already works without instrumentation. You can probably try it out > as is with profiles generated with linux perf using the create_llvm_prof > tool from the autofdo work: https://github.com/google/autofdoI'm pretty sure by "start with instrumentation" he means start with the restrictions it imposes on having to traffic the data through the object file. - Michael Spencer> > > > I would propose starting with just an lld patch that reads the call > > graph from a file. The format would be very similar to what you propose, > > just weight,caller,callee. > > > > In a another patch we can then look at instrumentation: Why it is more > > convenient for some uses and what performance advantage it might have. > > > > I have written a small tool that usesr intel_bts and 'perf script' to > > construct the callgraph. I am giving it a try with your lld patch and > > will hopefully post results today. > > > > Cheers, > > Rafael > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/67526449/attachment-0001.html>