RFC: Inlining Report Motivation Making good inlining choices while optimizing an application is often key to achieving optimal performance. While the compiler's default inlining heuristics sometimes provide great out-of-box results, optimal performance is sometimes achieved only after varying the settings of certain compiler options related to inlining or adding "always_inline" or "noinline" attributes to certain functions. Before we can determine how we need change the compiler's inlining choices to get better performance for an application, we need to have a clear picture of the compiler's inlining choices and what motivated them. Many compilers like LLVM and GCC provide informational notes when a function is inlined, but these notes provide only a "blow by blow" description of what the compiler did, rather than a high level illustration of the result. This high level picture can be provided by an inlining report. Over the years, I've worked with several compilers that provide inlining reports, and I can attest that the customers using those compilers have found them to be invaluable tool in investigating and improving their applications' performance. In addition, the inlining report can be used by compiler developers to visualize and improve the compiler's default heuristics and option values. For these reasons, I'd like to contribute code to LLVM to generate an inlining report as part of the inliner. Description The inlining report I am proposing contains the following information: (1) The values of the principle threshold options which affect how much inlining is done under various circumstances (2) Whether each function is compiled or has been eliminated by dead static function elimination. (3) For each function, the call sites that were and were not inlined. Since inlining a call site can expose other call sites for inlining, the inlining report also reports on whether these exposed call sites have been inlined or not. This information is presented in hierarchical manner. (4) For each call site, we include the principle reason the call site was or was not inlined, together with any cost vs . threshold computation that was done. High Level Design The inline report is created if the option -inline-report=X is passed on command line with a positive integer value of X. If X is 0, or this option is not specified, the Inliner does not create or perform any operations on the inline report, and there is no compile time overhead. Three main classes are used to implement the inline report: class InlineReportCallSite This class contains the inlining report information specific to a particular CallSite CS, including: (1) A bool indicating whether or not the CallSite was or was not inlined (2) An inlining reason indicating why the CallSite was or was not inlined (3) The inlining cost, outer inlining cost, and threshold values used in calculating the profitability of inlining (4) A vector of InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite exposed when CS was inlined. class InlineReportFunction This class contains the inlining report information specific to a particular Function F in the call graph, including: (1) A bool indicating whether the function has been dead static eliminated. (2) A vector of call InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite that appeared in F before any inlining was applied. class InlineReport The main class which summarizes the high level information in the inline report, including: (1) The values of the inlining threshold options (2) The "level" of the inlining report, which is a bit vector of feature options. For example, whether to print external functions and intrinsics, whether to print the inlining reasons, etc. (3) A map MF from each Function* to InlineReportFunction* (4) A map MCS from each CallSite* to InlineReportCallSite* In addition, the class InlineCost (from InlineCost.h) is augmented to include the primary reason a call site was inlined. The class Inliner has been augmented with an InlineReport, which is created when an Inliner is constructed. The InlineReport is updated using calls to the member functions of these three classes in Inliner::runOnSCC() and the functions called by it. Before any inlining is done in a particular call to runOnSCC(), the map MF is updated so that each Function (caller or callee) that will be examined for inlining has a corresponding InlineReportFunction in the map. (The map MCS is also updated in a similar way, but only when a Function is actually inlined.) The Inliner determines if a CallSite should be inlined by first calling Inliner::ShouldInline(). This calls getInlineCost() which returns an InlineCost, which now includes the reason the call site should or should not be inlined. This reason, as well and costs and threshold from the InlineCost are stored in the InlineReportCallSite for the CallSite. Then Inliner calls the static function InlineCallPossible(). If the inlining was not performed, the reason for not inlining is recorded in the InlineReportCallSite corresponding to the CallSite. If the inlining was performed, the corresponding InlineReportCallSite is marked as inlined, and it is populated with the InlineReportCallSites corresponding to the newly exposed CallSites that were created during the inlining. The InlineReport is printed during the call to Inliner::doFinalization(). Since the compiler can run any number of optimizations between two successive calls to runOnSCC(), the Instructions corresponding to CallSites can be deleted by the optimizations. Callbacks are used to mark the corresponding InlineReportCallSites as deleted when this happens. Example Here is an example of abbreviated inlining report that is generated in my locally modified copy of the LLVM sources. I generated this by compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of brevity, I didn't include all of the report. Omitted parts are indicated by .... in the report.) ---- Begin Inlining Report ---- Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit DEAD STATIC FUNC: copyFileName DEAD STATIC FUNC: showFileNames DEAD STATIC FUNC: stat .... COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit .... COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] .... COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strlen -> INLINE: myMalloc (-14925<=225) <<Callee has single callsite and local linkage>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strcpy -> snocString [[Callee is never inline]] ..... ---- End Inlining Report ------ Here is an explanation of some of the features: (1) Option values Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 The report begins with a list of the most relevant option values to inlining. (2) Compiled and dead functions COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit Functions in the file are identified as either being compiled or eliminated by dead static function elimination. (3) External function calls COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close Calls to externally defined functions are indicated by the word EXTERN. These lines can optionally be omitted. (4) Inlining and nesting COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc Inlined functions are marked INLINE. The inlining of a function within other inlined functions is shown clearly in the report using indentation. (5) Reasons functions were and were not inlined COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit .... COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] The principal reason a function was or was not inlined can be optionally displayed in the report. The reason a function was inlined is indicated in double angle brackets << >>. The reason a function was not inlined is indicated in double square brackets [[ ]]. When a comparison of the cost and threshold was used to determine if the function should be inlined, the comparison done is given. (Since intrinsics are never inlined, information about them can be suppressed in the report.) The reasons for or for not inlining can optionally be displayed on the same line as the function considered for inlining for easy analysis using grep, awk, etc. (6) Line and column info COMPILE FUNC: outOfMemory -> EXTERN: fprintf bzip2.c(1016,4) -> showFileNames bzip2.c(1019,4) [[Callee is never inline]] -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]] Optionally, file, line, and column info can be provided for call sites if source position information is present (using -g or -gline-tables-only). I would appreciate any comments you have on whether you support the inclusion of an inline report in LLVM, the form and features I have outlined above, and your thoughts on the high level design. Thank you in advance for your comments, Robert Cox robert.cox at intel.com<mailto:robert.cox at intel.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151022/ab262157/attachment.html>
----- Original Message -----> From: "Robert via llvm-dev Cox" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Thursday, October 22, 2015 1:25:05 PM > Subject: [llvm-dev] RFC: Inlining report> RFC: Inlining Report> Motivation> Making good inlining choices while optimizing an application is often > key to achieving optimal performance. While the compiler’s default > inlining heuristics sometimes provide great out-of-box results, > optimal performance is sometimes achieved only after varying the > settings of certain compiler options related to inlining or adding > “always_inline” or “noinline” attributes to certain functions.> Before we can determine how we need change the compiler’s inlining > choices to get better performance for an application, we need to > have a clear picture of the compiler’s inlining choices and what > motivated them. Many compilers like LLVM and GCC provide > informational notes when a function is inlined, but these notes > provide only a “blow by blow” description of what the compiler did, > rather than a high level illustration of the result. This high level > picture can be provided by an inlining report.> Over the years, I’ve worked with several compilers that provide > inlining reports, and I can attest that the customers using those > compilers have found them to be invaluable tool in investigating and > improving their applications’ performance. In addition, the inlining > report can be used by compiler developers to visualize and improve > the compiler’s default heuristics and option values.I agree, these can be extremely useful. Generically speaking, I would very much like to see Clang/LLVM grow the ability to provide optimization reports (including those where source lines are annotated with information on what was vectorized, eliminated, etc.). A few comments: 1. Inlining is iterative. Thus, I assume that your report might include information from multiple inlining passes. Is that correct? 2. Inlining costs are target specific (because it uses TTI costs), so it would be useful for the report to include the target architecture (as well as information on the LLVM version, name of the input file, etc.) 3. And this is the big one: Where should the infrastructure for this live? One of my goals when defining the 'informational note' infrastructure in LLVM, was to construct it such that the information was not just presentable to humans, but also so that it could be programmatically consumed. This is why we designed it with a class hierarchy: so that the "messages" could be more than just messages. The rationale was that there is information, necessary for presenting useful feedback to humans, that only the frontend has. For C++ codes, for example, you need to do symbol demangling. The frontend is probably the best place to do that. The frontend also knows the proper place to write output files. In addition, specifically for inlining information, the frontend knows where functions are defined without the need for debug information. My preference, therefore, is to make sure that the inliner generates sufficiently-detailed messages using a proper class hierarchy and sufficient information. Then, in Clang, we can collect those messages, demangle function names and add source-location information, and produce a report. Thoughts? Thanks again, Hal> For these reasons, I’d like to contribute code to LLVM to generate an > inlining report as part of the inliner.> Description> The inlining report I am proposing contains the following > information:> (1) The values of the principle threshold options which affect how > much inlining is done under various circumstances > (2) Whether each function is compiled or has been eliminated by dead > static function elimination. > (3) For each function, the call sites that were and were not inlined. > Since inlining a call site can expose other call sites for inlining, > the inlining report also reports on whether these exposed call sites > have been inlined or not. This information is presented in > hierarchical manner. > (4) For each call site, we include the principle reason the call site > was or was not inlined, together with any cost vs . threshold > computation that was done.> High Level Design> The inline report is created if the option –inline-report=X is passed > on command line with a positive integer value of X. If X is 0, or > this option is not specified, the Inliner does not create or perform > any operations on the inline report, and there is no compile time > overhead.> Three main classes are used to implement the inline report:> class InlineReportCallSite> This class contains the inlining report information specific to a > particular CallSite CS, including: > (1) A bool indicating whether or not the CallSite was or was not > inlined > (2) An inlining reason indicating why the CallSite was or was not > inlined > (3) The inlining cost, outer inlining cost, and threshold values used > in calculating the profitability of inlining > (4) A vector of InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite exposed when CS was inlined.> class InlineReportFunction> This class contains the inlining report information specific to a > particular Function F in the call graph, including: > (1) A bool indicating whether the function has been dead static > eliminated. > (2) A vector of call InlineReportCallSite*, each of which points to > an InlineReportCallSite for a CallSite that appeared in F before any > inlining was applied.> class InlineReport> The main class which summarizes the high level information in the > inline report, including: > (1) The values of the inlining threshold options > (2) The “level” of the inlining report, which is a bit vector of > feature options. For example, whether to print external functions > and intrinsics, whether to print the inlining reasons, etc. > (3) A map MF from each Function* to InlineReportFunction* > (4) A map MCS from each CallSite* to InlineReportCallSite*> In addition, the class InlineCost (from InlineCost.h) is augmented to > include the primary reason a call site was inlined.> The class Inliner has been augmented with an InlineReport, which is > created when an Inliner is constructed. The InlineReport is updated > using calls to the member functions of these three classes in > Inliner::runOnSCC() and the functions called by it.> Before any inlining is done in a particular call to runOnSCC(), the > map MF is updated so that each Function (caller or callee) that will > be examined for inlining has a corresponding InlineReportFunction in > the map. (The map MCS is also updated in a similar way, but only > when a Function is actually inlined.)> The Inliner determines if a CallSite should be inlined by first > calling Inliner::ShouldInline(). This calls getInlineCost() which > returns an InlineCost, which now includes the reason the call site > should or should not be inlined. This reason, as well and costs and > threshold from the InlineCost are stored in the InlineReportCallSite > for the CallSite.> Then Inliner calls the static function InlineCallPossible(). If the > inlining was not performed, the reason for not inlining is recorded > in the InlineReportCallSite corresponding to the CallSite. If the > inlining was performed, the corresponding InlineReportCallSite is > marked as inlined, and it is populated with the > InlineReportCallSites corresponding to the newly exposed CallSites > that were created during the inlining.> The InlineReport is printed during the call to > Inliner::doFinalization().> Since the compiler can run any number of optimizations between two > successive calls to runOnSCC(), the Instructions corresponding to > CallSites can be deleted by the optimizations. Callbacks are used to > mark the corresponding InlineReportCallSites as deleted when this > happens.> Example> Here is an example of abbreviated inlining report that is generated > in my locally modified copy of the LLVM sources. I generated this by > compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. > (For the sake of brevity, I didn’t include all of the report . > Omitted parts are indicated by …. in the report.)> ---- Begin Inlining Report ----> Option Values: > inline-threshold: 225 > inlinehint-threshold: 325 > inlinecold-threshold: 225 > inlineoptsize-threshold: 15> COMPILE FUNC: fopen_output_safely > -> EXTERN: open > -> EXTERN: fdopen > -> EXTERN: close> DEAD STATIC FUNC: setExit> DEAD STATIC FUNC: copyFileName> DEAD STATIC FUNC: showFileNames> DEAD STATIC FUNC: stat> ….> COMPILE FUNC: cleanUpAndFail > -> llvm.lifetime.start > [[Callee is intrinsic]] > -> INLINE: stat (35<=487) > <<Callee is single basic block>> > -> EXTERN: __xstat > -> EXTERN: fprintf > -> EXTERN: fclose > -> EXTERN: remove > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> INLINE: setExit (15<=225) > <<Inlining is profitable>> > -> EXTERN: exit> ….> COMPILE FUNC: outOfMemory > -> EXTERN: fprintf > -> INLINE: showFileNames (70<=225) > <<Inlining is profitable>> > -> EXTERN: fprintf > -> cleanUpAndFail > [[Callee is noreturn]]> ….> COMPILE FUNC: snocString > -> INLINE: mkCell (-14920<=225) > <<Callee has single callsite and local linkage>> > -> INLINE: myMalloc (70<=225) > <<Inlining is profitable>> > -> EXTERN: malloc > -> outOfMemory > [[Callee is noreturn]] > -> EXTERN: strlen > -> INLINE: myMalloc (-14925<=225) > <<Callee has single callsite and local linkage>> > -> EXTERN: malloc > -> outOfMemory > [[Callee is noreturn]] > -> EXTERN: strcpy > -> snocString > [[Callee is never inline]]> …..> ---- End Inlining Report ------> Here is an explanation of some of the features:> (1) Option values> Option Values: > inline-threshold: 225 > inlinehint-threshold: 325 > inlinecold-threshold: 225 > inlineoptsize-threshold: 15> The report begins with a list of the most relevant option values to > inlining.> (2) Compiled and dead functions> COMPILE FUNC: fopen_output_safely > -> EXTERN: open > -> EXTERN: fdopen > -> EXTERN: close> DEAD STATIC FUNC: setExit> Functions in the file are identified as either being compiled or > eliminated by dead static function elimination.> (3) External function calls> COMPILE FUNC: fopen_output_safely > -> EXTERN: open > -> EXTERN: fdopen > -> EXTERN: close> Calls to externally defined functions are indicated by the word > EXTERN. These lines can optionally be omitted.> (4) Inlining and nesting> COMPILE FUNC: snocString > -> INLINE: mkCell (-14920<=225) > <<Callee has single callsite and local linkage>> > -> INLINE: myMalloc (70<=225) > <<Inlining is profitable>> > -> EXTERN: malloc> Inlined functions are marked INLINE. The inlining of a function > within other inlined functions is shown clearly in the report using > indentation.> (5) Reasons functions were and were not inlined> COMPILE FUNC: cleanUpAndFail > -> llvm.lifetime.start > [[Callee is intrinsic]] > -> INLINE: stat (35<=487) > <<Callee is single basic block>> > -> EXTERN: __xstat > -> EXTERN: fprintf > -> EXTERN: fclose > -> EXTERN: remove > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> EXTERN: fprintf > -> INLINE: setExit (15<=225) > <<Inlining is profitable>> > -> EXTERN: exit> ….> COMPILE FUNC: outOfMemory > -> EXTERN: fprintf > -> INLINE: showFileNames (70<=225) > <<Inlining is profitable>> > -> EXTERN: fprintf > -> cleanUpAndFail > [[Callee is noreturn]]> The principal reason a function was or was not inlined can be > optionally displayed in the report. The reason a function was > inlined is indicated in double angle brackets << >>. The reason a > function was not inlined is indicated in double square brackets [[ > ]]. When a comparison of the cost and threshold was used to > determine if the function should be inlined, the comparison done is > given. (Since intrinsics are never inlined, information about them > can be suppressed in the report.) The reasons for or for not > inlining can optionally be displayed on the same line as the > function considered for inlining for easy analysis using grep, awk, > etc.> (6) Line and column info> COMPILE FUNC: outOfMemory > -> EXTERN: fprintf bzip2.c(1016,4) > -> showFileNames bzip2.c(1019,4) [[Callee is never inline]] > -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]]> Optionally, file, line, and column info can be provided for call > sites if source position information is present (using –g or > –gline-tables-only).> I would appreciate any comments you have on whether you support the > inclusion of an inline report in LLVM, the form and features I have > outlined above, and your thoughts on the high level design.> Thank you in advance for your comments,> Robert Cox > robert.cox at intel.com> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151022/3ed1ab31/attachment.html>
Robert, thanks for working on this. The feature is very useful. A couple of high level comments 1) The report should trim/prune calls to library functions by default 2) Each callsite should be annotated with the chain of inlining (context) that leads to the call. The order of callsites should also reflect the order they are exposed/handled. For instance, a -> b -> c -> d. If the inliner order (possible with iterative bottom up inlining) a -> b a -> c a -> d The report should look like: Caller : a --> b @line1 [Inlined, reason ...] --> c @line2:b at line1>> [Inlined, reason ...] --> d @line3:c at line2:b at line1 [...] @line specifies the callsite line number. Looking forward to your patch. thanks, David On Thu, Oct 22, 2015 at 11:25 AM, Cox, Robert via llvm-dev < llvm-dev at lists.llvm.org> wrote:> *RFC: Inlining Report * > > > > *Motivation * > > > > Making good inlining choices while optimizing an application is often key > to achieving optimal performance. While the compiler’s default inlining > heuristics sometimes provide great out-of-box results, optimal performance > is sometimes achieved only after varying the settings of certain compiler > options related to inlining or adding “always_inline” or “noinline” > attributes to certain functions. > > > > Before we can determine how we need change the compiler’s inlining choices > to get better performance for an application, we need to have a clear > picture of the compiler’s inlining choices and what motivated them. Many > compilers like LLVM and GCC provide *informational notes *when a function > is inlined, but these notes provide only a “blow by blow” description of > what the compiler did, rather than a high level illustration of the result. > This high level picture can be provided by an *inlining report. * > > > > Over the years, I’ve worked with several compilers that provide inlining > reports, and I can attest that the customers using those compilers have > found them to be invaluable tool in investigating and improving their > applications’ performance. In addition, the inlining report can be used > by compiler developers to visualize and improve the compiler’s default > heuristics and option values. > > > > For these reasons, I’d like to contribute code to LLVM to generate an > inlining report as part of the inliner. > > > > *Description * > > > > The inlining report I am proposing contains the following information: > > > > (1) The values of the principle threshold options which affect how > much inlining is done under various circumstances > > (2) Whether each function is compiled or has been eliminated by dead > static function elimination. > > (3) For each function, the call sites that were and were not inlined. > Since inlining a call site can expose other call sites for inlining, the > inlining report also reports on whether these exposed call sites have been > inlined or not. This information is presented in hierarchical manner. > > (4) For each call site, we include the principle reason the call site > was or was not inlined, together with any cost vs . threshold computation > that was done. > > > > *High Level Design * > > > > The inline report is created if the option –inline-report=X is passed on > command line with a positive integer value of X. If X is 0, or this option > is not specified, the Inliner does not create or perform any operations on > the inline report, and there is no compile time overhead. > > > > Three main classes are used to implement the inline report: > > > > *class InlineReportCallSite * > > > > This class contains the inlining report information specific to a > particular CallSite CS, including: > > (1) A bool indicating whether or not the CallSite was or was not > inlined > > (2) An inlining reason indicating why the CallSite was or was not > inlined > > (3) The inlining cost, outer inlining cost, and threshold values used > in calculating the profitability of inlining > > (4) A vector of InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite exposed when CS was inlined. > > > > *class InlineReportFunction * > > > > This class contains the inlining report information specific to a > particular Function F in the call graph, including: > > (1) A bool indicating whether the function has been dead static > eliminated. > > (2) A vector of call InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite that appeared in F before any inlining > was applied. > > > > *class InlineReport * > > > > The main class which summarizes the high level information in the inline > report, including: > > (1) The values of the inlining threshold options > > (2) The “level” of the inlining report, which is a bit vector of > feature options. For example, whether to print external functions and > intrinsics, whether to print the inlining reasons, etc. > > (3) A map MF from each Function* to InlineReportFunction* > > (4) A map MCS from each CallSite* to InlineReportCallSite* > > > > In addition, the class InlineCost (from InlineCost.h) is augmented to > include the primary reason a call site was inlined. > > > > The class Inliner has been augmented with an InlineReport, which is > created when an Inliner is constructed. The InlineReport is updated using > calls to the member functions of these three classes in Inliner::runOnSCC() > and the functions called by it. > > > > Before any inlining is done in a particular call to runOnSCC(), the map MF > is updated so that each Function (caller or callee) that will be examined > for inlining has a corresponding InlineReportFunction in the map. (The map > MCS is also updated in a similar way, but only when a Function is actually > inlined.) > > > > The Inliner determines if a CallSite should be inlined by first calling > Inliner::ShouldInline(). This calls getInlineCost() which returns an > InlineCost, which now includes the reason the call site should or should > not be inlined. This reason, as well and costs and threshold from the > InlineCost are stored in the InlineReportCallSite for the CallSite. > > > > Then Inliner calls the static function InlineCallPossible(). If the > inlining was not performed, the reason for not inlining is recorded in the > InlineReportCallSite corresponding to the CallSite. If the inlining was > performed, the corresponding InlineReportCallSite is marked as inlined, and > it is populated with the InlineReportCallSites corresponding to the newly > exposed CallSites that were created during the inlining. > > > > The InlineReport is printed during the call to Inliner::doFinalization(). > > > > Since the compiler can run any number of optimizations between two > successive calls to runOnSCC(), the Instructions corresponding to CallSites > can be deleted by the optimizations. Callbacks are used to mark the > corresponding InlineReportCallSites as deleted when this happens. > > > > *Example * > > > > Here is an example of abbreviated inlining report that is generated in my > locally modified copy of the LLVM sources. I generated this by compiling > the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of > brevity, I didn’t include all of the report. Omitted parts are indicated > by …. in the report.) > > > > *---- Begin Inlining Report ----* > > > > *Option Values:* > > * inline-threshold: 225* > > * inlinehint-threshold: 325* > > * inlinecold-threshold: 225* > > * inlineoptsize-threshold: 15* > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > *DEAD STATIC FUNC: setExit* > > > > *DEAD STATIC FUNC: copyFileName* > > > > *DEAD STATIC FUNC: showFileNames* > > > > *DEAD STATIC FUNC: stat* > > > > *….* > > > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > * [[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > * <<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: exit* > > > > *….* > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > * [[Callee is noreturn]]* > > > > *….* > > > > *COMPILE FUNC: snocString* > > * -> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strlen* > > * -> INLINE: myMalloc (-14925<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strcpy* > > * -> snocString* > > * [[Callee is never inline]]* > > > > *…..* > > > > *---- End Inlining Report ------* > > > > Here is an explanation of some of the features: > > > > (1) Option values > > > > *Option Values:* > > * inline-threshold: 225* > > * inlinehint-threshold: 325* > > * inlinecold-threshold: 225* > > * inlineoptsize-threshold: 15* > > > > The report begins with a list of the most relevant option values to > inlining. > > > > (2) Compiled and dead functions > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > *DEAD STATIC FUNC: setExit* > > > > Functions in the file are identified as either being compiled or > eliminated by dead static function elimination. > > > > (3) External function calls > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > Calls to externally defined functions are indicated by the word EXTERN. > These lines can optionally be omitted. > > > > (4) Inlining and nesting > > > > *COMPILE FUNC: snocString* > > * -> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > > > Inlined functions are marked INLINE. The inlining of a function within > other inlined functions is shown clearly in the report using indentation. > > > > (5) Reasons functions were and were not inlined > > > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > * [[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > * <<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: exit* > > > > …. > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > * [[Callee is noreturn]]* > > > > The principal reason a function was or was not inlined can be optionally > displayed in the report. The reason a function was inlined is indicated in > double angle brackets << >>. The reason a function was not inlined is > indicated in double square brackets [[ ]]. When a comparison of the cost > and threshold was used to determine if the function should be inlined, the > comparison done is given. (Since intrinsics are never inlined, > information about them can be suppressed in the report.) The reasons for or > for not inlining can optionally be displayed on the same line as the > function considered for inlining for easy analysis using grep, awk, etc. > > > > (6) Line and column info > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf bzip2.c(1016,4)* > > * -> showFileNames bzip2.c(1019,4) [[Callee is never inline]]* > > * -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]]* > > > > Optionally, file, line, and column info can be provided for call sites if > source position information is present (using –g or > > –gline-tables-only). > > > > I would appreciate any comments you have on whether you support the > inclusion of an inline report in LLVM, the form and features I have > outlined above, and your thoughts on the high level design. > > > > Thank you in advance for your comments, > > > > Robert Cox > > robert.cox at intel.com > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151022/83c674d3/attachment.html>
This would be really nice to have. There will be lots of details to work out - Hal already raised one of the more important ones - but I'd be very happy to see LLVM grow in this direction. Thanks for working on this. Philip On 10/22/2015 11:25 AM, Cox, Robert via llvm-dev wrote:> > *RFC: Inlining Report * > > ** > > *Motivation * > > Making good inlining choices while optimizing an application is often > key to achieving optimal performance. While the compiler’s default > inlining heuristics sometimes provide great out-of-box results, > optimal performance is sometimes achieved only after varying the > settings of certain compiler options related to inlining or adding > “always_inline” or “noinline” attributes to certain functions. > > Before we can determine how we need change the compiler’s inlining > choices to get better performance for an application, we need to have > a clear picture of the compiler’s inlining choices and what motivated > them. Many compilers like LLVM and GCC provide *informational notes > *when a function is inlined, but these notes provide only a “blow by > blow” description of what the compiler did, rather than a high level > illustration of the result. This high level picture can be provided by > an *inlining report. * > > Over the years, I’ve worked with several compilers that provide > inlining reports, and I can attest that the customers using those > compilers have found them to be invaluable tool in investigating and > improving their applications’ performance. In addition, the inlining > report can be used by compiler developers to visualize and improve the > compiler’s default heuristics and option values. > > For these reasons, I’d like to contribute code to LLVM to generate an > inlining report as part of the inliner. > > *Description * > > The inlining report I am proposing contains the following information: > > (1)The values of the principle threshold options which affect how much > inlining is done under various circumstances > > (2)Whether each function is compiled or has been eliminated by dead > static function elimination. > > (3)For each function, the call sites that were and were not inlined. > Since inlining a call site can expose other call sites for inlining, > the inlining report also reports on whether these exposed call sites > have been inlined or not. This information is presented in > hierarchical manner. > > (4)For each call site, we include the principle reason the call site > was or was not inlined, together with any cost vs . threshold > computation that was done. > > *High Level Design * > > The inline report is created if the option –inline-report=X is passed > on command line with a positive integer value of X. If X is 0, or > this option is not specified, the Inliner does not create or perform > any operations on the inline report, and there is no compile time > overhead. > > Three main classes are used to implement the inline report: > > */class InlineReportCallSite /* > > This class contains the inlining report information specific to a > particular CallSite CS, including: > > (1)A bool indicating whether or not the CallSite was or was not inlined > > (2)An inlining reason indicating why the CallSite was or was not inlined > > (3)The inlining cost, outer inlining cost, and threshold values used > in calculating the profitability of inlining > > (4)A vector of InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite exposed when CS was inlined. > > */class InlineReportFunction /* > > This class contains the inlining report information specific to a > particular Function F in the call graph, including: > > (1)A bool indicating whether the function has been dead static > eliminated. > > (2)A vector of call InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite that appeared in F before any > inlining was applied. > > */class InlineReport /* > > The main class which summarizes the high level information in the > inline report, including: > > (1)The values of the inlining threshold options > > (2)The “level” of the inlining report, which is a bit vector of > feature options. For example, whether to print external functions and > intrinsics, whether to print the inlining reasons, etc. > > (3)A map MF from each Function* to InlineReportFunction* > > (4)A map MCS from each CallSite* to InlineReportCallSite* > > In addition, the class InlineCost (from InlineCost.h) is augmented to > include the primary reason a call site was inlined. > > The class Inliner has been augmented with an InlineReport, which is > created when an Inliner is constructed. The InlineReport is updated > using calls to the member functions of these three classes in > Inliner::runOnSCC() and the functions called by it. > > Before any inlining is done in a particular call to runOnSCC(), the > map MF is updated so that each Function (caller or callee) that will > be examined for inlining has a corresponding InlineReportFunction in > the map. (The map MCS is also updated in a similar way, but only when > a Function is actually inlined.) > > The Inliner determines if a CallSite should be inlined by first > calling Inliner::ShouldInline(). This calls getInlineCost() which > returns an InlineCost, which now includes the reason the call site > should or should not be inlined. This reason, as well and costs and > threshold from the InlineCost are stored in the InlineReportCallSite > for the CallSite. > > Then Inliner calls the static function InlineCallPossible(). If the > inlining was not performed, the reason for not inlining is recorded in > the InlineReportCallSite corresponding to the CallSite. If the > inlining was performed, the corresponding InlineReportCallSite is > marked as inlined, and it is populated with the InlineReportCallSites > corresponding to the newly exposed CallSites that were created during > the inlining. > > The InlineReport is printed during the call to Inliner::doFinalization(). > > Since the compiler can run any number of optimizations between two > successive calls to runOnSCC(), the Instructions corresponding to > CallSites can be deleted by the optimizations. Callbacks are used to > mark the corresponding InlineReportCallSites as deleted when this > happens. > > *Example * > > Here is an example of abbreviated inlining report that is generated in > my locally modified copy of the LLVM sources. I generated this by > compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. > (For the sake of brevity, I didn’t include all of the report. > Omitted parts are indicated by …. in the report.) > > *---- Begin Inlining Report ----* > > ** > > *Option Values:* > > * inline-threshold: 225* > > * inlinehint-threshold: 325* > > * inlinecold-threshold: 225* > > * inlineoptsize-threshold: 15* > > ** > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > ** > > *DEAD STATIC FUNC: setExit* > > ** > > *DEAD STATIC FUNC: copyFileName* > > ** > > *DEAD STATIC FUNC: showFileNames* > > ** > > *DEAD STATIC FUNC: stat* > > ** > > *….* > > ** > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > * [[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > * <<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: exit* > > ** > > *….* > > ** > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > * [[Callee is noreturn]]* > > ** > > *….* > > ** > > *COMPILE FUNC: snocString* > > * -> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strlen* > > * -> INLINE: myMalloc (-14925<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strcpy* > > * -> snocString* > > * [[Callee is never inline]]* > > ** > > *…..* > > ** > > *---- End Inlining Report ------* > > ** > > Here is an explanation of some of the features: > > (1)Option values > > *Option Values:* > > *inline-threshold: 225* > > *inlinehint-threshold: 325* > > *inlinecold-threshold: 225* > > *inlineoptsize-threshold: 15* > > ** > > The report begins with a list of the most relevant option values to > inlining. > > (2)Compiled and dead functions > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > ** > > *DEAD STATIC FUNC: setExit* > > ** > > Functions in the file are identified as either being compiled or > eliminated by dead static function elimination. > > (3)External function calls > > *COMPILE FUNC: fopen_output_safely* > > *-> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > Calls to externally defined functions are indicated by the word > EXTERN. These lines can optionally be omitted. > > (4)Inlining and nesting > > *COMPILE FUNC: snocString* > > *-> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > *-> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > ** > > Inlined functions are marked INLINE. The inlining of a function within > other inlined functions is shown clearly in the report using indentation. > > (5)Reasons functions were and were not inlined > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > *[[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > *<<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > *<<Inlining is profitable>>* > > * -> EXTERN: exit* > > ** > > …. > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > *<<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > *[[Callee is noreturn]]* > > The principal reason a function was or was not inlined can be > optionally displayed in the report. The reason a function was inlined > is indicated in double angle brackets << >>. The reason a function > was not inlined is indicated in double square brackets [[ ]]. When a > comparison of the cost and threshold was used to determine if the > function should be inlined, the comparison done is given. (Since > intrinsics are never inlined, information about them can be suppressed > in the report.) The reasons for or for not inlining can optionally be > displayed on the same line as the function considered for inlining for > easy analysis using grep, awk, etc. > > (6)Line and column info > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf bzip2.c(1016,4)* > > * -> showFileNames bzip2.c(1019,4) [[Callee is never inline]]* > > * -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]]* > > ** > > Optionally, file, line, and column info can be provided for call sites > if source position information is present (using –g or > > –gline-tables-only). > > I would appreciate any comments you have on whether you support the > inclusion of an inline report in LLVM, the form and features I have > outlined above, and your thoughts on the high level design. > > Thank you in advance for your comments, > > Robert Cox > > robert.cox at intel.com <mailto:robert.cox at intel.com> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151023/f130a087/attachment.html>
On Thu, Oct 22, 2015 at 12:32 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > ------------------------------ > > *From: *"Robert via llvm-dev Cox" <llvm-dev at lists.llvm.org> > *To: *llvm-dev at lists.llvm.org > *Sent: *Thursday, October 22, 2015 1:25:05 PM > *Subject: *[llvm-dev] RFC: Inlining report > > *RFC: Inlining Report * > > > > *Motivation * > > > > Making good inlining choices while optimizing an application is often key > to achieving optimal performance. While the compiler’s default inlining > heuristics sometimes provide great out-of-box results, optimal performance > is sometimes achieved only after varying the settings of certain compiler > options related to inlining or adding “always_inline” or “noinline” > attributes to certain functions. > > > > Before we can determine how we need change the compiler’s inlining choices > to get better performance for an application, we need to have a clear > picture of the compiler’s inlining choices and what motivated them. Many > compilers like LLVM and GCC provide *informational notes *when a function > is inlined, but these notes provide only a “blow by blow” description of > what the compiler did, rather than a high level illustration of the result. > This high level picture can be provided by an *inlining report. * > > > > Over the years, I’ve worked with several compilers that provide inlining > reports, and I can attest that the customers using those compilers have > found them to be invaluable tool in investigating and improving their > applications’ performance. In addition, the inlining report can be used > by compiler developers to visualize and improve the compiler’s default > heuristics and option values. > > I agree, these can be extremely useful. Generically speaking, I would very > much like to see Clang/LLVM grow the ability to provide optimization > reports (including those where source lines are annotated with information > on what was vectorized, eliminated, etc.). > > A few comments: > > 1. Inlining is iterative. Thus, I assume that your report might include > information from multiple inlining passes. Is that correct? > > 2. Inlining costs are target specific (because it uses TTI costs), so it > would be useful for the report to include the target architecture (as well > as information on the LLVM version, name of the input file, etc.) > > 3. And this is the big one: Where should the infrastructure for this live? > > One of my goals when defining the 'informational note' infrastructure in > LLVM, was to construct it such that the information was not just > presentable to humans, but also so that it could be programmatically > consumed. This is why we designed it with a class hierarchy: so that the > "messages" could be more than just messages. The rationale was that there > is information, necessary for presenting useful feedback to humans, that > only the frontend has. For C++ codes, for example, you need to do symbol > demangling. The frontend is probably the best place to do that. The > frontend also knows the proper place to write output files. In addition, > specifically for inlining information, the frontend knows where functions > are defined without the need for debug information. >It would be nice to be able to produce the report without debug information, but not sure how important that requirement is -- the optimized build is usually done with some level of debug. debug info is also enabled with -Rpass option, so the inline report (or more generally optimization report) option can do the same here. thanks, David> > My preference, therefore, is to make sure that the inliner generates > sufficiently-detailed messages using a proper class hierarchy and > sufficient information. Then, in Clang, we can collect those messages, > demangle function names and add source-location information, and produce a > report. > > Thoughts? > > Thanks again, > Hal > > > > For these reasons, I’d like to contribute code to LLVM to generate an > inlining report as part of the inliner. > > > > *Description * > > > > The inlining report I am proposing contains the following information: > > > > (1) The values of the principle threshold options which affect how > much inlining is done under various circumstances > > (2) Whether each function is compiled or has been eliminated by dead > static function elimination. > > (3) For each function, the call sites that were and were not inlined. > Since inlining a call site can expose other call sites for inlining, the > inlining report also reports on whether these exposed call sites have been > inlined or not. This information is presented in hierarchical manner. > > (4) For each call site, we include the principle reason the call site > was or was not inlined, together with any cost vs . threshold computation > that was done. > > > > *High Level Design * > > > > The inline report is created if the option –inline-report=X is passed on > command line with a positive integer value of X. If X is 0, or this option > is not specified, the Inliner does not create or perform any operations on > the inline report, and there is no compile time overhead. > > > > Three main classes are used to implement the inline report: > > > > *class InlineReportCallSite * > > > > This class contains the inlining report information specific to a > particular CallSite CS, including: > > (1) A bool indicating whether or not the CallSite was or was not > inlined > > (2) An inlining reason indicating why the CallSite was or was not > inlined > > (3) The inlining cost, outer inlining cost, and threshold values used > in calculating the profitability of inlining > > (4) A vector of InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite exposed when CS was inlined. > > > > *class InlineReportFunction * > > > > This class contains the inlining report information specific to a > particular Function F in the call graph, including: > > (1) A bool indicating whether the function has been dead static > eliminated. > > (2) A vector of call InlineReportCallSite*, each of which points to an > InlineReportCallSite for a CallSite that appeared in F before any inlining > was applied. > > > > *class InlineReport * > > > > The main class which summarizes the high level information in the inline > report, including: > > (1) The values of the inlining threshold options > > (2) The “level” of the inlining report, which is a bit vector of > feature options. For example, whether to print external functions and > intrinsics, whether to print the inlining reasons, etc. > > (3) A map MF from each Function* to InlineReportFunction* > > (4) A map MCS from each CallSite* to InlineReportCallSite* > > > > In addition, the class InlineCost (from InlineCost.h) is augmented to > include the primary reason a call site was inlined. > > > > The class Inliner has been augmented with an InlineReport, which is > created when an Inliner is constructed. The InlineReport is updated using > calls to the member functions of these three classes in Inliner::runOnSCC() > and the functions called by it. > > > > Before any inlining is done in a particular call to runOnSCC(), the map MF > is updated so that each Function (caller or callee) that will be examined > for inlining has a corresponding InlineReportFunction in the map. (The map > MCS is also updated in a similar way, but only when a Function is actually > inlined.) > > > > The Inliner determines if a CallSite should be inlined by first calling > Inliner::ShouldInline(). This calls getInlineCost() which returns an > InlineCost, which now includes the reason the call site should or should > not be inlined. This reason, as well and costs and threshold from the > InlineCost are stored in the InlineReportCallSite for the CallSite. > > > > Then Inliner calls the static function InlineCallPossible(). If the > inlining was not performed, the reason for not inlining is recorded in the > InlineReportCallSite corresponding to the CallSite. If the inlining was > performed, the corresponding InlineReportCallSite is marked as inlined, and > it is populated with the InlineReportCallSites corresponding to the newly > exposed CallSites that were created during the inlining. > > > > The InlineReport is printed during the call to Inliner::doFinalization(). > > > > Since the compiler can run any number of optimizations between two > successive calls to runOnSCC(), the Instructions corresponding to CallSites > can be deleted by the optimizations. Callbacks are used to mark the > corresponding InlineReportCallSites as deleted when this happens. > > > > *Example * > > > > Here is an example of abbreviated inlining report that is generated in my > locally modified copy of the LLVM sources. I generated this by compiling > the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of > brevity, I didn’t include all of the report. Omitted parts are indicated > by …. in the report.) > > > > *---- Begin Inlining Report ----* > > > > *Option Values:* > > * inline-threshold: 225* > > * inlinehint-threshold: 325* > > * inlinecold-threshold: 225* > > * inlineoptsize-threshold: 15* > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > *DEAD STATIC FUNC: setExit* > > > > *DEAD STATIC FUNC: copyFileName* > > > > *DEAD STATIC FUNC: showFileNames* > > > > *DEAD STATIC FUNC: stat* > > > > *….* > > > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > * [[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > * <<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: exit* > > > > *….* > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > * [[Callee is noreturn]]* > > > > *….* > > > > *COMPILE FUNC: snocString* > > * -> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strlen* > > * -> INLINE: myMalloc (-14925<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> EXTERN: malloc* > > * -> outOfMemory* > > * [[Callee is noreturn]]* > > * -> EXTERN: strcpy* > > * -> snocString* > > * [[Callee is never inline]]* > > > > *…..* > > > > *---- End Inlining Report ------* > > > > Here is an explanation of some of the features: > > > > (1) Option values > > > > *Option Values:* > > * inline-threshold: 225* > > * inlinehint-threshold: 325* > > * inlinecold-threshold: 225* > > * inlineoptsize-threshold: 15* > > > > The report begins with a list of the most relevant option values to > inlining. > > > > (2) Compiled and dead functions > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > *DEAD STATIC FUNC: setExit* > > > > Functions in the file are identified as either being compiled or > eliminated by dead static function elimination. > > > > (3) External function calls > > > > *COMPILE FUNC: fopen_output_safely* > > * -> EXTERN: open* > > * -> EXTERN: fdopen* > > * -> EXTERN: close* > > > > Calls to externally defined functions are indicated by the word EXTERN. > These lines can optionally be omitted. > > > > (4) Inlining and nesting > > > > *COMPILE FUNC: snocString* > > * -> INLINE: mkCell (-14920<=225)* > > * <<Callee has single callsite and local linkage>>* > > * -> INLINE: myMalloc (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: malloc* > > > > Inlined functions are marked INLINE. The inlining of a function within > other inlined functions is shown clearly in the report using indentation. > > > > (5) Reasons functions were and were not inlined > > > > *COMPILE FUNC: cleanUpAndFail* > > * -> llvm.lifetime.start* > > * [[Callee is intrinsic]]* > > * -> INLINE: stat (35<=487)* > > * <<Callee is single basic block>>* > > * -> EXTERN: __xstat* > > * -> EXTERN: fprintf* > > * -> EXTERN: fclose* > > * -> EXTERN: remove* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> EXTERN: fprintf* > > * -> INLINE: setExit (15<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: exit* > > > > …. > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf* > > * -> INLINE: showFileNames (70<=225)* > > * <<Inlining is profitable>>* > > * -> EXTERN: fprintf* > > * -> cleanUpAndFail* > > * [[Callee is noreturn]]* > > > > The principal reason a function was or was not inlined can be optionally > displayed in the report. The reason a function was inlined is indicated in > double angle brackets << >>. The reason a function was not inlined is > indicated in double square brackets [[ ]]. When a comparison of the cost > and threshold was used to determine if the function should be inlined, the > comparison done is given. (Since intrinsics are never inlined, > information about them can be suppressed in the report.) The reasons for or > for not inlining can optionally be displayed on the same line as the > function considered for inlining for easy analysis using grep, awk, etc. > > > > (6) Line and column info > > > > *COMPILE FUNC: outOfMemory* > > * -> EXTERN: fprintf bzip2.c(1016,4)* > > * -> showFileNames bzip2.c(1019,4) [[Callee is never inline]]* > > * -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]]* > > > > Optionally, file, line, and column info can be provided for call sites if > source position information is present (using –g or > > –gline-tables-only). > > > > I would appreciate any comments you have on whether you support the > inclusion of an inline report in LLVM, the form and features I have > outlined above, and your thoughts on the high level design. > > > > Thank you in advance for your comments, > > > > Robert Cox > > robert.cox at intel.com > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151023/6e955dfd/attachment-0001.html>
I've worked on something like this in the past, which I found very useful. The user facing aspect is nice, but I found the real value was creating a human-editable, machine-readable report. I then updated the inliner so it could read in the report as a "replay script". This enabled a bunch of new capabilities: * By editing the script via tools, we could bisect bugs based on inlines, super-useful for whittling down huge inline trees found when doing aggressive inlining. * By collating across failure instances, producing failure reason histograms, useful for prioritizing work on removing limitations. * By cross-referencing decisions vs runtime data (say, dynamic call frequency), various views of effectiveness of inlining. * By editing by hand or tool, easy what-if experiments in changing inlining strategy. * By hacking other compilers to emit scripts, I could see what would happen if my compiler could emulate the other compiler's inlining strategy. There are tricky aspects to the replay, but in my experience, something like this is very worthwhile. FWIW I am considering implementing something similar for LLILC.... From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Cox, Robert via llvm-dev Sent: Thursday, October 22, 2015 11:25 AM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] RFC: Inlining report RFC: Inlining Report Motivation Making good inlining choices while optimizing an application is often key to achieving optimal performance. While the compiler's default inlining heuristics sometimes provide great out-of-box results, optimal performance is sometimes achieved only after varying the settings of certain compiler options related to inlining or adding "always_inline" or "noinline" attributes to certain functions. Before we can determine how we need change the compiler's inlining choices to get better performance for an application, we need to have a clear picture of the compiler's inlining choices and what motivated them. Many compilers like LLVM and GCC provide informational notes when a function is inlined, but these notes provide only a "blow by blow" description of what the compiler did, rather than a high level illustration of the result. This high level picture can be provided by an inlining report. Over the years, I've worked with several compilers that provide inlining reports, and I can attest that the customers using those compilers have found them to be invaluable tool in investigating and improving their applications' performance. In addition, the inlining report can be used by compiler developers to visualize and improve the compiler's default heuristics and option values. For these reasons, I'd like to contribute code to LLVM to generate an inlining report as part of the inliner. Description The inlining report I am proposing contains the following information: (1) The values of the principle threshold options which affect how much inlining is done under various circumstances (2) Whether each function is compiled or has been eliminated by dead static function elimination. (3) For each function, the call sites that were and were not inlined. Since inlining a call site can expose other call sites for inlining, the inlining report also reports on whether these exposed call sites have been inlined or not. This information is presented in hierarchical manner. (4) For each call site, we include the principle reason the call site was or was not inlined, together with any cost vs . threshold computation that was done. High Level Design The inline report is created if the option -inline-report=X is passed on command line with a positive integer value of X. If X is 0, or this option is not specified, the Inliner does not create or perform any operations on the inline report, and there is no compile time overhead. Three main classes are used to implement the inline report: class InlineReportCallSite This class contains the inlining report information specific to a particular CallSite CS, including: (1) A bool indicating whether or not the CallSite was or was not inlined (2) An inlining reason indicating why the CallSite was or was not inlined (3) The inlining cost, outer inlining cost, and threshold values used in calculating the profitability of inlining (4) A vector of InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite exposed when CS was inlined. class InlineReportFunction This class contains the inlining report information specific to a particular Function F in the call graph, including: (1) A bool indicating whether the function has been dead static eliminated. (2) A vector of call InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite that appeared in F before any inlining was applied. class InlineReport The main class which summarizes the high level information in the inline report, including: (1) The values of the inlining threshold options (2) The "level" of the inlining report, which is a bit vector of feature options. For example, whether to print external functions and intrinsics, whether to print the inlining reasons, etc. (3) A map MF from each Function* to InlineReportFunction* (4) A map MCS from each CallSite* to InlineReportCallSite* In addition, the class InlineCost (from InlineCost.h) is augmented to include the primary reason a call site was inlined. The class Inliner has been augmented with an InlineReport, which is created when an Inliner is constructed. The InlineReport is updated using calls to the member functions of these three classes in Inliner::runOnSCC() and the functions called by it. Before any inlining is done in a particular call to runOnSCC(), the map MF is updated so that each Function (caller or callee) that will be examined for inlining has a corresponding InlineReportFunction in the map. (The map MCS is also updated in a similar way, but only when a Function is actually inlined.) The Inliner determines if a CallSite should be inlined by first calling Inliner::ShouldInline(). This calls getInlineCost() which returns an InlineCost, which now includes the reason the call site should or should not be inlined. This reason, as well and costs and threshold from the InlineCost are stored in the InlineReportCallSite for the CallSite. Then Inliner calls the static function InlineCallPossible(). If the inlining was not performed, the reason for not inlining is recorded in the InlineReportCallSite corresponding to the CallSite. If the inlining was performed, the corresponding InlineReportCallSite is marked as inlined, and it is populated with the InlineReportCallSites corresponding to the newly exposed CallSites that were created during the inlining. The InlineReport is printed during the call to Inliner::doFinalization(). Since the compiler can run any number of optimizations between two successive calls to runOnSCC(), the Instructions corresponding to CallSites can be deleted by the optimizations. Callbacks are used to mark the corresponding InlineReportCallSites as deleted when this happens. Example Here is an example of abbreviated inlining report that is generated in my locally modified copy of the LLVM sources. I generated this by compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of brevity, I didn't include all of the report. Omitted parts are indicated by .... in the report.) ---- Begin Inlining Report ---- Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit DEAD STATIC FUNC: copyFileName DEAD STATIC FUNC: showFileNames DEAD STATIC FUNC: stat .... COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit .... COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] .... COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strlen -> INLINE: myMalloc (-14925<=225) <<Callee has single callsite and local linkage>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strcpy -> snocString [[Callee is never inline]] ..... ---- End Inlining Report ------ Here is an explanation of some of the features: (1) Option values Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 The report begins with a list of the most relevant option values to inlining. (2) Compiled and dead functions COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit Functions in the file are identified as either being compiled or eliminated by dead static function elimination. (3) External function calls COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close Calls to externally defined functions are indicated by the word EXTERN. These lines can optionally be omitted. (4) Inlining and nesting COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc Inlined functions are marked INLINE. The inlining of a function within other inlined functions is shown clearly in the report using indentation. (5) Reasons functions were and were not inlined COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit .... COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] The principal reason a function was or was not inlined can be optionally displayed in the report. The reason a function was inlined is indicated in double angle brackets << >>. The reason a function was not inlined is indicated in double square brackets [[ ]]. When a comparison of the cost and threshold was used to determine if the function should be inlined, the comparison done is given. (Since intrinsics are never inlined, information about them can be suppressed in the report.) The reasons for or for not inlining can optionally be displayed on the same line as the function considered for inlining for easy analysis using grep, awk, etc. (6) Line and column info COMPILE FUNC: outOfMemory -> EXTERN: fprintf bzip2.c(1016,4) -> showFileNames bzip2.c(1019,4) [[Callee is never inline]] -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]] Optionally, file, line, and column info can be provided for call sites if source position information is present (using -g or -gline-tables-only). I would appreciate any comments you have on whether you support the inclusion of an inline report in LLVM, the form and features I have outlined above, and your thoughts on the high level design. Thank you in advance for your comments, Robert Cox robert.cox at intel.com<mailto:robert.cox at intel.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151026/e96049d7/attachment.html>
This is certainly very useful! My comments are related to determining the inline reason. * -> INLINE: stat (35<=487)*> > * <<Callee is single basic block>>* >In the above example, the callee would have been inlined even if the single BB bonus is not applied to the threshold (since 35 <= 225). The fact that it is single BB didn't tip the scales. If the stat were, say, (300 <= 487), then the fact that it is a single BB is crucial. Perhaps, there could be a way to specify a primary reason and many secondary reasons? * -> INLINE: setExit (15<=225)*> > * <<Inlining is profitable>>* >I think it is more useful to know why it is profitable. At the minimum it will be useful to differentiate between a callee that is so small that it will get inlined at any callsite vs a callee that is profitable to inline at this callsite due to instruction simplification.> > Thanks,Easwaran -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151028/d5944cd4/attachment.html>
What is the current status of the proposal? I haven’t seen any further discussion/changes. Are there any plans to move forward? Artur On 22 Oct 2015, at 21:25, Cox, Robert via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: RFC: Inlining Report Motivation Making good inlining choices while optimizing an application is often key to achieving optimal performance. While the compiler’s default inlining heuristics sometimes provide great out-of-box results, optimal performance is sometimes achieved only after varying the settings of certain compiler options related to inlining or adding “always_inline” or “noinline” attributes to certain functions. Before we can determine how we need change the compiler’s inlining choices to get better performance for an application, we need to have a clear picture of the compiler’s inlining choices and what motivated them. Many compilers like LLVM and GCC provide informational notes when a function is inlined, but these notes provide only a “blow by blow” description of what the compiler did, rather than a high level illustration of the result. This high level picture can be provided by an inlining report. Over the years, I’ve worked with several compilers that provide inlining reports, and I can attest that the customers using those compilers have found them to be invaluable tool in investigating and improving their applications’ performance. In addition, the inlining report can be used by compiler developers to visualize and improve the compiler’s default heuristics and option values. For these reasons, I’d like to contribute code to LLVM to generate an inlining report as part of the inliner. Description The inlining report I am proposing contains the following information: (1) The values of the principle threshold options which affect how much inlining is done under various circumstances (2) Whether each function is compiled or has been eliminated by dead static function elimination. (3) For each function, the call sites that were and were not inlined. Since inlining a call site can expose other call sites for inlining, the inlining report also reports on whether these exposed call sites have been inlined or not. This information is presented in hierarchical manner. (4) For each call site, we include the principle reason the call site was or was not inlined, together with any cost vs . threshold computation that was done. High Level Design The inline report is created if the option –inline-report=X is passed on command line with a positive integer value of X. If X is 0, or this option is not specified, the Inliner does not create or perform any operations on the inline report, and there is no compile time overhead. Three main classes are used to implement the inline report: class InlineReportCallSite This class contains the inlining report information specific to a particular CallSite CS, including: (1) A bool indicating whether or not the CallSite was or was not inlined (2) An inlining reason indicating why the CallSite was or was not inlined (3) The inlining cost, outer inlining cost, and threshold values used in calculating the profitability of inlining (4) A vector of InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite exposed when CS was inlined. class InlineReportFunction This class contains the inlining report information specific to a particular Function F in the call graph, including: (1) A bool indicating whether the function has been dead static eliminated. (2) A vector of call InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite that appeared in F before any inlining was applied. class InlineReport The main class which summarizes the high level information in the inline report, including: (1) The values of the inlining threshold options (2) The “level” of the inlining report, which is a bit vector of feature options. For example, whether to print external functions and intrinsics, whether to print the inlining reasons, etc. (3) A map MF from each Function* to InlineReportFunction* (4) A map MCS from each CallSite* to InlineReportCallSite* In addition, the class InlineCost (from InlineCost.h) is augmented to include the primary reason a call site was inlined. The class Inliner has been augmented with an InlineReport, which is created when an Inliner is constructed. The InlineReport is updated using calls to the member functions of these three classes in Inliner::runOnSCC() and the functions called by it. Before any inlining is done in a particular call to runOnSCC(), the map MF is updated so that each Function (caller or callee) that will be examined for inlining has a corresponding InlineReportFunction in the map. (The map MCS is also updated in a similar way, but only when a Function is actually inlined.) The Inliner determines if a CallSite should be inlined by first calling Inliner::ShouldInline(). This calls getInlineCost() which returns an InlineCost, which now includes the reason the call site should or should not be inlined. This reason, as well and costs and threshold from the InlineCost are stored in the InlineReportCallSite for the CallSite. Then Inliner calls the static function InlineCallPossible(). If the inlining was not performed, the reason for not inlining is recorded in the InlineReportCallSite corresponding to the CallSite. If the inlining was performed, the corresponding InlineReportCallSite is marked as inlined, and it is populated with the InlineReportCallSites corresponding to the newly exposed CallSites that were created during the inlining. The InlineReport is printed during the call to Inliner::doFinalization(). Since the compiler can run any number of optimizations between two successive calls to runOnSCC(), the Instructions corresponding to CallSites can be deleted by the optimizations. Callbacks are used to mark the corresponding InlineReportCallSites as deleted when this happens. Example Here is an example of abbreviated inlining report that is generated in my locally modified copy of the LLVM sources. I generated this by compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of brevity, I didn’t include all of the report. Omitted parts are indicated by …. in the report.) ---- Begin Inlining Report ---- Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit DEAD STATIC FUNC: copyFileName DEAD STATIC FUNC: showFileNames DEAD STATIC FUNC: stat …. COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit …. COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] …. COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strlen -> INLINE: myMalloc (-14925<=225) <<Callee has single callsite and local linkage>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strcpy -> snocString [[Callee is never inline]] ….. ---- End Inlining Report ------ Here is an explanation of some of the features: (1) Option values Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 The report begins with a list of the most relevant option values to inlining. (2) Compiled and dead functions COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit Functions in the file are identified as either being compiled or eliminated by dead static function elimination. (3) External function calls COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close Calls to externally defined functions are indicated by the word EXTERN. These lines can optionally be omitted. (4) Inlining and nesting COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc Inlined functions are marked INLINE. The inlining of a function within other inlined functions is shown clearly in the report using indentation. (5) Reasons functions were and were not inlined COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit …. COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] The principal reason a function was or was not inlined can be optionally displayed in the report. The reason a function was inlined is indicated in double angle brackets << >>. The reason a function was not inlined is indicated in double square brackets [[ ]]. When a comparison of the cost and threshold was used to determine if the function should be inlined, the comparison done is given. (Since intrinsics are never inlined, information about them can be suppressed in the report.) The reasons for or for not inlining can optionally be displayed on the same line as the function considered for inlining for easy analysis using grep, awk, etc. (6) Line and column info COMPILE FUNC: outOfMemory -> EXTERN: fprintf bzip2.c(1016,4) -> showFileNames bzip2.c(1019,4) [[Callee is never inline]] -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]] Optionally, file, line, and column info can be provided for call sites if source position information is present (using –g or –gline-tables-only). I would appreciate any comments you have on whether you support the inclusion of an inline report in LLVM, the form and features I have outlined above, and your thoughts on the high level design. Thank you in advance for your comments, Robert Cox robert.cox at intel.com<mailto:robert.cox at intel.com> _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160413/457a42bd/attachment.html>
Hi Artur, Sorry for the delay. I was off working on other projects until a few weeks ago. I met with Chandler Carruth, Phil Reames, and Hal Finkel to discuss this last week. I had prepared a patch, but there was a strong preference that I break the patch up into smaller pieces and then resubmit it to the list via Phabricator. I am in the process of doing that now. n Robert Cox From: Artur Pilipenko [mailto:apilipenko at azulsystems.com] Sent: Wednesday, April 13, 2016 8:39 AM To: Cox, Robert Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] RFC: Inlining report What is the current status of the proposal? I haven’t seen any further discussion/changes. Are there any plans to move forward? Artur On 22 Oct 2015, at 21:25, Cox, Robert via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: RFC: Inlining Report Motivation Making good inlining choices while optimizing an application is often key to achieving optimal performance. While the compiler’s default inlining heuristics sometimes provide great out-of-box results, optimal performance is sometimes achieved only after varying the settings of certain compiler options related to inlining or adding “always_inline” or “noinline” attributes to certain functions. Before we can determine how we need change the compiler’s inlining choices to get better performance for an application, we need to have a clear picture of the compiler’s inlining choices and what motivated them. Many compilers like LLVM and GCC provide informational notes when a function is inlined, but these notes provide only a “blow by blow” description of what the compiler did, rather than a high level illustration of the result. This high level picture can be provided by an inlining report. Over the years, I’ve worked with several compilers that provide inlining reports, and I can attest that the customers using those compilers have found them to be invaluable tool in investigating and improving their applications’ performance. In addition, the inlining report can be used by compiler developers to visualize and improve the compiler’s default heuristics and option values. For these reasons, I’d like to contribute code to LLVM to generate an inlining report as part of the inliner. Description The inlining report I am proposing contains the following information: (1) The values of the principle threshold options which affect how much inlining is done under various circumstances (2) Whether each function is compiled or has been eliminated by dead static function elimination. (3) For each function, the call sites that were and were not inlined. Since inlining a call site can expose other call sites for inlining, the inlining report also reports on whether these exposed call sites have been inlined or not. This information is presented in hierarchical manner. (4) For each call site, we include the principle reason the call site was or was not inlined, together with any cost vs . threshold computation that was done. High Level Design The inline report is created if the option –inline-report=X is passed on command line with a positive integer value of X. If X is 0, or this option is not specified, the Inliner does not create or perform any operations on the inline report, and there is no compile time overhead. Three main classes are used to implement the inline report: class InlineReportCallSite This class contains the inlining report information specific to a particular CallSite CS, including: (1) A bool indicating whether or not the CallSite was or was not inlined (2) An inlining reason indicating why the CallSite was or was not inlined (3) The inlining cost, outer inlining cost, and threshold values used in calculating the profitability of inlining (4) A vector of InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite exposed when CS was inlined. class InlineReportFunction This class contains the inlining report information specific to a particular Function F in the call graph, including: (1) A bool indicating whether the function has been dead static eliminated. (2) A vector of call InlineReportCallSite*, each of which points to an InlineReportCallSite for a CallSite that appeared in F before any inlining was applied. class InlineReport The main class which summarizes the high level information in the inline report, including: (1) The values of the inlining threshold options (2) The “level” of the inlining report, which is a bit vector of feature options. For example, whether to print external functions and intrinsics, whether to print the inlining reasons, etc. (3) A map MF from each Function* to InlineReportFunction* (4) A map MCS from each CallSite* to InlineReportCallSite* In addition, the class InlineCost (from InlineCost.h) is augmented to include the primary reason a call site was inlined. The class Inliner has been augmented with an InlineReport, which is created when an Inliner is constructed. The InlineReport is updated using calls to the member functions of these three classes in Inliner::runOnSCC() and the functions called by it. Before any inlining is done in a particular call to runOnSCC(), the map MF is updated so that each Function (caller or callee) that will be examined for inlining has a corresponding InlineReportFunction in the map. (The map MCS is also updated in a similar way, but only when a Function is actually inlined.) The Inliner determines if a CallSite should be inlined by first calling Inliner::ShouldInline(). This calls getInlineCost() which returns an InlineCost, which now includes the reason the call site should or should not be inlined. This reason, as well and costs and threshold from the InlineCost are stored in the InlineReportCallSite for the CallSite. Then Inliner calls the static function InlineCallPossible(). If the inlining was not performed, the reason for not inlining is recorded in the InlineReportCallSite corresponding to the CallSite. If the inlining was performed, the corresponding InlineReportCallSite is marked as inlined, and it is populated with the InlineReportCallSites corresponding to the newly exposed CallSites that were created during the inlining. The InlineReport is printed during the call to Inliner::doFinalization(). Since the compiler can run any number of optimizations between two successive calls to runOnSCC(), the Instructions corresponding to CallSites can be deleted by the optimizations. Callbacks are used to mark the corresponding InlineReportCallSites as deleted when this happens. Example Here is an example of abbreviated inlining report that is generated in my locally modified copy of the LLVM sources. I generated this by compiling the file bzip2.c from the spec 2006 benchmark 401.bzip. (For the sake of brevity, I didn’t include all of the report. Omitted parts are indicated by …. in the report.) ---- Begin Inlining Report ---- Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit DEAD STATIC FUNC: copyFileName DEAD STATIC FUNC: showFileNames DEAD STATIC FUNC: stat …. COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit …. COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] …. COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strlen -> INLINE: myMalloc (-14925<=225) <<Callee has single callsite and local linkage>> -> EXTERN: malloc -> outOfMemory [[Callee is noreturn]] -> EXTERN: strcpy -> snocString [[Callee is never inline]] ….. ---- End Inlining Report ------ Here is an explanation of some of the features: (1) Option values Option Values: inline-threshold: 225 inlinehint-threshold: 325 inlinecold-threshold: 225 inlineoptsize-threshold: 15 The report begins with a list of the most relevant option values to inlining. (2) Compiled and dead functions COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close DEAD STATIC FUNC: setExit Functions in the file are identified as either being compiled or eliminated by dead static function elimination. (3) External function calls COMPILE FUNC: fopen_output_safely -> EXTERN: open -> EXTERN: fdopen -> EXTERN: close Calls to externally defined functions are indicated by the word EXTERN. These lines can optionally be omitted. (4) Inlining and nesting COMPILE FUNC: snocString -> INLINE: mkCell (-14920<=225) <<Callee has single callsite and local linkage>> -> INLINE: myMalloc (70<=225) <<Inlining is profitable>> -> EXTERN: malloc Inlined functions are marked INLINE. The inlining of a function within other inlined functions is shown clearly in the report using indentation. (5) Reasons functions were and were not inlined COMPILE FUNC: cleanUpAndFail -> llvm.lifetime.start [[Callee is intrinsic]] -> INLINE: stat (35<=487) <<Callee is single basic block>> -> EXTERN: __xstat -> EXTERN: fprintf -> EXTERN: fclose -> EXTERN: remove -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> EXTERN: fprintf -> INLINE: setExit (15<=225) <<Inlining is profitable>> -> EXTERN: exit …. COMPILE FUNC: outOfMemory -> EXTERN: fprintf -> INLINE: showFileNames (70<=225) <<Inlining is profitable>> -> EXTERN: fprintf -> cleanUpAndFail [[Callee is noreturn]] The principal reason a function was or was not inlined can be optionally displayed in the report. The reason a function was inlined is indicated in double angle brackets << >>. The reason a function was not inlined is indicated in double square brackets [[ ]]. When a comparison of the cost and threshold was used to determine if the function should be inlined, the comparison done is given. (Since intrinsics are never inlined, information about them can be suppressed in the report.) The reasons for or for not inlining can optionally be displayed on the same line as the function considered for inlining for easy analysis using grep, awk, etc. (6) Line and column info COMPILE FUNC: outOfMemory -> EXTERN: fprintf bzip2.c(1016,4) -> showFileNames bzip2.c(1019,4) [[Callee is never inline]] -> cleanUpAndFail bzip2.c(1020,4) [[Callee is never inline]] Optionally, file, line, and column info can be provided for call sites if source position information is present (using –g or –gline-tables-only). I would appreciate any comments you have on whether you support the inclusion of an inline report in LLVM, the form and features I have outlined above, and your thoughts on the high level design. Thank you in advance for your comments, Robert Cox robert.cox at intel.com<mailto:robert.cox at intel.com> _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160414/d258a00f/attachment-0001.html>