Teresa Johnson via llvm-dev
2016-Apr-14 14:08 UTC
[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
Hi all, Below is a proposal for refining the way we communicate between the ThinLTO link step (the combined indexing step) and the backend processes that do the actual importing and other summary-based optimizations in a distributed backend process. Mehdi, let me know if this addresses your concerns. Peter, PTAL from the standpoint of any summary extensions needed for CFI and make sure they can fit into this model. Thanks, Teresa Background ---------------- Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current module in FunctionImport pass) triggered a discussion (mostly over IRC) on how best to determine import/export decisions in distributed back end compiles. Import and export decisions are made by traversing the combined index. The actual importing happens in the FunctionImporter class, which is passed the set of values to import. The importer class is either invoked directly on each backend compile, which happens in the threads launched in the libLTO path, or via the FunctionImportPass. The pass is currently used by the opt tool, by the gold-plugin when it launches ThinLTO threads for single machine parallelism, and via clang when invoked with a bitcode input file and the -fthinlto-index= option. The latter was added in r254927 to enable launching a ThinLTO backend compile in a separate distributed build process. Before r266125, the FunctionImportPass was walking the entire index, but ignoring the import results for all but the current module, and not using the exports list. The reason to do the full index walk is that eventually we can minimize the required static promotions in the current module (based on whether its defined values are imported elsewhere). However, this was costing a lot of compile time in each backend thread. On the other hand, Mehdi would like to use the pass for testing via the opt tool, and planned to eventually add the support for using the computed export lists to guide promotion. Therefore, the other invocations (in the gold-plugin and from clang for the distributed back ends) will need to either invoke the FunctionImporter directly (as in libLTO), passing in the import/export information, or use a new pass interface that consumes the necessary info to compute this information. Eventually the import/export decisions should be made a single time in the thin link step (as is currently done for libLTO which doesn’t use the pass), along with any other global summary-based decisions. The advantage is that each backend isn’t doing redundant computation, and I believe it is safer to make global decisions affecting correctness (e.g. promotion) a single time. For the gold-plugin launched threads, it should be straightforward to use the libLTO approach of computing these decisions and passing the relevant information to each backend thread via a direct invocation of the FunctionImporter, instead of using the FunctionImportPass. However, for distributed build backends, if the decisions are to be made a single time in the thin link step, summary based decisions need to be serialized out in order to be used by the FunctionImporter in each backend process (which could be invoked directly from clang, or via a possibly modified FunctionImportPass interface). My original plan was to mark linkage changes determined globally (such as promotion decisions) in the combined index itself for consumption in each back end. But an advantage of serializing out just the necessary info for each module is that the entire combined index wouldn’t need to be staged to each distributed build node. Individual Module Index Files --------------------------------------- Rather than define a new format for serializing out the globally determined information from the thin link step, we can continue to use the combined index file format. However, we can create an individual “combined” index file for each module. This better enables passing along any summary information useful for backend compilations beyond just import and export lists, which can include other linkage optimizations, and information for transformations such as CFI. It also enables leverage of much of the existing combined index bitcode interfaces and data structures. An overview on what is included in an individual “primary” module’s index file: 1) Module symbol table only includes modules imported into the primary module. 2) Summary section only includes summaries for value definitions that should be imported, as well as for definitions in the primary module. 3) Any desired linkage changes for both the primary module and imported defs are recorded in the summary entry linkage fields. Note that 1 and 2 ensure that nothing can be imported beyond those values marked promoted during the global thin link (important since that possibly requires promotion in the exporting module). Any value that is imported as a declaration (because it did not have a summary entry as per 2 above), and that has local linkage, should automatically be promoted when importing (its primary module’s index would include a summary with the promoted linkage recorded). Linkage Changes ----------------------- As described above, the linkage changes determined by the global index walk in the thin link step will be marked in the summary entries (in all individual index files containing that symbol). The back end will compare the linkage types in the index to those in the materialized bitcode (both in the primary module and in any definitions being imported) and make the necessary adjustments. Some possibilities include: A. Promotion: Index will indicate external linkage, so local value will be promoted and renamed. For imported declarations, any that are local will be promoted. B. Avoiding promotion by forced import: Used when the thin link step determines it is better to force an import of a static definition and leave it static. The index will indicate local linkage, so linkage type in IR will not be changed when it is imported (or when compiling the exporting module). C. Internalization by forced import: If an external symbol has 1 or only a very small number of external references, and all referring modules decide to import that definition, the thin link analysis could decide that it is better to leave all copies local. The index would indicate local linkage, and the linkage type in the IR would then be changed to local when it is imported (and when compiling the exporting module) D. LinkOnce -> Weak/AvailableExternally: This is a compile time optimization to avoid unnecessarily keeping multiple copies of a LinkOnce value. Linkage is marked in index, and again adjusted in the backends since it will be different than the initial linkage after parsing. Note that pcc has made a proposal to do some of the ThinLTO promotion and renaming up front in the compile step, so that some functions can be eagerly compiled into text (see http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html). However, that will only apply to locals referenced by functions that are deemed unlikely to import or be exported. The remaining locals can still be promoted lazily. Importing Strategy ------------------------ Strategy 1: Import exactly those defs for which we have summaries Could use simplified/reduced summaries that strip the ref/call edges, since they won’t be used by the backends. Strategy 2: Allow the importer some flexibility to modify import decisions In case we find situations where it is better to let the importer to adjust decisions based on full information (not yet known whether we need this flexibility, but I don’t want to remove this possibility until after more performance tuning is done on large apps). The modified decisions must be legal based on the linkage changes decided on during the thin link step (described in A-D in prior section): A. Promotion - Since we can only import at most the values for which we were given summaries, which were known to be exported at link time, we can safely ratchet down the amount of importing without rendering those promotion decisions incorrect (some promotions may have been unnecessary if we decide not to import something, but they are not wrong from a correctness standpoint). B&C. Avoiding promotion or internalization decisions - these rely on forced import of the local or (to be) internalized values. Simply force import anything with a summary that is marked as having local linkage in the summary. D. LinkOnce -> Weak/AvailableExternally - these are not based on importing and are unaffected by the importer’s decisions. Incremental Builds ------------------------- A backend compilation needs to be rebuilt when it’s individual “combined” index changes (it includes the module hashes of all relevant modules, including the importing module, as well as all linkage decisions). -- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160414/b6a26f8e/attachment.html>
Mehdi Amini via llvm-dev
2016-Apr-15 08:43 UTC
[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
Hi Teresa, Thanks for summarizing and formalizing our discussion on IRC.> On Apr 14, 2016, at 7:08 AM, Teresa Johnson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Below is a proposal for refining the way we communicate between the ThinLTO link step (the combined indexing step) and the backend processes that do the actual importing and other summary-based optimizations in a distributed backend process. > > Mehdi, let me know if this addresses your concerns. > > Peter, PTAL from the standpoint of any summary extensions needed for CFI and make sure they can fit into this model. > > Thanks, > Teresa > > > Background > ---------------- > > Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current module in FunctionImport pass) triggered a discussion (mostly over IRC) on how best to determine import/export decisions in distributed back end compiles. > > Import and export decisions are made by traversing the combined index. The actual importing happens in the FunctionImporter class, which is passed the set of values to import. The importer class is either invoked directly on each backend compile, which happens in the threads launched in the libLTO path, or via the FunctionImportPass. > > The pass is currently used by the opt tool, by the gold-plugin when it launches ThinLTO threads for single machine parallelism, and via clang when invoked with a bitcode input file and the -fthinlto-index= option. The latter was added in r254927 to enable launching a ThinLTO backend compile in a separate distributed build process. > > Before r266125, the FunctionImportPass was walking the entire index, but ignoring the import results for all but the current module, and not using the exports list. The reason to do the full index walk is that eventually we can minimize the required static promotions in the current module (based on whether its defined values are imported elsewhere). However, this was costing a lot of compile time in each backend thread. On the other hand, Mehdi would like to use the pass for testing via the opt tool, and planned to eventually add the support for using the computed export lists to guide promotion. Therefore, the other invocations (in the gold-plugin and from clang for the distributed back ends) will need to either invoke the FunctionImporter directly (as in libLTO), passing in the import/export information, or use a new pass interface that consumes the necessary info to compute this information. > > Eventually the import/export decisions should be made a single time in the thin link step (as is currently done for libLTO which doesn’t use the pass), along with any other global summary-based decisions. The advantage is that each backend isn’t doing redundant computation, and I believe it is safer to make global decisions affecting correctness (e.g. promotion) a single time. For the gold-plugin launched threads, it should be straightforward to use the libLTO approach of computing these decisions and passing the relevant information to each backend thread via a direct invocation of the FunctionImporter, instead of using the FunctionImportPass. > > However, for distributed build backends, if the decisions are to be made a single time in the thin link step, summary based decisions need to be serialized out in order to be used by the FunctionImporter in each backend process (which could be invoked directly from clang, or via a possibly modified FunctionImportPass interface). My original plan was to mark linkage changes determined globally (such as promotion decisions) in the combined index itself for consumption in each back end. But an advantage of serializing out just the necessary info for each module is that the entire combined index wouldn’t need to be staged to each distributed build node.staged... and parsed!> > Individual Module Index Files > --------------------------------------- > > Rather than define a new format for serializing out the globally determined information from the thin link step, we can continue to use the combined index file format. However, we can create an individual “combined” index file for each module. This better enables passing along any summary information useful for backend compilations beyond just import and export lists, which can include other linkage optimizations, and information for transformations such as CFI. It also enables leverage of much of the existing combined index bitcode interfaces and data structures. > > An overview on what is included in an individual “primary” module’s index file: > 1) Module symbol table only includes modules imported into the primary module. > 2) Summary section only includes summaries for value definitions that should be imported, as well as for definitions in the primary module. > 3) Any desired linkage changes for both the primary module and imported defs are recorded in the summary entry linkage fields. > > Note that 1 and 2 ensure that nothing can be imported beyond those values marked promoted during the global thin link (important since that possibly requires promotion in the exporting module). Any value that is imported as a declaration (because it did not have a summary entry as per 2 above), and that has local linkage, should automatically be promoted when importing (its primary module’s index would include a summary with the promoted linkage recorded).Missing: "export list", i.e. which symbols needs to be preserved in this module (all the others can be turned into internal).> > Linkage Changes > ----------------------- > > As described above, the linkage changes determined by the global index walk in the thin link step will be marked in the summary entries (in all individual index files containing that symbol). The back end will compare the linkage types in the index to those in the materialized bitcode (both in the primary module and in any definitions being imported) and make the necessary adjustments. > > > Some possibilities include: > > A. Promotion: Index will indicate external linkage, so local value will be promoted and renamed. For imported declarations, any that are local will be promoted. > > B. Avoiding promotion by forced import: Used when the thin link step determines it is better to force an import of a static definition and leave it static. The index will indicate local linkage, so linkage type in IR will not be changed when it is imported (or when compiling the exporting module). > > C. Internalization by forced import: If an external symbol has 1 or only a very small number of external references, and all referring modules decide to import that definition, the thin link analysis could decide that it is better to leave all copies local. The index would indicate local linkage, and the linkage type in the IR would then be changed to local when it is imported (and when compiling the exporting module) > > D. LinkOnce -> Weak/AvailableExternally: This is a compile time optimization to avoid unnecessarily keeping multiple copies of a LinkOnce value. Linkage is marked in index, and again adjusted in the backends since it will be different than the initial linkage after parsing. > > Note that pcc has made a proposal to do some of the ThinLTO promotion and renaming up front in the compile step, so that some functions can be eagerly compiled into text (see http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html <http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html>). However, that will only apply to locals referenced by functions that are deemed unlikely to import or be exported. The remaining locals can still be promoted lazily. > > Importing Strategy > ------------------------ > > Strategy 1: Import exactly those defs for which we have summaries > > Could use simplified/reduced summaries that strip the ref/call edges, since they won’t be used by the backends. > > Strategy 2: Allow the importer some flexibility to modify import decisions > > In case we find situations where it is better to let the importer to adjust decisions based on full information (not yet known whether we need this flexibility, but I don’t want to remove this possibility until after more performance tuning is done on large apps). The modified decisions must be legal based on the linkage changes decided on during the thin link step (described in A-D in prior section): > A. Promotion - Since we can only import at most the values for which we were given summaries, which were known to be exported at link time, we can safely ratchet down the amount of importing without rendering those promotion decisions incorrect (some promotions may have been unnecessary if we decide not to import something, but they are not wrong from a correctness standpoint). > B&C. Avoiding promotion or internalization decisions - these rely on forced import of the local or (to be) internalized values. Simply force import anything with a summary that is marked as having local linkage in the summary. > D. LinkOnce -> Weak/AvailableExternally - these are not based on importing and are unaffected by the importer’s decisions. > > Incremental Builds > ------------------------- > > A backend compilation needs to be rebuilt when it’s individual “combined” index changes (it includes the module hashes of all relevant modules, including the importing module, as well as all linkage decisions).... or when the compiler itself changes :) -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160415/59d9953c/attachment.html>
Teresa Johnson via llvm-dev
2016-Apr-15 12:57 UTC
[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
On Fri, Apr 15, 2016 at 1:43 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> Hi Teresa, > > Thanks for summarizing and formalizing our discussion on IRC. > > On Apr 14, 2016, at 7:08 AM, Teresa Johnson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Below is a proposal for refining the way we communicate between the > ThinLTO link step (the combined indexing step) and the backend processes > that do the actual importing and other summary-based optimizations in a > distributed backend process. > > Mehdi, let me know if this addresses your concerns. > > Peter, PTAL from the standpoint of any summary extensions needed for CFI > and make sure they can fit into this model. > > Thanks, > Teresa > > > Background > ---------------- > > Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current > module in FunctionImport pass) triggered a discussion (mostly over IRC) on > how best to determine import/export decisions in distributed back end > compiles. > > Import and export decisions are made by traversing the combined index. The > actual importing happens in the FunctionImporter class, which is passed the > set of values to import. The importer class is either invoked directly on > each backend compile, which happens in the threads launched in the libLTO > path, or via the FunctionImportPass. > > The pass is currently used by the opt tool, by the gold-plugin when it > launches ThinLTO threads for single machine parallelism, and via clang when > invoked with a bitcode input file and the -fthinlto-index= option. The > latter was added in r254927 to enable launching a ThinLTO backend compile > in a separate distributed build process. > > Before r266125, the FunctionImportPass was walking the entire index, but > ignoring the import results for all but the current module, and not using > the exports list. The reason to do the full index walk is that eventually > we can minimize the required static promotions in the current module (based > on whether its defined values are imported elsewhere). However, this was > costing a lot of compile time in each backend thread. On the other hand, > Mehdi would like to use the pass for testing via the opt tool, and planned > to eventually add the support for using the computed export lists to guide > promotion. Therefore, the other invocations (in the gold-plugin and from > clang for the distributed back ends) will need to either invoke the > FunctionImporter directly (as in libLTO), passing in the import/export > information, or use a new pass interface that consumes the necessary info > to compute this information. > > Eventually the import/export decisions should be made a single time in the > thin link step (as is currently done for libLTO which doesn’t use the > pass), along with any other global summary-based decisions. The advantage > is that each backend isn’t doing redundant computation, and I believe it is > safer to make global decisions affecting correctness (e.g. promotion) a > single time. For the gold-plugin launched threads, it should be > straightforward to use the libLTO approach of computing these decisions and > passing the relevant information to each backend thread via a direct > invocation of the FunctionImporter, instead of using the FunctionImportPass. > > However, for distributed build backends, if the decisions are to be made a > single time in the thin link step, summary based decisions need to be > serialized out in order to be used by the FunctionImporter in each backend > process (which could be invoked directly from clang, or via a possibly > modified FunctionImportPass interface). My original plan was to mark > linkage changes determined globally (such as promotion decisions) in the > combined index itself for consumption in each back end. But an advantage of > serializing out just the necessary info for each module is that the entire > combined index wouldn’t need to be staged to each distributed build node. > > > staged... and parsed! >Yes, that is another benefit of the smaller index.> > > > Individual Module Index Files > --------------------------------------- > > Rather than define a new format for serializing out the globally > determined information from the thin link step, we can continue to use the > combined index file format. However, we can create an individual “combined” > index file for each module. This better enables passing along any summary > information useful for backend compilations beyond just import and export > lists, which can include other linkage optimizations, and information for > transformations such as CFI. It also enables leverage of much of the > existing combined index bitcode interfaces and data structures. > > An overview on what is included in an individual “primary” module’s index > file: > 1) Module symbol table only includes modules imported into the primary > module. > 2) Summary section only includes summaries for value definitions that > should be imported, as well as for definitions in the primary module. > 3) Any desired linkage changes for both the primary module and imported > defs are recorded in the summary entry linkage fields. > > Note that 1 and 2 ensure that nothing can be imported beyond those values > marked promoted during the global thin link (important since that possibly > requires promotion in the exporting module). Any value that is imported as > a declaration (because it did not have a summary entry as per 2 above), and > that has local linkage, should automatically be promoted when importing > (its primary module’s index would include a summary with the promoted > linkage recorded). > > > Missing: "export list", i.e. which symbols needs to be preserved in this > module (all the others can be turned into internal). >I don't think we need an explicit export list to do this, just note it in the linkage type. I had only covered parts of that below - noting which locals need to be promoted by marking them as having external linkage (A), and which could be internalized when then have only a couple external accesses through importing and internalization (C), but didn't cover the case where there were no external accesses. Added that below.> > > Linkage Changes > ----------------------- > > As described above, the linkage changes determined by the global index > walk in the thin link step will be marked in the summary entries (in all > individual index files containing that symbol). The back end will compare > the linkage types in the index to those in the materialized bitcode (both > in the primary module and in any definitions being imported) and make the > necessary adjustments. > > > Some possibilities include: > > A. Promotion: Index will indicate external linkage, so local value will be > promoted and renamed. For imported declarations, any that are local will be > promoted. > > B. Avoiding promotion by forced import: Used when the thin link step > determines it is better to force an import of a static definition and leave > it static. The index will indicate local linkage, so linkage type in IR > will not be changed when it is imported (or when compiling the exporting > module). > > C. Internalization by forced import: If an external symbol has 1 or only a > very small number of external references, and all referring modules decide > to import that definition, the thin link analysis could decide that it is > better to leave all copies local. The index would indicate local linkage, > and the linkage type in the IR would then be changed to local when it is > imported (and when compiling the exporting module) > > > D. LinkOnce -> Weak/AvailableExternally: This is a compile time > optimization to avoid unnecessarily keeping multiple copies of a LinkOnce > value. Linkage is marked in index, and again adjusted in the backends since > it will be different than the initial linkage after parsing. > > E. Internalization when there is no external access: If an externalsymbol has no external references, it can be internalized. The index will indicate local linkage, and the linkage type in the IR would then be changed to local whencompiling the exporting module. (This should go up between B and C actually, I will reorder them in the master doc I'll post to https://sites.google.com/site/llvmthinlto/ later today)> > Note that pcc has made a proposal to do some of the ThinLTO promotion and > renaming up front in the compile step, so that some functions can be > eagerly compiled into text (see > http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html). > However, that will only apply to locals referenced by functions that are > deemed unlikely to import or be exported. The remaining locals can still be > promoted lazily. > > Importing Strategy > ------------------------ > > Strategy 1: Import exactly those defs for which we have summaries > > Could use simplified/reduced summaries that strip the ref/call edges, > since they won’t be used by the backends. > > Strategy 2: Allow the importer some flexibility to modify import decisions > > In case we find situations where it is better to let the importer to > adjust decisions based on full information (not yet known whether we need > this flexibility, but I don’t want to remove this possibility until after > more performance tuning is done on large apps). The modified decisions must > be legal based on the linkage changes decided on during the thin link step > (described in A-D in prior section): > A. Promotion - Since we can only import at most the values for which we > were given summaries, which were known to be exported at link time, we can > safely ratchet down the amount of importing without rendering those > promotion decisions incorrect (some promotions may have been unnecessary if > we decide not to import something, but they are not wrong from a > correctness standpoint). > B&C. Avoiding promotion or internalization decisions - these rely on > forced import of the local or (to be) internalized values. Simply force > import anything with a summary that is marked as having local linkage in > the summary. > D. LinkOnce -> Weak/AvailableExternally - these are not based on > importing and are unaffected by the importer’s decisions. > > Incremental Builds > ------------------------- > > A backend compilation needs to be rebuilt when it’s individual “combined” > index changes (it includes the module hashes of all relevant modules, > including the importing module, as well as all linkage decisions). > > > ... or when the compiler itself changes :) >Sure, didn't mention that here since it is not affected by ThinLTO. I'll add it for completeness though. Thanks, Teresa> > > > -- > Mehdi > >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160415/777408ac/attachment-0001.html>
Peter Collingbourne via llvm-dev
2016-May-05 22:51 UTC
[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
On Thu, Apr 14, 2016 at 7:08 AM, Teresa Johnson <tejohnson at google.com> wrote:> Hi all, > > Below is a proposal for refining the way we communicate between the > ThinLTO link step (the combined indexing step) and the backend processes > that do the actual importing and other summary-based optimizations in a > distributed backend process. > > Mehdi, let me know if this addresses your concerns. > > Peter, PTAL from the standpoint of any summary extensions needed for CFI > and make sure they can fit into this model. > > Thanks, > Teresa > > > Background > ---------------- > > Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current > module in FunctionImport pass) triggered a discussion (mostly over IRC) on > how best to determine import/export decisions in distributed back end > compiles. > > Import and export decisions are made by traversing the combined index. The > actual importing happens in the FunctionImporter class, which is passed the > set of values to import. The importer class is either invoked directly on > each backend compile, which happens in the threads launched in the libLTO > path, or via the FunctionImportPass. > > The pass is currently used by the opt tool, by the gold-plugin when it > launches ThinLTO threads for single machine parallelism, and via clang when > invoked with a bitcode input file and the -fthinlto-index= option. The > latter was added in r254927 to enable launching a ThinLTO backend compile > in a separate distributed build process. > > Before r266125, the FunctionImportPass was walking the entire index, but > ignoring the import results for all but the current module, and not using > the exports list. The reason to do the full index walk is that eventually > we can minimize the required static promotions in the current module (based > on whether its defined values are imported elsewhere). However, this was > costing a lot of compile time in each backend thread. On the other hand, > Mehdi would like to use the pass for testing via the opt tool, and planned > to eventually add the support for using the computed export lists to guide > promotion. Therefore, the other invocations (in the gold-plugin and from > clang for the distributed back ends) will need to either invoke the > FunctionImporter directly (as in libLTO), passing in the import/export > information, or use a new pass interface that consumes the necessary info > to compute this information. > > Eventually the import/export decisions should be made a single time in the > thin link step (as is currently done for libLTO which doesn’t use the > pass), along with any other global summary-based decisions. The advantage > is that each backend isn’t doing redundant computation, and I believe it is > safer to make global decisions affecting correctness (e.g. promotion) a > single time. For the gold-plugin launched threads, it should be > straightforward to use the libLTO approach of computing these decisions and > passing the relevant information to each backend thread via a direct > invocation of the FunctionImporter, instead of using the FunctionImportPass. > > However, for distributed build backends, if the decisions are to be made a > single time in the thin link step, summary based decisions need to be > serialized out in order to be used by the FunctionImporter in each backend > process (which could be invoked directly from clang, or via a possibly > modified FunctionImportPass interface). My original plan was to mark > linkage changes determined globally (such as promotion decisions) in the > combined index itself for consumption in each back end. But an advantage of > serializing out just the necessary info for each module is that the entire > combined index wouldn’t need to be staged to each distributed build node. > > Individual Module Index Files > --------------------------------------- > > Rather than define a new format for serializing out the globally > determined information from the thin link step, we can continue to use the > combined index file format. However, we can create an individual “combined” > index file for each module. This better enables passing along any summary > information useful for backend compilations beyond just import and export > lists, which can include other linkage optimizations, and information for > transformations such as CFI. It also enables leverage of much of the > existing combined index bitcode interfaces and data structures. > > An overview on what is included in an individual “primary” module’s index > file: > 1) Module symbol table only includes modules imported into the primary > module. > 2) Summary section only includes summaries for value definitions that > should be imported, as well as for definitions in the primary module. > 3) Any desired linkage changes for both the primary module and imported > defs are recorded in the summary entry linkage fields. > > Note that 1 and 2 ensure that nothing can be imported beyond those values > marked promoted during the global thin link (important since that possibly > requires promotion in the exporting module). Any value that is imported as > a declaration (because it did not have a summary entry as per 2 above), and > that has local linkage, should automatically be promoted when importing > (its primary module’s index would include a summary with the promoted > linkage recorded). >I like the idea of individual module indices, especially how you propose to use them for incremental builds. The summary information for CFI and vtable opt would be a little different from what is currently used for importing, as it would not be based on symbol names. I don't see this as being a problem though. In the case of CFI, it would be a set of the bitset names mentioned in @llvm.bitset.test calls in that module, and in the case of vtable opt, it would be a set of (name, offset) pairs (i.e. "vtable slots"). The combined module summary would hold a mapping from keys of those types to the lowering information for that key. We can build the individual module indices by filtering the combined module summary against the sets in the individual summaries. Peter Linkage Changes> ----------------------- > > As described above, the linkage changes determined by the global index > walk in the thin link step will be marked in the summary entries (in all > individual index files containing that symbol). The back end will compare > the linkage types in the index to those in the materialized bitcode (both > in the primary module and in any definitions being imported) and make the > necessary adjustments. > > > Some possibilities include: > > A. Promotion: Index will indicate external linkage, so local value will be > promoted and renamed. For imported declarations, any that are local will be > promoted. > > B. Avoiding promotion by forced import: Used when the thin link step > determines it is better to force an import of a static definition and leave > it static. The index will indicate local linkage, so linkage type in IR > will not be changed when it is imported (or when compiling the exporting > module). > > C. Internalization by forced import: If an external symbol has 1 or only a > very small number of external references, and all referring modules decide > to import that definition, the thin link analysis could decide that it is > better to leave all copies local. The index would indicate local linkage, > and the linkage type in the IR would then be changed to local when it is > imported (and when compiling the exporting module) > > D. LinkOnce -> Weak/AvailableExternally: This is a compile time > optimization to avoid unnecessarily keeping multiple copies of a LinkOnce > value. Linkage is marked in index, and again adjusted in the backends since > it will be different than the initial linkage after parsing. > > Note that pcc has made a proposal to do some of the ThinLTO promotion and > renaming up front in the compile step, so that some functions can be > eagerly compiled into text (see > http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html). > However, that will only apply to locals referenced by functions that are > deemed unlikely to import or be exported. The remaining locals can still be > promoted lazily. > > Importing Strategy > ------------------------ > > Strategy 1: Import exactly those defs for which we have summaries > > Could use simplified/reduced summaries that strip the ref/call edges, > since they won’t be used by the backends. > > Strategy 2: Allow the importer some flexibility to modify import decisions > > In case we find situations where it is better to let the importer to > adjust decisions based on full information (not yet known whether we need > this flexibility, but I don’t want to remove this possibility until after > more performance tuning is done on large apps). The modified decisions must > be legal based on the linkage changes decided on during the thin link step > (described in A-D in prior section): > A. Promotion - Since we can only import at most the values for which we > were given summaries, which were known to be exported at link time, we can > safely ratchet down the amount of importing without rendering those > promotion decisions incorrect (some promotions may have been unnecessary if > we decide not to import something, but they are not wrong from a > correctness standpoint). > B&C. Avoiding promotion or internalization decisions - these rely on > forced import of the local or (to be) internalized values. Simply force > import anything with a summary that is marked as having local linkage in > the summary. > D. LinkOnce -> Weak/AvailableExternally - these are not based on > importing and are unaffected by the importer’s decisions. > > Incremental Builds > ------------------------- > > A backend compilation needs to be rebuilt when it’s individual “combined” > index changes (it includes the module hashes of all relevant modules, > including the importing module, as well as all linkage decisions). > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 > > > > >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160505/2f6440cb/attachment-0001.html>
Teresa Johnson via llvm-dev
2016-May-06 20:24 UTC
[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
On Thu, May 5, 2016 at 3:51 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> On Thu, Apr 14, 2016 at 7:08 AM, Teresa Johnson <tejohnson at google.com> > wrote: > >> Hi all, >> >> Below is a proposal for refining the way we communicate between the >> ThinLTO link step (the combined indexing step) and the backend processes >> that do the actual importing and other summary-based optimizations in a >> distributed backend process. >> >> Mehdi, let me know if this addresses your concerns. >> >> Peter, PTAL from the standpoint of any summary extensions needed for CFI >> and make sure they can fit into this model. >> >> Thanks, >> Teresa >> >> >> Background >> ---------------- >> >> Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current >> module in FunctionImport pass) triggered a discussion (mostly over IRC) on >> how best to determine import/export decisions in distributed back end >> compiles. >> >> Import and export decisions are made by traversing the combined index. >> The actual importing happens in the FunctionImporter class, which is passed >> the set of values to import. The importer class is either invoked directly >> on each backend compile, which happens in the threads launched in the >> libLTO path, or via the FunctionImportPass. >> >> The pass is currently used by the opt tool, by the gold-plugin when it >> launches ThinLTO threads for single machine parallelism, and via clang when >> invoked with a bitcode input file and the -fthinlto-index= option. The >> latter was added in r254927 to enable launching a ThinLTO backend compile >> in a separate distributed build process. >> >> Before r266125, the FunctionImportPass was walking the entire index, but >> ignoring the import results for all but the current module, and not using >> the exports list. The reason to do the full index walk is that eventually >> we can minimize the required static promotions in the current module (based >> on whether its defined values are imported elsewhere). However, this was >> costing a lot of compile time in each backend thread. On the other hand, >> Mehdi would like to use the pass for testing via the opt tool, and planned >> to eventually add the support for using the computed export lists to guide >> promotion. Therefore, the other invocations (in the gold-plugin and from >> clang for the distributed back ends) will need to either invoke the >> FunctionImporter directly (as in libLTO), passing in the import/export >> information, or use a new pass interface that consumes the necessary info >> to compute this information. >> >> Eventually the import/export decisions should be made a single time in >> the thin link step (as is currently done for libLTO which doesn’t use the >> pass), along with any other global summary-based decisions. The advantage >> is that each backend isn’t doing redundant computation, and I believe it is >> safer to make global decisions affecting correctness (e.g. promotion) a >> single time. For the gold-plugin launched threads, it should be >> straightforward to use the libLTO approach of computing these decisions and >> passing the relevant information to each backend thread via a direct >> invocation of the FunctionImporter, instead of using the FunctionImportPass. >> >> However, for distributed build backends, if the decisions are to be made >> a single time in the thin link step, summary based decisions need to be >> serialized out in order to be used by the FunctionImporter in each backend >> process (which could be invoked directly from clang, or via a possibly >> modified FunctionImportPass interface). My original plan was to mark >> linkage changes determined globally (such as promotion decisions) in the >> combined index itself for consumption in each back end. But an advantage of >> serializing out just the necessary info for each module is that the entire >> combined index wouldn’t need to be staged to each distributed build node. >> >> Individual Module Index Files >> --------------------------------------- >> >> Rather than define a new format for serializing out the globally >> determined information from the thin link step, we can continue to use the >> combined index file format. However, we can create an individual “combined” >> index file for each module. This better enables passing along any summary >> information useful for backend compilations beyond just import and export >> lists, which can include other linkage optimizations, and information for >> transformations such as CFI. It also enables leverage of much of the >> existing combined index bitcode interfaces and data structures. >> >> An overview on what is included in an individual “primary” module’s index >> file: >> 1) Module symbol table only includes modules imported into the primary >> module. >> 2) Summary section only includes summaries for value definitions that >> should be imported, as well as for definitions in the primary module. >> 3) Any desired linkage changes for both the primary module and imported >> defs are recorded in the summary entry linkage fields. >> >> Note that 1 and 2 ensure that nothing can be imported beyond those values >> marked promoted during the global thin link (important since that possibly >> requires promotion in the exporting module). Any value that is imported as >> a declaration (because it did not have a summary entry as per 2 above), and >> that has local linkage, should automatically be promoted when importing >> (its primary module’s index would include a summary with the promoted >> linkage recorded). >> > > I like the idea of individual module indices, especially how you propose > to use them for incremental builds. > > The summary information for CFI and vtable opt would be a little different > from what is currently used for importing, as it would not be based on > symbol names. I don't see this as being a problem though. In the case of > CFI, it would be a set of the bitset names mentioned in @llvm.bitset.test > calls in that module, and in the case of vtable opt, it would be a set of > (name, offset) pairs (i.e. "vtable slots"). The combined module summary > would hold a mapping from keys of those types to the lowering information > for that key. We can build the individual module indices by filtering the > combined module summary against the sets in the individual summaries. >By "the sets in the individual summaries" do you mean the summaries that are going to go into that individual index file (the summaries for things originally defined in that module + every definition it will import)? Are these keys unique across the linked bitcode? I.e. will we ever see the same bitset name in multiple input bitcode files that need to be disambiguated in the combined index? Teresa> Peter > > Linkage Changes >> ----------------------- >> >> As described above, the linkage changes determined by the global index >> walk in the thin link step will be marked in the summary entries (in all >> individual index files containing that symbol). The back end will compare >> the linkage types in the index to those in the materialized bitcode (both >> in the primary module and in any definitions being imported) and make the >> necessary adjustments. >> >> >> Some possibilities include: >> >> A. Promotion: Index will indicate external linkage, so local value will >> be promoted and renamed. For imported declarations, any that are local will >> be promoted. >> >> B. Avoiding promotion by forced import: Used when the thin link step >> determines it is better to force an import of a static definition and leave >> it static. The index will indicate local linkage, so linkage type in IR >> will not be changed when it is imported (or when compiling the exporting >> module). >> >> C. Internalization by forced import: If an external symbol has 1 or only >> a very small number of external references, and all referring modules >> decide to import that definition, the thin link analysis could decide that >> it is better to leave all copies local. The index would indicate local >> linkage, and the linkage type in the IR would then be changed to local when >> it is imported (and when compiling the exporting module) >> >> D. LinkOnce -> Weak/AvailableExternally: This is a compile time >> optimization to avoid unnecessarily keeping multiple copies of a LinkOnce >> value. Linkage is marked in index, and again adjusted in the backends since >> it will be different than the initial linkage after parsing. >> >> Note that pcc has made a proposal to do some of the ThinLTO promotion and >> renaming up front in the compile step, so that some functions can be >> eagerly compiled into text (see >> http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html). >> However, that will only apply to locals referenced by functions that are >> deemed unlikely to import or be exported. The remaining locals can still be >> promoted lazily. >> >> Importing Strategy >> ------------------------ >> >> Strategy 1: Import exactly those defs for which we have summaries >> >> Could use simplified/reduced summaries that strip the ref/call edges, >> since they won’t be used by the backends. >> >> Strategy 2: Allow the importer some flexibility to modify import decisions >> >> In case we find situations where it is better to let the importer to >> adjust decisions based on full information (not yet known whether we need >> this flexibility, but I don’t want to remove this possibility until after >> more performance tuning is done on large apps). The modified decisions must >> be legal based on the linkage changes decided on during the thin link step >> (described in A-D in prior section): >> A. Promotion - Since we can only import at most the values for which >> we were given summaries, which were known to be exported at link time, we >> can safely ratchet down the amount of importing without rendering those >> promotion decisions incorrect (some promotions may have been unnecessary if >> we decide not to import something, but they are not wrong from a >> correctness standpoint). >> B&C. Avoiding promotion or internalization decisions - these rely on >> forced import of the local or (to be) internalized values. Simply force >> import anything with a summary that is marked as having local linkage in >> the summary. >> D. LinkOnce -> Weak/AvailableExternally - these are not based on >> importing and are unaffected by the importer’s decisions. >> >> Incremental Builds >> ------------------------- >> >> A backend compilation needs to be rebuilt when it’s individual “combined” >> index changes (it includes the module hashes of all relevant modules, >> including the importing module, as well as all linkage decisions). >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 >> >> >> >> >> > > > -- > -- > Peter >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160506/a131e8a8/attachment-0001.html>