On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg <alexr at leftfield.org> wrote:> "ELF-wrapped bitcode" seems potentially controversial to me. > > What about ar, nm, and various ld implementations adds this requirement? > What about the LLVM implementations of these tools is lacking? >Sorry I can not parse your questions properly. Can you make it clearer? David> > Alex > > > On May 13, 2015, at 7:44 PM, Teresa Johnson <tejohnson at google.com> > wrote: > > > > I've included below an RFC for implementing ThinLTO in LLVM, looking > > forward to feedback and questions. > > Thanks! > > Teresa > > > > > > > > RFC to discuss plans for implementing ThinLTO upstream. Background can > > be found in slides from EuroLLVM 2015: > > > https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0) > > As described in the talk, we have a prototype implementation, and > > would like to start staging patches upstream. This RFC describes a > > breakdown of the major pieces. We would like to commit upstream > > gradually in several stages, with all functionality off by default. > > The core ThinLTO importing support and tuning will require frequent > > change and iteration during testing and tuning, and for that part we > > would like to commit rapidly (off by default). See the proposed staged > > implementation described in the Implementation Plan section. > > > > > > ThinLTO Overview > > =============> > > > See the talk slides linked above for more details. The following is a > > high-level overview of the motivation. > > > > Cross Module Optimization (CMO) is an effective means for improving > > runtime performance, by extending the scope of optimizations across > > source module boundaries. Without CMO, the compiler is limited to > > optimizing within the scope of single source modules. Two solutions > > for enabling CMO are Link-Time Optimization (LTO), which is currently > > supported in LLVM and GCC, and Lightweight-Interprocedural > > Optimization (LIPO). However, each of these solutions has limitations > > that prevent it from being enabled by default. ThinLTO is a new > > approach that attempts to address these limitations, with a goal of > > being enabled more broadly. ThinLTO is designed with many of the same > > principals as LIPO, and therefore its advantages, without any of its > > inherent weakness. Unlike in LIPO where the module group decision is > > made at profile training runtime, ThinLTO makes the decision at > > compile time, but in a lazy mode that facilitates large scale > > parallelism. The serial linker plugin phase is designed to be razor > > thin and blazingly fast. By default this step only does minimal > > preparation work to enable the parallel lazy importing performed > > later. ThinLTO aims to be scalable like a regular O2 build, enabling > > CMO on machines without large memory configurations, while also > > integrating well with distributed build systems. Results from early > > prototyping on SPEC cpu2006 C++ benchmarks are in line with > > expectations that ThinLTO can scale like O2 while enabling much of the > > CMO performed during a full LTO build. > > > > > > A ThinLTO build is divided into 3 phases, which are referred to in the > > following implementation plan: > > > > phase-1: IR and Function Summary Generation (-c compile) > > phase-2: Thin Linker Plugin Layer (thin archive linker step) > > phase-3: Parallel Backend with Demand-Driven Importing > > > > > > Implementation Plan > > ===============> > > > This section gives a high-level breakdown of the ThinLTO support that > > will be added, in roughly the order that the patches would be staged. > > The patches are divided into three stages. The first stage contains a > > minimal amount of preparation work that is not ThinLTO-specific. The > > second stage contains most of the infrastructure for ThinLTO, which > > will be off by default. The third stage includes > > enhancements/improvements/tunings that can be performed after the main > > ThinLTO infrastructure is in. > > > > The second and third implementation stages will initially be very > > volatile, requiring a lot of iterations and tuning with large apps to > > get stabilized. Therefore it will be important to do fast commits for > > these implementation stages. > > > > > > 1. Stage 1: Preparation > > ------------------------------- > > > > The first planned sets of patches are enablers for ThinLTO work: > > > > > > a. LTO directory structure: > > > > Restructure the LTO directory to remove circular dependence when > > ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass > > within Transforms/IPO, and leverages the LTOModule class for linking > > in functions from modules, IPO then requires the LTO library. This > > creates a circular dependence between LTO and IPO. To break that, we > > need to split the lib/LTO directory/library into lib/LTO/CodeGen and > > lib/LTO/Module, containing LTOCodeGenerator and LTOModule, > > respectively. Only LTOCodeGenerator has a dependence on IPO, removing > > the circular dependence. > > > > > > b. ELF wrapper generation support: > > > > Implement ELF wrapped bitcode writer. In order to more easily interact > > with tools such as $AR, $NM, and “$LD -r” we plan to emit the phase-1 > > bitcode wrapped in ELF via the .llvmbc section, along with a symbol > > table. The goal is both to interact with these tools without requiring > > a plugin, and also to avoid doing partial LTO/ThinLTO across files > > linked with “$LD -r” (i.e. the resulting object file should still > > contain ELF-wrapped bitcode to enable ThinLTO at the full link step). > > I will send a separate design document for these changes, but the > > following is a high-level overview. > > > > Support was added to LLVM for reading ELF-wrapped bitcode > > (http://reviews.llvm.org/rL218078), but there does not yet exist > > support in LLVM/Clang for emitting bitcode wrapped in ELF. I plan to > > add support for optionally generating bitcode in an ELF file > > containing a single .llvmbc section holding the bitcode. Specifically, > > the patch would add new options “emit-llvm-bc-elf” (object file) and > > corresponding “emit-llvm-elf” (textual assembly code equivalent). > > Eventually these would be automatically triggered under “-fthinlto -c” > > and “-fthinlto -S”, respectively. > > > > Additionally, a symbol table will be generated in the ELF file, > > holding the function symbols within the bitcode. This facilitates > > handling archives of the ELF-wrapped bitcode created with $AR, since > > the archive will have a symbol table as well. The archive symbol table > > enables gold to extract and pass to the plugin the constituent > > ELF-wrapped bitcode files. To support the concatenated llvmbc section > > generated by “$LD -r”, some handling needs to be added to gold and to > > the backend driver to process each original module’s bitcode. > > > > The function index/summary will later be added as a special ELF > > section alongside the .llvmbc sections. > > > > > > 2. Stage 2: ThinLTO Infrastructure > > ---------------------------------------------- > > > > The next set of patches adds the base implementation of the ThinLTO > > infrastructure, specifically those required to make ThinLTO functional > > and generate correct but not necessarily high-performing binaries. It > > also does not include support to make debug support under -g efficient > > with ThinLTO. > > > > > > a. Clang/LLVM/gold linker options: > > > > An early set of clang/llvm patches is needed to provide options to > > enable ThinLTO (off by default), so that the rest of the > > implementation can be disabled by default as it is added. > > Specifically, clang options -fthinlto (used instead of -flto) will > > cause clang to invoke the phase-1 emission of LLVM bitcode and > > function summary/index on a compile step, and pass the appropriate > > option to the gold plugin on a link step. The -thinlto option will be > > added to the gold plugin and llvm-lto tool to launch the phase-2 thin > > archive step. The -thinlto option will also be added to the ‘opt’ tool > > to invoke it as a phase-3 parallel backend instance. > > > > > > b. Thin-archive linking support in Gold plugin and llvm-lto: > > > > Under the new plugin option (see above), the plugin needs to perform > > the phase-2 (thin archive) link which simply emits a combined function > > map from the linked modules, without actually performing the normal > > link. Corresponding support should be added to the standalone llvm-lto > > tool to enable testing/debugging without involving the linker and > > plugin. > > > > > > c. ThinLTO backend support: > > > > Support for invoking a phase-3 backend invocation (including > > importing) on a module should be added to the ‘opt’ tool under the new > > option. The main change under the option is to instantiate a Linker > > object used to manage the process of linking imported functions into > > the module, efficient read of the combined function map, and enable > > the ThinLTO import pass. > > > > > > d. Function index/summary support: > > > > This includes infrastructure for writing and reading the function > > index/summary section. As noted earlier this will be encoded in a > > special ELF section within the module, alongside the .llvmbc section > > containing the bitcode. The thin archive generated by phase-2 of > > ThinLTO simply contains all of the function index/summary sections > > across the linked modules, organized for efficient function lookup. > > > > Each function available for importing from the module contains an > > entry in the module’s function index/summary section and in the > > resulting combined function map. Each function entry contains that > > function’s offset within the bitcode file, used to efficiently locate > > and quickly import just that function. The entry also contains summary > > information (e.g. basic information determined during parsing such as > > the number of instructions in the function), that will be used to help > > guide later import decisions. Because the contents of this section > > will change frequently during ThinLTO tuning, it should also be marked > > with a version id for backwards compatibility or version checking. > > > > > > e. ThinLTO importing support: > > > > Support for the mechanics of importing functions from other modules, > > which can go in gradually as a set of patches since it will be off by > > default. Separate patches can include: > > > > - BitcodeReader changes to use function index to import/deserialize > > single function of interest (small changes, leverages existing lazy > > streamer support). > > > > - Minor LTOModule changes to pass the ThinLTO function to import and > > its index into bitcode reader. > > > > - Marking of imported functions (for use in ThinLTO-specific symbol > > linking and global DCE, for example). This can be in-memory initially, > > but IR support may be required in order to support streaming bitcode > > out and back in again after importing. > > > > - ModuleLinker changes to do ThinLTO-specific symbol linking and > > static promotion when necessary. The linkage type of imported > > functions changes to AvailableExternallyLinkage, for example. Statics > > must be promoted in certain cases, and renamed in consistent ways. > > > > - GlobalDCE changes to support removing imported functions that were > > not inlined (very small changes to existing pass logic). > > > > > > f. ThinLTO Import Driver SCC pass: > > > > Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via > > an SCC pass, enabled only under -fthinlto options. The pass includes > > utilizing the thin archive (global function index/summary), import > > decision heuristics, invocation of LTOModule/ModuleLinker routines > > that perform the import, and any necessary callgraph updates and > > verification. > > > > > > g. Backend Driver: > > > > For a single node build, the gold plugin can simply write a makefile > > and fork the parallel backend instances directly via parallel make. > > > > > > 3. Stage 3: ThinLTO Tuning and Enhancements > > ---------------------------------------------------------------- > > > > This refers to the patches that are not required for ThinLTO to work, > > but rather to improve compile time, memory, run-time performance and > > usability. > > > > > > a. Lazy Debug Metadata Linking: > > > > The prototype implementation included lazy importing of module-level > > metadata during the ThinLTO pass finalization (i.e. after all function > > importing is complete). This actually applies to all module-level > > metadata, not just debug, although it is the largest. This can be > > added as a separate set of patches. Changes to BitcodeReader, > > ValueMapper, ModuleLinker > > > > > > b. Import Tuning: > > > > Tuning the import strategy will be an iterative process that will > > continue to be refined over time. It involves several different types > > of changes: adding support for recording additional metrics in the > > function summary, such as profile data and optional heavier-weight IPA > > analyses, and tuning the import heuristics based on the summary and > > callsite context. > > > > > > c. Combined Function Map Pruning: > > > > The combined function map can be pruned of functions that are unlikely > > to benefit from being imported. For example, during the phase-2 thin > > archive plug step we can safely omit large and (with profile data) > > cold functions, which are unlikely to benefit from being inlined. > > Additionally, all but one copy of comdat functions can be suppressed. > > > > > > d. Distributed Build System Integration: > > > > For a distributed build system, the gold plugin should write the > > parallel backend invocations into a makefile, including the mapping > > from the IR file to the real object file path, and exit. Additional > > work needs to be done in the distributed build system itself to > > distribute and dispatch the parallel backend jobs to the build > > cluster. > > > > > > e. Dependence Tracking and Incremental Compiles: > > > > In order to support build systems that stage from local disks or > > network storage, the plugin will optionally support computation of > > dependent sets of IR files that each module may import from. This can > > be computed from profile data, if it exists, or from the symbol table > > and heuristics if not. These dependence sets also enable support for > > incremental backend compiles. > > > > > > > > -- > > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/a9b3ea1a/attachment.html>
On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> > > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg <alexr at leftfield.org> > wrote: >> >> "ELF-wrapped bitcode" seems potentially controversial to me. >> >> What about ar, nm, and various ld implementations adds this requirement? >> What about the LLVM implementations of these tools is lacking? > > > Sorry I can not parse your questions properly. Can you make it clearer?Alex is asking what the issue is with ar, nm, ld -r and regular bitcode that makes using elf-wrapped bitcode easier. The issue is that generally you need to provide a plugin to these tools in order for them to understand and handle bitcode files. We'd like standard tools to work without requiring a plugin as much as possible. And in some cases we want them to be handled different than the way bitcode files are handled with the plugin. nm: Without a plugin, normal bitcode files are inscrutable. When provided the gold plugin it can emit the symbols. ar: Without a plugin, it will create an archive of bitcode files, but without an index, so it can't be handled by the linker even with a plugin on an -flto link. When ar is provided the gold plugin it does create an index, so the linker + gold plugin handle it appropriately on an -flto link. ld -r: Without a plugin, fails when provided bitcode inputs. When provided the gold plugin, it handles them but compiles them all the way through to ELF executable instructions via a partial LTO link. This is where we would like to differ in behavior (while also not requiring a plugin) with ELF-wrapped bitcode: we would like the ld -r output file to still contain ELF-wrapped bitcode, delaying the LTO until the full link step. Let me know if that helps address your concerns. Thanks, Teresa> > David > >> >> >> Alex >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson <tejohnson at google.com> >> > wrote: >> > >> > I've included below an RFC for implementing ThinLTO in LLVM, looking >> > forward to feedback and questions. >> > Thanks! >> > Teresa >> > >> > >> > >> > RFC to discuss plans for implementing ThinLTO upstream. Background can >> > be found in slides from EuroLLVM 2015: >> > >> > https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0) >> > As described in the talk, we have a prototype implementation, and >> > would like to start staging patches upstream. This RFC describes a >> > breakdown of the major pieces. We would like to commit upstream >> > gradually in several stages, with all functionality off by default. >> > The core ThinLTO importing support and tuning will require frequent >> > change and iteration during testing and tuning, and for that part we >> > would like to commit rapidly (off by default). See the proposed staged >> > implementation described in the Implementation Plan section. >> > >> > >> > ThinLTO Overview >> > =============>> > >> > See the talk slides linked above for more details. The following is a >> > high-level overview of the motivation. >> > >> > Cross Module Optimization (CMO) is an effective means for improving >> > runtime performance, by extending the scope of optimizations across >> > source module boundaries. Without CMO, the compiler is limited to >> > optimizing within the scope of single source modules. Two solutions >> > for enabling CMO are Link-Time Optimization (LTO), which is currently >> > supported in LLVM and GCC, and Lightweight-Interprocedural >> > Optimization (LIPO). However, each of these solutions has limitations >> > that prevent it from being enabled by default. ThinLTO is a new >> > approach that attempts to address these limitations, with a goal of >> > being enabled more broadly. ThinLTO is designed with many of the same >> > principals as LIPO, and therefore its advantages, without any of its >> > inherent weakness. Unlike in LIPO where the module group decision is >> > made at profile training runtime, ThinLTO makes the decision at >> > compile time, but in a lazy mode that facilitates large scale >> > parallelism. The serial linker plugin phase is designed to be razor >> > thin and blazingly fast. By default this step only does minimal >> > preparation work to enable the parallel lazy importing performed >> > later. ThinLTO aims to be scalable like a regular O2 build, enabling >> > CMO on machines without large memory configurations, while also >> > integrating well with distributed build systems. Results from early >> > prototyping on SPEC cpu2006 C++ benchmarks are in line with >> > expectations that ThinLTO can scale like O2 while enabling much of the >> > CMO performed during a full LTO build. >> > >> > >> > A ThinLTO build is divided into 3 phases, which are referred to in the >> > following implementation plan: >> > >> > phase-1: IR and Function Summary Generation (-c compile) >> > phase-2: Thin Linker Plugin Layer (thin archive linker step) >> > phase-3: Parallel Backend with Demand-Driven Importing >> > >> > >> > Implementation Plan >> > ===============>> > >> > This section gives a high-level breakdown of the ThinLTO support that >> > will be added, in roughly the order that the patches would be staged. >> > The patches are divided into three stages. The first stage contains a >> > minimal amount of preparation work that is not ThinLTO-specific. The >> > second stage contains most of the infrastructure for ThinLTO, which >> > will be off by default. The third stage includes >> > enhancements/improvements/tunings that can be performed after the main >> > ThinLTO infrastructure is in. >> > >> > The second and third implementation stages will initially be very >> > volatile, requiring a lot of iterations and tuning with large apps to >> > get stabilized. Therefore it will be important to do fast commits for >> > these implementation stages. >> > >> > >> > 1. Stage 1: Preparation >> > ------------------------------- >> > >> > The first planned sets of patches are enablers for ThinLTO work: >> > >> > >> > a. LTO directory structure: >> > >> > Restructure the LTO directory to remove circular dependence when >> > ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass >> > within Transforms/IPO, and leverages the LTOModule class for linking >> > in functions from modules, IPO then requires the LTO library. This >> > creates a circular dependence between LTO and IPO. To break that, we >> > need to split the lib/LTO directory/library into lib/LTO/CodeGen and >> > lib/LTO/Module, containing LTOCodeGenerator and LTOModule, >> > respectively. Only LTOCodeGenerator has a dependence on IPO, removing >> > the circular dependence. >> > >> > >> > b. ELF wrapper generation support: >> > >> > Implement ELF wrapped bitcode writer. In order to more easily interact >> > with tools such as $AR, $NM, and “$LD -r” we plan to emit the phase-1 >> > bitcode wrapped in ELF via the .llvmbc section, along with a symbol >> > table. The goal is both to interact with these tools without requiring >> > a plugin, and also to avoid doing partial LTO/ThinLTO across files >> > linked with “$LD -r” (i.e. the resulting object file should still >> > contain ELF-wrapped bitcode to enable ThinLTO at the full link step). >> > I will send a separate design document for these changes, but the >> > following is a high-level overview. >> > >> > Support was added to LLVM for reading ELF-wrapped bitcode >> > (http://reviews.llvm.org/rL218078), but there does not yet exist >> > support in LLVM/Clang for emitting bitcode wrapped in ELF. I plan to >> > add support for optionally generating bitcode in an ELF file >> > containing a single .llvmbc section holding the bitcode. Specifically, >> > the patch would add new options “emit-llvm-bc-elf” (object file) and >> > corresponding “emit-llvm-elf” (textual assembly code equivalent). >> > Eventually these would be automatically triggered under “-fthinlto -c” >> > and “-fthinlto -S”, respectively. >> > >> > Additionally, a symbol table will be generated in the ELF file, >> > holding the function symbols within the bitcode. This facilitates >> > handling archives of the ELF-wrapped bitcode created with $AR, since >> > the archive will have a symbol table as well. The archive symbol table >> > enables gold to extract and pass to the plugin the constituent >> > ELF-wrapped bitcode files. To support the concatenated llvmbc section >> > generated by “$LD -r”, some handling needs to be added to gold and to >> > the backend driver to process each original module’s bitcode. >> > >> > The function index/summary will later be added as a special ELF >> > section alongside the .llvmbc sections. >> > >> > >> > 2. Stage 2: ThinLTO Infrastructure >> > ---------------------------------------------- >> > >> > The next set of patches adds the base implementation of the ThinLTO >> > infrastructure, specifically those required to make ThinLTO functional >> > and generate correct but not necessarily high-performing binaries. It >> > also does not include support to make debug support under -g efficient >> > with ThinLTO. >> > >> > >> > a. Clang/LLVM/gold linker options: >> > >> > An early set of clang/llvm patches is needed to provide options to >> > enable ThinLTO (off by default), so that the rest of the >> > implementation can be disabled by default as it is added. >> > Specifically, clang options -fthinlto (used instead of -flto) will >> > cause clang to invoke the phase-1 emission of LLVM bitcode and >> > function summary/index on a compile step, and pass the appropriate >> > option to the gold plugin on a link step. The -thinlto option will be >> > added to the gold plugin and llvm-lto tool to launch the phase-2 thin >> > archive step. The -thinlto option will also be added to the ‘opt’ tool >> > to invoke it as a phase-3 parallel backend instance. >> > >> > >> > b. Thin-archive linking support in Gold plugin and llvm-lto: >> > >> > Under the new plugin option (see above), the plugin needs to perform >> > the phase-2 (thin archive) link which simply emits a combined function >> > map from the linked modules, without actually performing the normal >> > link. Corresponding support should be added to the standalone llvm-lto >> > tool to enable testing/debugging without involving the linker and >> > plugin. >> > >> > >> > c. ThinLTO backend support: >> > >> > Support for invoking a phase-3 backend invocation (including >> > importing) on a module should be added to the ‘opt’ tool under the new >> > option. The main change under the option is to instantiate a Linker >> > object used to manage the process of linking imported functions into >> > the module, efficient read of the combined function map, and enable >> > the ThinLTO import pass. >> > >> > >> > d. Function index/summary support: >> > >> > This includes infrastructure for writing and reading the function >> > index/summary section. As noted earlier this will be encoded in a >> > special ELF section within the module, alongside the .llvmbc section >> > containing the bitcode. The thin archive generated by phase-2 of >> > ThinLTO simply contains all of the function index/summary sections >> > across the linked modules, organized for efficient function lookup. >> > >> > Each function available for importing from the module contains an >> > entry in the module’s function index/summary section and in the >> > resulting combined function map. Each function entry contains that >> > function’s offset within the bitcode file, used to efficiently locate >> > and quickly import just that function. The entry also contains summary >> > information (e.g. basic information determined during parsing such as >> > the number of instructions in the function), that will be used to help >> > guide later import decisions. Because the contents of this section >> > will change frequently during ThinLTO tuning, it should also be marked >> > with a version id for backwards compatibility or version checking. >> > >> > >> > e. ThinLTO importing support: >> > >> > Support for the mechanics of importing functions from other modules, >> > which can go in gradually as a set of patches since it will be off by >> > default. Separate patches can include: >> > >> > - BitcodeReader changes to use function index to import/deserialize >> > single function of interest (small changes, leverages existing lazy >> > streamer support). >> > >> > - Minor LTOModule changes to pass the ThinLTO function to import and >> > its index into bitcode reader. >> > >> > - Marking of imported functions (for use in ThinLTO-specific symbol >> > linking and global DCE, for example). This can be in-memory initially, >> > but IR support may be required in order to support streaming bitcode >> > out and back in again after importing. >> > >> > - ModuleLinker changes to do ThinLTO-specific symbol linking and >> > static promotion when necessary. The linkage type of imported >> > functions changes to AvailableExternallyLinkage, for example. Statics >> > must be promoted in certain cases, and renamed in consistent ways. >> > >> > - GlobalDCE changes to support removing imported functions that were >> > not inlined (very small changes to existing pass logic). >> > >> > >> > f. ThinLTO Import Driver SCC pass: >> > >> > Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via >> > an SCC pass, enabled only under -fthinlto options. The pass includes >> > utilizing the thin archive (global function index/summary), import >> > decision heuristics, invocation of LTOModule/ModuleLinker routines >> > that perform the import, and any necessary callgraph updates and >> > verification. >> > >> > >> > g. Backend Driver: >> > >> > For a single node build, the gold plugin can simply write a makefile >> > and fork the parallel backend instances directly via parallel make. >> > >> > >> > 3. Stage 3: ThinLTO Tuning and Enhancements >> > ---------------------------------------------------------------- >> > >> > This refers to the patches that are not required for ThinLTO to work, >> > but rather to improve compile time, memory, run-time performance and >> > usability. >> > >> > >> > a. Lazy Debug Metadata Linking: >> > >> > The prototype implementation included lazy importing of module-level >> > metadata during the ThinLTO pass finalization (i.e. after all function >> > importing is complete). This actually applies to all module-level >> > metadata, not just debug, although it is the largest. This can be >> > added as a separate set of patches. Changes to BitcodeReader, >> > ValueMapper, ModuleLinker >> > >> > >> > b. Import Tuning: >> > >> > Tuning the import strategy will be an iterative process that will >> > continue to be refined over time. It involves several different types >> > of changes: adding support for recording additional metrics in the >> > function summary, such as profile data and optional heavier-weight IPA >> > analyses, and tuning the import heuristics based on the summary and >> > callsite context. >> > >> > >> > c. Combined Function Map Pruning: >> > >> > The combined function map can be pruned of functions that are unlikely >> > to benefit from being imported. For example, during the phase-2 thin >> > archive plug step we can safely omit large and (with profile data) >> > cold functions, which are unlikely to benefit from being inlined. >> > Additionally, all but one copy of comdat functions can be suppressed. >> > >> > >> > d. Distributed Build System Integration: >> > >> > For a distributed build system, the gold plugin should write the >> > parallel backend invocations into a makefile, including the mapping >> > from the IR file to the real object file path, and exit. Additional >> > work needs to be done in the distributed build system itself to >> > distribute and dispatch the parallel backend jobs to the build >> > cluster. >> > >> > >> > e. Dependence Tracking and Incremental Compiles: >> > >> > In order to support build systems that stage from local disks or >> > network storage, the plugin will optionally support computation of >> > dependent sets of IR files that each module may import from. This can >> > be computed from profile data, if it exists, or from the symbol table >> > and heuristics if not. These dependence sets also enable support for >> > incremental backend compiles. >> > >> > >> > >> > -- >> > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
So, what Alex is saying is that we have these tools as well and they understand bitcode just fine, as well as every object format - not just ELF. :) -eric On Thu, May 14, 2015, 6:55 AM Teresa Johnson <tejohnson at google.com> wrote:> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li > <xinliangli at gmail.com> wrote: > > > > > > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg <alexr at leftfield.org> > > wrote: > >> > >> "ELF-wrapped bitcode" seems potentially controversial to me. > >> > >> What about ar, nm, and various ld implementations adds this requirement? > >> What about the LLVM implementations of these tools is lacking? > > > > > > Sorry I can not parse your questions properly. Can you make it clearer? > > Alex is asking what the issue is with ar, nm, ld -r and regular > bitcode that makes using elf-wrapped bitcode easier. > > The issue is that generally you need to provide a plugin to these > tools in order for them to understand and handle bitcode files. We'd > like standard tools to work without requiring a plugin as much as > possible. And in some cases we want them to be handled different than > the way bitcode files are handled with the plugin. > > nm: Without a plugin, normal bitcode files are inscrutable. When > provided the gold plugin it can emit the symbols. > > ar: Without a plugin, it will create an archive of bitcode files, but > without an index, so it can't be handled by the linker even with a > plugin on an -flto link. When ar is provided the gold plugin it does > create an index, so the linker + gold plugin handle it appropriately > on an -flto link. > > ld -r: Without a plugin, fails when provided bitcode inputs. When > provided the gold plugin, it handles them but compiles them all the > way through to ELF executable instructions via a partial LTO link. > This is where we would like to differ in behavior (while also not > requiring a plugin) with ELF-wrapped bitcode: we would like the ld -r > output file to still contain ELF-wrapped bitcode, delaying the LTO > until the full link step. > > Let me know if that helps address your concerns. > > Thanks, > Teresa > > > > > David > > > >> > >> > >> Alex > >> > >> > On May 13, 2015, at 7:44 PM, Teresa Johnson <tejohnson at google.com> > >> > wrote: > >> > > >> > I've included below an RFC for implementing ThinLTO in LLVM, looking > >> > forward to feedback and questions. > >> > Thanks! > >> > Teresa > >> > > >> > > >> > > >> > RFC to discuss plans for implementing ThinLTO upstream. Background can > >> > be found in slides from EuroLLVM 2015: > >> > > >> > > https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0) > >> > As described in the talk, we have a prototype implementation, and > >> > would like to start staging patches upstream. This RFC describes a > >> > breakdown of the major pieces. We would like to commit upstream > >> > gradually in several stages, with all functionality off by default. > >> > The core ThinLTO importing support and tuning will require frequent > >> > change and iteration during testing and tuning, and for that part we > >> > would like to commit rapidly (off by default). See the proposed staged > >> > implementation described in the Implementation Plan section. > >> > > >> > > >> > ThinLTO Overview > >> > =============> >> > > >> > See the talk slides linked above for more details. The following is a > >> > high-level overview of the motivation. > >> > > >> > Cross Module Optimization (CMO) is an effective means for improving > >> > runtime performance, by extending the scope of optimizations across > >> > source module boundaries. Without CMO, the compiler is limited to > >> > optimizing within the scope of single source modules. Two solutions > >> > for enabling CMO are Link-Time Optimization (LTO), which is currently > >> > supported in LLVM and GCC, and Lightweight-Interprocedural > >> > Optimization (LIPO). However, each of these solutions has limitations > >> > that prevent it from being enabled by default. ThinLTO is a new > >> > approach that attempts to address these limitations, with a goal of > >> > being enabled more broadly. ThinLTO is designed with many of the same > >> > principals as LIPO, and therefore its advantages, without any of its > >> > inherent weakness. Unlike in LIPO where the module group decision is > >> > made at profile training runtime, ThinLTO makes the decision at > >> > compile time, but in a lazy mode that facilitates large scale > >> > parallelism. The serial linker plugin phase is designed to be razor > >> > thin and blazingly fast. By default this step only does minimal > >> > preparation work to enable the parallel lazy importing performed > >> > later. ThinLTO aims to be scalable like a regular O2 build, enabling > >> > CMO on machines without large memory configurations, while also > >> > integrating well with distributed build systems. Results from early > >> > prototyping on SPEC cpu2006 C++ benchmarks are in line with > >> > expectations that ThinLTO can scale like O2 while enabling much of the > >> > CMO performed during a full LTO build. > >> > > >> > > >> > A ThinLTO build is divided into 3 phases, which are referred to in the > >> > following implementation plan: > >> > > >> > phase-1: IR and Function Summary Generation (-c compile) > >> > phase-2: Thin Linker Plugin Layer (thin archive linker step) > >> > phase-3: Parallel Backend with Demand-Driven Importing > >> > > >> > > >> > Implementation Plan > >> > ===============> >> > > >> > This section gives a high-level breakdown of the ThinLTO support that > >> > will be added, in roughly the order that the patches would be staged. > >> > The patches are divided into three stages. The first stage contains a > >> > minimal amount of preparation work that is not ThinLTO-specific. The > >> > second stage contains most of the infrastructure for ThinLTO, which > >> > will be off by default. The third stage includes > >> > enhancements/improvements/tunings that can be performed after the main > >> > ThinLTO infrastructure is in. > >> > > >> > The second and third implementation stages will initially be very > >> > volatile, requiring a lot of iterations and tuning with large apps to > >> > get stabilized. Therefore it will be important to do fast commits for > >> > these implementation stages. > >> > > >> > > >> > 1. Stage 1: Preparation > >> > ------------------------------- > >> > > >> > The first planned sets of patches are enablers for ThinLTO work: > >> > > >> > > >> > a. LTO directory structure: > >> > > >> > Restructure the LTO directory to remove circular dependence when > >> > ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass > >> > within Transforms/IPO, and leverages the LTOModule class for linking > >> > in functions from modules, IPO then requires the LTO library. This > >> > creates a circular dependence between LTO and IPO. To break that, we > >> > need to split the lib/LTO directory/library into lib/LTO/CodeGen and > >> > lib/LTO/Module, containing LTOCodeGenerator and LTOModule, > >> > respectively. Only LTOCodeGenerator has a dependence on IPO, removing > >> > the circular dependence. > >> > > >> > > >> > b. ELF wrapper generation support: > >> > > >> > Implement ELF wrapped bitcode writer. In order to more easily interact > >> > with tools such as $AR, $NM, and “$LD -r” we plan to emit the phase-1 > >> > bitcode wrapped in ELF via the .llvmbc section, along with a symbol > >> > table. The goal is both to interact with these tools without requiring > >> > a plugin, and also to avoid doing partial LTO/ThinLTO across files > >> > linked with “$LD -r” (i.e. the resulting object file should still > >> > contain ELF-wrapped bitcode to enable ThinLTO at the full link step). > >> > I will send a separate design document for these changes, but the > >> > following is a high-level overview. > >> > > >> > Support was added to LLVM for reading ELF-wrapped bitcode > >> > (http://reviews.llvm.org/rL218078), but there does not yet exist > >> > support in LLVM/Clang for emitting bitcode wrapped in ELF. I plan to > >> > add support for optionally generating bitcode in an ELF file > >> > containing a single .llvmbc section holding the bitcode. Specifically, > >> > the patch would add new options “emit-llvm-bc-elf” (object file) and > >> > corresponding “emit-llvm-elf” (textual assembly code equivalent). > >> > Eventually these would be automatically triggered under “-fthinlto -c” > >> > and “-fthinlto -S”, respectively. > >> > > >> > Additionally, a symbol table will be generated in the ELF file, > >> > holding the function symbols within the bitcode. This facilitates > >> > handling archives of the ELF-wrapped bitcode created with $AR, since > >> > the archive will have a symbol table as well. The archive symbol table > >> > enables gold to extract and pass to the plugin the constituent > >> > ELF-wrapped bitcode files. To support the concatenated llvmbc section > >> > generated by “$LD -r”, some handling needs to be added to gold and to > >> > the backend driver to process each original module’s bitcode. > >> > > >> > The function index/summary will later be added as a special ELF > >> > section alongside the .llvmbc sections. > >> > > >> > > >> > 2. Stage 2: ThinLTO Infrastructure > >> > ---------------------------------------------- > >> > > >> > The next set of patches adds the base implementation of the ThinLTO > >> > infrastructure, specifically those required to make ThinLTO functional > >> > and generate correct but not necessarily high-performing binaries. It > >> > also does not include support to make debug support under -g efficient > >> > with ThinLTO. > >> > > >> > > >> > a. Clang/LLVM/gold linker options: > >> > > >> > An early set of clang/llvm patches is needed to provide options to > >> > enable ThinLTO (off by default), so that the rest of the > >> > implementation can be disabled by default as it is added. > >> > Specifically, clang options -fthinlto (used instead of -flto) will > >> > cause clang to invoke the phase-1 emission of LLVM bitcode and > >> > function summary/index on a compile step, and pass the appropriate > >> > option to the gold plugin on a link step. The -thinlto option will be > >> > added to the gold plugin and llvm-lto tool to launch the phase-2 thin > >> > archive step. The -thinlto option will also be added to the ‘opt’ tool > >> > to invoke it as a phase-3 parallel backend instance. > >> > > >> > > >> > b. Thin-archive linking support in Gold plugin and llvm-lto: > >> > > >> > Under the new plugin option (see above), the plugin needs to perform > >> > the phase-2 (thin archive) link which simply emits a combined function > >> > map from the linked modules, without actually performing the normal > >> > link. Corresponding support should be added to the standalone llvm-lto > >> > tool to enable testing/debugging without involving the linker and > >> > plugin. > >> > > >> > > >> > c. ThinLTO backend support: > >> > > >> > Support for invoking a phase-3 backend invocation (including > >> > importing) on a module should be added to the ‘opt’ tool under the new > >> > option. The main change under the option is to instantiate a Linker > >> > object used to manage the process of linking imported functions into > >> > the module, efficient read of the combined function map, and enable > >> > the ThinLTO import pass. > >> > > >> > > >> > d. Function index/summary support: > >> > > >> > This includes infrastructure for writing and reading the function > >> > index/summary section. As noted earlier this will be encoded in a > >> > special ELF section within the module, alongside the .llvmbc section > >> > containing the bitcode. The thin archive generated by phase-2 of > >> > ThinLTO simply contains all of the function index/summary sections > >> > across the linked modules, organized for efficient function lookup. > >> > > >> > Each function available for importing from the module contains an > >> > entry in the module’s function index/summary section and in the > >> > resulting combined function map. Each function entry contains that > >> > function’s offset within the bitcode file, used to efficiently locate > >> > and quickly import just that function. The entry also contains summary > >> > information (e.g. basic information determined during parsing such as > >> > the number of instructions in the function), that will be used to help > >> > guide later import decisions. Because the contents of this section > >> > will change frequently during ThinLTO tuning, it should also be marked > >> > with a version id for backwards compatibility or version checking. > >> > > >> > > >> > e. ThinLTO importing support: > >> > > >> > Support for the mechanics of importing functions from other modules, > >> > which can go in gradually as a set of patches since it will be off by > >> > default. Separate patches can include: > >> > > >> > - BitcodeReader changes to use function index to import/deserialize > >> > single function of interest (small changes, leverages existing lazy > >> > streamer support). > >> > > >> > - Minor LTOModule changes to pass the ThinLTO function to import and > >> > its index into bitcode reader. > >> > > >> > - Marking of imported functions (for use in ThinLTO-specific symbol > >> > linking and global DCE, for example). This can be in-memory initially, > >> > but IR support may be required in order to support streaming bitcode > >> > out and back in again after importing. > >> > > >> > - ModuleLinker changes to do ThinLTO-specific symbol linking and > >> > static promotion when necessary. The linkage type of imported > >> > functions changes to AvailableExternallyLinkage, for example. Statics > >> > must be promoted in certain cases, and renamed in consistent ways. > >> > > >> > - GlobalDCE changes to support removing imported functions that were > >> > not inlined (very small changes to existing pass logic). > >> > > >> > > >> > f. ThinLTO Import Driver SCC pass: > >> > > >> > Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via > >> > an SCC pass, enabled only under -fthinlto options. The pass includes > >> > utilizing the thin archive (global function index/summary), import > >> > decision heuristics, invocation of LTOModule/ModuleLinker routines > >> > that perform the import, and any necessary callgraph updates and > >> > verification. > >> > > >> > > >> > g. Backend Driver: > >> > > >> > For a single node build, the gold plugin can simply write a makefile > >> > and fork the parallel backend instances directly via parallel make. > >> > > >> > > >> > 3. Stage 3: ThinLTO Tuning and Enhancements > >> > ---------------------------------------------------------------- > >> > > >> > This refers to the patches that are not required for ThinLTO to work, > >> > but rather to improve compile time, memory, run-time performance and > >> > usability. > >> > > >> > > >> > a. Lazy Debug Metadata Linking: > >> > > >> > The prototype implementation included lazy importing of module-level > >> > metadata during the ThinLTO pass finalization (i.e. after all function > >> > importing is complete). This actually applies to all module-level > >> > metadata, not just debug, although it is the largest. This can be > >> > added as a separate set of patches. Changes to BitcodeReader, > >> > ValueMapper, ModuleLinker > >> > > >> > > >> > b. Import Tuning: > >> > > >> > Tuning the import strategy will be an iterative process that will > >> > continue to be refined over time. It involves several different types > >> > of changes: adding support for recording additional metrics in the > >> > function summary, such as profile data and optional heavier-weight IPA > >> > analyses, and tuning the import heuristics based on the summary and > >> > callsite context. > >> > > >> > > >> > c. Combined Function Map Pruning: > >> > > >> > The combined function map can be pruned of functions that are unlikely > >> > to benefit from being imported. For example, during the phase-2 thin > >> > archive plug step we can safely omit large and (with profile data) > >> > cold functions, which are unlikely to benefit from being inlined. > >> > Additionally, all but one copy of comdat functions can be suppressed. > >> > > >> > > >> > d. Distributed Build System Integration: > >> > > >> > For a distributed build system, the gold plugin should write the > >> > parallel backend invocations into a makefile, including the mapping > >> > from the IR file to the real object file path, and exit. Additional > >> > work needs to be done in the distributed build system itself to > >> > distribute and dispatch the parallel backend jobs to the build > >> > cluster. > >> > > >> > > >> > e. Dependence Tracking and Incremental Compiles: > >> > > >> > In order to support build systems that stage from local disks or > >> > network storage, the plugin will optionally support computation of > >> > dependent sets of IR files that each module may import from. This can > >> > be computed from profile data, if it exists, or from the symbol table > >> > and heuristics if not. These dependence sets also enable support for > >> > incremental backend compiles. > >> > > >> > > >> > > >> > -- > >> > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/02fe50cc/attachment.html>