There is no need for emitting the full symtab. I checked the overhead with a huge internal C++ source. The overhead of symtab + str table compared with byte code with debug is about 3%. More importantly, it is also possible to use the symtab also for index/summary purpose, which makes the space usage completely 'unwasted'. That gets into the details which will follow when patches are in. David On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote:> > Are you sure about the additional I/O? With native symtab, existing > tools just need to read those, while plugin based approach needs to read > bit code section to feedback symbols to the tool. > > The additional I/O will be quite big if you are going to emit the full > symbol table. Looking at some of our real world links the symbol table and > string tables of all the inputs seen by the linker add up to about 50 - > 100mb. > > On Thu, May 14, 2015 at 10:28 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > >> >> >> On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at gmail.com> >> wrote: >> >>> >>> >>> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at google.com> >>> wrote: >>> >>>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at gmail.com> >>>> wrote: >>>> > >>>> > >>>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com> >>>> wrote: >>>> >> >>>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher < >>>> echristo at gmail.com> >>>> >> wrote: >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at dberlin.org >>>> > >>>> >>> wrote: >>>> >>>> >>>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher < >>>> echristo at gmail.com> >>>> >>>> wrote: >>>> >>>> > I'm not sure this is a particularly great assumption to make. >>>> >>>> >>>> >>>> Which part? >>>> >>> >>>> >>> >>>> >>> The binutils part :) >>>> >>> >>>> >>>> >>>> >>>> >>>> >>>> > We have to >>>> >>>> > support a lot of different build systems and tools and >>>> concentrating >>>> >>>> > on >>>> >>>> > something that just binutils uses isn't particularly friendly >>>> here. >>>> >>>> I think you may have misunderstood >>>> >>>> His point was exactly that they want to be transparent to *all of* >>>> these >>>> >>>> tools. >>>> >>>> You are saying "we should be friendly to everyone". He is saying >>>> the >>>> >>>> same thing. >>>> >>>> We should be friendly to everyone. The friendly way to do this is >>>> to >>>> >>>> not require all of these tools build plugins to handle bitcode. >>>> >>>> >>>> >>>> Hence, elf-wrapped bitcode. >>>> >>> >>>> >>> >>>> >>> Oh, I understood. I just don't know that I agree. To do anything >>>> with the >>>> >>> tools will require some knowledge of bitcode anyhow or need the >>>> plugin. I'm >>>> >>> saying that as a baseline start we should look at how to do this >>>> using the >>>> >>> tools we've got rather than wrapping things for no real gain. >>>> >> >>>> >> >>>> >> That doesn't seem strictly true - the ar situation (which I'm lead to >>>> >> believe is in use in our build system & others, one would assume). >>>> With the >>>> >> symbol table included as proposed, ar can be used without any >>>> knowledge of >>>> >> the bitcode or need for a plugin. >>>> >> >>>> > >>>> > For some bits, sure. Optimizing for ar seems a bit silly, why not 'ld >>>> -r'? >>>> >>>> But as mentioned, ld -r can work on native object wrapped bitcode >>>> without a plugin as well. >>>> >>>> >>> How? It's not like any partial linking is going to go on inside the >>> bitcode if the linker doesn't understand bitcode. >>> >> >> What do we want plugin to do anything here? We just need the linker to >> concatenate the bitcode sections and produce a combined bitcode file. >> >> >>> >>> >>>> > Agreed. The ar situation is interesting because one thing we >>>> discussed after >>>> > you wandered off was just adding a ToC section to bitcode as it is >>>> and then >>>> > having the tools handle that. Would seem to accomplish at least the >>>> goals as >>>> > I've seen them up to this point without worrying too much. >>>> >>>> The ToC section is a way we can encode the function index/summary into >>>> bitcode, but won't help integrate with existing tools. The main issue >>>> we are trying to solve is integrating transparently with existing >>>> binutils tools in use in our build system and probably elsewhere. >>>> >>>> >>> Right. I'm not entirely sure what use we're going to see in the existing >>> tools that we want to encompass here. There's some of it for convenience >>> (i.e. nm etc for developers), but they can use a tool that understands >>> bitcode and we can make the existing llvm tools suffice for these needs. >>> >>> I think the way of looking at this is that we can: >>> >>> a) go with wrapping things in native object formats, this means >>> - some tools continue to work at the cost of additional I/O and space >>> at compile/link time >>> >> >> Are you sure about the additional I/O? With native symtab, existing tools >> just need to read those, while plugin based approach needs to read bit code >> section to feedback symbols to the tool. >> >> >>> - we still have to update some tools to work at all >>> >> >> If any, it will be minimal. >> >> >>> >>> b) we extend those tools/our own tools and have them be drop in >>> replacements to the existing tools. They'll understand the bitcode format >>> natively, they'll be smaller, and we'll be able to push the state of the >>> art in tooling/analysis a bit more in the future without having to rework >>> thin lto. >>> >>> It's basically a set of trade-offs and for llvm we've historically gone >>> the b direction. >>> >>> >> I am fine making llvm tools work with it, but we should not require/force >> user using them. I think this is an orthogonal feature. >> >> David >> >> >> >> >>> > >>>> > At any rate, I think this aspect of the proposal needs a bit of >>>> discussion >>>> > and some mapping out of the pros and cons here. >>>> >>>> Sure, we can continue to discuss and I will try to lay out the >>>> pros/cons. >>>> >>> >>> Excellent. >>> >>> -eric >>> >>> >>>> >>>> Teresa >>>> >>>> > >>>> > -eric >>>> > >>>> >>> >>>> >>> I've talked to Teresa a bit offline and we're going to talk more >>>> later >>>> >>> (and discuss on the list), but there are some discussions about how >>>> to make >>>> >>> this work either with just bitcode/llvm tools and so not requiring >>>> >>> integration on all platforms. The latter is what I consider as >>>> particularly >>>> >>> friendly :) >>>> >>> >>>> >>> -eric >>>> >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> > I also >>>> >>>> > can't imagine how it's necessary for any of the lto aspects as >>>> >>>> > currently >>>> >>>> > written in the proposal. >>>> >>>> > >>>> >>>> > -eric >>>> >>>> > >>>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li >>>> >>>> > <xinliangli at gmail.com> >>>> >>>> > wrote: >>>> >>>> >> >>>> >>>> >> The design objective is to make thinLTO mostly transparent to >>>> binutil >>>> >>>> >> tools to enable easy integration with any build system in the >>>> wild. >>>> >>>> >> 'Pass-through' mode with 'ld -r' instead of the partial LTO >>>> mode is >>>> >>>> >> another >>>> >>>> >> reason. >>>> >>>> >> >>>> >>>> >> David >>>> >>>> >> >>>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson >>>> >>>> >> <tejohnson at google.com> >>>> >>>> >> wrote: >>>> >>>> >>> >>>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher >>>> >>>> >>> <echristo at gmail.com> >>>> >>>> >>> wrote: >>>> >>>> >>> > So, what Alex is saying is that we have these tools as well >>>> and >>>> >>>> >>> > they >>>> >>>> >>> > understand bitcode just fine, as well as every object format >>>> - not >>>> >>>> >>> > just >>>> >>>> >>> > ELF. >>>> >>>> >>> > :) >>>> >>>> >>> >>>> >>>> >>> Right, there are also LLVM specific versions (llvm-ar, >>>> llvm-nm) that >>>> >>>> >>> handle bitcode similarly to the way the standard tool + plugin >>>> does. >>>> >>>> >>> But the goal we are trying to achieve is to allow the standard >>>> >>>> >>> system >>>> >>>> >>> versions of the tools to handle these files without requiring a >>>> >>>> >>> plugin. I know the LLVM tool handles other object formats, but >>>> I'm >>>> >>>> >>> not >>>> >>>> >>> sure how that helps here? We're not planning to replace those >>>> tools, >>>> >>>> >>> just allow the standard system versions to handle the >>>> intermediate >>>> >>>> >>> objects produced by ThinLTO. >>>> >>>> >>> >>>> >>>> >>> Thanks, >>>> >>>> >>> Teresa >>>> >>>> >>> >>>> >>>> >>> > >>>> >>>> >>> > -eric >>>> >>>> >>> > >>>> >>>> >>> > >>>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson >>>> >>>> >>> > <tejohnson at google.com> >>>> >>>> >>> > wrote: >>>> >>>> >>> >> >>>> >>>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li >>>> >>>> >>> >> <xinliangli at gmail.com> wrote: >>>> >>>> >>> >> > >>>> >>>> >>> >> > >>>> >>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg >>>> >>>> >>> >> > <alexr at leftfield.org> >>>> >>>> >>> >> > wrote: >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> "ELF-wrapped bitcode" seems potentially controversial to >>>> me. >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> What about ar, nm, and various ld implementations adds >>>> this >>>> >>>> >>> >> >> requirement? >>>> >>>> >>> >> >> What about the LLVM implementations of these tools is >>>> lacking? >>>> >>>> >>> >> > >>>> >>>> >>> >> > >>>> >>>> >>> >> > Sorry I can not parse your questions properly. Can you >>>> make it >>>> >>>> >>> >> > clearer? >>>> >>>> >>> >> >>>> >>>> >>> >> Alex is asking what the issue is with ar, nm, ld -r and >>>> regular >>>> >>>> >>> >> bitcode that makes using elf-wrapped bitcode easier. >>>> >>>> >>> >> >>>> >>>> >>> >> The issue is that generally you need to provide a plugin to >>>> these >>>> >>>> >>> >> tools in order for them to understand and handle bitcode >>>> files. >>>> >>>> >>> >> We'd >>>> >>>> >>> >> like standard tools to work without requiring a plugin as >>>> much as >>>> >>>> >>> >> possible. And in some cases we want them to be handled >>>> different >>>> >>>> >>> >> than >>>> >>>> >>> >> the way bitcode files are handled with the plugin. >>>> >>>> >>> >> >>>> >>>> >>> >> nm: Without a plugin, normal bitcode files are inscrutable. >>>> When >>>> >>>> >>> >> provided the gold plugin it can emit the symbols. >>>> >>>> >>> >> >>>> >>>> >>> >> ar: Without a plugin, it will create an archive of bitcode >>>> files, >>>> >>>> >>> >> but >>>> >>>> >>> >> without an index, so it can't be handled by the linker even >>>> with >>>> >>>> >>> >> a >>>> >>>> >>> >> plugin on an -flto link. When ar is provided the gold >>>> plugin it >>>> >>>> >>> >> does >>>> >>>> >>> >> create an index, so the linker + gold plugin handle it >>>> >>>> >>> >> appropriately >>>> >>>> >>> >> on an -flto link. >>>> >>>> >>> >> >>>> >>>> >>> >> ld -r: Without a plugin, fails when provided bitcode >>>> inputs. When >>>> >>>> >>> >> provided the gold plugin, it handles them but compiles them >>>> all >>>> >>>> >>> >> the >>>> >>>> >>> >> way through to ELF executable instructions via a partial LTO >>>> >>>> >>> >> link. >>>> >>>> >>> >> This is where we would like to differ in behavior (while >>>> also not >>>> >>>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we would like >>>> the >>>> >>>> >>> >> ld -r >>>> >>>> >>> >> output file to still contain ELF-wrapped bitcode, delaying >>>> the >>>> >>>> >>> >> LTO >>>> >>>> >>> >> until the full link step. >>>> >>>> >>> >> >>>> >>>> >>> >> Let me know if that helps address your concerns. >>>> >>>> >>> >> >>>> >>>> >>> >> Thanks, >>>> >>>> >>> >> Teresa >>>> >>>> >>> >> >>>> >>>> >>> >> > >>>> >>>> >>> >> > David >>>> >>>> >>> >> > >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> Alex >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson >>>> >>>> >>> >> >> > <tejohnson at google.com> >>>> >>>> >>> >> >> > wrote: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > I've included below an RFC for implementing ThinLTO in >>>> LLVM, >>>> >>>> >>> >> >> > looking >>>> >>>> >>> >> >> > forward to feedback and questions. >>>> >>>> >>> >> >> > Thanks! >>>> >>>> >>> >> >> > Teresa >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > RFC to discuss plans for implementing ThinLTO upstream. >>>> >>>> >>> >> >> > Background >>>> >>>> >>> >> >> > can >>>> >>>> >>> >> >> > be found in slides from EuroLLVM 2015: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0 >>>> ) >>>> >>>> >>> >> >> > As described in the talk, we have a prototype >>>> >>>> >>> >> >> > implementation, and >>>> >>>> >>> >> >> > would like to start staging patches upstream. This RFC >>>> >>>> >>> >> >> > describes >>>> >>>> >>> >> >> > a >>>> >>>> >>> >> >> > breakdown of the major pieces. We would like to commit >>>> >>>> >>> >> >> > upstream >>>> >>>> >>> >> >> > gradually in several stages, with all functionality >>>> off by >>>> >>>> >>> >> >> > default. >>>> >>>> >>> >> >> > The core ThinLTO importing support and tuning will >>>> require >>>> >>>> >>> >> >> > frequent >>>> >>>> >>> >> >> > change and iteration during testing and tuning, and >>>> for that >>>> >>>> >>> >> >> > part >>>> >>>> >>> >> >> > we >>>> >>>> >>> >> >> > would like to commit rapidly (off by default). See the >>>> >>>> >>> >> >> > proposed >>>> >>>> >>> >> >> > staged >>>> >>>> >>> >> >> > implementation described in the Implementation Plan >>>> section. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > ThinLTO Overview >>>> >>>> >>> >> >> > =============>>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > See the talk slides linked above for more details. The >>>> >>>> >>> >> >> > following >>>> >>>> >>> >> >> > is a >>>> >>>> >>> >> >> > high-level overview of the motivation. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Cross Module Optimization (CMO) is an effective means >>>> for >>>> >>>> >>> >> >> > improving >>>> >>>> >>> >> >> > runtime performance, by extending the scope of >>>> optimizations >>>> >>>> >>> >> >> > across >>>> >>>> >>> >> >> > source module boundaries. Without CMO, the compiler is >>>> >>>> >>> >> >> > limited to >>>> >>>> >>> >> >> > optimizing within the scope of single source modules. >>>> Two >>>> >>>> >>> >> >> > solutions >>>> >>>> >>> >> >> > for enabling CMO are Link-Time Optimization (LTO), >>>> which is >>>> >>>> >>> >> >> > currently >>>> >>>> >>> >> >> > supported in LLVM and GCC, and >>>> Lightweight-Interprocedural >>>> >>>> >>> >> >> > Optimization (LIPO). However, each of these solutions >>>> has >>>> >>>> >>> >> >> > limitations >>>> >>>> >>> >> >> > that prevent it from being enabled by default. ThinLTO >>>> is a >>>> >>>> >>> >> >> > new >>>> >>>> >>> >> >> > approach that attempts to address these limitations, >>>> with a >>>> >>>> >>> >> >> > goal >>>> >>>> >>> >> >> > of >>>> >>>> >>> >> >> > being enabled more broadly. ThinLTO is designed with >>>> many of >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > same >>>> >>>> >>> >> >> > principals as LIPO, and therefore its advantages, >>>> without >>>> >>>> >>> >> >> > any of >>>> >>>> >>> >> >> > its >>>> >>>> >>> >> >> > inherent weakness. Unlike in LIPO where the module >>>> group >>>> >>>> >>> >> >> > decision >>>> >>>> >>> >> >> > is >>>> >>>> >>> >> >> > made at profile training runtime, ThinLTO makes the >>>> decision >>>> >>>> >>> >> >> > at >>>> >>>> >>> >> >> > compile time, but in a lazy mode that facilitates large >>>> >>>> >>> >> >> > scale >>>> >>>> >>> >> >> > parallelism. The serial linker plugin phase is >>>> designed to >>>> >>>> >>> >> >> > be >>>> >>>> >>> >> >> > razor >>>> >>>> >>> >> >> > thin and blazingly fast. By default this step only does >>>> >>>> >>> >> >> > minimal >>>> >>>> >>> >> >> > preparation work to enable the parallel lazy importing >>>> >>>> >>> >> >> > performed >>>> >>>> >>> >> >> > later. ThinLTO aims to be scalable like a regular O2 >>>> build, >>>> >>>> >>> >> >> > enabling >>>> >>>> >>> >> >> > CMO on machines without large memory configurations, >>>> while >>>> >>>> >>> >> >> > also >>>> >>>> >>> >> >> > integrating well with distributed build systems. >>>> Results >>>> >>>> >>> >> >> > from >>>> >>>> >>> >> >> > early >>>> >>>> >>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks are in line >>>> with >>>> >>>> >>> >> >> > expectations that ThinLTO can scale like O2 while >>>> enabling >>>> >>>> >>> >> >> > much >>>> >>>> >>> >> >> > of >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > CMO performed during a full LTO build. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > A ThinLTO build is divided into 3 phases, which are >>>> referred >>>> >>>> >>> >> >> > to >>>> >>>> >>> >> >> > in >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > following implementation plan: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > phase-1: IR and Function Summary Generation (-c >>>> compile) >>>> >>>> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin archive linker >>>> step) >>>> >>>> >>> >> >> > phase-3: Parallel Backend with Demand-Driven Importing >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Implementation Plan >>>> >>>> >>> >> >> > ===============>>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > This section gives a high-level breakdown of the >>>> ThinLTO >>>> >>>> >>> >> >> > support >>>> >>>> >>> >> >> > that >>>> >>>> >>> >> >> > will be added, in roughly the order that the patches >>>> would >>>> >>>> >>> >> >> > be >>>> >>>> >>> >> >> > staged. >>>> >>>> >>> >> >> > The patches are divided into three stages. The first >>>> stage >>>> >>>> >>> >> >> > contains a >>>> >>>> >>> >> >> > minimal amount of preparation work that is not >>>> >>>> >>> >> >> > ThinLTO-specific. >>>> >>>> >>> >> >> > The >>>> >>>> >>> >> >> > second stage contains most of the infrastructure for >>>> >>>> >>> >> >> > ThinLTO, >>>> >>>> >>> >> >> > which >>>> >>>> >>> >> >> > will be off by default. The third stage includes >>>> >>>> >>> >> >> > enhancements/improvements/tunings that can be performed >>>> >>>> >>> >> >> > after the >>>> >>>> >>> >> >> > main >>>> >>>> >>> >> >> > ThinLTO infrastructure is in. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The second and third implementation stages will >>>> initially be >>>> >>>> >>> >> >> > very >>>> >>>> >>> >> >> > volatile, requiring a lot of iterations and tuning with >>>> >>>> >>> >> >> > large >>>> >>>> >>> >> >> > apps to >>>> >>>> >>> >> >> > get stabilized. Therefore it will be important to do >>>> fast >>>> >>>> >>> >> >> > commits >>>> >>>> >>> >> >> > for >>>> >>>> >>> >> >> > these implementation stages. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > 1. Stage 1: Preparation >>>> >>>> >>> >> >> > ------------------------------- >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The first planned sets of patches are enablers for >>>> ThinLTO >>>> >>>> >>> >> >> > work: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > a. LTO directory structure: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Restructure the LTO directory to remove circular >>>> dependence >>>> >>>> >>> >> >> > when >>>> >>>> >>> >> >> > ThinLTO pass added. Because ThinLTO is being >>>> implemented as >>>> >>>> >>> >> >> > a SCC >>>> >>>> >>> >> >> > pass >>>> >>>> >>> >> >> > within Transforms/IPO, and leverages the LTOModule >>>> class for >>>> >>>> >>> >> >> > linking >>>> >>>> >>> >> >> > in functions from modules, IPO then requires the LTO >>>> >>>> >>> >> >> > library. >>>> >>>> >>> >> >> > This >>>> >>>> >>> >> >> > creates a circular dependence between LTO and IPO. To >>>> break >>>> >>>> >>> >> >> > that, >>>> >>>> >>> >> >> > we >>>> >>>> >>> >> >> > need to split the lib/LTO directory/library into >>>> >>>> >>> >> >> > lib/LTO/CodeGen >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > lib/LTO/Module, containing LTOCodeGenerator and >>>> LTOModule, >>>> >>>> >>> >> >> > respectively. Only LTOCodeGenerator has a dependence >>>> on IPO, >>>> >>>> >>> >> >> > removing >>>> >>>> >>> >> >> > the circular dependence. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > b. ELF wrapper generation support: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Implement ELF wrapped bitcode writer. In order to more >>>> >>>> >>> >> >> > easily >>>> >>>> >>> >> >> > interact >>>> >>>> >>> >> >> > with tools such as $AR, $NM, and “$LD -r” we plan to >>>> emit >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > phase-1 >>>> >>>> >>> >> >> > bitcode wrapped in ELF via the .llvmbc section, along >>>> with a >>>> >>>> >>> >> >> > symbol >>>> >>>> >>> >> >> > table. The goal is both to interact with these tools >>>> without >>>> >>>> >>> >> >> > requiring >>>> >>>> >>> >> >> > a plugin, and also to avoid doing partial LTO/ThinLTO >>>> across >>>> >>>> >>> >> >> > files >>>> >>>> >>> >> >> > linked with “$LD -r” (i.e. the resulting object file >>>> should >>>> >>>> >>> >> >> > still >>>> >>>> >>> >> >> > contain ELF-wrapped bitcode to enable ThinLTO at the >>>> full >>>> >>>> >>> >> >> > link >>>> >>>> >>> >> >> > step). >>>> >>>> >>> >> >> > I will send a separate design document for these >>>> changes, >>>> >>>> >>> >> >> > but the >>>> >>>> >>> >> >> > following is a high-level overview. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Support was added to LLVM for reading ELF-wrapped >>>> bitcode >>>> >>>> >>> >> >> > (http://reviews.llvm.org/rL218078), but there does >>>> not yet >>>> >>>> >>> >> >> > exist >>>> >>>> >>> >> >> > support in LLVM/Clang for emitting bitcode wrapped in >>>> ELF. I >>>> >>>> >>> >> >> > plan >>>> >>>> >>> >> >> > to >>>> >>>> >>> >> >> > add support for optionally generating bitcode in an >>>> ELF file >>>> >>>> >>> >> >> > containing a single .llvmbc section holding the >>>> bitcode. >>>> >>>> >>> >> >> > Specifically, >>>> >>>> >>> >> >> > the patch would add new options “emit-llvm-bc-elf” >>>> (object >>>> >>>> >>> >> >> > file) >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > corresponding “emit-llvm-elf” (textual assembly code >>>> >>>> >>> >> >> > equivalent). >>>> >>>> >>> >> >> > Eventually these would be automatically triggered under >>>> >>>> >>> >> >> > “-fthinlto >>>> >>>> >>> >> >> > -c” >>>> >>>> >>> >> >> > and “-fthinlto -S”, respectively. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Additionally, a symbol table will be generated in the >>>> ELF >>>> >>>> >>> >> >> > file, >>>> >>>> >>> >> >> > holding the function symbols within the bitcode. This >>>> >>>> >>> >> >> > facilitates >>>> >>>> >>> >> >> > handling archives of the ELF-wrapped bitcode created >>>> with >>>> >>>> >>> >> >> > $AR, >>>> >>>> >>> >> >> > since >>>> >>>> >>> >> >> > the archive will have a symbol table as well. The >>>> archive >>>> >>>> >>> >> >> > symbol >>>> >>>> >>> >> >> > table >>>> >>>> >>> >> >> > enables gold to extract and pass to the plugin the >>>> >>>> >>> >> >> > constituent >>>> >>>> >>> >> >> > ELF-wrapped bitcode files. To support the concatenated >>>> >>>> >>> >> >> > llvmbc >>>> >>>> >>> >> >> > section >>>> >>>> >>> >> >> > generated by “$LD -r”, some handling needs to be added >>>> to >>>> >>>> >>> >> >> > gold >>>> >>>> >>> >> >> > and to >>>> >>>> >>> >> >> > the backend driver to process each original module’s >>>> >>>> >>> >> >> > bitcode. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The function index/summary will later be added as a >>>> special >>>> >>>> >>> >> >> > ELF >>>> >>>> >>> >> >> > section alongside the .llvmbc sections. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure >>>> >>>> >>> >> >> > ---------------------------------------------- >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The next set of patches adds the base implementation >>>> of the >>>> >>>> >>> >> >> > ThinLTO >>>> >>>> >>> >> >> > infrastructure, specifically those required to make >>>> ThinLTO >>>> >>>> >>> >> >> > functional >>>> >>>> >>> >> >> > and generate correct but not necessarily >>>> high-performing >>>> >>>> >>> >> >> > binaries. It >>>> >>>> >>> >> >> > also does not include support to make debug support >>>> under -g >>>> >>>> >>> >> >> > efficient >>>> >>>> >>> >> >> > with ThinLTO. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > a. Clang/LLVM/gold linker options: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > An early set of clang/llvm patches is needed to provide >>>> >>>> >>> >> >> > options >>>> >>>> >>> >> >> > to >>>> >>>> >>> >> >> > enable ThinLTO (off by default), so that the rest of >>>> the >>>> >>>> >>> >> >> > implementation can be disabled by default as it is >>>> added. >>>> >>>> >>> >> >> > Specifically, clang options -fthinlto (used instead of >>>> >>>> >>> >> >> > -flto) >>>> >>>> >>> >> >> > will >>>> >>>> >>> >> >> > cause clang to invoke the phase-1 emission of LLVM >>>> bitcode >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > function summary/index on a compile step, and pass the >>>> >>>> >>> >> >> > appropriate >>>> >>>> >>> >> >> > option to the gold plugin on a link step. The -thinlto >>>> >>>> >>> >> >> > option >>>> >>>> >>> >> >> > will be >>>> >>>> >>> >> >> > added to the gold plugin and llvm-lto tool to launch >>>> the >>>> >>>> >>> >> >> > phase-2 >>>> >>>> >>> >> >> > thin >>>> >>>> >>> >> >> > archive step. The -thinlto option will also be added >>>> to the >>>> >>>> >>> >> >> > ‘opt’ >>>> >>>> >>> >> >> > tool >>>> >>>> >>> >> >> > to invoke it as a phase-3 parallel backend instance. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > b. Thin-archive linking support in Gold plugin and >>>> llvm-lto: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Under the new plugin option (see above), the plugin >>>> needs to >>>> >>>> >>> >> >> > perform >>>> >>>> >>> >> >> > the phase-2 (thin archive) link which simply emits a >>>> >>>> >>> >> >> > combined >>>> >>>> >>> >> >> > function >>>> >>>> >>> >> >> > map from the linked modules, without actually >>>> performing the >>>> >>>> >>> >> >> > normal >>>> >>>> >>> >> >> > link. Corresponding support should be added to the >>>> >>>> >>> >> >> > standalone >>>> >>>> >>> >> >> > llvm-lto >>>> >>>> >>> >> >> > tool to enable testing/debugging without involving the >>>> >>>> >>> >> >> > linker and >>>> >>>> >>> >> >> > plugin. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > c. ThinLTO backend support: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Support for invoking a phase-3 backend invocation >>>> (including >>>> >>>> >>> >> >> > importing) on a module should be added to the ‘opt’ >>>> tool >>>> >>>> >>> >> >> > under >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > new >>>> >>>> >>> >> >> > option. The main change under the option is to >>>> instantiate a >>>> >>>> >>> >> >> > Linker >>>> >>>> >>> >> >> > object used to manage the process of linking imported >>>> >>>> >>> >> >> > functions >>>> >>>> >>> >> >> > into >>>> >>>> >>> >> >> > the module, efficient read of the combined function >>>> map, and >>>> >>>> >>> >> >> > enable >>>> >>>> >>> >> >> > the ThinLTO import pass. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > d. Function index/summary support: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > This includes infrastructure for writing and reading >>>> the >>>> >>>> >>> >> >> > function >>>> >>>> >>> >> >> > index/summary section. As noted earlier this will be >>>> encoded >>>> >>>> >>> >> >> > in a >>>> >>>> >>> >> >> > special ELF section within the module, alongside the >>>> .llvmbc >>>> >>>> >>> >> >> > section >>>> >>>> >>> >> >> > containing the bitcode. The thin archive generated by >>>> >>>> >>> >> >> > phase-2 of >>>> >>>> >>> >> >> > ThinLTO simply contains all of the function >>>> index/summary >>>> >>>> >>> >> >> > sections >>>> >>>> >>> >> >> > across the linked modules, organized for efficient >>>> function >>>> >>>> >>> >> >> > lookup. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Each function available for importing from the module >>>> >>>> >>> >> >> > contains an >>>> >>>> >>> >> >> > entry in the module’s function index/summary section >>>> and in >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > resulting combined function map. Each function entry >>>> >>>> >>> >> >> > contains >>>> >>>> >>> >> >> > that >>>> >>>> >>> >> >> > function’s offset within the bitcode file, used to >>>> >>>> >>> >> >> > efficiently >>>> >>>> >>> >> >> > locate >>>> >>>> >>> >> >> > and quickly import just that function. The entry also >>>> >>>> >>> >> >> > contains >>>> >>>> >>> >> >> > summary >>>> >>>> >>> >> >> > information (e.g. basic information determined during >>>> >>>> >>> >> >> > parsing >>>> >>>> >>> >> >> > such as >>>> >>>> >>> >> >> > the number of instructions in the function), that will >>>> be >>>> >>>> >>> >> >> > used to >>>> >>>> >>> >> >> > help >>>> >>>> >>> >> >> > guide later import decisions. Because the contents of >>>> this >>>> >>>> >>> >> >> > section >>>> >>>> >>> >> >> > will change frequently during ThinLTO tuning, it >>>> should also >>>> >>>> >>> >> >> > be >>>> >>>> >>> >> >> > marked >>>> >>>> >>> >> >> > with a version id for backwards compatibility or >>>> version >>>> >>>> >>> >> >> > checking. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > e. ThinLTO importing support: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Support for the mechanics of importing functions from >>>> other >>>> >>>> >>> >> >> > modules, >>>> >>>> >>> >> >> > which can go in gradually as a set of patches since it >>>> will >>>> >>>> >>> >> >> > be >>>> >>>> >>> >> >> > off by >>>> >>>> >>> >> >> > default. Separate patches can include: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > - BitcodeReader changes to use function index to >>>> >>>> >>> >> >> > import/deserialize >>>> >>>> >>> >> >> > single function of interest (small changes, leverages >>>> >>>> >>> >> >> > existing >>>> >>>> >>> >> >> > lazy >>>> >>>> >>> >> >> > streamer support). >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > - Minor LTOModule changes to pass the ThinLTO function >>>> to >>>> >>>> >>> >> >> > import >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > its index into bitcode reader. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > - Marking of imported functions (for use in >>>> ThinLTO-specific >>>> >>>> >>> >> >> > symbol >>>> >>>> >>> >> >> > linking and global DCE, for example). This can be >>>> in-memory >>>> >>>> >>> >> >> > initially, >>>> >>>> >>> >> >> > but IR support may be required in order to support >>>> streaming >>>> >>>> >>> >> >> > bitcode >>>> >>>> >>> >> >> > out and back in again after importing. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > - ModuleLinker changes to do ThinLTO-specific symbol >>>> linking >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > static promotion when necessary. The linkage type of >>>> >>>> >>> >> >> > imported >>>> >>>> >>> >> >> > functions changes to AvailableExternallyLinkage, for >>>> >>>> >>> >> >> > example. >>>> >>>> >>> >> >> > Statics >>>> >>>> >>> >> >> > must be promoted in certain cases, and renamed in >>>> consistent >>>> >>>> >>> >> >> > ways. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > - GlobalDCE changes to support removing imported >>>> functions >>>> >>>> >>> >> >> > that >>>> >>>> >>> >> >> > were >>>> >>>> >>> >> >> > not inlined (very small changes to existing pass >>>> logic). >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > f. ThinLTO Import Driver SCC pass: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with framework for >>>> doing >>>> >>>> >>> >> >> > ThinLTO >>>> >>>> >>> >> >> > via >>>> >>>> >>> >> >> > an SCC pass, enabled only under -fthinlto options. The >>>> pass >>>> >>>> >>> >> >> > includes >>>> >>>> >>> >> >> > utilizing the thin archive (global function >>>> index/summary), >>>> >>>> >>> >> >> > import >>>> >>>> >>> >> >> > decision heuristics, invocation of >>>> LTOModule/ModuleLinker >>>> >>>> >>> >> >> > routines >>>> >>>> >>> >> >> > that perform the import, and any necessary callgraph >>>> updates >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > verification. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > g. Backend Driver: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > For a single node build, the gold plugin can simply >>>> write a >>>> >>>> >>> >> >> > makefile >>>> >>>> >>> >> >> > and fork the parallel backend instances directly via >>>> >>>> >>> >> >> > parallel >>>> >>>> >>> >> >> > make. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> ---------------------------------------------------------------- >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > This refers to the patches that are not required for >>>> ThinLTO >>>> >>>> >>> >> >> > to >>>> >>>> >>> >> >> > work, >>>> >>>> >>> >> >> > but rather to improve compile time, memory, run-time >>>> >>>> >>> >> >> > performance >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > usability. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > a. Lazy Debug Metadata Linking: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The prototype implementation included lazy importing of >>>> >>>> >>> >> >> > module-level >>>> >>>> >>> >> >> > metadata during the ThinLTO pass finalization (i.e. >>>> after >>>> >>>> >>> >> >> > all >>>> >>>> >>> >> >> > function >>>> >>>> >>> >> >> > importing is complete). This actually applies to all >>>> >>>> >>> >> >> > module-level >>>> >>>> >>> >> >> > metadata, not just debug, although it is the largest. >>>> This >>>> >>>> >>> >> >> > can be >>>> >>>> >>> >> >> > added as a separate set of patches. Changes to >>>> >>>> >>> >> >> > BitcodeReader, >>>> >>>> >>> >> >> > ValueMapper, ModuleLinker >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > b. Import Tuning: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > Tuning the import strategy will be an iterative >>>> process that >>>> >>>> >>> >> >> > will >>>> >>>> >>> >> >> > continue to be refined over time. It involves several >>>> >>>> >>> >> >> > different >>>> >>>> >>> >> >> > types >>>> >>>> >>> >> >> > of changes: adding support for recording additional >>>> metrics >>>> >>>> >>> >> >> > in >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > function summary, such as profile data and optional >>>> >>>> >>> >> >> > heavier-weight >>>> >>>> >>> >> >> > IPA >>>> >>>> >>> >> >> > analyses, and tuning the import heuristics based on the >>>> >>>> >>> >> >> > summary >>>> >>>> >>> >> >> > and >>>> >>>> >>> >> >> > callsite context. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > c. Combined Function Map Pruning: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > The combined function map can be pruned of functions >>>> that >>>> >>>> >>> >> >> > are >>>> >>>> >>> >> >> > unlikely >>>> >>>> >>> >> >> > to benefit from being imported. For example, during the >>>> >>>> >>> >> >> > phase-2 >>>> >>>> >>> >> >> > thin >>>> >>>> >>> >> >> > archive plug step we can safely omit large and (with >>>> profile >>>> >>>> >>> >> >> > data) >>>> >>>> >>> >> >> > cold functions, which are unlikely to benefit from >>>> being >>>> >>>> >>> >> >> > inlined. >>>> >>>> >>> >> >> > Additionally, all but one copy of comdat functions can >>>> be >>>> >>>> >>> >> >> > suppressed. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > d. Distributed Build System Integration: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > For a distributed build system, the gold plugin should >>>> write >>>> >>>> >>> >> >> > the >>>> >>>> >>> >> >> > parallel backend invocations into a makefile, >>>> including the >>>> >>>> >>> >> >> > mapping >>>> >>>> >>> >> >> > from the IR file to the real object file path, and >>>> exit. >>>> >>>> >>> >> >> > Additional >>>> >>>> >>> >> >> > work needs to be done in the distributed build system >>>> itself >>>> >>>> >>> >> >> > to >>>> >>>> >>> >> >> > distribute and dispatch the parallel backend jobs to >>>> the >>>> >>>> >>> >> >> > build >>>> >>>> >>> >> >> > cluster. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > e. Dependence Tracking and Incremental Compiles: >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > In order to support build systems that stage from local >>>> >>>> >>> >> >> > disks or >>>> >>>> >>> >> >> > network storage, the plugin will optionally support >>>> >>>> >>> >> >> > computation >>>> >>>> >>> >> >> > of >>>> >>>> >>> >> >> > dependent sets of IR files that each module may import >>>> from. >>>> >>>> >>> >> >> > This >>>> >>>> >>> >> >> > can >>>> >>>> >>> >> >> > be computed from profile data, if it exists, or from >>>> the >>>> >>>> >>> >> >> > symbol >>>> >>>> >>> >> >> > table >>>> >>>> >>> >> >> > and heuristics if not. These dependence sets also >>>> enable >>>> >>>> >>> >> >> > support >>>> >>>> >>> >> >> > for >>>> >>>> >>> >> >> > incremental backend compiles. >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > -- >>>> >>>> >>> >> >> > Teresa Johnson | Software Engineer | >>>> tejohnson at google.com | >>>> >>>> >>> >> >> > 408-460-2413 >>>> >>>> >>> >> >> > >>>> >>>> >>> >> >> > _______________________________________________ >>>> >>>> >>> >> >> > LLVM Developers mailing list >>>> >>>> >>> >> >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> >>>> >>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >> >> >>>> >>>> >>> >> >> _______________________________________________ >>>> >>>> >>> >> >> LLVM Developers mailing list >>>> >>>> >>> >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> >>>> >>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >> > >>>> >>>> >>> >> > >>>> >>>> >>> >> >>>> >>>> >>> >> >>>> >>>> >>> >> >>>> >>>> >>> >> -- >>>> >>>> >>> >> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> >>>> >>> >> 408-460-2413 >>>> >>>> >>> >> >>>> >>>> >>> >> _______________________________________________ >>>> >>>> >>> >> LLVM Developers mailing list >>>> >>>> >>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> >>>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>>> >>>> >>> >>>> >>>> >>> >>>> >>>> >>> -- >>>> >>>> >>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> >>>> >>> 408-460-2413 >>>> >>>> >> >>>> >>>> >> >>>> >>>> > >>>> >>>> > _______________________________________________ >>>> >>>> > LLVM Developers mailing list >>>> >>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> > >>>> >>> >>>> >>> >>>> >>> _______________________________________________ >>>> >>> LLVM Developers mailing list >>>> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >>>> > >>>> > _______________________________________________ >>>> > LLVM Developers mailing list >>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> > >>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>> 408-460-2413 >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/1842c0bc/attachment.html>
On Fri, May 15, 2015 at 8:26 AM, Xinliang David Li <xinliangli at gmail.com> wrote:> There is no need for emitting the full symtab. I checked the overhead with > a huge internal C++ source. The overhead of symtab + str table compared > with byte code with debug is about 3%. > > More importantly, it is also possible to use the symtab also for > index/summary purpose, which makes the space usage completely 'unwasted'. > That gets into the details which will follow when patches are in. >That direction ends up more heavily leaning on this model, though. Keeping all the LLVM stuff (including summary info) in the IR means that on platforms with bitcode-aware tools (like, by the sounds of it, OSX with ld being bitcode aware, etc) we can support a nice bitcode-only solution. Wrapping that in native object files for backwards compatibility for a few tools seems OK, but the more features we build on top of that foundation the harder it is to get out of that business when/where the backwards compatibility isn't needed. Also, leaving the wrapping as a separate backwards compatibility thing would, I imagine, ease testing by making more parts testable without the added complexity of the wrapping. It'd be useful to see the sorts of build system scenarios that use these native object tools so we can look at what we can/can't reasonably support. (I assume tools aren't generally expecting a symtab where the symbols aren't actually in the .text section - I don't know what/if any/how some of these tools might do the wrong thing when presented with such info - but this is all outside of my depth/area, so don't worry about explaining it to me, but it seems other people care about what we're supporting here, at least) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/330a8e5a/attachment.html>
> There is no need for emitting the full symtab. I checked the overheadwith a huge internal C++ source. The overhead of symtab + str table compared with byte code with debug is about 3%. It's still sizable and could be noticeable if thinLTO can deliver compile times that closer to what resembles builds without LTO as your results suggest.> More importantly, it is also possible to use the symtab also forindex/summary purpose, which makes the space usage completely 'unwasted'. That gets into the details which will follow when patches are in. There is symbol information in both the native object symbol table and the bitcode file? isn't that waste? I understand the reasons for using the native object wrapper (compatibility with other tools) and happy with that. But I'd also like to see the option for function index/summary data to be produced without the wrapper, so that bitcode aware tools do not need to use this wrapped format. If you mix the native object wrapper symbol information with the function/index summary data then that would end up being impossible. Also won't having the native object data with the function index/summary have a cost on testing for all of the supported native object formats? On Fri, May 15, 2015 at 4:26 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> There is no need for emitting the full symtab. I checked the overhead with > a huge internal C++ source. The overhead of symtab + str table compared > with byte code with debug is about 3%. > > More importantly, it is also possible to use the symtab also for > index/summary purpose, which makes the space usage completely 'unwasted'. > That gets into the details which will follow when patches are in. > > David > > On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote: > >> > Are you sure about the additional I/O? With native symtab, existing >> tools just need to read those, while plugin based approach needs to read >> bit code section to feedback symbols to the tool. >> >> The additional I/O will be quite big if you are going to emit the full >> symbol table. Looking at some of our real world links the symbol table and >> string tables of all the inputs seen by the linker add up to about 50 - >> 100mb. >> >> On Thu, May 14, 2015 at 10:28 PM, Xinliang David Li <xinliangli at gmail.com >> > wrote: >> >>> >>> >>> On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at google.com> >>>> wrote: >>>> >>>>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at gmail.com> >>>>> wrote: >>>>> > >>>>> > >>>>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com> >>>>> wrote: >>>>> >> >>>>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher < >>>>> echristo at gmail.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin < >>>>> dberlin at dberlin.org> >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher < >>>>> echristo at gmail.com> >>>>> >>>> wrote: >>>>> >>>> > I'm not sure this is a particularly great assumption to make. >>>>> >>>> >>>>> >>>> Which part? >>>>> >>> >>>>> >>> >>>>> >>> The binutils part :) >>>>> >>> >>>>> >>>> >>>>> >>>> >>>>> >>>> > We have to >>>>> >>>> > support a lot of different build systems and tools and >>>>> concentrating >>>>> >>>> > on >>>>> >>>> > something that just binutils uses isn't particularly friendly >>>>> here. >>>>> >>>> I think you may have misunderstood >>>>> >>>> His point was exactly that they want to be transparent to *all >>>>> of* these >>>>> >>>> tools. >>>>> >>>> You are saying "we should be friendly to everyone". He is saying >>>>> the >>>>> >>>> same thing. >>>>> >>>> We should be friendly to everyone. The friendly way to do this is >>>>> to >>>>> >>>> not require all of these tools build plugins to handle bitcode. >>>>> >>>> >>>>> >>>> Hence, elf-wrapped bitcode. >>>>> >>> >>>>> >>> >>>>> >>> Oh, I understood. I just don't know that I agree. To do anything >>>>> with the >>>>> >>> tools will require some knowledge of bitcode anyhow or need the >>>>> plugin. I'm >>>>> >>> saying that as a baseline start we should look at how to do this >>>>> using the >>>>> >>> tools we've got rather than wrapping things for no real gain. >>>>> >> >>>>> >> >>>>> >> That doesn't seem strictly true - the ar situation (which I'm lead >>>>> to >>>>> >> believe is in use in our build system & others, one would assume). >>>>> With the >>>>> >> symbol table included as proposed, ar can be used without any >>>>> knowledge of >>>>> >> the bitcode or need for a plugin. >>>>> >> >>>>> > >>>>> > For some bits, sure. Optimizing for ar seems a bit silly, why not >>>>> 'ld -r'? >>>>> >>>>> But as mentioned, ld -r can work on native object wrapped bitcode >>>>> without a plugin as well. >>>>> >>>>> >>>> How? It's not like any partial linking is going to go on inside the >>>> bitcode if the linker doesn't understand bitcode. >>>> >>> >>> What do we want plugin to do anything here? We just need the linker to >>> concatenate the bitcode sections and produce a combined bitcode file. >>> >>> >>>> >>>> >>>>> > Agreed. The ar situation is interesting because one thing we >>>>> discussed after >>>>> > you wandered off was just adding a ToC section to bitcode as it is >>>>> and then >>>>> > having the tools handle that. Would seem to accomplish at least the >>>>> goals as >>>>> > I've seen them up to this point without worrying too much. >>>>> >>>>> The ToC section is a way we can encode the function index/summary into >>>>> bitcode, but won't help integrate with existing tools. The main issue >>>>> we are trying to solve is integrating transparently with existing >>>>> binutils tools in use in our build system and probably elsewhere. >>>>> >>>>> >>>> Right. I'm not entirely sure what use we're going to see in the >>>> existing tools that we want to encompass here. There's some of it for >>>> convenience (i.e. nm etc for developers), but they can use a tool that >>>> understands bitcode and we can make the existing llvm tools suffice for >>>> these needs. >>>> >>>> I think the way of looking at this is that we can: >>>> >>>> a) go with wrapping things in native object formats, this means >>>> - some tools continue to work at the cost of additional I/O and space >>>> at compile/link time >>>> >>> >>> Are you sure about the additional I/O? With native symtab, existing >>> tools just need to read those, while plugin based approach needs to read >>> bit code section to feedback symbols to the tool. >>> >>> >>>> - we still have to update some tools to work at all >>>> >>> >>> If any, it will be minimal. >>> >>> >>>> >>>> b) we extend those tools/our own tools and have them be drop in >>>> replacements to the existing tools. They'll understand the bitcode format >>>> natively, they'll be smaller, and we'll be able to push the state of the >>>> art in tooling/analysis a bit more in the future without having to rework >>>> thin lto. >>>> >>>> It's basically a set of trade-offs and for llvm we've historically gone >>>> the b direction. >>>> >>>> >>> I am fine making llvm tools work with it, but we should not >>> require/force user using them. I think this is an orthogonal feature. >>> >>> David >>> >>> >>> >>> >>>> > >>>>> > At any rate, I think this aspect of the proposal needs a bit of >>>>> discussion >>>>> > and some mapping out of the pros and cons here. >>>>> >>>>> Sure, we can continue to discuss and I will try to lay out the >>>>> pros/cons. >>>>> >>>> >>>> Excellent. >>>> >>>> -eric >>>> >>>> >>>>> >>>>> Teresa >>>>> >>>>> > >>>>> > -eric >>>>> > >>>>> >>> >>>>> >>> I've talked to Teresa a bit offline and we're going to talk more >>>>> later >>>>> >>> (and discuss on the list), but there are some discussions about >>>>> how to make >>>>> >>> this work either with just bitcode/llvm tools and so not requiring >>>>> >>> integration on all platforms. The latter is what I consider as >>>>> particularly >>>>> >>> friendly :) >>>>> >>> >>>>> >>> -eric >>>>> >>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> > I also >>>>> >>>> > can't imagine how it's necessary for any of the lto aspects as >>>>> >>>> > currently >>>>> >>>> > written in the proposal. >>>>> >>>> > >>>>> >>>> > -eric >>>>> >>>> > >>>>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li >>>>> >>>> > <xinliangli at gmail.com> >>>>> >>>> > wrote: >>>>> >>>> >> >>>>> >>>> >> The design objective is to make thinLTO mostly transparent to >>>>> binutil >>>>> >>>> >> tools to enable easy integration with any build system in the >>>>> wild. >>>>> >>>> >> 'Pass-through' mode with 'ld -r' instead of the partial LTO >>>>> mode is >>>>> >>>> >> another >>>>> >>>> >> reason. >>>>> >>>> >> >>>>> >>>> >> David >>>>> >>>> >> >>>>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson >>>>> >>>> >> <tejohnson at google.com> >>>>> >>>> >> wrote: >>>>> >>>> >>> >>>>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher >>>>> >>>> >>> <echristo at gmail.com> >>>>> >>>> >>> wrote: >>>>> >>>> >>> > So, what Alex is saying is that we have these tools as well >>>>> and >>>>> >>>> >>> > they >>>>> >>>> >>> > understand bitcode just fine, as well as every object >>>>> format - not >>>>> >>>> >>> > just >>>>> >>>> >>> > ELF. >>>>> >>>> >>> > :) >>>>> >>>> >>> >>>>> >>>> >>> Right, there are also LLVM specific versions (llvm-ar, >>>>> llvm-nm) that >>>>> >>>> >>> handle bitcode similarly to the way the standard tool + >>>>> plugin does. >>>>> >>>> >>> But the goal we are trying to achieve is to allow the standard >>>>> >>>> >>> system >>>>> >>>> >>> versions of the tools to handle these files without requiring >>>>> a >>>>> >>>> >>> plugin. I know the LLVM tool handles other object formats, >>>>> but I'm >>>>> >>>> >>> not >>>>> >>>> >>> sure how that helps here? We're not planning to replace those >>>>> tools, >>>>> >>>> >>> just allow the standard system versions to handle the >>>>> intermediate >>>>> >>>> >>> objects produced by ThinLTO. >>>>> >>>> >>> >>>>> >>>> >>> Thanks, >>>>> >>>> >>> Teresa >>>>> >>>> >>> >>>>> >>>> >>> > >>>>> >>>> >>> > -eric >>>>> >>>> >>> > >>>>> >>>> >>> > >>>>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson >>>>> >>>> >>> > <tejohnson at google.com> >>>>> >>>> >>> > wrote: >>>>> >>>> >>> >> >>>>> >>>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li >>>>> >>>> >>> >> <xinliangli at gmail.com> wrote: >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg >>>>> >>>> >>> >> > <alexr at leftfield.org> >>>>> >>>> >>> >> > wrote: >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> "ELF-wrapped bitcode" seems potentially controversial >>>>> to me. >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> What about ar, nm, and various ld implementations adds >>>>> this >>>>> >>>> >>> >> >> requirement? >>>>> >>>> >>> >> >> What about the LLVM implementations of these tools is >>>>> lacking? >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > Sorry I can not parse your questions properly. Can you >>>>> make it >>>>> >>>> >>> >> > clearer? >>>>> >>>> >>> >> >>>>> >>>> >>> >> Alex is asking what the issue is with ar, nm, ld -r and >>>>> regular >>>>> >>>> >>> >> bitcode that makes using elf-wrapped bitcode easier. >>>>> >>>> >>> >> >>>>> >>>> >>> >> The issue is that generally you need to provide a plugin >>>>> to these >>>>> >>>> >>> >> tools in order for them to understand and handle bitcode >>>>> files. >>>>> >>>> >>> >> We'd >>>>> >>>> >>> >> like standard tools to work without requiring a plugin as >>>>> much as >>>>> >>>> >>> >> possible. And in some cases we want them to be handled >>>>> different >>>>> >>>> >>> >> than >>>>> >>>> >>> >> the way bitcode files are handled with the plugin. >>>>> >>>> >>> >> >>>>> >>>> >>> >> nm: Without a plugin, normal bitcode files are >>>>> inscrutable. When >>>>> >>>> >>> >> provided the gold plugin it can emit the symbols. >>>>> >>>> >>> >> >>>>> >>>> >>> >> ar: Without a plugin, it will create an archive of bitcode >>>>> files, >>>>> >>>> >>> >> but >>>>> >>>> >>> >> without an index, so it can't be handled by the linker >>>>> even with >>>>> >>>> >>> >> a >>>>> >>>> >>> >> plugin on an -flto link. When ar is provided the gold >>>>> plugin it >>>>> >>>> >>> >> does >>>>> >>>> >>> >> create an index, so the linker + gold plugin handle it >>>>> >>>> >>> >> appropriately >>>>> >>>> >>> >> on an -flto link. >>>>> >>>> >>> >> >>>>> >>>> >>> >> ld -r: Without a plugin, fails when provided bitcode >>>>> inputs. When >>>>> >>>> >>> >> provided the gold plugin, it handles them but compiles >>>>> them all >>>>> >>>> >>> >> the >>>>> >>>> >>> >> way through to ELF executable instructions via a partial >>>>> LTO >>>>> >>>> >>> >> link. >>>>> >>>> >>> >> This is where we would like to differ in behavior (while >>>>> also not >>>>> >>>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we would >>>>> like the >>>>> >>>> >>> >> ld -r >>>>> >>>> >>> >> output file to still contain ELF-wrapped bitcode, delaying >>>>> the >>>>> >>>> >>> >> LTO >>>>> >>>> >>> >> until the full link step. >>>>> >>>> >>> >> >>>>> >>>> >>> >> Let me know if that helps address your concerns. >>>>> >>>> >>> >> >>>>> >>>> >>> >> Thanks, >>>>> >>>> >>> >> Teresa >>>>> >>>> >>> >> >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > David >>>>> >>>> >>> >> > >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> Alex >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson >>>>> >>>> >>> >> >> > <tejohnson at google.com> >>>>> >>>> >>> >> >> > wrote: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > I've included below an RFC for implementing ThinLTO >>>>> in LLVM, >>>>> >>>> >>> >> >> > looking >>>>> >>>> >>> >> >> > forward to feedback and questions. >>>>> >>>> >>> >> >> > Thanks! >>>>> >>>> >>> >> >> > Teresa >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > RFC to discuss plans for implementing ThinLTO >>>>> upstream. >>>>> >>>> >>> >> >> > Background >>>>> >>>> >>> >> >> > can >>>>> >>>> >>> >> >> > be found in slides from EuroLLVM 2015: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0 >>>>> ) >>>>> >>>> >>> >> >> > As described in the talk, we have a prototype >>>>> >>>> >>> >> >> > implementation, and >>>>> >>>> >>> >> >> > would like to start staging patches upstream. This RFC >>>>> >>>> >>> >> >> > describes >>>>> >>>> >>> >> >> > a >>>>> >>>> >>> >> >> > breakdown of the major pieces. We would like to commit >>>>> >>>> >>> >> >> > upstream >>>>> >>>> >>> >> >> > gradually in several stages, with all functionality >>>>> off by >>>>> >>>> >>> >> >> > default. >>>>> >>>> >>> >> >> > The core ThinLTO importing support and tuning will >>>>> require >>>>> >>>> >>> >> >> > frequent >>>>> >>>> >>> >> >> > change and iteration during testing and tuning, and >>>>> for that >>>>> >>>> >>> >> >> > part >>>>> >>>> >>> >> >> > we >>>>> >>>> >>> >> >> > would like to commit rapidly (off by default). See the >>>>> >>>> >>> >> >> > proposed >>>>> >>>> >>> >> >> > staged >>>>> >>>> >>> >> >> > implementation described in the Implementation Plan >>>>> section. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > ThinLTO Overview >>>>> >>>> >>> >> >> > =============>>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > See the talk slides linked above for more details. The >>>>> >>>> >>> >> >> > following >>>>> >>>> >>> >> >> > is a >>>>> >>>> >>> >> >> > high-level overview of the motivation. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Cross Module Optimization (CMO) is an effective means >>>>> for >>>>> >>>> >>> >> >> > improving >>>>> >>>> >>> >> >> > runtime performance, by extending the scope of >>>>> optimizations >>>>> >>>> >>> >> >> > across >>>>> >>>> >>> >> >> > source module boundaries. Without CMO, the compiler is >>>>> >>>> >>> >> >> > limited to >>>>> >>>> >>> >> >> > optimizing within the scope of single source modules. >>>>> Two >>>>> >>>> >>> >> >> > solutions >>>>> >>>> >>> >> >> > for enabling CMO are Link-Time Optimization (LTO), >>>>> which is >>>>> >>>> >>> >> >> > currently >>>>> >>>> >>> >> >> > supported in LLVM and GCC, and >>>>> Lightweight-Interprocedural >>>>> >>>> >>> >> >> > Optimization (LIPO). However, each of these solutions >>>>> has >>>>> >>>> >>> >> >> > limitations >>>>> >>>> >>> >> >> > that prevent it from being enabled by default. >>>>> ThinLTO is a >>>>> >>>> >>> >> >> > new >>>>> >>>> >>> >> >> > approach that attempts to address these limitations, >>>>> with a >>>>> >>>> >>> >> >> > goal >>>>> >>>> >>> >> >> > of >>>>> >>>> >>> >> >> > being enabled more broadly. ThinLTO is designed with >>>>> many of >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > same >>>>> >>>> >>> >> >> > principals as LIPO, and therefore its advantages, >>>>> without >>>>> >>>> >>> >> >> > any of >>>>> >>>> >>> >> >> > its >>>>> >>>> >>> >> >> > inherent weakness. Unlike in LIPO where the module >>>>> group >>>>> >>>> >>> >> >> > decision >>>>> >>>> >>> >> >> > is >>>>> >>>> >>> >> >> > made at profile training runtime, ThinLTO makes the >>>>> decision >>>>> >>>> >>> >> >> > at >>>>> >>>> >>> >> >> > compile time, but in a lazy mode that facilitates >>>>> large >>>>> >>>> >>> >> >> > scale >>>>> >>>> >>> >> >> > parallelism. The serial linker plugin phase is >>>>> designed to >>>>> >>>> >>> >> >> > be >>>>> >>>> >>> >> >> > razor >>>>> >>>> >>> >> >> > thin and blazingly fast. By default this step only >>>>> does >>>>> >>>> >>> >> >> > minimal >>>>> >>>> >>> >> >> > preparation work to enable the parallel lazy importing >>>>> >>>> >>> >> >> > performed >>>>> >>>> >>> >> >> > later. ThinLTO aims to be scalable like a regular O2 >>>>> build, >>>>> >>>> >>> >> >> > enabling >>>>> >>>> >>> >> >> > CMO on machines without large memory configurations, >>>>> while >>>>> >>>> >>> >> >> > also >>>>> >>>> >>> >> >> > integrating well with distributed build systems. >>>>> Results >>>>> >>>> >>> >> >> > from >>>>> >>>> >>> >> >> > early >>>>> >>>> >>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks are in >>>>> line with >>>>> >>>> >>> >> >> > expectations that ThinLTO can scale like O2 while >>>>> enabling >>>>> >>>> >>> >> >> > much >>>>> >>>> >>> >> >> > of >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > CMO performed during a full LTO build. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > A ThinLTO build is divided into 3 phases, which are >>>>> referred >>>>> >>>> >>> >> >> > to >>>>> >>>> >>> >> >> > in >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > following implementation plan: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > phase-1: IR and Function Summary Generation (-c >>>>> compile) >>>>> >>>> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin archive >>>>> linker step) >>>>> >>>> >>> >> >> > phase-3: Parallel Backend with Demand-Driven Importing >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Implementation Plan >>>>> >>>> >>> >> >> > ===============>>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > This section gives a high-level breakdown of the >>>>> ThinLTO >>>>> >>>> >>> >> >> > support >>>>> >>>> >>> >> >> > that >>>>> >>>> >>> >> >> > will be added, in roughly the order that the patches >>>>> would >>>>> >>>> >>> >> >> > be >>>>> >>>> >>> >> >> > staged. >>>>> >>>> >>> >> >> > The patches are divided into three stages. The first >>>>> stage >>>>> >>>> >>> >> >> > contains a >>>>> >>>> >>> >> >> > minimal amount of preparation work that is not >>>>> >>>> >>> >> >> > ThinLTO-specific. >>>>> >>>> >>> >> >> > The >>>>> >>>> >>> >> >> > second stage contains most of the infrastructure for >>>>> >>>> >>> >> >> > ThinLTO, >>>>> >>>> >>> >> >> > which >>>>> >>>> >>> >> >> > will be off by default. The third stage includes >>>>> >>>> >>> >> >> > enhancements/improvements/tunings that can be >>>>> performed >>>>> >>>> >>> >> >> > after the >>>>> >>>> >>> >> >> > main >>>>> >>>> >>> >> >> > ThinLTO infrastructure is in. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The second and third implementation stages will >>>>> initially be >>>>> >>>> >>> >> >> > very >>>>> >>>> >>> >> >> > volatile, requiring a lot of iterations and tuning >>>>> with >>>>> >>>> >>> >> >> > large >>>>> >>>> >>> >> >> > apps to >>>>> >>>> >>> >> >> > get stabilized. Therefore it will be important to do >>>>> fast >>>>> >>>> >>> >> >> > commits >>>>> >>>> >>> >> >> > for >>>>> >>>> >>> >> >> > these implementation stages. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > 1. Stage 1: Preparation >>>>> >>>> >>> >> >> > ------------------------------- >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The first planned sets of patches are enablers for >>>>> ThinLTO >>>>> >>>> >>> >> >> > work: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > a. LTO directory structure: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Restructure the LTO directory to remove circular >>>>> dependence >>>>> >>>> >>> >> >> > when >>>>> >>>> >>> >> >> > ThinLTO pass added. Because ThinLTO is being >>>>> implemented as >>>>> >>>> >>> >> >> > a SCC >>>>> >>>> >>> >> >> > pass >>>>> >>>> >>> >> >> > within Transforms/IPO, and leverages the LTOModule >>>>> class for >>>>> >>>> >>> >> >> > linking >>>>> >>>> >>> >> >> > in functions from modules, IPO then requires the LTO >>>>> >>>> >>> >> >> > library. >>>>> >>>> >>> >> >> > This >>>>> >>>> >>> >> >> > creates a circular dependence between LTO and IPO. To >>>>> break >>>>> >>>> >>> >> >> > that, >>>>> >>>> >>> >> >> > we >>>>> >>>> >>> >> >> > need to split the lib/LTO directory/library into >>>>> >>>> >>> >> >> > lib/LTO/CodeGen >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > lib/LTO/Module, containing LTOCodeGenerator and >>>>> LTOModule, >>>>> >>>> >>> >> >> > respectively. Only LTOCodeGenerator has a dependence >>>>> on IPO, >>>>> >>>> >>> >> >> > removing >>>>> >>>> >>> >> >> > the circular dependence. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > b. ELF wrapper generation support: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Implement ELF wrapped bitcode writer. In order to more >>>>> >>>> >>> >> >> > easily >>>>> >>>> >>> >> >> > interact >>>>> >>>> >>> >> >> > with tools such as $AR, $NM, and “$LD -r” we plan to >>>>> emit >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > phase-1 >>>>> >>>> >>> >> >> > bitcode wrapped in ELF via the .llvmbc section, along >>>>> with a >>>>> >>>> >>> >> >> > symbol >>>>> >>>> >>> >> >> > table. The goal is both to interact with these tools >>>>> without >>>>> >>>> >>> >> >> > requiring >>>>> >>>> >>> >> >> > a plugin, and also to avoid doing partial LTO/ThinLTO >>>>> across >>>>> >>>> >>> >> >> > files >>>>> >>>> >>> >> >> > linked with “$LD -r” (i.e. the resulting object file >>>>> should >>>>> >>>> >>> >> >> > still >>>>> >>>> >>> >> >> > contain ELF-wrapped bitcode to enable ThinLTO at the >>>>> full >>>>> >>>> >>> >> >> > link >>>>> >>>> >>> >> >> > step). >>>>> >>>> >>> >> >> > I will send a separate design document for these >>>>> changes, >>>>> >>>> >>> >> >> > but the >>>>> >>>> >>> >> >> > following is a high-level overview. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Support was added to LLVM for reading ELF-wrapped >>>>> bitcode >>>>> >>>> >>> >> >> > (http://reviews.llvm.org/rL218078), but there does >>>>> not yet >>>>> >>>> >>> >> >> > exist >>>>> >>>> >>> >> >> > support in LLVM/Clang for emitting bitcode wrapped in >>>>> ELF. I >>>>> >>>> >>> >> >> > plan >>>>> >>>> >>> >> >> > to >>>>> >>>> >>> >> >> > add support for optionally generating bitcode in an >>>>> ELF file >>>>> >>>> >>> >> >> > containing a single .llvmbc section holding the >>>>> bitcode. >>>>> >>>> >>> >> >> > Specifically, >>>>> >>>> >>> >> >> > the patch would add new options “emit-llvm-bc-elf” >>>>> (object >>>>> >>>> >>> >> >> > file) >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > corresponding “emit-llvm-elf” (textual assembly code >>>>> >>>> >>> >> >> > equivalent). >>>>> >>>> >>> >> >> > Eventually these would be automatically triggered >>>>> under >>>>> >>>> >>> >> >> > “-fthinlto >>>>> >>>> >>> >> >> > -c” >>>>> >>>> >>> >> >> > and “-fthinlto -S”, respectively. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Additionally, a symbol table will be generated in the >>>>> ELF >>>>> >>>> >>> >> >> > file, >>>>> >>>> >>> >> >> > holding the function symbols within the bitcode. This >>>>> >>>> >>> >> >> > facilitates >>>>> >>>> >>> >> >> > handling archives of the ELF-wrapped bitcode created >>>>> with >>>>> >>>> >>> >> >> > $AR, >>>>> >>>> >>> >> >> > since >>>>> >>>> >>> >> >> > the archive will have a symbol table as well. The >>>>> archive >>>>> >>>> >>> >> >> > symbol >>>>> >>>> >>> >> >> > table >>>>> >>>> >>> >> >> > enables gold to extract and pass to the plugin the >>>>> >>>> >>> >> >> > constituent >>>>> >>>> >>> >> >> > ELF-wrapped bitcode files. To support the concatenated >>>>> >>>> >>> >> >> > llvmbc >>>>> >>>> >>> >> >> > section >>>>> >>>> >>> >> >> > generated by “$LD -r”, some handling needs to be >>>>> added to >>>>> >>>> >>> >> >> > gold >>>>> >>>> >>> >> >> > and to >>>>> >>>> >>> >> >> > the backend driver to process each original module’s >>>>> >>>> >>> >> >> > bitcode. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The function index/summary will later be added as a >>>>> special >>>>> >>>> >>> >> >> > ELF >>>>> >>>> >>> >> >> > section alongside the .llvmbc sections. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure >>>>> >>>> >>> >> >> > ---------------------------------------------- >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The next set of patches adds the base implementation >>>>> of the >>>>> >>>> >>> >> >> > ThinLTO >>>>> >>>> >>> >> >> > infrastructure, specifically those required to make >>>>> ThinLTO >>>>> >>>> >>> >> >> > functional >>>>> >>>> >>> >> >> > and generate correct but not necessarily >>>>> high-performing >>>>> >>>> >>> >> >> > binaries. It >>>>> >>>> >>> >> >> > also does not include support to make debug support >>>>> under -g >>>>> >>>> >>> >> >> > efficient >>>>> >>>> >>> >> >> > with ThinLTO. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > a. Clang/LLVM/gold linker options: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > An early set of clang/llvm patches is needed to >>>>> provide >>>>> >>>> >>> >> >> > options >>>>> >>>> >>> >> >> > to >>>>> >>>> >>> >> >> > enable ThinLTO (off by default), so that the rest of >>>>> the >>>>> >>>> >>> >> >> > implementation can be disabled by default as it is >>>>> added. >>>>> >>>> >>> >> >> > Specifically, clang options -fthinlto (used instead of >>>>> >>>> >>> >> >> > -flto) >>>>> >>>> >>> >> >> > will >>>>> >>>> >>> >> >> > cause clang to invoke the phase-1 emission of LLVM >>>>> bitcode >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > function summary/index on a compile step, and pass the >>>>> >>>> >>> >> >> > appropriate >>>>> >>>> >>> >> >> > option to the gold plugin on a link step. The -thinlto >>>>> >>>> >>> >> >> > option >>>>> >>>> >>> >> >> > will be >>>>> >>>> >>> >> >> > added to the gold plugin and llvm-lto tool to launch >>>>> the >>>>> >>>> >>> >> >> > phase-2 >>>>> >>>> >>> >> >> > thin >>>>> >>>> >>> >> >> > archive step. The -thinlto option will also be added >>>>> to the >>>>> >>>> >>> >> >> > ‘opt’ >>>>> >>>> >>> >> >> > tool >>>>> >>>> >>> >> >> > to invoke it as a phase-3 parallel backend instance. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > b. Thin-archive linking support in Gold plugin and >>>>> llvm-lto: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Under the new plugin option (see above), the plugin >>>>> needs to >>>>> >>>> >>> >> >> > perform >>>>> >>>> >>> >> >> > the phase-2 (thin archive) link which simply emits a >>>>> >>>> >>> >> >> > combined >>>>> >>>> >>> >> >> > function >>>>> >>>> >>> >> >> > map from the linked modules, without actually >>>>> performing the >>>>> >>>> >>> >> >> > normal >>>>> >>>> >>> >> >> > link. Corresponding support should be added to the >>>>> >>>> >>> >> >> > standalone >>>>> >>>> >>> >> >> > llvm-lto >>>>> >>>> >>> >> >> > tool to enable testing/debugging without involving the >>>>> >>>> >>> >> >> > linker and >>>>> >>>> >>> >> >> > plugin. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > c. ThinLTO backend support: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Support for invoking a phase-3 backend invocation >>>>> (including >>>>> >>>> >>> >> >> > importing) on a module should be added to the ‘opt’ >>>>> tool >>>>> >>>> >>> >> >> > under >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > new >>>>> >>>> >>> >> >> > option. The main change under the option is to >>>>> instantiate a >>>>> >>>> >>> >> >> > Linker >>>>> >>>> >>> >> >> > object used to manage the process of linking imported >>>>> >>>> >>> >> >> > functions >>>>> >>>> >>> >> >> > into >>>>> >>>> >>> >> >> > the module, efficient read of the combined function >>>>> map, and >>>>> >>>> >>> >> >> > enable >>>>> >>>> >>> >> >> > the ThinLTO import pass. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > d. Function index/summary support: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > This includes infrastructure for writing and reading >>>>> the >>>>> >>>> >>> >> >> > function >>>>> >>>> >>> >> >> > index/summary section. As noted earlier this will be >>>>> encoded >>>>> >>>> >>> >> >> > in a >>>>> >>>> >>> >> >> > special ELF section within the module, alongside the >>>>> .llvmbc >>>>> >>>> >>> >> >> > section >>>>> >>>> >>> >> >> > containing the bitcode. The thin archive generated by >>>>> >>>> >>> >> >> > phase-2 of >>>>> >>>> >>> >> >> > ThinLTO simply contains all of the function >>>>> index/summary >>>>> >>>> >>> >> >> > sections >>>>> >>>> >>> >> >> > across the linked modules, organized for efficient >>>>> function >>>>> >>>> >>> >> >> > lookup. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Each function available for importing from the module >>>>> >>>> >>> >> >> > contains an >>>>> >>>> >>> >> >> > entry in the module’s function index/summary section >>>>> and in >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > resulting combined function map. Each function entry >>>>> >>>> >>> >> >> > contains >>>>> >>>> >>> >> >> > that >>>>> >>>> >>> >> >> > function’s offset within the bitcode file, used to >>>>> >>>> >>> >> >> > efficiently >>>>> >>>> >>> >> >> > locate >>>>> >>>> >>> >> >> > and quickly import just that function. The entry also >>>>> >>>> >>> >> >> > contains >>>>> >>>> >>> >> >> > summary >>>>> >>>> >>> >> >> > information (e.g. basic information determined during >>>>> >>>> >>> >> >> > parsing >>>>> >>>> >>> >> >> > such as >>>>> >>>> >>> >> >> > the number of instructions in the function), that >>>>> will be >>>>> >>>> >>> >> >> > used to >>>>> >>>> >>> >> >> > help >>>>> >>>> >>> >> >> > guide later import decisions. Because the contents of >>>>> this >>>>> >>>> >>> >> >> > section >>>>> >>>> >>> >> >> > will change frequently during ThinLTO tuning, it >>>>> should also >>>>> >>>> >>> >> >> > be >>>>> >>>> >>> >> >> > marked >>>>> >>>> >>> >> >> > with a version id for backwards compatibility or >>>>> version >>>>> >>>> >>> >> >> > checking. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > e. ThinLTO importing support: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Support for the mechanics of importing functions from >>>>> other >>>>> >>>> >>> >> >> > modules, >>>>> >>>> >>> >> >> > which can go in gradually as a set of patches since >>>>> it will >>>>> >>>> >>> >> >> > be >>>>> >>>> >>> >> >> > off by >>>>> >>>> >>> >> >> > default. Separate patches can include: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > - BitcodeReader changes to use function index to >>>>> >>>> >>> >> >> > import/deserialize >>>>> >>>> >>> >> >> > single function of interest (small changes, leverages >>>>> >>>> >>> >> >> > existing >>>>> >>>> >>> >> >> > lazy >>>>> >>>> >>> >> >> > streamer support). >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > - Minor LTOModule changes to pass the ThinLTO >>>>> function to >>>>> >>>> >>> >> >> > import >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > its index into bitcode reader. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > - Marking of imported functions (for use in >>>>> ThinLTO-specific >>>>> >>>> >>> >> >> > symbol >>>>> >>>> >>> >> >> > linking and global DCE, for example). This can be >>>>> in-memory >>>>> >>>> >>> >> >> > initially, >>>>> >>>> >>> >> >> > but IR support may be required in order to support >>>>> streaming >>>>> >>>> >>> >> >> > bitcode >>>>> >>>> >>> >> >> > out and back in again after importing. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > - ModuleLinker changes to do ThinLTO-specific symbol >>>>> linking >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > static promotion when necessary. The linkage type of >>>>> >>>> >>> >> >> > imported >>>>> >>>> >>> >> >> > functions changes to AvailableExternallyLinkage, for >>>>> >>>> >>> >> >> > example. >>>>> >>>> >>> >> >> > Statics >>>>> >>>> >>> >> >> > must be promoted in certain cases, and renamed in >>>>> consistent >>>>> >>>> >>> >> >> > ways. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > - GlobalDCE changes to support removing imported >>>>> functions >>>>> >>>> >>> >> >> > that >>>>> >>>> >>> >> >> > were >>>>> >>>> >>> >> >> > not inlined (very small changes to existing pass >>>>> logic). >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > f. ThinLTO Import Driver SCC pass: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with framework for >>>>> doing >>>>> >>>> >>> >> >> > ThinLTO >>>>> >>>> >>> >> >> > via >>>>> >>>> >>> >> >> > an SCC pass, enabled only under -fthinlto options. >>>>> The pass >>>>> >>>> >>> >> >> > includes >>>>> >>>> >>> >> >> > utilizing the thin archive (global function >>>>> index/summary), >>>>> >>>> >>> >> >> > import >>>>> >>>> >>> >> >> > decision heuristics, invocation of >>>>> LTOModule/ModuleLinker >>>>> >>>> >>> >> >> > routines >>>>> >>>> >>> >> >> > that perform the import, and any necessary callgraph >>>>> updates >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > verification. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > g. Backend Driver: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > For a single node build, the gold plugin can simply >>>>> write a >>>>> >>>> >>> >> >> > makefile >>>>> >>>> >>> >> >> > and fork the parallel backend instances directly via >>>>> >>>> >>> >> >> > parallel >>>>> >>>> >>> >> >> > make. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> ---------------------------------------------------------------- >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > This refers to the patches that are not required for >>>>> ThinLTO >>>>> >>>> >>> >> >> > to >>>>> >>>> >>> >> >> > work, >>>>> >>>> >>> >> >> > but rather to improve compile time, memory, run-time >>>>> >>>> >>> >> >> > performance >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > usability. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > a. Lazy Debug Metadata Linking: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The prototype implementation included lazy importing >>>>> of >>>>> >>>> >>> >> >> > module-level >>>>> >>>> >>> >> >> > metadata during the ThinLTO pass finalization (i.e. >>>>> after >>>>> >>>> >>> >> >> > all >>>>> >>>> >>> >> >> > function >>>>> >>>> >>> >> >> > importing is complete). This actually applies to all >>>>> >>>> >>> >> >> > module-level >>>>> >>>> >>> >> >> > metadata, not just debug, although it is the largest. >>>>> This >>>>> >>>> >>> >> >> > can be >>>>> >>>> >>> >> >> > added as a separate set of patches. Changes to >>>>> >>>> >>> >> >> > BitcodeReader, >>>>> >>>> >>> >> >> > ValueMapper, ModuleLinker >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > b. Import Tuning: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > Tuning the import strategy will be an iterative >>>>> process that >>>>> >>>> >>> >> >> > will >>>>> >>>> >>> >> >> > continue to be refined over time. It involves several >>>>> >>>> >>> >> >> > different >>>>> >>>> >>> >> >> > types >>>>> >>>> >>> >> >> > of changes: adding support for recording additional >>>>> metrics >>>>> >>>> >>> >> >> > in >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > function summary, such as profile data and optional >>>>> >>>> >>> >> >> > heavier-weight >>>>> >>>> >>> >> >> > IPA >>>>> >>>> >>> >> >> > analyses, and tuning the import heuristics based on >>>>> the >>>>> >>>> >>> >> >> > summary >>>>> >>>> >>> >> >> > and >>>>> >>>> >>> >> >> > callsite context. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > c. Combined Function Map Pruning: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > The combined function map can be pruned of functions >>>>> that >>>>> >>>> >>> >> >> > are >>>>> >>>> >>> >> >> > unlikely >>>>> >>>> >>> >> >> > to benefit from being imported. For example, during >>>>> the >>>>> >>>> >>> >> >> > phase-2 >>>>> >>>> >>> >> >> > thin >>>>> >>>> >>> >> >> > archive plug step we can safely omit large and (with >>>>> profile >>>>> >>>> >>> >> >> > data) >>>>> >>>> >>> >> >> > cold functions, which are unlikely to benefit from >>>>> being >>>>> >>>> >>> >> >> > inlined. >>>>> >>>> >>> >> >> > Additionally, all but one copy of comdat functions >>>>> can be >>>>> >>>> >>> >> >> > suppressed. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > d. Distributed Build System Integration: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > For a distributed build system, the gold plugin >>>>> should write >>>>> >>>> >>> >> >> > the >>>>> >>>> >>> >> >> > parallel backend invocations into a makefile, >>>>> including the >>>>> >>>> >>> >> >> > mapping >>>>> >>>> >>> >> >> > from the IR file to the real object file path, and >>>>> exit. >>>>> >>>> >>> >> >> > Additional >>>>> >>>> >>> >> >> > work needs to be done in the distributed build system >>>>> itself >>>>> >>>> >>> >> >> > to >>>>> >>>> >>> >> >> > distribute and dispatch the parallel backend jobs to >>>>> the >>>>> >>>> >>> >> >> > build >>>>> >>>> >>> >> >> > cluster. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > e. Dependence Tracking and Incremental Compiles: >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > In order to support build systems that stage from >>>>> local >>>>> >>>> >>> >> >> > disks or >>>>> >>>> >>> >> >> > network storage, the plugin will optionally support >>>>> >>>> >>> >> >> > computation >>>>> >>>> >>> >> >> > of >>>>> >>>> >>> >> >> > dependent sets of IR files that each module may >>>>> import from. >>>>> >>>> >>> >> >> > This >>>>> >>>> >>> >> >> > can >>>>> >>>> >>> >> >> > be computed from profile data, if it exists, or from >>>>> the >>>>> >>>> >>> >> >> > symbol >>>>> >>>> >>> >> >> > table >>>>> >>>> >>> >> >> > and heuristics if not. These dependence sets also >>>>> enable >>>>> >>>> >>> >> >> > support >>>>> >>>> >>> >> >> > for >>>>> >>>> >>> >> >> > incremental backend compiles. >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > -- >>>>> >>>> >>> >> >> > Teresa Johnson | Software Engineer | >>>>> tejohnson at google.com | >>>>> >>>> >>> >> >> > 408-460-2413 >>>>> >>>> >>> >> >> > >>>>> >>>> >>> >> >> > _______________________________________________ >>>>> >>>> >>> >> >> > LLVM Developers mailing list >>>>> >>>> >>> >> >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> >>>> >>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>> >> >> >>>>> >>>> >>> >> >> _______________________________________________ >>>>> >>>> >>> >> >> LLVM Developers mailing list >>>>> >>>> >>> >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> >>>> >>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>> >> > >>>>> >>>> >>> >> > >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> -- >>>>> >>>> >>> >> Teresa Johnson | Software Engineer | tejohnson at google.com >>>>> | >>>>> >>>> >>> >> 408-460-2413 >>>>> >>>> >>> >> >>>>> >>>> >>> >> _______________________________________________ >>>>> >>>> >>> >> LLVM Developers mailing list >>>>> >>>> >>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> >>>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> -- >>>>> >>>> >>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>> >>>> >>> 408-460-2413 >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> > >>>>> >>>> > _______________________________________________ >>>>> >>>> > LLVM Developers mailing list >>>>> >>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>> > >>>>> >>> >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> LLVM Developers mailing list >>>>> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>> >>>>> > >>>>> > _______________________________________________ >>>>> > LLVM Developers mailing list >>>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>> 408-460-2413 >>>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/d8600ab7/attachment.html>
On Fri, May 15, 2015 at 10:07 AM, Dave Bozier <seifsta at gmail.com> wrote:> > There is no need for emitting the full symtab. I checked the overhead > with a huge internal C++ source. The overhead of symtab + str table > compared with byte code with debug is about 3%. > > It's still sizable and could be noticeable if thinLTO can deliver compile > times that closer to what resembles builds without LTO as your results > suggest. >If the cost is part of the index/summary, then it is avoidable.> > > More importantly, it is also possible to use the symtab also for > index/summary purpose, which makes the space usage completely 'unwasted'. > That gets into the details which will follow when patches are in. > > There is symbol information in both the native object symbol table and the > bitcode file? isn't that waste? I understand the reasons for using the > native object wrapper (compatibility with other tools) and happy with that. > But I'd also like to see the option for function index/summary data to be > produced without the wrapper, so that bitcode aware tools do not need to > use this wrapped format. >I agree.> If you mix the native object wrapper symbol information with the > function/index summary data then that would end up being impossible. > >It is possible. The summary data is still in its own proper (its own section). Under the bitcode only option, the symtab will be replaced with bitcode form of the index, while the summary remains the same.> Also won't having the native object data with the function index/summary > have a cost on testing for all of the supported native object formats? >yes. thanks, David> On Fri, May 15, 2015 at 4:26 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > >> There is no need for emitting the full symtab. I checked the overhead >> with a huge internal C++ source. The overhead of symtab + str table >> compared with byte code with debug is about 3%. >> >> More importantly, it is also possible to use the symtab also for >> index/summary purpose, which makes the space usage completely 'unwasted'. >> That gets into the details which will follow when patches are in. >> >> David >> >> On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote: >> >>> > Are you sure about the additional I/O? With native symtab, existing >>> tools just need to read those, while plugin based approach needs to read >>> bit code section to feedback symbols to the tool. >>> >>> The additional I/O will be quite big if you are going to emit the full >>> symbol table. Looking at some of our real world links the symbol table and >>> string tables of all the inputs seen by the linker add up to about 50 - >>> 100mb. >>> >>> On Thu, May 14, 2015 at 10:28 PM, Xinliang David Li < >>> xinliangli at gmail.com> wrote: >>> >>>> >>>> >>>> On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at google.com> >>>>> wrote: >>>>> >>>>>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com> >>>>>> wrote: >>>>>> >> >>>>>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher < >>>>>> echristo at gmail.com> >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin < >>>>>> dberlin at dberlin.org> >>>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher < >>>>>> echristo at gmail.com> >>>>>> >>>> wrote: >>>>>> >>>> > I'm not sure this is a particularly great assumption to make. >>>>>> >>>> >>>>>> >>>> Which part? >>>>>> >>> >>>>>> >>> >>>>>> >>> The binutils part :) >>>>>> >>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> > We have to >>>>>> >>>> > support a lot of different build systems and tools and >>>>>> concentrating >>>>>> >>>> > on >>>>>> >>>> > something that just binutils uses isn't particularly friendly >>>>>> here. >>>>>> >>>> I think you may have misunderstood >>>>>> >>>> His point was exactly that they want to be transparent to *all >>>>>> of* these >>>>>> >>>> tools. >>>>>> >>>> You are saying "we should be friendly to everyone". He is saying >>>>>> the >>>>>> >>>> same thing. >>>>>> >>>> We should be friendly to everyone. The friendly way to do this >>>>>> is to >>>>>> >>>> not require all of these tools build plugins to handle bitcode. >>>>>> >>>> >>>>>> >>>> Hence, elf-wrapped bitcode. >>>>>> >>> >>>>>> >>> >>>>>> >>> Oh, I understood. I just don't know that I agree. To do anything >>>>>> with the >>>>>> >>> tools will require some knowledge of bitcode anyhow or need the >>>>>> plugin. I'm >>>>>> >>> saying that as a baseline start we should look at how to do this >>>>>> using the >>>>>> >>> tools we've got rather than wrapping things for no real gain. >>>>>> >> >>>>>> >> >>>>>> >> That doesn't seem strictly true - the ar situation (which I'm lead >>>>>> to >>>>>> >> believe is in use in our build system & others, one would assume). >>>>>> With the >>>>>> >> symbol table included as proposed, ar can be used without any >>>>>> knowledge of >>>>>> >> the bitcode or need for a plugin. >>>>>> >> >>>>>> > >>>>>> > For some bits, sure. Optimizing for ar seems a bit silly, why not >>>>>> 'ld -r'? >>>>>> >>>>>> But as mentioned, ld -r can work on native object wrapped bitcode >>>>>> without a plugin as well. >>>>>> >>>>>> >>>>> How? It's not like any partial linking is going to go on inside the >>>>> bitcode if the linker doesn't understand bitcode. >>>>> >>>> >>>> What do we want plugin to do anything here? We just need the linker to >>>> concatenate the bitcode sections and produce a combined bitcode file. >>>> >>>> >>>>> >>>>> >>>>>> > Agreed. The ar situation is interesting because one thing we >>>>>> discussed after >>>>>> > you wandered off was just adding a ToC section to bitcode as it is >>>>>> and then >>>>>> > having the tools handle that. Would seem to accomplish at least the >>>>>> goals as >>>>>> > I've seen them up to this point without worrying too much. >>>>>> >>>>>> The ToC section is a way we can encode the function index/summary into >>>>>> bitcode, but won't help integrate with existing tools. The main issue >>>>>> we are trying to solve is integrating transparently with existing >>>>>> binutils tools in use in our build system and probably elsewhere. >>>>>> >>>>>> >>>>> Right. I'm not entirely sure what use we're going to see in the >>>>> existing tools that we want to encompass here. There's some of it for >>>>> convenience (i.e. nm etc for developers), but they can use a tool that >>>>> understands bitcode and we can make the existing llvm tools suffice for >>>>> these needs. >>>>> >>>>> I think the way of looking at this is that we can: >>>>> >>>>> a) go with wrapping things in native object formats, this means >>>>> - some tools continue to work at the cost of additional I/O and space >>>>> at compile/link time >>>>> >>>> >>>> Are you sure about the additional I/O? With native symtab, existing >>>> tools just need to read those, while plugin based approach needs to read >>>> bit code section to feedback symbols to the tool. >>>> >>>> >>>>> - we still have to update some tools to work at all >>>>> >>>> >>>> If any, it will be minimal. >>>> >>>> >>>>> >>>>> b) we extend those tools/our own tools and have them be drop in >>>>> replacements to the existing tools. They'll understand the bitcode format >>>>> natively, they'll be smaller, and we'll be able to push the state of the >>>>> art in tooling/analysis a bit more in the future without having to rework >>>>> thin lto. >>>>> >>>>> It's basically a set of trade-offs and for llvm we've historically >>>>> gone the b direction. >>>>> >>>>> >>>> I am fine making llvm tools work with it, but we should not >>>> require/force user using them. I think this is an orthogonal feature. >>>> >>>> David >>>> >>>> >>>> >>>> >>>>> > >>>>>> > At any rate, I think this aspect of the proposal needs a bit of >>>>>> discussion >>>>>> > and some mapping out of the pros and cons here. >>>>>> >>>>>> Sure, we can continue to discuss and I will try to lay out the >>>>>> pros/cons. >>>>>> >>>>> >>>>> Excellent. >>>>> >>>>> -eric >>>>> >>>>> >>>>>> >>>>>> Teresa >>>>>> >>>>>> > >>>>>> > -eric >>>>>> > >>>>>> >>> >>>>>> >>> I've talked to Teresa a bit offline and we're going to talk more >>>>>> later >>>>>> >>> (and discuss on the list), but there are some discussions about >>>>>> how to make >>>>>> >>> this work either with just bitcode/llvm tools and so not requiring >>>>>> >>> integration on all platforms. The latter is what I consider as >>>>>> particularly >>>>>> >>> friendly :) >>>>>> >>> >>>>>> >>> -eric >>>>>> >>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> > I also >>>>>> >>>> > can't imagine how it's necessary for any of the lto aspects as >>>>>> >>>> > currently >>>>>> >>>> > written in the proposal. >>>>>> >>>> > >>>>>> >>>> > -eric >>>>>> >>>> > >>>>>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li >>>>>> >>>> > <xinliangli at gmail.com> >>>>>> >>>> > wrote: >>>>>> >>>> >> >>>>>> >>>> >> The design objective is to make thinLTO mostly transparent to >>>>>> binutil >>>>>> >>>> >> tools to enable easy integration with any build system in the >>>>>> wild. >>>>>> >>>> >> 'Pass-through' mode with 'ld -r' instead of the partial LTO >>>>>> mode is >>>>>> >>>> >> another >>>>>> >>>> >> reason. >>>>>> >>>> >> >>>>>> >>>> >> David >>>>>> >>>> >> >>>>>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson >>>>>> >>>> >> <tejohnson at google.com> >>>>>> >>>> >> wrote: >>>>>> >>>> >>> >>>>>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher >>>>>> >>>> >>> <echristo at gmail.com> >>>>>> >>>> >>> wrote: >>>>>> >>>> >>> > So, what Alex is saying is that we have these tools as >>>>>> well and >>>>>> >>>> >>> > they >>>>>> >>>> >>> > understand bitcode just fine, as well as every object >>>>>> format - not >>>>>> >>>> >>> > just >>>>>> >>>> >>> > ELF. >>>>>> >>>> >>> > :) >>>>>> >>>> >>> >>>>>> >>>> >>> Right, there are also LLVM specific versions (llvm-ar, >>>>>> llvm-nm) that >>>>>> >>>> >>> handle bitcode similarly to the way the standard tool + >>>>>> plugin does. >>>>>> >>>> >>> But the goal we are trying to achieve is to allow the >>>>>> standard >>>>>> >>>> >>> system >>>>>> >>>> >>> versions of the tools to handle these files without >>>>>> requiring a >>>>>> >>>> >>> plugin. I know the LLVM tool handles other object formats, >>>>>> but I'm >>>>>> >>>> >>> not >>>>>> >>>> >>> sure how that helps here? We're not planning to replace >>>>>> those tools, >>>>>> >>>> >>> just allow the standard system versions to handle the >>>>>> intermediate >>>>>> >>>> >>> objects produced by ThinLTO. >>>>>> >>>> >>> >>>>>> >>>> >>> Thanks, >>>>>> >>>> >>> Teresa >>>>>> >>>> >>> >>>>>> >>>> >>> > >>>>>> >>>> >>> > -eric >>>>>> >>>> >>> > >>>>>> >>>> >>> > >>>>>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson >>>>>> >>>> >>> > <tejohnson at google.com> >>>>>> >>>> >>> > wrote: >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li >>>>>> >>>> >>> >> <xinliangli at gmail.com> wrote: >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg >>>>>> >>>> >>> >> > <alexr at leftfield.org> >>>>>> >>>> >>> >> > wrote: >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> "ELF-wrapped bitcode" seems potentially controversial >>>>>> to me. >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> What about ar, nm, and various ld implementations adds >>>>>> this >>>>>> >>>> >>> >> >> requirement? >>>>>> >>>> >>> >> >> What about the LLVM implementations of these tools is >>>>>> lacking? >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > Sorry I can not parse your questions properly. Can you >>>>>> make it >>>>>> >>>> >>> >> > clearer? >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> Alex is asking what the issue is with ar, nm, ld -r and >>>>>> regular >>>>>> >>>> >>> >> bitcode that makes using elf-wrapped bitcode easier. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> The issue is that generally you need to provide a plugin >>>>>> to these >>>>>> >>>> >>> >> tools in order for them to understand and handle bitcode >>>>>> files. >>>>>> >>>> >>> >> We'd >>>>>> >>>> >>> >> like standard tools to work without requiring a plugin as >>>>>> much as >>>>>> >>>> >>> >> possible. And in some cases we want them to be handled >>>>>> different >>>>>> >>>> >>> >> than >>>>>> >>>> >>> >> the way bitcode files are handled with the plugin. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> nm: Without a plugin, normal bitcode files are >>>>>> inscrutable. When >>>>>> >>>> >>> >> provided the gold plugin it can emit the symbols. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> ar: Without a plugin, it will create an archive of >>>>>> bitcode files, >>>>>> >>>> >>> >> but >>>>>> >>>> >>> >> without an index, so it can't be handled by the linker >>>>>> even with >>>>>> >>>> >>> >> a >>>>>> >>>> >>> >> plugin on an -flto link. When ar is provided the gold >>>>>> plugin it >>>>>> >>>> >>> >> does >>>>>> >>>> >>> >> create an index, so the linker + gold plugin handle it >>>>>> >>>> >>> >> appropriately >>>>>> >>>> >>> >> on an -flto link. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> ld -r: Without a plugin, fails when provided bitcode >>>>>> inputs. When >>>>>> >>>> >>> >> provided the gold plugin, it handles them but compiles >>>>>> them all >>>>>> >>>> >>> >> the >>>>>> >>>> >>> >> way through to ELF executable instructions via a partial >>>>>> LTO >>>>>> >>>> >>> >> link. >>>>>> >>>> >>> >> This is where we would like to differ in behavior (while >>>>>> also not >>>>>> >>>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we would >>>>>> like the >>>>>> >>>> >>> >> ld -r >>>>>> >>>> >>> >> output file to still contain ELF-wrapped bitcode, >>>>>> delaying the >>>>>> >>>> >>> >> LTO >>>>>> >>>> >>> >> until the full link step. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> Let me know if that helps address your concerns. >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> Thanks, >>>>>> >>>> >>> >> Teresa >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > David >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> Alex >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson >>>>>> >>>> >>> >> >> > <tejohnson at google.com> >>>>>> >>>> >>> >> >> > wrote: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > I've included below an RFC for implementing ThinLTO >>>>>> in LLVM, >>>>>> >>>> >>> >> >> > looking >>>>>> >>>> >>> >> >> > forward to feedback and questions. >>>>>> >>>> >>> >> >> > Thanks! >>>>>> >>>> >>> >> >> > Teresa >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > RFC to discuss plans for implementing ThinLTO >>>>>> upstream. >>>>>> >>>> >>> >> >> > Background >>>>>> >>>> >>> >> >> > can >>>>>> >>>> >>> >> >> > be found in slides from EuroLLVM 2015: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0 >>>>>> ) >>>>>> >>>> >>> >> >> > As described in the talk, we have a prototype >>>>>> >>>> >>> >> >> > implementation, and >>>>>> >>>> >>> >> >> > would like to start staging patches upstream. This >>>>>> RFC >>>>>> >>>> >>> >> >> > describes >>>>>> >>>> >>> >> >> > a >>>>>> >>>> >>> >> >> > breakdown of the major pieces. We would like to >>>>>> commit >>>>>> >>>> >>> >> >> > upstream >>>>>> >>>> >>> >> >> > gradually in several stages, with all functionality >>>>>> off by >>>>>> >>>> >>> >> >> > default. >>>>>> >>>> >>> >> >> > The core ThinLTO importing support and tuning will >>>>>> require >>>>>> >>>> >>> >> >> > frequent >>>>>> >>>> >>> >> >> > change and iteration during testing and tuning, and >>>>>> for that >>>>>> >>>> >>> >> >> > part >>>>>> >>>> >>> >> >> > we >>>>>> >>>> >>> >> >> > would like to commit rapidly (off by default). See >>>>>> the >>>>>> >>>> >>> >> >> > proposed >>>>>> >>>> >>> >> >> > staged >>>>>> >>>> >>> >> >> > implementation described in the Implementation Plan >>>>>> section. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > ThinLTO Overview >>>>>> >>>> >>> >> >> > =============>>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > See the talk slides linked above for more details. >>>>>> The >>>>>> >>>> >>> >> >> > following >>>>>> >>>> >>> >> >> > is a >>>>>> >>>> >>> >> >> > high-level overview of the motivation. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Cross Module Optimization (CMO) is an effective >>>>>> means for >>>>>> >>>> >>> >> >> > improving >>>>>> >>>> >>> >> >> > runtime performance, by extending the scope of >>>>>> optimizations >>>>>> >>>> >>> >> >> > across >>>>>> >>>> >>> >> >> > source module boundaries. Without CMO, the compiler >>>>>> is >>>>>> >>>> >>> >> >> > limited to >>>>>> >>>> >>> >> >> > optimizing within the scope of single source >>>>>> modules. Two >>>>>> >>>> >>> >> >> > solutions >>>>>> >>>> >>> >> >> > for enabling CMO are Link-Time Optimization (LTO), >>>>>> which is >>>>>> >>>> >>> >> >> > currently >>>>>> >>>> >>> >> >> > supported in LLVM and GCC, and >>>>>> Lightweight-Interprocedural >>>>>> >>>> >>> >> >> > Optimization (LIPO). However, each of these >>>>>> solutions has >>>>>> >>>> >>> >> >> > limitations >>>>>> >>>> >>> >> >> > that prevent it from being enabled by default. >>>>>> ThinLTO is a >>>>>> >>>> >>> >> >> > new >>>>>> >>>> >>> >> >> > approach that attempts to address these limitations, >>>>>> with a >>>>>> >>>> >>> >> >> > goal >>>>>> >>>> >>> >> >> > of >>>>>> >>>> >>> >> >> > being enabled more broadly. ThinLTO is designed with >>>>>> many of >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > same >>>>>> >>>> >>> >> >> > principals as LIPO, and therefore its advantages, >>>>>> without >>>>>> >>>> >>> >> >> > any of >>>>>> >>>> >>> >> >> > its >>>>>> >>>> >>> >> >> > inherent weakness. Unlike in LIPO where the module >>>>>> group >>>>>> >>>> >>> >> >> > decision >>>>>> >>>> >>> >> >> > is >>>>>> >>>> >>> >> >> > made at profile training runtime, ThinLTO makes the >>>>>> decision >>>>>> >>>> >>> >> >> > at >>>>>> >>>> >>> >> >> > compile time, but in a lazy mode that facilitates >>>>>> large >>>>>> >>>> >>> >> >> > scale >>>>>> >>>> >>> >> >> > parallelism. The serial linker plugin phase is >>>>>> designed to >>>>>> >>>> >>> >> >> > be >>>>>> >>>> >>> >> >> > razor >>>>>> >>>> >>> >> >> > thin and blazingly fast. By default this step only >>>>>> does >>>>>> >>>> >>> >> >> > minimal >>>>>> >>>> >>> >> >> > preparation work to enable the parallel lazy >>>>>> importing >>>>>> >>>> >>> >> >> > performed >>>>>> >>>> >>> >> >> > later. ThinLTO aims to be scalable like a regular O2 >>>>>> build, >>>>>> >>>> >>> >> >> > enabling >>>>>> >>>> >>> >> >> > CMO on machines without large memory configurations, >>>>>> while >>>>>> >>>> >>> >> >> > also >>>>>> >>>> >>> >> >> > integrating well with distributed build systems. >>>>>> Results >>>>>> >>>> >>> >> >> > from >>>>>> >>>> >>> >> >> > early >>>>>> >>>> >>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks are in >>>>>> line with >>>>>> >>>> >>> >> >> > expectations that ThinLTO can scale like O2 while >>>>>> enabling >>>>>> >>>> >>> >> >> > much >>>>>> >>>> >>> >> >> > of >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > CMO performed during a full LTO build. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > A ThinLTO build is divided into 3 phases, which are >>>>>> referred >>>>>> >>>> >>> >> >> > to >>>>>> >>>> >>> >> >> > in >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > following implementation plan: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > phase-1: IR and Function Summary Generation (-c >>>>>> compile) >>>>>> >>>> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin archive >>>>>> linker step) >>>>>> >>>> >>> >> >> > phase-3: Parallel Backend with Demand-Driven >>>>>> Importing >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Implementation Plan >>>>>> >>>> >>> >> >> > ===============>>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > This section gives a high-level breakdown of the >>>>>> ThinLTO >>>>>> >>>> >>> >> >> > support >>>>>> >>>> >>> >> >> > that >>>>>> >>>> >>> >> >> > will be added, in roughly the order that the patches >>>>>> would >>>>>> >>>> >>> >> >> > be >>>>>> >>>> >>> >> >> > staged. >>>>>> >>>> >>> >> >> > The patches are divided into three stages. The first >>>>>> stage >>>>>> >>>> >>> >> >> > contains a >>>>>> >>>> >>> >> >> > minimal amount of preparation work that is not >>>>>> >>>> >>> >> >> > ThinLTO-specific. >>>>>> >>>> >>> >> >> > The >>>>>> >>>> >>> >> >> > second stage contains most of the infrastructure for >>>>>> >>>> >>> >> >> > ThinLTO, >>>>>> >>>> >>> >> >> > which >>>>>> >>>> >>> >> >> > will be off by default. The third stage includes >>>>>> >>>> >>> >> >> > enhancements/improvements/tunings that can be >>>>>> performed >>>>>> >>>> >>> >> >> > after the >>>>>> >>>> >>> >> >> > main >>>>>> >>>> >>> >> >> > ThinLTO infrastructure is in. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The second and third implementation stages will >>>>>> initially be >>>>>> >>>> >>> >> >> > very >>>>>> >>>> >>> >> >> > volatile, requiring a lot of iterations and tuning >>>>>> with >>>>>> >>>> >>> >> >> > large >>>>>> >>>> >>> >> >> > apps to >>>>>> >>>> >>> >> >> > get stabilized. Therefore it will be important to do >>>>>> fast >>>>>> >>>> >>> >> >> > commits >>>>>> >>>> >>> >> >> > for >>>>>> >>>> >>> >> >> > these implementation stages. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > 1. Stage 1: Preparation >>>>>> >>>> >>> >> >> > ------------------------------- >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The first planned sets of patches are enablers for >>>>>> ThinLTO >>>>>> >>>> >>> >> >> > work: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > a. LTO directory structure: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Restructure the LTO directory to remove circular >>>>>> dependence >>>>>> >>>> >>> >> >> > when >>>>>> >>>> >>> >> >> > ThinLTO pass added. Because ThinLTO is being >>>>>> implemented as >>>>>> >>>> >>> >> >> > a SCC >>>>>> >>>> >>> >> >> > pass >>>>>> >>>> >>> >> >> > within Transforms/IPO, and leverages the LTOModule >>>>>> class for >>>>>> >>>> >>> >> >> > linking >>>>>> >>>> >>> >> >> > in functions from modules, IPO then requires the LTO >>>>>> >>>> >>> >> >> > library. >>>>>> >>>> >>> >> >> > This >>>>>> >>>> >>> >> >> > creates a circular dependence between LTO and IPO. >>>>>> To break >>>>>> >>>> >>> >> >> > that, >>>>>> >>>> >>> >> >> > we >>>>>> >>>> >>> >> >> > need to split the lib/LTO directory/library into >>>>>> >>>> >>> >> >> > lib/LTO/CodeGen >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > lib/LTO/Module, containing LTOCodeGenerator and >>>>>> LTOModule, >>>>>> >>>> >>> >> >> > respectively. Only LTOCodeGenerator has a dependence >>>>>> on IPO, >>>>>> >>>> >>> >> >> > removing >>>>>> >>>> >>> >> >> > the circular dependence. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > b. ELF wrapper generation support: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Implement ELF wrapped bitcode writer. In order to >>>>>> more >>>>>> >>>> >>> >> >> > easily >>>>>> >>>> >>> >> >> > interact >>>>>> >>>> >>> >> >> > with tools such as $AR, $NM, and “$LD -r” we plan to >>>>>> emit >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > phase-1 >>>>>> >>>> >>> >> >> > bitcode wrapped in ELF via the .llvmbc section, >>>>>> along with a >>>>>> >>>> >>> >> >> > symbol >>>>>> >>>> >>> >> >> > table. The goal is both to interact with these tools >>>>>> without >>>>>> >>>> >>> >> >> > requiring >>>>>> >>>> >>> >> >> > a plugin, and also to avoid doing partial >>>>>> LTO/ThinLTO across >>>>>> >>>> >>> >> >> > files >>>>>> >>>> >>> >> >> > linked with “$LD -r” (i.e. the resulting object file >>>>>> should >>>>>> >>>> >>> >> >> > still >>>>>> >>>> >>> >> >> > contain ELF-wrapped bitcode to enable ThinLTO at the >>>>>> full >>>>>> >>>> >>> >> >> > link >>>>>> >>>> >>> >> >> > step). >>>>>> >>>> >>> >> >> > I will send a separate design document for these >>>>>> changes, >>>>>> >>>> >>> >> >> > but the >>>>>> >>>> >>> >> >> > following is a high-level overview. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Support was added to LLVM for reading ELF-wrapped >>>>>> bitcode >>>>>> >>>> >>> >> >> > (http://reviews.llvm.org/rL218078), but there does >>>>>> not yet >>>>>> >>>> >>> >> >> > exist >>>>>> >>>> >>> >> >> > support in LLVM/Clang for emitting bitcode wrapped >>>>>> in ELF. I >>>>>> >>>> >>> >> >> > plan >>>>>> >>>> >>> >> >> > to >>>>>> >>>> >>> >> >> > add support for optionally generating bitcode in an >>>>>> ELF file >>>>>> >>>> >>> >> >> > containing a single .llvmbc section holding the >>>>>> bitcode. >>>>>> >>>> >>> >> >> > Specifically, >>>>>> >>>> >>> >> >> > the patch would add new options “emit-llvm-bc-elf” >>>>>> (object >>>>>> >>>> >>> >> >> > file) >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > corresponding “emit-llvm-elf” (textual assembly code >>>>>> >>>> >>> >> >> > equivalent). >>>>>> >>>> >>> >> >> > Eventually these would be automatically triggered >>>>>> under >>>>>> >>>> >>> >> >> > “-fthinlto >>>>>> >>>> >>> >> >> > -c” >>>>>> >>>> >>> >> >> > and “-fthinlto -S”, respectively. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Additionally, a symbol table will be generated in >>>>>> the ELF >>>>>> >>>> >>> >> >> > file, >>>>>> >>>> >>> >> >> > holding the function symbols within the bitcode. This >>>>>> >>>> >>> >> >> > facilitates >>>>>> >>>> >>> >> >> > handling archives of the ELF-wrapped bitcode created >>>>>> with >>>>>> >>>> >>> >> >> > $AR, >>>>>> >>>> >>> >> >> > since >>>>>> >>>> >>> >> >> > the archive will have a symbol table as well. The >>>>>> archive >>>>>> >>>> >>> >> >> > symbol >>>>>> >>>> >>> >> >> > table >>>>>> >>>> >>> >> >> > enables gold to extract and pass to the plugin the >>>>>> >>>> >>> >> >> > constituent >>>>>> >>>> >>> >> >> > ELF-wrapped bitcode files. To support the >>>>>> concatenated >>>>>> >>>> >>> >> >> > llvmbc >>>>>> >>>> >>> >> >> > section >>>>>> >>>> >>> >> >> > generated by “$LD -r”, some handling needs to be >>>>>> added to >>>>>> >>>> >>> >> >> > gold >>>>>> >>>> >>> >> >> > and to >>>>>> >>>> >>> >> >> > the backend driver to process each original module’s >>>>>> >>>> >>> >> >> > bitcode. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The function index/summary will later be added as a >>>>>> special >>>>>> >>>> >>> >> >> > ELF >>>>>> >>>> >>> >> >> > section alongside the .llvmbc sections. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure >>>>>> >>>> >>> >> >> > ---------------------------------------------- >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The next set of patches adds the base implementation >>>>>> of the >>>>>> >>>> >>> >> >> > ThinLTO >>>>>> >>>> >>> >> >> > infrastructure, specifically those required to make >>>>>> ThinLTO >>>>>> >>>> >>> >> >> > functional >>>>>> >>>> >>> >> >> > and generate correct but not necessarily >>>>>> high-performing >>>>>> >>>> >>> >> >> > binaries. It >>>>>> >>>> >>> >> >> > also does not include support to make debug support >>>>>> under -g >>>>>> >>>> >>> >> >> > efficient >>>>>> >>>> >>> >> >> > with ThinLTO. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > a. Clang/LLVM/gold linker options: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > An early set of clang/llvm patches is needed to >>>>>> provide >>>>>> >>>> >>> >> >> > options >>>>>> >>>> >>> >> >> > to >>>>>> >>>> >>> >> >> > enable ThinLTO (off by default), so that the rest of >>>>>> the >>>>>> >>>> >>> >> >> > implementation can be disabled by default as it is >>>>>> added. >>>>>> >>>> >>> >> >> > Specifically, clang options -fthinlto (used instead >>>>>> of >>>>>> >>>> >>> >> >> > -flto) >>>>>> >>>> >>> >> >> > will >>>>>> >>>> >>> >> >> > cause clang to invoke the phase-1 emission of LLVM >>>>>> bitcode >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > function summary/index on a compile step, and pass >>>>>> the >>>>>> >>>> >>> >> >> > appropriate >>>>>> >>>> >>> >> >> > option to the gold plugin on a link step. The >>>>>> -thinlto >>>>>> >>>> >>> >> >> > option >>>>>> >>>> >>> >> >> > will be >>>>>> >>>> >>> >> >> > added to the gold plugin and llvm-lto tool to launch >>>>>> the >>>>>> >>>> >>> >> >> > phase-2 >>>>>> >>>> >>> >> >> > thin >>>>>> >>>> >>> >> >> > archive step. The -thinlto option will also be added >>>>>> to the >>>>>> >>>> >>> >> >> > ‘opt’ >>>>>> >>>> >>> >> >> > tool >>>>>> >>>> >>> >> >> > to invoke it as a phase-3 parallel backend instance. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > b. Thin-archive linking support in Gold plugin and >>>>>> llvm-lto: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Under the new plugin option (see above), the plugin >>>>>> needs to >>>>>> >>>> >>> >> >> > perform >>>>>> >>>> >>> >> >> > the phase-2 (thin archive) link which simply emits a >>>>>> >>>> >>> >> >> > combined >>>>>> >>>> >>> >> >> > function >>>>>> >>>> >>> >> >> > map from the linked modules, without actually >>>>>> performing the >>>>>> >>>> >>> >> >> > normal >>>>>> >>>> >>> >> >> > link. Corresponding support should be added to the >>>>>> >>>> >>> >> >> > standalone >>>>>> >>>> >>> >> >> > llvm-lto >>>>>> >>>> >>> >> >> > tool to enable testing/debugging without involving >>>>>> the >>>>>> >>>> >>> >> >> > linker and >>>>>> >>>> >>> >> >> > plugin. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > c. ThinLTO backend support: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Support for invoking a phase-3 backend invocation >>>>>> (including >>>>>> >>>> >>> >> >> > importing) on a module should be added to the ‘opt’ >>>>>> tool >>>>>> >>>> >>> >> >> > under >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > new >>>>>> >>>> >>> >> >> > option. The main change under the option is to >>>>>> instantiate a >>>>>> >>>> >>> >> >> > Linker >>>>>> >>>> >>> >> >> > object used to manage the process of linking imported >>>>>> >>>> >>> >> >> > functions >>>>>> >>>> >>> >> >> > into >>>>>> >>>> >>> >> >> > the module, efficient read of the combined function >>>>>> map, and >>>>>> >>>> >>> >> >> > enable >>>>>> >>>> >>> >> >> > the ThinLTO import pass. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > d. Function index/summary support: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > This includes infrastructure for writing and reading >>>>>> the >>>>>> >>>> >>> >> >> > function >>>>>> >>>> >>> >> >> > index/summary section. As noted earlier this will be >>>>>> encoded >>>>>> >>>> >>> >> >> > in a >>>>>> >>>> >>> >> >> > special ELF section within the module, alongside the >>>>>> .llvmbc >>>>>> >>>> >>> >> >> > section >>>>>> >>>> >>> >> >> > containing the bitcode. The thin archive generated by >>>>>> >>>> >>> >> >> > phase-2 of >>>>>> >>>> >>> >> >> > ThinLTO simply contains all of the function >>>>>> index/summary >>>>>> >>>> >>> >> >> > sections >>>>>> >>>> >>> >> >> > across the linked modules, organized for efficient >>>>>> function >>>>>> >>>> >>> >> >> > lookup. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Each function available for importing from the module >>>>>> >>>> >>> >> >> > contains an >>>>>> >>>> >>> >> >> > entry in the module’s function index/summary section >>>>>> and in >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > resulting combined function map. Each function entry >>>>>> >>>> >>> >> >> > contains >>>>>> >>>> >>> >> >> > that >>>>>> >>>> >>> >> >> > function’s offset within the bitcode file, used to >>>>>> >>>> >>> >> >> > efficiently >>>>>> >>>> >>> >> >> > locate >>>>>> >>>> >>> >> >> > and quickly import just that function. The entry also >>>>>> >>>> >>> >> >> > contains >>>>>> >>>> >>> >> >> > summary >>>>>> >>>> >>> >> >> > information (e.g. basic information determined during >>>>>> >>>> >>> >> >> > parsing >>>>>> >>>> >>> >> >> > such as >>>>>> >>>> >>> >> >> > the number of instructions in the function), that >>>>>> will be >>>>>> >>>> >>> >> >> > used to >>>>>> >>>> >>> >> >> > help >>>>>> >>>> >>> >> >> > guide later import decisions. Because the contents >>>>>> of this >>>>>> >>>> >>> >> >> > section >>>>>> >>>> >>> >> >> > will change frequently during ThinLTO tuning, it >>>>>> should also >>>>>> >>>> >>> >> >> > be >>>>>> >>>> >>> >> >> > marked >>>>>> >>>> >>> >> >> > with a version id for backwards compatibility or >>>>>> version >>>>>> >>>> >>> >> >> > checking. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > e. ThinLTO importing support: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Support for the mechanics of importing functions >>>>>> from other >>>>>> >>>> >>> >> >> > modules, >>>>>> >>>> >>> >> >> > which can go in gradually as a set of patches since >>>>>> it will >>>>>> >>>> >>> >> >> > be >>>>>> >>>> >>> >> >> > off by >>>>>> >>>> >>> >> >> > default. Separate patches can include: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > - BitcodeReader changes to use function index to >>>>>> >>>> >>> >> >> > import/deserialize >>>>>> >>>> >>> >> >> > single function of interest (small changes, leverages >>>>>> >>>> >>> >> >> > existing >>>>>> >>>> >>> >> >> > lazy >>>>>> >>>> >>> >> >> > streamer support). >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > - Minor LTOModule changes to pass the ThinLTO >>>>>> function to >>>>>> >>>> >>> >> >> > import >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > its index into bitcode reader. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > - Marking of imported functions (for use in >>>>>> ThinLTO-specific >>>>>> >>>> >>> >> >> > symbol >>>>>> >>>> >>> >> >> > linking and global DCE, for example). This can be >>>>>> in-memory >>>>>> >>>> >>> >> >> > initially, >>>>>> >>>> >>> >> >> > but IR support may be required in order to support >>>>>> streaming >>>>>> >>>> >>> >> >> > bitcode >>>>>> >>>> >>> >> >> > out and back in again after importing. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > - ModuleLinker changes to do ThinLTO-specific symbol >>>>>> linking >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > static promotion when necessary. The linkage type of >>>>>> >>>> >>> >> >> > imported >>>>>> >>>> >>> >> >> > functions changes to AvailableExternallyLinkage, for >>>>>> >>>> >>> >> >> > example. >>>>>> >>>> >>> >> >> > Statics >>>>>> >>>> >>> >> >> > must be promoted in certain cases, and renamed in >>>>>> consistent >>>>>> >>>> >>> >> >> > ways. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > - GlobalDCE changes to support removing imported >>>>>> functions >>>>>> >>>> >>> >> >> > that >>>>>> >>>> >>> >> >> > were >>>>>> >>>> >>> >> >> > not inlined (very small changes to existing pass >>>>>> logic). >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > f. ThinLTO Import Driver SCC pass: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with framework for >>>>>> doing >>>>>> >>>> >>> >> >> > ThinLTO >>>>>> >>>> >>> >> >> > via >>>>>> >>>> >>> >> >> > an SCC pass, enabled only under -fthinlto options. >>>>>> The pass >>>>>> >>>> >>> >> >> > includes >>>>>> >>>> >>> >> >> > utilizing the thin archive (global function >>>>>> index/summary), >>>>>> >>>> >>> >> >> > import >>>>>> >>>> >>> >> >> > decision heuristics, invocation of >>>>>> LTOModule/ModuleLinker >>>>>> >>>> >>> >> >> > routines >>>>>> >>>> >>> >> >> > that perform the import, and any necessary callgraph >>>>>> updates >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > verification. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > g. Backend Driver: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > For a single node build, the gold plugin can simply >>>>>> write a >>>>>> >>>> >>> >> >> > makefile >>>>>> >>>> >>> >> >> > and fork the parallel backend instances directly via >>>>>> >>>> >>> >> >> > parallel >>>>>> >>>> >>> >> >> > make. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> ---------------------------------------------------------------- >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > This refers to the patches that are not required for >>>>>> ThinLTO >>>>>> >>>> >>> >> >> > to >>>>>> >>>> >>> >> >> > work, >>>>>> >>>> >>> >> >> > but rather to improve compile time, memory, run-time >>>>>> >>>> >>> >> >> > performance >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > usability. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > a. Lazy Debug Metadata Linking: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The prototype implementation included lazy importing >>>>>> of >>>>>> >>>> >>> >> >> > module-level >>>>>> >>>> >>> >> >> > metadata during the ThinLTO pass finalization (i.e. >>>>>> after >>>>>> >>>> >>> >> >> > all >>>>>> >>>> >>> >> >> > function >>>>>> >>>> >>> >> >> > importing is complete). This actually applies to all >>>>>> >>>> >>> >> >> > module-level >>>>>> >>>> >>> >> >> > metadata, not just debug, although it is the >>>>>> largest. This >>>>>> >>>> >>> >> >> > can be >>>>>> >>>> >>> >> >> > added as a separate set of patches. Changes to >>>>>> >>>> >>> >> >> > BitcodeReader, >>>>>> >>>> >>> >> >> > ValueMapper, ModuleLinker >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > b. Import Tuning: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > Tuning the import strategy will be an iterative >>>>>> process that >>>>>> >>>> >>> >> >> > will >>>>>> >>>> >>> >> >> > continue to be refined over time. It involves several >>>>>> >>>> >>> >> >> > different >>>>>> >>>> >>> >> >> > types >>>>>> >>>> >>> >> >> > of changes: adding support for recording additional >>>>>> metrics >>>>>> >>>> >>> >> >> > in >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > function summary, such as profile data and optional >>>>>> >>>> >>> >> >> > heavier-weight >>>>>> >>>> >>> >> >> > IPA >>>>>> >>>> >>> >> >> > analyses, and tuning the import heuristics based on >>>>>> the >>>>>> >>>> >>> >> >> > summary >>>>>> >>>> >>> >> >> > and >>>>>> >>>> >>> >> >> > callsite context. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > c. Combined Function Map Pruning: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > The combined function map can be pruned of functions >>>>>> that >>>>>> >>>> >>> >> >> > are >>>>>> >>>> >>> >> >> > unlikely >>>>>> >>>> >>> >> >> > to benefit from being imported. For example, during >>>>>> the >>>>>> >>>> >>> >> >> > phase-2 >>>>>> >>>> >>> >> >> > thin >>>>>> >>>> >>> >> >> > archive plug step we can safely omit large and (with >>>>>> profile >>>>>> >>>> >>> >> >> > data) >>>>>> >>>> >>> >> >> > cold functions, which are unlikely to benefit from >>>>>> being >>>>>> >>>> >>> >> >> > inlined. >>>>>> >>>> >>> >> >> > Additionally, all but one copy of comdat functions >>>>>> can be >>>>>> >>>> >>> >> >> > suppressed. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > d. Distributed Build System Integration: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > For a distributed build system, the gold plugin >>>>>> should write >>>>>> >>>> >>> >> >> > the >>>>>> >>>> >>> >> >> > parallel backend invocations into a makefile, >>>>>> including the >>>>>> >>>> >>> >> >> > mapping >>>>>> >>>> >>> >> >> > from the IR file to the real object file path, and >>>>>> exit. >>>>>> >>>> >>> >> >> > Additional >>>>>> >>>> >>> >> >> > work needs to be done in the distributed build >>>>>> system itself >>>>>> >>>> >>> >> >> > to >>>>>> >>>> >>> >> >> > distribute and dispatch the parallel backend jobs to >>>>>> the >>>>>> >>>> >>> >> >> > build >>>>>> >>>> >>> >> >> > cluster. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > e. Dependence Tracking and Incremental Compiles: >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > In order to support build systems that stage from >>>>>> local >>>>>> >>>> >>> >> >> > disks or >>>>>> >>>> >>> >> >> > network storage, the plugin will optionally support >>>>>> >>>> >>> >> >> > computation >>>>>> >>>> >>> >> >> > of >>>>>> >>>> >>> >> >> > dependent sets of IR files that each module may >>>>>> import from. >>>>>> >>>> >>> >> >> > This >>>>>> >>>> >>> >> >> > can >>>>>> >>>> >>> >> >> > be computed from profile data, if it exists, or from >>>>>> the >>>>>> >>>> >>> >> >> > symbol >>>>>> >>>> >>> >> >> > table >>>>>> >>>> >>> >> >> > and heuristics if not. These dependence sets also >>>>>> enable >>>>>> >>>> >>> >> >> > support >>>>>> >>>> >>> >> >> > for >>>>>> >>>> >>> >> >> > incremental backend compiles. >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > -- >>>>>> >>>> >>> >> >> > Teresa Johnson | Software Engineer | >>>>>> tejohnson at google.com | >>>>>> >>>> >>> >> >> > 408-460-2413 >>>>>> >>>> >>> >> >> > >>>>>> >>>> >>> >> >> > _______________________________________________ >>>>>> >>>> >>> >> >> > LLVM Developers mailing list >>>>>> >>>> >>> >> >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> >>>> >>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>> >>> >> >> >>>>>> >>>> >>> >> >> _______________________________________________ >>>>>> >>>> >>> >> >> LLVM Developers mailing list >>>>>> >>>> >>> >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> >>>> >>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> > >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> -- >>>>>> >>>> >>> >> Teresa Johnson | Software Engineer | tejohnson at google.com >>>>>> | >>>>>> >>>> >>> >> 408-460-2413 >>>>>> >>>> >>> >> >>>>>> >>>> >>> >> _______________________________________________ >>>>>> >>>> >>> >> LLVM Developers mailing list >>>>>> >>>> >>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> >>>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>> >>> >>>>>> >>>> >>> >>>>>> >>>> >>> >>>>>> >>>> >>> -- >>>>>> >>>> >>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>>> >>>> >>> 408-460-2413 >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> >>>> > >>>>>> >>>> > _______________________________________________ >>>>>> >>>> > LLVM Developers mailing list >>>>>> >>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>>> > >>>>>> >>> >>>>>> >>> >>>>>> >>> _______________________________________________ >>>>>> >>> LLVM Developers mailing list >>>>>> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> >>> >>>>>> > >>>>>> > _______________________________________________ >>>>>> > LLVM Developers mailing list >>>>>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | >>>>>> 408-460-2413 >>>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/8e7aae67/attachment.html>
On Fri, May 15, 2015 at 10:07 AM, Dave Bozier <seifsta at gmail.com> wrote:> > There is no need for emitting the full symtab. I checked the overhead > with a huge internal C++ source. The overhead of symtab + str table > compared with byte code with debug is about 3%. > > It's still sizable and could be noticeable if thinLTO can deliver compile > times that closer to what resembles builds without LTO as your results > suggest. >If the cost is part of the index/summary, then it is avoidable.> > > More importantly, it is also possible to use the symtab also for > index/summary purpose, which makes the space usage completely 'unwasted'. > That gets into the details which will follow when patches are in. >> > There is symbol information in both the native object symbol table and the > bitcode file? isn't that waste? I understand the reasons for using the > native object wrapper (compatibility with other tools) and happy with that. > But I'd also like to see the option for function index/summary data to be > produced without the wrapper, so that bitcode aware tools do not need to > use this wrapped format. >I agree.> If you mix the native object wrapper symbol information with the > function/index summary data then that would end up being impossible. >It is possible. The summary data is still in its own proper (its own section). Under the bitcode only option, the symtab will be replaced with bitcode form of the index, while the summary remains the same.> > Also won't having the native object data with the function index/summary > have a cost on testing for all of the supported native object formats? >yes. thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/f2a69004/attachment.html>