thr3ads.net - llvm dev - [LLVMdev] RFC: ThinLTO Impementation Plan [May 2015]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2015-May-14 20:11 UTC

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>> > I'm not sure this is a particularly great assumption to make.
>>
>> Which part?
>>
>
> The binutils part :)
>
>
>>
>> >  We have to
>> > support a lot of different build systems and tools and
concentrating on
>> > something that just binutils uses isn't particularly friendly
here.
>> I think you may have misunderstood
>> His point was exactly that they want to be transparent to *all of*
these
>> tools.
>> You are saying "we should be friendly to everyone". He is
saying the same
>> thing.
>> We should be friendly to everyone. The friendly way to do this is to
>> not require all of these tools build plugins to handle bitcode.
>>
>> Hence, elf-wrapped bitcode.
>>
>
> Oh, I understood. I just don't know that I agree. To do anything with
the
> tools will require some knowledge of bitcode anyhow or need the plugin.
I'm
> saying that as a baseline start we should look at how to do this using the
> tools we've got rather than wrapping things for no real gain.
>
That doesn't seem strictly true - the ar situation (which I'm lead to
believe is in use in our build system & others, one would assume). With the
symbol table included as proposed, ar can be used without any knowledge of
the bitcode or need for a plugin.

It'd be helpful to have the scenarios we're trying to support with these
tools & then weigh up the alternatives.

> I've talked to Teresa a bit offline and we're going to talk more
later
> (and discuss on the list), but there are some discussions about how to make
> this work either with just bitcode/llvm tools and so not requiring
> integration on all platforms. The latter is what I consider as particularly
> friendly :)
>
> -eric
>
>
>>
>>
>> > I also
>> > can't imagine how it's necessary for any of the lto
aspects as currently
>> > written in the proposal.
>> >
>> > -eric
>> >
>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli
at gmail.com
>> >
>> > wrote:
>> >>
>> >> The design objective is to make thinLTO mostly transparent to
binutil
>> >> tools to enable easy integration with any build system in the
wild.
>> >> 'Pass-through' mode with 'ld -r' instead of
the partial LTO mode is
>> another
>> >> reason.
>> >>
>> >> David
>> >>
>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson
at google.com>
>> >> wrote:
>> >>>
>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher
<echristo at gmail.com
>> >
>> >>> wrote:
>> >>> > So, what Alex is saying is that we have these tools
as well and they
>> >>> > understand bitcode just fine, as well as every object
format - not
>> just
>> >>> > ELF.
>> >>> > :)
>> >>>
>> >>> Right, there are also LLVM specific versions (llvm-ar,
llvm-nm) that
>> >>> handle bitcode similarly to the way the standard tool +
plugin does.
>> >>> But the goal we are trying to achieve is to allow the
standard system
>> >>> versions of the tools to handle these files without
requiring a
>> >>> plugin. I know the LLVM tool handles other object formats,
but I'm not
>> >>> sure how that helps here? We're not planning to
replace those tools,
>> >>> just allow the standard system versions to handle the
intermediate
>> >>> objects produced by ThinLTO.
>> >>>
>> >>> Thanks,
>> >>> Teresa
>> >>>
>> >>> >
>> >>> > -eric
>> >>> >
>> >>> >
>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
<tejohnson at google.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David
Li
>> >>> >> <xinliangli at gmail.com> wrote:
>> >>> >> >
>> >>> >> >
>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex
Rosenberg
>> >>> >> > <alexr at leftfield.org>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> "ELF-wrapped bitcode" seems
potentially controversial to me.
>> >>> >> >>
>> >>> >> >> What about ar, nm, and various ld
implementations adds this
>> >>> >> >> requirement?
>> >>> >> >> What about the LLVM implementations of
these tools is lacking?
>> >>> >> >
>> >>> >> >
>> >>> >> > Sorry I can not parse your questions
properly. Can you make it
>> >>> >> > clearer?
>> >>> >>
>> >>> >> Alex is asking what the issue is with ar, nm, ld
-r and regular
>> >>> >> bitcode that makes using elf-wrapped bitcode
easier.
>> >>> >>
>> >>> >> The issue is that generally you need to provide a
plugin to these
>> >>> >> tools in order for them to understand and handle
bitcode files.
>> We'd
>> >>> >> like standard tools to work without requiring a
plugin as much as
>> >>> >> possible. And in some cases we want them to be
handled different
>> than
>> >>> >> the way bitcode files are handled with the
plugin.
>> >>> >>
>> >>> >> nm: Without a plugin, normal bitcode files are
inscrutable. When
>> >>> >> provided the gold plugin it can emit the symbols.
>> >>> >>
>> >>> >> ar: Without a plugin, it will create an archive
of bitcode files,
>> but
>> >>> >> without an index, so it can't be handled by
the linker even with a
>> >>> >> plugin on an -flto link. When ar is provided the
gold plugin it
>> does
>> >>> >> create an index, so the linker + gold plugin
handle it
>> appropriately
>> >>> >> on an -flto link.
>> >>> >>
>> >>> >> ld -r: Without a plugin, fails when provided
bitcode inputs. When
>> >>> >> provided the gold plugin, it handles them but
compiles them all the
>> >>> >> way through to ELF executable instructions via a
partial LTO link.
>> >>> >> This is where we would like to differ in behavior
(while also not
>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we
would like the ld
>> -r
>> >>> >> output file to still contain ELF-wrapped bitcode,
delaying the LTO
>> >>> >> until the full link step.
>> >>> >>
>> >>> >> Let me know if that helps address your concerns.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Teresa
>> >>> >>
>> >>> >> >
>> >>> >> > David
>> >>> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> Alex
>> >>> >> >>
>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa
Johnson
>> >>> >> >> > <tejohnson at google.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > I've included below an RFC for
implementing ThinLTO in LLVM,
>> >>> >> >> > looking
>> >>> >> >> > forward to feedback and questions.
>> >>> >> >> > Thanks!
>> >>> >> >> > Teresa
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > RFC to discuss plans for
implementing ThinLTO upstream.
>> >>> >> >> > Background
>> >>> >> >> > can
>> >>> >> >> > be found in slides from EuroLLVM
2015:
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>> >>> >> >> > As described in the talk, we have a
prototype implementation,
>> and
>> >>> >> >> > would like to start staging patches
upstream. This RFC
>> describes
>> >>> >> >> > a
>> >>> >> >> > breakdown of the major pieces. We
would like to commit
>> upstream
>> >>> >> >> > gradually in several stages, with
all functionality off by
>> >>> >> >> > default.
>> >>> >> >> > The core ThinLTO importing support
and tuning will require
>> >>> >> >> > frequent
>> >>> >> >> > change and iteration during testing
and tuning, and for that
>> part
>> >>> >> >> > we
>> >>> >> >> > would like to commit rapidly (off
by default). See the
>> proposed
>> >>> >> >> > staged
>> >>> >> >> > implementation described in the
Implementation Plan section.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > ThinLTO Overview
>> >>> >> >> > =============>> >>>
>> >> >
>> >>> >> >> > See the talk slides linked above
for more details. The
>> following
>> >>> >> >> > is a
>> >>> >> >> > high-level overview of the
motivation.
>> >>> >> >> >
>> >>> >> >> > Cross Module Optimization (CMO) is
an effective means for
>> >>> >> >> > improving
>> >>> >> >> > runtime performance, by extending
the scope of optimizations
>> >>> >> >> > across
>> >>> >> >> > source module boundaries. Without
CMO, the compiler is
>> limited to
>> >>> >> >> > optimizing within the scope of
single source modules. Two
>> >>> >> >> > solutions
>> >>> >> >> > for enabling CMO are Link-Time
Optimization (LTO), which is
>> >>> >> >> > currently
>> >>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
>> >>> >> >> > Optimization (LIPO). However, each
of these solutions has
>> >>> >> >> > limitations
>> >>> >> >> > that prevent it from being enabled
by default. ThinLTO is a
>> new
>> >>> >> >> > approach that attempts to address
these limitations, with a
>> goal
>> >>> >> >> > of
>> >>> >> >> > being enabled more broadly. ThinLTO
is designed with many of
>> the
>> >>> >> >> > same
>> >>> >> >> > principals as LIPO, and therefore
its advantages, without any
>> of
>> >>> >> >> > its
>> >>> >> >> > inherent weakness. Unlike in LIPO
where the module group
>> decision
>> >>> >> >> > is
>> >>> >> >> > made at profile training runtime,
ThinLTO makes the decision
>> at
>> >>> >> >> > compile time, but in a lazy mode
that facilitates large scale
>> >>> >> >> > parallelism. The serial linker
plugin phase is designed to be
>> >>> >> >> > razor
>> >>> >> >> > thin and blazingly fast. By default
this step only does
>> minimal
>> >>> >> >> > preparation work to enable the
parallel lazy importing
>> performed
>> >>> >> >> > later. ThinLTO aims to be scalable
like a regular O2 build,
>> >>> >> >> > enabling
>> >>> >> >> > CMO on machines without large
memory configurations, while
>> also
>> >>> >> >> > integrating well with distributed
build systems. Results from
>> >>> >> >> > early
>> >>> >> >> > prototyping on SPEC cpu2006 C++
benchmarks are in line with
>> >>> >> >> > expectations that ThinLTO can scale
like O2 while enabling
>> much
>> >>> >> >> > of
>> >>> >> >> > the
>> >>> >> >> > CMO performed during a full LTO
build.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > A ThinLTO build is divided into 3
phases, which are referred
>> to
>> >>> >> >> > in
>> >>> >> >> > the
>> >>> >> >> > following implementation plan:
>> >>> >> >> >
>> >>> >> >> > phase-1: IR and Function Summary
Generation (-c compile)
>> >>> >> >> > phase-2: Thin Linker Plugin Layer
(thin archive linker step)
>> >>> >> >> > phase-3: Parallel Backend with
Demand-Driven Importing
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > Implementation Plan
>> >>> >> >> > ===============>>
>>> >> >> >
>> >>> >> >> > This section gives a high-level
breakdown of the ThinLTO
>> support
>> >>> >> >> > that
>> >>> >> >> > will be added, in roughly the order
that the patches would be
>> >>> >> >> > staged.
>> >>> >> >> > The patches are divided into three
stages. The first stage
>> >>> >> >> > contains a
>> >>> >> >> > minimal amount of preparation work
that is not
>> ThinLTO-specific.
>> >>> >> >> > The
>> >>> >> >> > second stage contains most of the
infrastructure for ThinLTO,
>> >>> >> >> > which
>> >>> >> >> > will be off by default. The third
stage includes
>> >>> >> >> > enhancements/improvements/tunings
that can be performed after
>> the
>> >>> >> >> > main
>> >>> >> >> > ThinLTO infrastructure is in.
>> >>> >> >> >
>> >>> >> >> > The second and third implementation
stages will initially be
>> very
>> >>> >> >> > volatile, requiring a lot of
iterations and tuning with large
>> >>> >> >> > apps to
>> >>> >> >> > get stabilized. Therefore it will
be important to do fast
>> commits
>> >>> >> >> > for
>> >>> >> >> > these implementation stages.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 1. Stage 1: Preparation
>> >>> >> >> > -------------------------------
>> >>> >> >> >
>> >>> >> >> > The first planned sets of patches
are enablers for ThinLTO
>> work:
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. LTO directory structure:
>> >>> >> >> >
>> >>> >> >> > Restructure the LTO directory to
remove circular dependence
>> when
>> >>> >> >> > ThinLTO pass added. Because ThinLTO
is being implemented as a
>> SCC
>> >>> >> >> > pass
>> >>> >> >> > within Transforms/IPO, and
leverages the LTOModule class for
>> >>> >> >> > linking
>> >>> >> >> > in functions from modules, IPO then
requires the LTO library.
>> >>> >> >> > This
>> >>> >> >> > creates a circular dependence
between LTO and IPO. To break
>> that,
>> >>> >> >> > we
>> >>> >> >> > need to split the lib/LTO
directory/library into
>> lib/LTO/CodeGen
>> >>> >> >> > and
>> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
>> >>> >> >> > respectively. Only LTOCodeGenerator
has a dependence on IPO,
>> >>> >> >> > removing
>> >>> >> >> > the circular dependence.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. ELF wrapper generation support:
>> >>> >> >> >
>> >>> >> >> > Implement ELF wrapped bitcode
writer. In order to more easily
>> >>> >> >> > interact
>> >>> >> >> > with tools such as $AR, $NM, and
“$LD -r” we plan to emit the
>> >>> >> >> > phase-1
>> >>> >> >> > bitcode wrapped in ELF via the
.llvmbc section, along with a
>> >>> >> >> > symbol
>> >>> >> >> > table. The goal is both to interact
with these tools without
>> >>> >> >> > requiring
>> >>> >> >> > a plugin, and also to avoid doing
partial LTO/ThinLTO across
>> >>> >> >> > files
>> >>> >> >> > linked with “$LD -r” (i.e. the
resulting object file should
>> still
>> >>> >> >> > contain ELF-wrapped bitcode to
enable ThinLTO at the full link
>> >>> >> >> > step).
>> >>> >> >> > I will send a separate design
document for these changes, but
>> the
>> >>> >> >> > following is a high-level overview.
>> >>> >> >> >
>> >>> >> >> > Support was added to LLVM for
reading ELF-wrapped bitcode
>> >>> >> >> > (http://reviews.llvm.org/rL218078),
but there does not yet
>> exist
>> >>> >> >> > support in LLVM/Clang for emitting
bitcode wrapped in ELF. I
>> plan
>> >>> >> >> > to
>> >>> >> >> > add support for optionally
generating bitcode in an ELF file
>> >>> >> >> > containing a single .llvmbc section
holding the bitcode.
>> >>> >> >> > Specifically,
>> >>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object
>> file)
>> >>> >> >> > and
>> >>> >> >> > corresponding “emit-llvm-elf”
(textual assembly code
>> equivalent).
>> >>> >> >> > Eventually these would be
automatically triggered under
>> >>> >> >> > “-fthinlto
>> >>> >> >> > -c”
>> >>> >> >> > and “-fthinlto -S”, respectively.
>> >>> >> >> >
>> >>> >> >> > Additionally, a symbol table will
be generated in the ELF
>> file,
>> >>> >> >> > holding the function symbols within
the bitcode. This
>> facilitates
>> >>> >> >> > handling archives of the
ELF-wrapped bitcode created with $AR,
>> >>> >> >> > since
>> >>> >> >> > the archive will have a symbol
table as well. The archive
>> symbol
>> >>> >> >> > table
>> >>> >> >> > enables gold to extract and pass to
the plugin the constituent
>> >>> >> >> > ELF-wrapped bitcode files. To
support the concatenated llvmbc
>> >>> >> >> > section
>> >>> >> >> > generated by “$LD -r”, some
handling needs to be added to gold
>> >>> >> >> > and to
>> >>> >> >> > the backend driver to process each
original module’s bitcode.
>> >>> >> >> >
>> >>> >> >> > The function index/summary will
later be added as a special
>> ELF
>> >>> >> >> > section alongside the .llvmbc
sections.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure
>> >>> >> >> >
----------------------------------------------
>> >>> >> >> >
>> >>> >> >> > The next set of patches adds the
base implementation of the
>> >>> >> >> > ThinLTO
>> >>> >> >> > infrastructure, specifically those
required to make ThinLTO
>> >>> >> >> > functional
>> >>> >> >> > and generate correct but not
necessarily high-performing
>> >>> >> >> > binaries. It
>> >>> >> >> > also does not include support to
make debug support under -g
>> >>> >> >> > efficient
>> >>> >> >> > with ThinLTO.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. Clang/LLVM/gold linker options:
>> >>> >> >> >
>> >>> >> >> > An early set of clang/llvm patches
is needed to provide
>> options
>> >>> >> >> > to
>> >>> >> >> > enable ThinLTO (off by default), so
that the rest of the
>> >>> >> >> > implementation can be disabled by
default as it is added.
>> >>> >> >> > Specifically, clang options
-fthinlto (used instead of -flto)
>> >>> >> >> > will
>> >>> >> >> > cause clang to invoke the phase-1
emission of LLVM bitcode and
>> >>> >> >> > function summary/index on a compile
step, and pass the
>> >>> >> >> > appropriate
>> >>> >> >> > option to the gold plugin on a link
step. The -thinlto option
>> >>> >> >> > will be
>> >>> >> >> > added to the gold plugin and
llvm-lto tool to launch the
>> phase-2
>> >>> >> >> > thin
>> >>> >> >> > archive step. The -thinlto option
will also be added to the
>> ‘opt’
>> >>> >> >> > tool
>> >>> >> >> > to invoke it as a phase-3 parallel
backend instance.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. Thin-archive linking support in
Gold plugin and llvm-lto:
>> >>> >> >> >
>> >>> >> >> > Under the new plugin option (see
above), the plugin needs to
>> >>> >> >> > perform
>> >>> >> >> > the phase-2 (thin archive) link
which simply emits a combined
>> >>> >> >> > function
>> >>> >> >> > map from the linked modules,
without actually performing the
>> >>> >> >> > normal
>> >>> >> >> > link. Corresponding support should
be added to the standalone
>> >>> >> >> > llvm-lto
>> >>> >> >> > tool to enable testing/debugging
without involving the linker
>> and
>> >>> >> >> > plugin.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > c. ThinLTO backend support:
>> >>> >> >> >
>> >>> >> >> > Support for invoking a phase-3
backend invocation (including
>> >>> >> >> > importing) on a module should be
added to the ‘opt’ tool under
>> >>> >> >> > the
>> >>> >> >> > new
>> >>> >> >> > option. The main change under the
option is to instantiate a
>> >>> >> >> > Linker
>> >>> >> >> > object used to manage the process
of linking imported
>> functions
>> >>> >> >> > into
>> >>> >> >> > the module, efficient read of the
combined function map, and
>> >>> >> >> > enable
>> >>> >> >> > the ThinLTO import pass.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > d. Function index/summary support:
>> >>> >> >> >
>> >>> >> >> > This includes infrastructure for
writing and reading the
>> function
>> >>> >> >> > index/summary section. As noted
earlier this will be encoded
>> in a
>> >>> >> >> > special ELF section within the
module, alongside the .llvmbc
>> >>> >> >> > section
>> >>> >> >> > containing the bitcode. The thin
archive generated by phase-2
>> of
>> >>> >> >> > ThinLTO simply contains all of the
function index/summary
>> >>> >> >> > sections
>> >>> >> >> > across the linked modules,
organized for efficient function
>> >>> >> >> > lookup.
>> >>> >> >> >
>> >>> >> >> > Each function available for
importing from the module
>> contains an
>> >>> >> >> > entry in the module’s function
index/summary section and in
>> the
>> >>> >> >> > resulting combined function map.
Each function entry contains
>> >>> >> >> > that
>> >>> >> >> > function’s offset within the
bitcode file, used to efficiently
>> >>> >> >> > locate
>> >>> >> >> > and quickly import just that
function. The entry also contains
>> >>> >> >> > summary
>> >>> >> >> > information (e.g. basic information
determined during parsing
>> >>> >> >> > such as
>> >>> >> >> > the number of instructions in the
function), that will be
>> used to
>> >>> >> >> > help
>> >>> >> >> > guide later import decisions.
Because the contents of this
>> >>> >> >> > section
>> >>> >> >> > will change frequently during
ThinLTO tuning, it should also
>> be
>> >>> >> >> > marked
>> >>> >> >> > with a version id for backwards
compatibility or version
>> >>> >> >> > checking.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > e. ThinLTO importing support:
>> >>> >> >> >
>> >>> >> >> > Support for the mechanics of
importing functions from other
>> >>> >> >> > modules,
>> >>> >> >> > which can go in gradually as a set
of patches since it will be
>> >>> >> >> > off by
>> >>> >> >> > default. Separate patches can
include:
>> >>> >> >> >
>> >>> >> >> > - BitcodeReader changes to use
function index to
>> >>> >> >> > import/deserialize
>> >>> >> >> > single function of interest (small
changes, leverages existing
>> >>> >> >> > lazy
>> >>> >> >> > streamer support).
>> >>> >> >> >
>> >>> >> >> > - Minor LTOModule changes to pass
the ThinLTO function to
>> import
>> >>> >> >> > and
>> >>> >> >> > its index into bitcode reader.
>> >>> >> >> >
>> >>> >> >> > - Marking of imported functions
(for use in ThinLTO-specific
>> >>> >> >> > symbol
>> >>> >> >> > linking and global DCE, for
example). This can be in-memory
>> >>> >> >> > initially,
>> >>> >> >> > but IR support may be required in
order to support streaming
>> >>> >> >> > bitcode
>> >>> >> >> > out and back in again after
importing.
>> >>> >> >> >
>> >>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking
>> and
>> >>> >> >> > static promotion when necessary.
The linkage type of imported
>> >>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
>> >>> >> >> > Statics
>> >>> >> >> > must be promoted in certain cases,
and renamed in consistent
>> >>> >> >> > ways.
>> >>> >> >> >
>> >>> >> >> > - GlobalDCE changes to support
removing imported functions
>> that
>> >>> >> >> > were
>> >>> >> >> > not inlined (very small changes to
existing pass logic).
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > f. ThinLTO Import Driver SCC pass:
>> >>> >> >> >
>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp
with framework for doing
>> ThinLTO
>> >>> >> >> > via
>> >>> >> >> > an SCC pass, enabled only under
-fthinlto options. The pass
>> >>> >> >> > includes
>> >>> >> >> > utilizing the thin archive (global
function index/summary),
>> >>> >> >> > import
>> >>> >> >> > decision heuristics, invocation of
LTOModule/ModuleLinker
>> >>> >> >> > routines
>> >>> >> >> > that perform the import, and any
necessary callgraph updates
>> and
>> >>> >> >> > verification.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > g. Backend Driver:
>> >>> >> >> >
>> >>> >> >> > For a single node build, the gold
plugin can simply write a
>> >>> >> >> > makefile
>> >>> >> >> > and fork the parallel backend
instances directly via parallel
>> >>> >> >> > make.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and
Enhancements
>> >>> >> >> >
>> ----------------------------------------------------------------
>> >>> >> >> >
>> >>> >> >> > This refers to the patches that are
not required for ThinLTO
>> to
>> >>> >> >> > work,
>> >>> >> >> > but rather to improve compile time,
memory, run-time
>> performance
>> >>> >> >> > and
>> >>> >> >> > usability.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. Lazy Debug Metadata Linking:
>> >>> >> >> >
>> >>> >> >> > The prototype implementation
included lazy importing of
>> >>> >> >> > module-level
>> >>> >> >> > metadata during the ThinLTO pass
finalization (i.e. after all
>> >>> >> >> > function
>> >>> >> >> > importing is complete). This
actually applies to all
>> module-level
>> >>> >> >> > metadata, not just debug, although
it is the largest. This
>> can be
>> >>> >> >> > added as a separate set of patches.
Changes to BitcodeReader,
>> >>> >> >> > ValueMapper, ModuleLinker
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. Import Tuning:
>> >>> >> >> >
>> >>> >> >> > Tuning the import strategy will be
an iterative process that
>> will
>> >>> >> >> > continue to be refined over time.
It involves several
>> different
>> >>> >> >> > types
>> >>> >> >> > of changes: adding support for
recording additional metrics in
>> >>> >> >> > the
>> >>> >> >> > function summary, such as profile
data and optional
>> >>> >> >> > heavier-weight
>> >>> >> >> > IPA
>> >>> >> >> > analyses, and tuning the import
heuristics based on the
>> summary
>> >>> >> >> > and
>> >>> >> >> > callsite context.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > c. Combined Function Map Pruning:
>> >>> >> >> >
>> >>> >> >> > The combined function map can be
pruned of functions that are
>> >>> >> >> > unlikely
>> >>> >> >> > to benefit from being imported. For
example, during the
>> phase-2
>> >>> >> >> > thin
>> >>> >> >> > archive plug step we can safely
omit large and (with profile
>> >>> >> >> > data)
>> >>> >> >> > cold functions, which are unlikely
to benefit from being
>> inlined.
>> >>> >> >> > Additionally, all but one copy of
comdat functions can be
>> >>> >> >> > suppressed.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > d. Distributed Build System
Integration:
>> >>> >> >> >
>> >>> >> >> > For a distributed build system, the
gold plugin should write
>> the
>> >>> >> >> > parallel backend invocations into a
makefile, including the
>> >>> >> >> > mapping
>> >>> >> >> > from the IR file to the real object
file path, and exit.
>> >>> >> >> > Additional
>> >>> >> >> > work needs to be done in the
distributed build system itself
>> to
>> >>> >> >> > distribute and dispatch the
parallel backend jobs to the build
>> >>> >> >> > cluster.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > e. Dependence Tracking and
Incremental Compiles:
>> >>> >> >> >
>> >>> >> >> > In order to support build systems
that stage from local disks
>> or
>> >>> >> >> > network storage, the plugin will
optionally support
>> computation
>> >>> >> >> > of
>> >>> >> >> > dependent sets of IR files that
each module may import from.
>> This
>> >>> >> >> > can
>> >>> >> >> > be computed from profile data, if
it exists, or from the
>> symbol
>> >>> >> >> > table
>> >>> >> >> > and heuristics if not. These
dependence sets also enable
>> support
>> >>> >> >> > for
>> >>> >> >> > incremental backend compiles.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Teresa Johnson | Software Engineer
| tejohnson at google.com |
>> >>> >> >> > 408-460-2413
>> >>> >> >> >
>> >>> >> >> >
_______________________________________________
>> >>> >> >> > LLVM Developers mailing list
>> >>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>> >> >>
>> >>> >> >>
_______________________________________________
>> >>> >> >> LLVM Developers mailing list
>> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>> >>> >> 408-460-2413
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> LLVM Developers mailing list
>> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>> 408-460-2413
>> >>
>> >>
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/db632f0a/attachment.html>

Eric Christopher

2015-May-14 20:18 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com>
wrote:
> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>
>>
>>
>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
>>> wrote:
>>> > I'm not sure this is a particularly great assumption to
make.
>>>
>>> Which part?
>>>
>>
>> The binutils part :)
>>
>>
>>>
>>> >  We have to
>>> > support a lot of different build systems and tools and
concentrating on
>>> > something that just binutils uses isn't particularly
friendly here.
>>> I think you may have misunderstood
>>> His point was exactly that they want to be transparent to *all of*
these
>>> tools.
>>> You are saying "we should be friendly to everyone". He is
saying the
>>> same thing.
>>> We should be friendly to everyone. The friendly way to do this is
to
>>> not require all of these tools build plugins to handle bitcode.
>>>
>>> Hence, elf-wrapped bitcode.
>>>
>>
>> Oh, I understood. I just don't know that I agree. To do anything
with the
>> tools will require some knowledge of bitcode anyhow or need the plugin.
I'm
>> saying that as a baseline start we should look at how to do this using
the
>> tools we've got rather than wrapping things for no real gain.
>>
>
> That doesn't seem strictly true - the ar situation (which I'm lead
to
> believe is in use in our build system & others, one would assume). With
the
> symbol table included as proposed, ar can be used without any knowledge of
> the bitcode or need for a plugin.
>
>For some bits, sure. Optimizing for ar seems a bit silly, why not 'ld
-r'?
;)

> It'd be helpful to have the scenarios we're trying to support with
these
> tools & then weigh up the alternatives.
>
>
Agreed. The ar situation is interesting because one thing we discussed
after you wandered off was just adding a ToC section to bitcode as it is
and then having the tools handle that. Would seem to accomplish at least
the goals as I've seen them up to this point without worrying too much.

At any rate, I think this aspect of the proposal needs a bit of discussion
and some mapping out of the pros and cons here.

-eric

> I've talked to Teresa a bit offline and we're going to talk more
later
>> (and discuss on the list), but there are some discussions about how to
make
>> this work either with just bitcode/llvm tools and so not requiring
>> integration on all platforms. The latter is what I consider as
particularly
>> friendly :)
>>
>> -eric
>>
>>
>>>
>>>
>>> > I also
>>> > can't imagine how it's necessary for any of the lto
aspects as
>>> currently
>>> > written in the proposal.
>>> >
>>> > -eric
>>> >
>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <
>>> xinliangli at gmail.com>
>>> > wrote:
>>> >>
>>> >> The design objective is to make thinLTO mostly transparent
to binutil
>>> >> tools to enable easy integration with any build system in
the wild.
>>> >> 'Pass-through' mode with 'ld -r' instead
of the partial LTO mode is
>>> another
>>> >> reason.
>>> >>
>>> >> David
>>> >>
>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson
<tejohnson at google.com
>>> >
>>> >> wrote:
>>> >>>
>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher <
>>> echristo at gmail.com>
>>> >>> wrote:
>>> >>> > So, what Alex is saying is that we have these
tools as well and
>>> they
>>> >>> > understand bitcode just fine, as well as every
object format - not
>>> just
>>> >>> > ELF.
>>> >>> > :)
>>> >>>
>>> >>> Right, there are also LLVM specific versions (llvm-ar,
llvm-nm) that
>>> >>> handle bitcode similarly to the way the standard tool
+ plugin does.
>>> >>> But the goal we are trying to achieve is to allow the
standard system
>>> >>> versions of the tools to handle these files without
requiring a
>>> >>> plugin. I know the LLVM tool handles other object
formats, but I'm
>>> not
>>> >>> sure how that helps here? We're not planning to
replace those tools,
>>> >>> just allow the standard system versions to handle the
intermediate
>>> >>> objects produced by ThinLTO.
>>> >>>
>>> >>> Thanks,
>>> >>> Teresa
>>> >>>
>>> >>> >
>>> >>> > -eric
>>> >>> >
>>> >>> >
>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
<tejohnson at google.com
>>> >
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang
David Li
>>> >>> >> <xinliangli at gmail.com> wrote:
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex
Rosenberg
>>> >>> >> > <alexr at leftfield.org>
>>> >>> >> > wrote:
>>> >>> >> >>
>>> >>> >> >> "ELF-wrapped bitcode"
seems potentially controversial to me.
>>> >>> >> >>
>>> >>> >> >> What about ar, nm, and various ld
implementations adds this
>>> >>> >> >> requirement?
>>> >>> >> >> What about the LLVM implementations
of these tools is lacking?
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > Sorry I can not parse your questions
properly. Can you make it
>>> >>> >> > clearer?
>>> >>> >>
>>> >>> >> Alex is asking what the issue is with ar, nm,
ld -r and regular
>>> >>> >> bitcode that makes using elf-wrapped bitcode
easier.
>>> >>> >>
>>> >>> >> The issue is that generally you need to
provide a plugin to these
>>> >>> >> tools in order for them to understand and
handle bitcode files.
>>> We'd
>>> >>> >> like standard tools to work without requiring
a plugin as much as
>>> >>> >> possible. And in some cases we want them to
be handled different
>>> than
>>> >>> >> the way bitcode files are handled with the
plugin.
>>> >>> >>
>>> >>> >> nm: Without a plugin, normal bitcode files
are inscrutable. When
>>> >>> >> provided the gold plugin it can emit the
symbols.
>>> >>> >>
>>> >>> >> ar: Without a plugin, it will create an
archive of bitcode files,
>>> but
>>> >>> >> without an index, so it can't be handled
by the linker even with a
>>> >>> >> plugin on an -flto link. When ar is provided
the gold plugin it
>>> does
>>> >>> >> create an index, so the linker + gold plugin
handle it
>>> appropriately
>>> >>> >> on an -flto link.
>>> >>> >>
>>> >>> >> ld -r: Without a plugin, fails when provided
bitcode inputs. When
>>> >>> >> provided the gold plugin, it handles them but
compiles them all
>>> the
>>> >>> >> way through to ELF executable instructions
via a partial LTO link.
>>> >>> >> This is where we would like to differ in
behavior (while also not
>>> >>> >> requiring a plugin) with ELF-wrapped bitcode:
we would like the
>>> ld -r
>>> >>> >> output file to still contain ELF-wrapped
bitcode, delaying the LTO
>>> >>> >> until the full link step.
>>> >>> >>
>>> >>> >> Let me know if that helps address your
concerns.
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> Teresa
>>> >>> >>
>>> >>> >> >
>>> >>> >> > David
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> Alex
>>> >>> >> >>
>>> >>> >> >> > On May 13, 2015, at 7:44 PM,
Teresa Johnson
>>> >>> >> >> > <tejohnson at google.com>
>>> >>> >> >> > wrote:
>>> >>> >> >> >
>>> >>> >> >> > I've included below an RFC
for implementing ThinLTO in LLVM,
>>> >>> >> >> > looking
>>> >>> >> >> > forward to feedback and
questions.
>>> >>> >> >> > Thanks!
>>> >>> >> >> > Teresa
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > RFC to discuss plans for
implementing ThinLTO upstream.
>>> >>> >> >> > Background
>>> >>> >> >> > can
>>> >>> >> >> > be found in slides from
EuroLLVM 2015:
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>>> )
>>> >>> >> >> > As described in the talk, we
have a prototype
>>> implementation, and
>>> >>> >> >> > would like to start staging
patches upstream. This RFC
>>> describes
>>> >>> >> >> > a
>>> >>> >> >> > breakdown of the major pieces.
We would like to commit
>>> upstream
>>> >>> >> >> > gradually in several stages,
with all functionality off by
>>> >>> >> >> > default.
>>> >>> >> >> > The core ThinLTO importing
support and tuning will require
>>> >>> >> >> > frequent
>>> >>> >> >> > change and iteration during
testing and tuning, and for that
>>> part
>>> >>> >> >> > we
>>> >>> >> >> > would like to commit rapidly
(off by default). See the
>>> proposed
>>> >>> >> >> > staged
>>> >>> >> >> > implementation described in the
Implementation Plan section.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > ThinLTO Overview
>>> >>> >> >> > =============>>>
>>> >> >> >
>>> >>> >> >> > See the talk slides linked
above for more details. The
>>> following
>>> >>> >> >> > is a
>>> >>> >> >> > high-level overview of the
motivation.
>>> >>> >> >> >
>>> >>> >> >> > Cross Module Optimization (CMO)
is an effective means for
>>> >>> >> >> > improving
>>> >>> >> >> > runtime performance, by
extending the scope of optimizations
>>> >>> >> >> > across
>>> >>> >> >> > source module boundaries.
Without CMO, the compiler is
>>> limited to
>>> >>> >> >> > optimizing within the scope of
single source modules. Two
>>> >>> >> >> > solutions
>>> >>> >> >> > for enabling CMO are Link-Time
Optimization (LTO), which is
>>> >>> >> >> > currently
>>> >>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
>>> >>> >> >> > Optimization (LIPO). However,
each of these solutions has
>>> >>> >> >> > limitations
>>> >>> >> >> > that prevent it from being
enabled by default. ThinLTO is a
>>> new
>>> >>> >> >> > approach that attempts to
address these limitations, with a
>>> goal
>>> >>> >> >> > of
>>> >>> >> >> > being enabled more broadly.
ThinLTO is designed with many of
>>> the
>>> >>> >> >> > same
>>> >>> >> >> > principals as LIPO, and
therefore its advantages, without
>>> any of
>>> >>> >> >> > its
>>> >>> >> >> > inherent weakness. Unlike in
LIPO where the module group
>>> decision
>>> >>> >> >> > is
>>> >>> >> >> > made at profile training
runtime, ThinLTO makes the decision
>>> at
>>> >>> >> >> > compile time, but in a lazy
mode that facilitates large scale
>>> >>> >> >> > parallelism. The serial linker
plugin phase is designed to be
>>> >>> >> >> > razor
>>> >>> >> >> > thin and blazingly fast. By
default this step only does
>>> minimal
>>> >>> >> >> > preparation work to enable the
parallel lazy importing
>>> performed
>>> >>> >> >> > later. ThinLTO aims to be
scalable like a regular O2 build,
>>> >>> >> >> > enabling
>>> >>> >> >> > CMO on machines without large
memory configurations, while
>>> also
>>> >>> >> >> > integrating well with
distributed build systems. Results from
>>> >>> >> >> > early
>>> >>> >> >> > prototyping on SPEC cpu2006 C++
benchmarks are in line with
>>> >>> >> >> > expectations that ThinLTO can
scale like O2 while enabling
>>> much
>>> >>> >> >> > of
>>> >>> >> >> > the
>>> >>> >> >> > CMO performed during a full LTO
build.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > A ThinLTO build is divided into
3 phases, which are referred
>>> to
>>> >>> >> >> > in
>>> >>> >> >> > the
>>> >>> >> >> > following implementation plan:
>>> >>> >> >> >
>>> >>> >> >> > phase-1: IR and Function
Summary Generation (-c compile)
>>> >>> >> >> > phase-2: Thin Linker Plugin
Layer (thin archive linker step)
>>> >>> >> >> > phase-3: Parallel Backend with
Demand-Driven Importing
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > Implementation Plan
>>> >>> >> >> > ===============>>>
>>> >> >> >
>>> >>> >> >> > This section gives a high-level
breakdown of the ThinLTO
>>> support
>>> >>> >> >> > that
>>> >>> >> >> > will be added, in roughly the
order that the patches would be
>>> >>> >> >> > staged.
>>> >>> >> >> > The patches are divided into
three stages. The first stage
>>> >>> >> >> > contains a
>>> >>> >> >> > minimal amount of preparation
work that is not
>>> ThinLTO-specific.
>>> >>> >> >> > The
>>> >>> >> >> > second stage contains most of
the infrastructure for ThinLTO,
>>> >>> >> >> > which
>>> >>> >> >> > will be off by default. The
third stage includes
>>> >>> >> >> >
enhancements/improvements/tunings that can be performed
>>> after the
>>> >>> >> >> > main
>>> >>> >> >> > ThinLTO infrastructure is in.
>>> >>> >> >> >
>>> >>> >> >> > The second and third
implementation stages will initially be
>>> very
>>> >>> >> >> > volatile, requiring a lot of
iterations and tuning with large
>>> >>> >> >> > apps to
>>> >>> >> >> > get stabilized. Therefore it
will be important to do fast
>>> commits
>>> >>> >> >> > for
>>> >>> >> >> > these implementation stages.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > 1. Stage 1: Preparation
>>> >>> >> >> > -------------------------------
>>> >>> >> >> >
>>> >>> >> >> > The first planned sets of
patches are enablers for ThinLTO
>>> work:
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > a. LTO directory structure:
>>> >>> >> >> >
>>> >>> >> >> > Restructure the LTO directory
to remove circular dependence
>>> when
>>> >>> >> >> > ThinLTO pass added. Because
ThinLTO is being implemented as
>>> a SCC
>>> >>> >> >> > pass
>>> >>> >> >> > within Transforms/IPO, and
leverages the LTOModule class for
>>> >>> >> >> > linking
>>> >>> >> >> > in functions from modules, IPO
then requires the LTO library.
>>> >>> >> >> > This
>>> >>> >> >> > creates a circular dependence
between LTO and IPO. To break
>>> that,
>>> >>> >> >> > we
>>> >>> >> >> > need to split the lib/LTO
directory/library into
>>> lib/LTO/CodeGen
>>> >>> >> >> > and
>>> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
>>> >>> >> >> > respectively. Only
LTOCodeGenerator has a dependence on IPO,
>>> >>> >> >> > removing
>>> >>> >> >> > the circular dependence.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > b. ELF wrapper generation
support:
>>> >>> >> >> >
>>> >>> >> >> > Implement ELF wrapped bitcode
writer. In order to more easily
>>> >>> >> >> > interact
>>> >>> >> >> > with tools such as $AR, $NM,
and “$LD -r” we plan to emit the
>>> >>> >> >> > phase-1
>>> >>> >> >> > bitcode wrapped in ELF via the
.llvmbc section, along with a
>>> >>> >> >> > symbol
>>> >>> >> >> > table. The goal is both to
interact with these tools without
>>> >>> >> >> > requiring
>>> >>> >> >> > a plugin, and also to avoid
doing partial LTO/ThinLTO across
>>> >>> >> >> > files
>>> >>> >> >> > linked with “$LD -r” (i.e. the
resulting object file should
>>> still
>>> >>> >> >> > contain ELF-wrapped bitcode to
enable ThinLTO at the full
>>> link
>>> >>> >> >> > step).
>>> >>> >> >> > I will send a separate design
document for these changes,
>>> but the
>>> >>> >> >> > following is a high-level
overview.
>>> >>> >> >> >
>>> >>> >> >> > Support was added to LLVM for
reading ELF-wrapped bitcode
>>> >>> >> >> >
(http://reviews.llvm.org/rL218078), but there does not yet
>>> exist
>>> >>> >> >> > support in LLVM/Clang for
emitting bitcode wrapped in ELF. I
>>> plan
>>> >>> >> >> > to
>>> >>> >> >> > add support for optionally
generating bitcode in an ELF file
>>> >>> >> >> > containing a single .llvmbc
section holding the bitcode.
>>> >>> >> >> > Specifically,
>>> >>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object
>>> file)
>>> >>> >> >> > and
>>> >>> >> >> > corresponding “emit-llvm-elf”
(textual assembly code
>>> equivalent).
>>> >>> >> >> > Eventually these would be
automatically triggered under
>>> >>> >> >> > “-fthinlto
>>> >>> >> >> > -c”
>>> >>> >> >> > and “-fthinlto -S”,
respectively.
>>> >>> >> >> >
>>> >>> >> >> > Additionally, a symbol table
will be generated in the ELF
>>> file,
>>> >>> >> >> > holding the function symbols
within the bitcode. This
>>> facilitates
>>> >>> >> >> > handling archives of the
ELF-wrapped bitcode created with
>>> $AR,
>>> >>> >> >> > since
>>> >>> >> >> > the archive will have a symbol
table as well. The archive
>>> symbol
>>> >>> >> >> > table
>>> >>> >> >> > enables gold to extract and
pass to the plugin the
>>> constituent
>>> >>> >> >> > ELF-wrapped bitcode files. To
support the concatenated llvmbc
>>> >>> >> >> > section
>>> >>> >> >> > generated by “$LD -r”, some
handling needs to be added to
>>> gold
>>> >>> >> >> > and to
>>> >>> >> >> > the backend driver to process
each original module’s bitcode.
>>> >>> >> >> >
>>> >>> >> >> > The function index/summary will
later be added as a special
>>> ELF
>>> >>> >> >> > section alongside the .llvmbc
sections.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > 2. Stage 2: ThinLTO
Infrastructure
>>> >>> >> >> >
----------------------------------------------
>>> >>> >> >> >
>>> >>> >> >> > The next set of patches adds
the base implementation of the
>>> >>> >> >> > ThinLTO
>>> >>> >> >> > infrastructure, specifically
those required to make ThinLTO
>>> >>> >> >> > functional
>>> >>> >> >> > and generate correct but not
necessarily high-performing
>>> >>> >> >> > binaries. It
>>> >>> >> >> > also does not include support
to make debug support under -g
>>> >>> >> >> > efficient
>>> >>> >> >> > with ThinLTO.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > a. Clang/LLVM/gold linker
options:
>>> >>> >> >> >
>>> >>> >> >> > An early set of clang/llvm
patches is needed to provide
>>> options
>>> >>> >> >> > to
>>> >>> >> >> > enable ThinLTO (off by
default), so that the rest of the
>>> >>> >> >> > implementation can be disabled
by default as it is added.
>>> >>> >> >> > Specifically, clang options
-fthinlto (used instead of -flto)
>>> >>> >> >> > will
>>> >>> >> >> > cause clang to invoke the
phase-1 emission of LLVM bitcode
>>> and
>>> >>> >> >> > function summary/index on a
compile step, and pass the
>>> >>> >> >> > appropriate
>>> >>> >> >> > option to the gold plugin on a
link step. The -thinlto option
>>> >>> >> >> > will be
>>> >>> >> >> > added to the gold plugin and
llvm-lto tool to launch the
>>> phase-2
>>> >>> >> >> > thin
>>> >>> >> >> > archive step. The -thinlto
option will also be added to the
>>> ‘opt’
>>> >>> >> >> > tool
>>> >>> >> >> > to invoke it as a phase-3
parallel backend instance.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > b. Thin-archive linking support
in Gold plugin and llvm-lto:
>>> >>> >> >> >
>>> >>> >> >> > Under the new plugin option
(see above), the plugin needs to
>>> >>> >> >> > perform
>>> >>> >> >> > the phase-2 (thin archive) link
which simply emits a combined
>>> >>> >> >> > function
>>> >>> >> >> > map from the linked modules,
without actually performing the
>>> >>> >> >> > normal
>>> >>> >> >> > link. Corresponding support
should be added to the standalone
>>> >>> >> >> > llvm-lto
>>> >>> >> >> > tool to enable
testing/debugging without involving the
>>> linker and
>>> >>> >> >> > plugin.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > c. ThinLTO backend support:
>>> >>> >> >> >
>>> >>> >> >> > Support for invoking a phase-3
backend invocation (including
>>> >>> >> >> > importing) on a module should
be added to the ‘opt’ tool
>>> under
>>> >>> >> >> > the
>>> >>> >> >> > new
>>> >>> >> >> > option. The main change under
the option is to instantiate a
>>> >>> >> >> > Linker
>>> >>> >> >> > object used to manage the
process of linking imported
>>> functions
>>> >>> >> >> > into
>>> >>> >> >> > the module, efficient read of
the combined function map, and
>>> >>> >> >> > enable
>>> >>> >> >> > the ThinLTO import pass.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > d. Function index/summary
support:
>>> >>> >> >> >
>>> >>> >> >> > This includes infrastructure
for writing and reading the
>>> function
>>> >>> >> >> > index/summary section. As noted
earlier this will be encoded
>>> in a
>>> >>> >> >> > special ELF section within the
module, alongside the .llvmbc
>>> >>> >> >> > section
>>> >>> >> >> > containing the bitcode. The
thin archive generated by
>>> phase-2 of
>>> >>> >> >> > ThinLTO simply contains all of
the function index/summary
>>> >>> >> >> > sections
>>> >>> >> >> > across the linked modules,
organized for efficient function
>>> >>> >> >> > lookup.
>>> >>> >> >> >
>>> >>> >> >> > Each function available for
importing from the module
>>> contains an
>>> >>> >> >> > entry in the module’s function
index/summary section and in
>>> the
>>> >>> >> >> > resulting combined function
map. Each function entry contains
>>> >>> >> >> > that
>>> >>> >> >> > function’s offset within the
bitcode file, used to
>>> efficiently
>>> >>> >> >> > locate
>>> >>> >> >> > and quickly import just that
function. The entry also
>>> contains
>>> >>> >> >> > summary
>>> >>> >> >> > information (e.g. basic
information determined during parsing
>>> >>> >> >> > such as
>>> >>> >> >> > the number of instructions in
the function), that will be
>>> used to
>>> >>> >> >> > help
>>> >>> >> >> > guide later import decisions.
Because the contents of this
>>> >>> >> >> > section
>>> >>> >> >> > will change frequently during
ThinLTO tuning, it should also
>>> be
>>> >>> >> >> > marked
>>> >>> >> >> > with a version id for backwards
compatibility or version
>>> >>> >> >> > checking.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > e. ThinLTO importing support:
>>> >>> >> >> >
>>> >>> >> >> > Support for the mechanics of
importing functions from other
>>> >>> >> >> > modules,
>>> >>> >> >> > which can go in gradually as a
set of patches since it will
>>> be
>>> >>> >> >> > off by
>>> >>> >> >> > default. Separate patches can
include:
>>> >>> >> >> >
>>> >>> >> >> > - BitcodeReader changes to use
function index to
>>> >>> >> >> > import/deserialize
>>> >>> >> >> > single function of interest
(small changes, leverages
>>> existing
>>> >>> >> >> > lazy
>>> >>> >> >> > streamer support).
>>> >>> >> >> >
>>> >>> >> >> > - Minor LTOModule changes to
pass the ThinLTO function to
>>> import
>>> >>> >> >> > and
>>> >>> >> >> > its index into bitcode reader.
>>> >>> >> >> >
>>> >>> >> >> > - Marking of imported functions
(for use in ThinLTO-specific
>>> >>> >> >> > symbol
>>> >>> >> >> > linking and global DCE, for
example). This can be in-memory
>>> >>> >> >> > initially,
>>> >>> >> >> > but IR support may be required
in order to support streaming
>>> >>> >> >> > bitcode
>>> >>> >> >> > out and back in again after
importing.
>>> >>> >> >> >
>>> >>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking
>>> and
>>> >>> >> >> > static promotion when
necessary. The linkage type of imported
>>> >>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
>>> >>> >> >> > Statics
>>> >>> >> >> > must be promoted in certain
cases, and renamed in consistent
>>> >>> >> >> > ways.
>>> >>> >> >> >
>>> >>> >> >> > - GlobalDCE changes to support
removing imported functions
>>> that
>>> >>> >> >> > were
>>> >>> >> >> > not inlined (very small changes
to existing pass logic).
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > f. ThinLTO Import Driver SCC
pass:
>>> >>> >> >> >
>>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp
with framework for doing
>>> ThinLTO
>>> >>> >> >> > via
>>> >>> >> >> > an SCC pass, enabled only under
-fthinlto options. The pass
>>> >>> >> >> > includes
>>> >>> >> >> > utilizing the thin archive
(global function index/summary),
>>> >>> >> >> > import
>>> >>> >> >> > decision heuristics, invocation
of LTOModule/ModuleLinker
>>> >>> >> >> > routines
>>> >>> >> >> > that perform the import, and
any necessary callgraph updates
>>> and
>>> >>> >> >> > verification.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > g. Backend Driver:
>>> >>> >> >> >
>>> >>> >> >> > For a single node build, the
gold plugin can simply write a
>>> >>> >> >> > makefile
>>> >>> >> >> > and fork the parallel backend
instances directly via parallel
>>> >>> >> >> > make.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and
Enhancements
>>> >>> >> >> >
>>> ----------------------------------------------------------------
>>> >>> >> >> >
>>> >>> >> >> > This refers to the patches that
are not required for ThinLTO
>>> to
>>> >>> >> >> > work,
>>> >>> >> >> > but rather to improve compile
time, memory, run-time
>>> performance
>>> >>> >> >> > and
>>> >>> >> >> > usability.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > a. Lazy Debug Metadata Linking:
>>> >>> >> >> >
>>> >>> >> >> > The prototype implementation
included lazy importing of
>>> >>> >> >> > module-level
>>> >>> >> >> > metadata during the ThinLTO
pass finalization (i.e. after all
>>> >>> >> >> > function
>>> >>> >> >> > importing is complete). This
actually applies to all
>>> module-level
>>> >>> >> >> > metadata, not just debug,
although it is the largest. This
>>> can be
>>> >>> >> >> > added as a separate set of
patches. Changes to BitcodeReader,
>>> >>> >> >> > ValueMapper, ModuleLinker
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > b. Import Tuning:
>>> >>> >> >> >
>>> >>> >> >> > Tuning the import strategy will
be an iterative process that
>>> will
>>> >>> >> >> > continue to be refined over
time. It involves several
>>> different
>>> >>> >> >> > types
>>> >>> >> >> > of changes: adding support for
recording additional metrics
>>> in
>>> >>> >> >> > the
>>> >>> >> >> > function summary, such as
profile data and optional
>>> >>> >> >> > heavier-weight
>>> >>> >> >> > IPA
>>> >>> >> >> > analyses, and tuning the import
heuristics based on the
>>> summary
>>> >>> >> >> > and
>>> >>> >> >> > callsite context.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > c. Combined Function Map
Pruning:
>>> >>> >> >> >
>>> >>> >> >> > The combined function map can
be pruned of functions that are
>>> >>> >> >> > unlikely
>>> >>> >> >> > to benefit from being imported.
For example, during the
>>> phase-2
>>> >>> >> >> > thin
>>> >>> >> >> > archive plug step we can safely
omit large and (with profile
>>> >>> >> >> > data)
>>> >>> >> >> > cold functions, which are
unlikely to benefit from being
>>> inlined.
>>> >>> >> >> > Additionally, all but one copy
of comdat functions can be
>>> >>> >> >> > suppressed.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > d. Distributed Build System
Integration:
>>> >>> >> >> >
>>> >>> >> >> > For a distributed build system,
the gold plugin should write
>>> the
>>> >>> >> >> > parallel backend invocations
into a makefile, including the
>>> >>> >> >> > mapping
>>> >>> >> >> > from the IR file to the real
object file path, and exit.
>>> >>> >> >> > Additional
>>> >>> >> >> > work needs to be done in the
distributed build system itself
>>> to
>>> >>> >> >> > distribute and dispatch the
parallel backend jobs to the
>>> build
>>> >>> >> >> > cluster.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > e. Dependence Tracking and
Incremental Compiles:
>>> >>> >> >> >
>>> >>> >> >> > In order to support build
systems that stage from local
>>> disks or
>>> >>> >> >> > network storage, the plugin
will optionally support
>>> computation
>>> >>> >> >> > of
>>> >>> >> >> > dependent sets of IR files that
each module may import from.
>>> This
>>> >>> >> >> > can
>>> >>> >> >> > be computed from profile data,
if it exists, or from the
>>> symbol
>>> >>> >> >> > table
>>> >>> >> >> > and heuristics if not. These
dependence sets also enable
>>> support
>>> >>> >> >> > for
>>> >>> >> >> > incremental backend compiles.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > --
>>> >>> >> >> > Teresa Johnson | Software
Engineer | tejohnson at google.com |
>>> >>> >> >> > 408-460-2413
>>> >>> >> >> >
>>> >>> >> >> >
_______________________________________________
>>> >>> >> >> > LLVM Developers mailing list
>>> >>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>> >> >>
>>> >>> >> >>
_______________________________________________
>>> >>> >> >> LLVM Developers mailing list
>>> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Teresa Johnson | Software Engineer |
tejohnson at google.com |
>>> >>> >> 408-460-2413
>>> >>> >>
>>> >>> >>
_______________________________________________
>>> >>> >> LLVM Developers mailing list
>>> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >>> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>>> 408-460-2413
>>> >>
>>> >>
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/fecff7ed/attachment.html>

Teresa Johnson

2015-May-14 20:35 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at gmail.com>
wrote:>
>
> On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com>
wrote:
>>
>> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>>>
>>>
>>>
>>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at
dberlin.org>
>>> wrote:
>>>>
>>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo
at gmail.com>
>>>> wrote:
>>>> > I'm not sure this is a particularly great assumption
to make.
>>>>
>>>> Which part?
>>>
>>>
>>> The binutils part :)
>>>
>>>>
>>>>
>>>> >  We have to
>>>> > support a lot of different build systems and tools and
concentrating
>>>> > on
>>>> > something that just binutils uses isn't particularly
friendly here.
>>>> I think you may have misunderstood
>>>> His point was exactly that they want to be transparent to *all
of* these
>>>> tools.
>>>> You are saying "we should be friendly to everyone".
He is saying the
>>>> same thing.
>>>> We should be friendly to everyone. The friendly way to do this
is to
>>>> not require all of these tools build plugins to handle bitcode.
>>>>
>>>> Hence, elf-wrapped bitcode.
>>>
>>>
>>> Oh, I understood. I just don't know that I agree. To do
anything with the
>>> tools will require some knowledge of bitcode anyhow or need the
plugin. I'm
>>> saying that as a baseline start we should look at how to do this
using the
>>> tools we've got rather than wrapping things for no real gain.
>>
>>
>> That doesn't seem strictly true - the ar situation (which I'm
lead to
>> believe is in use in our build system & others, one would assume).
With the
>> symbol table included as proposed, ar can be used without any knowledge
of
>> the bitcode or need for a plugin.
>>
>
> For some bits, sure. Optimizing for ar seems a bit silly, why not 'ld
-r'?
But as mentioned, ld -r can work on native object wrapped bitcode
without a plugin as well.
> ;)
>
>>
>> It'd be helpful to have the scenarios we're trying to support
with these
>> tools & then weigh up the alternatives.
>>
>
>
> Agreed. The ar situation is interesting because one thing we discussed
after
> you wandered off was just adding a ToC section to bitcode as it is and then
> having the tools handle that. Would seem to accomplish at least the goals
as
> I've seen them up to this point without worrying too much.
The ToC section is a way we can encode the function index/summary into
bitcode, but won't help integrate with existing tools. The main issue
we are trying to solve is integrating transparently with existing
binutils tools in use in our build system and probably elsewhere.
>
> At any rate, I think this aspect of the proposal needs a bit of discussion
> and some mapping out of the pros and cons here.
Sure, we can continue to discuss and I will try to lay out the pros/cons.

Teresa
>
> -eric
>
>>>
>>> I've talked to Teresa a bit offline and we're going to talk
more later
>>> (and discuss on the list), but there are some discussions about how
to make
>>> this work either with just bitcode/llvm tools and so not requiring
>>> integration on all platforms. The latter is what I consider as
particularly
>>> friendly :)
>>>
>>> -eric
>>>
>>>>
>>>>
>>>>
>>>> > I also
>>>> > can't imagine how it's necessary for any of the
lto aspects as
>>>> > currently
>>>> > written in the proposal.
>>>> >
>>>> > -eric
>>>> >
>>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li
>>>> > <xinliangli at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> The design objective is to make thinLTO mostly
transparent to binutil
>>>> >> tools to enable easy integration with any build system
in the wild.
>>>> >> 'Pass-through' mode with 'ld -r'
instead of the partial LTO mode is
>>>> >> another
>>>> >> reason.
>>>> >>
>>>> >> David
>>>> >>
>>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson
>>>> >> <tejohnson at google.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher
>>>> >>> <echristo at gmail.com>
>>>> >>> wrote:
>>>> >>> > So, what Alex is saying is that we have these
tools as well and
>>>> >>> > they
>>>> >>> > understand bitcode just fine, as well as
every object format - not
>>>> >>> > just
>>>> >>> > ELF.
>>>> >>> > :)
>>>> >>>
>>>> >>> Right, there are also LLVM specific versions
(llvm-ar, llvm-nm) that
>>>> >>> handle bitcode similarly to the way the standard
tool + plugin does.
>>>> >>> But the goal we are trying to achieve is to allow
the standard
>>>> >>> system
>>>> >>> versions of the tools to handle these files
without requiring a
>>>> >>> plugin. I know the LLVM tool handles other object
formats, but I'm
>>>> >>> not
>>>> >>> sure how that helps here? We're not planning
to replace those tools,
>>>> >>> just allow the standard system versions to handle
the intermediate
>>>> >>> objects produced by ThinLTO.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Teresa
>>>> >>>
>>>> >>> >
>>>> >>> > -eric
>>>> >>> >
>>>> >>> >
>>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
>>>> >>> > <tejohnson at google.com>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> On Wed, May 13, 2015 at 11:23 PM,
Xinliang David Li
>>>> >>> >> <xinliangli at gmail.com> wrote:
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM,
Alex Rosenberg
>>>> >>> >> > <alexr at leftfield.org>
>>>> >>> >> > wrote:
>>>> >>> >> >>
>>>> >>> >> >> "ELF-wrapped bitcode"
seems potentially controversial to me.
>>>> >>> >> >>
>>>> >>> >> >> What about ar, nm, and various
ld implementations adds this
>>>> >>> >> >> requirement?
>>>> >>> >> >> What about the LLVM
implementations of these tools is lacking?
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > Sorry I can not parse your questions
properly. Can you make it
>>>> >>> >> > clearer?
>>>> >>> >>
>>>> >>> >> Alex is asking what the issue is with ar,
nm, ld -r and regular
>>>> >>> >> bitcode that makes using elf-wrapped
bitcode easier.
>>>> >>> >>
>>>> >>> >> The issue is that generally you need to
provide a plugin to these
>>>> >>> >> tools in order for them to understand and
handle bitcode files.
>>>> >>> >> We'd
>>>> >>> >> like standard tools to work without
requiring a plugin as much as
>>>> >>> >> possible. And in some cases we want them
to be handled different
>>>> >>> >> than
>>>> >>> >> the way bitcode files are handled with
the plugin.
>>>> >>> >>
>>>> >>> >> nm: Without a plugin, normal bitcode
files are inscrutable. When
>>>> >>> >> provided the gold plugin it can emit the
symbols.
>>>> >>> >>
>>>> >>> >> ar: Without a plugin, it will create an
archive of bitcode files,
>>>> >>> >> but
>>>> >>> >> without an index, so it can't be
handled by the linker even with
>>>> >>> >> a
>>>> >>> >> plugin on an -flto link. When ar is
provided the gold plugin it
>>>> >>> >> does
>>>> >>> >> create an index, so the linker + gold
plugin handle it
>>>> >>> >> appropriately
>>>> >>> >> on an -flto link.
>>>> >>> >>
>>>> >>> >> ld -r: Without a plugin, fails when
provided bitcode inputs. When
>>>> >>> >> provided the gold plugin, it handles them
but compiles them all
>>>> >>> >> the
>>>> >>> >> way through to ELF executable
instructions via a partial LTO
>>>> >>> >> link.
>>>> >>> >> This is where we would like to differ in
behavior (while also not
>>>> >>> >> requiring a plugin) with ELF-wrapped
bitcode: we would like the
>>>> >>> >> ld -r
>>>> >>> >> output file to still contain ELF-wrapped
bitcode, delaying the
>>>> >>> >> LTO
>>>> >>> >> until the full link step.
>>>> >>> >>
>>>> >>> >> Let me know if that helps address your
concerns.
>>>> >>> >>
>>>> >>> >> Thanks,
>>>> >>> >> Teresa
>>>> >>> >>
>>>> >>> >> >
>>>> >>> >> > David
>>>> >>> >> >
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >> Alex
>>>> >>> >> >>
>>>> >>> >> >> > On May 13, 2015, at 7:44
PM, Teresa Johnson
>>>> >>> >> >> > <tejohnson at
google.com>
>>>> >>> >> >> > wrote:
>>>> >>> >> >> >
>>>> >>> >> >> > I've included below an
RFC for implementing ThinLTO in LLVM,
>>>> >>> >> >> > looking
>>>> >>> >> >> > forward to feedback and
questions.
>>>> >>> >> >> > Thanks!
>>>> >>> >> >> > Teresa
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > RFC to discuss plans for
implementing ThinLTO upstream.
>>>> >>> >> >> > Background
>>>> >>> >> >> > can
>>>> >>> >> >> > be found in slides from
EuroLLVM 2015:
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> >
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>>>> >>> >> >> > As described in the talk,
we have a prototype
>>>> >>> >> >> > implementation, and
>>>> >>> >> >> > would like to start staging
patches upstream. This RFC
>>>> >>> >> >> > describes
>>>> >>> >> >> > a
>>>> >>> >> >> > breakdown of the major
pieces. We would like to commit
>>>> >>> >> >> > upstream
>>>> >>> >> >> > gradually in several
stages, with all functionality off by
>>>> >>> >> >> > default.
>>>> >>> >> >> > The core ThinLTO importing
support and tuning will require
>>>> >>> >> >> > frequent
>>>> >>> >> >> > change and iteration during
testing and tuning, and for that
>>>> >>> >> >> > part
>>>> >>> >> >> > we
>>>> >>> >> >> > would like to commit
rapidly (off by default). See the
>>>> >>> >> >> > proposed
>>>> >>> >> >> > staged
>>>> >>> >> >> > implementation described in
the Implementation Plan section.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > ThinLTO Overview
>>>> >>> >> >> >
=============>>>> >>> >> >> >
>>>> >>> >> >> > See the talk slides linked
above for more details. The
>>>> >>> >> >> > following
>>>> >>> >> >> > is a
>>>> >>> >> >> > high-level overview of the
motivation.
>>>> >>> >> >> >
>>>> >>> >> >> > Cross Module Optimization
(CMO) is an effective means for
>>>> >>> >> >> > improving
>>>> >>> >> >> > runtime performance, by
extending the scope of optimizations
>>>> >>> >> >> > across
>>>> >>> >> >> > source module boundaries.
Without CMO, the compiler is
>>>> >>> >> >> > limited to
>>>> >>> >> >> > optimizing within the scope
of single source modules. Two
>>>> >>> >> >> > solutions
>>>> >>> >> >> > for enabling CMO are
Link-Time Optimization (LTO), which is
>>>> >>> >> >> > currently
>>>> >>> >> >> > supported in LLVM and GCC,
and Lightweight-Interprocedural
>>>> >>> >> >> > Optimization (LIPO).
However, each of these solutions has
>>>> >>> >> >> > limitations
>>>> >>> >> >> > that prevent it from being
enabled by default. ThinLTO is a
>>>> >>> >> >> > new
>>>> >>> >> >> > approach that attempts to
address these limitations, with a
>>>> >>> >> >> > goal
>>>> >>> >> >> > of
>>>> >>> >> >> > being enabled more broadly.
ThinLTO is designed with many of
>>>> >>> >> >> > the
>>>> >>> >> >> > same
>>>> >>> >> >> > principals as LIPO, and
therefore its advantages, without
>>>> >>> >> >> > any of
>>>> >>> >> >> > its
>>>> >>> >> >> > inherent weakness. Unlike
in LIPO where the module group
>>>> >>> >> >> > decision
>>>> >>> >> >> > is
>>>> >>> >> >> > made at profile training
runtime, ThinLTO makes the decision
>>>> >>> >> >> > at
>>>> >>> >> >> > compile time, but in a lazy
mode that facilitates large
>>>> >>> >> >> > scale
>>>> >>> >> >> > parallelism. The serial
linker plugin phase is designed to
>>>> >>> >> >> > be
>>>> >>> >> >> > razor
>>>> >>> >> >> > thin and blazingly fast. By
default this step only does
>>>> >>> >> >> > minimal
>>>> >>> >> >> > preparation work to enable
the parallel lazy importing
>>>> >>> >> >> > performed
>>>> >>> >> >> > later. ThinLTO aims to be
scalable like a regular O2 build,
>>>> >>> >> >> > enabling
>>>> >>> >> >> > CMO on machines without
large memory configurations, while
>>>> >>> >> >> > also
>>>> >>> >> >> > integrating well with
distributed build systems. Results
>>>> >>> >> >> > from
>>>> >>> >> >> > early
>>>> >>> >> >> > prototyping on SPEC cpu2006
C++ benchmarks are in line with
>>>> >>> >> >> > expectations that ThinLTO
can scale like O2 while enabling
>>>> >>> >> >> > much
>>>> >>> >> >> > of
>>>> >>> >> >> > the
>>>> >>> >> >> > CMO performed during a full
LTO build.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > A ThinLTO build is divided
into 3 phases, which are referred
>>>> >>> >> >> > to
>>>> >>> >> >> > in
>>>> >>> >> >> > the
>>>> >>> >> >> > following implementation
plan:
>>>> >>> >> >> >
>>>> >>> >> >> > phase-1: IR and Function
Summary Generation (-c compile)
>>>> >>> >> >> > phase-2: Thin Linker Plugin
Layer (thin archive linker step)
>>>> >>> >> >> > phase-3: Parallel Backend
with Demand-Driven Importing
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > Implementation Plan
>>>> >>> >> >> >
===============>>>> >>> >> >> >
>>>> >>> >> >> > This section gives a
high-level breakdown of the ThinLTO
>>>> >>> >> >> > support
>>>> >>> >> >> > that
>>>> >>> >> >> > will be added, in roughly
the order that the patches would
>>>> >>> >> >> > be
>>>> >>> >> >> > staged.
>>>> >>> >> >> > The patches are divided
into three stages. The first stage
>>>> >>> >> >> > contains a
>>>> >>> >> >> > minimal amount of
preparation work that is not
>>>> >>> >> >> > ThinLTO-specific.
>>>> >>> >> >> > The
>>>> >>> >> >> > second stage contains most
of the infrastructure for
>>>> >>> >> >> > ThinLTO,
>>>> >>> >> >> > which
>>>> >>> >> >> > will be off by default. The
third stage includes
>>>> >>> >> >> >
enhancements/improvements/tunings that can be performed
>>>> >>> >> >> > after the
>>>> >>> >> >> > main
>>>> >>> >> >> > ThinLTO infrastructure is
in.
>>>> >>> >> >> >
>>>> >>> >> >> > The second and third
implementation stages will initially be
>>>> >>> >> >> > very
>>>> >>> >> >> > volatile, requiring a lot
of iterations and tuning with
>>>> >>> >> >> > large
>>>> >>> >> >> > apps to
>>>> >>> >> >> > get stabilized. Therefore
it will be important to do fast
>>>> >>> >> >> > commits
>>>> >>> >> >> > for
>>>> >>> >> >> > these implementation
stages.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > 1. Stage 1: Preparation
>>>> >>> >> >> >
-------------------------------
>>>> >>> >> >> >
>>>> >>> >> >> > The first planned sets of
patches are enablers for ThinLTO
>>>> >>> >> >> > work:
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > a. LTO directory structure:
>>>> >>> >> >> >
>>>> >>> >> >> > Restructure the LTO
directory to remove circular dependence
>>>> >>> >> >> > when
>>>> >>> >> >> > ThinLTO pass added. Because
ThinLTO is being implemented as
>>>> >>> >> >> > a SCC
>>>> >>> >> >> > pass
>>>> >>> >> >> > within Transforms/IPO, and
leverages the LTOModule class for
>>>> >>> >> >> > linking
>>>> >>> >> >> > in functions from modules,
IPO then requires the LTO
>>>> >>> >> >> > library.
>>>> >>> >> >> > This
>>>> >>> >> >> > creates a circular
dependence between LTO and IPO. To break
>>>> >>> >> >> > that,
>>>> >>> >> >> > we
>>>> >>> >> >> > need to split the lib/LTO
directory/library into
>>>> >>> >> >> > lib/LTO/CodeGen
>>>> >>> >> >> > and
>>>> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
>>>> >>> >> >> > respectively. Only
LTOCodeGenerator has a dependence on IPO,
>>>> >>> >> >> > removing
>>>> >>> >> >> > the circular dependence.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > b. ELF wrapper generation
support:
>>>> >>> >> >> >
>>>> >>> >> >> > Implement ELF wrapped
bitcode writer. In order to more
>>>> >>> >> >> > easily
>>>> >>> >> >> > interact
>>>> >>> >> >> > with tools such as $AR,
$NM, and “$LD -r” we plan to emit
>>>> >>> >> >> > the
>>>> >>> >> >> > phase-1
>>>> >>> >> >> > bitcode wrapped in ELF via
the .llvmbc section, along with a
>>>> >>> >> >> > symbol
>>>> >>> >> >> > table. The goal is both to
interact with these tools without
>>>> >>> >> >> > requiring
>>>> >>> >> >> > a plugin, and also to avoid
doing partial LTO/ThinLTO across
>>>> >>> >> >> > files
>>>> >>> >> >> > linked with “$LD -r” (i.e.
the resulting object file should
>>>> >>> >> >> > still
>>>> >>> >> >> > contain ELF-wrapped bitcode
to enable ThinLTO at the full
>>>> >>> >> >> > link
>>>> >>> >> >> > step).
>>>> >>> >> >> > I will send a separate
design document for these changes,
>>>> >>> >> >> > but the
>>>> >>> >> >> > following is a high-level
overview.
>>>> >>> >> >> >
>>>> >>> >> >> > Support was added to LLVM
for reading ELF-wrapped bitcode
>>>> >>> >> >> >
(http://reviews.llvm.org/rL218078), but there does not yet
>>>> >>> >> >> > exist
>>>> >>> >> >> > support in LLVM/Clang for
emitting bitcode wrapped in ELF. I
>>>> >>> >> >> > plan
>>>> >>> >> >> > to
>>>> >>> >> >> > add support for optionally
generating bitcode in an ELF file
>>>> >>> >> >> > containing a single .llvmbc
section holding the bitcode.
>>>> >>> >> >> > Specifically,
>>>> >>> >> >> > the patch would add new
options “emit-llvm-bc-elf” (object
>>>> >>> >> >> > file)
>>>> >>> >> >> > and
>>>> >>> >> >> > corresponding
“emit-llvm-elf” (textual assembly code
>>>> >>> >> >> > equivalent).
>>>> >>> >> >> > Eventually these would be
automatically triggered under
>>>> >>> >> >> > “-fthinlto
>>>> >>> >> >> > -c”
>>>> >>> >> >> > and “-fthinlto -S”,
respectively.
>>>> >>> >> >> >
>>>> >>> >> >> > Additionally, a symbol
table will be generated in the ELF
>>>> >>> >> >> > file,
>>>> >>> >> >> > holding the function
symbols within the bitcode. This
>>>> >>> >> >> > facilitates
>>>> >>> >> >> > handling archives of the
ELF-wrapped bitcode created with
>>>> >>> >> >> > $AR,
>>>> >>> >> >> > since
>>>> >>> >> >> > the archive will have a
symbol table as well. The archive
>>>> >>> >> >> > symbol
>>>> >>> >> >> > table
>>>> >>> >> >> > enables gold to extract and
pass to the plugin the
>>>> >>> >> >> > constituent
>>>> >>> >> >> > ELF-wrapped bitcode files.
To support the concatenated
>>>> >>> >> >> > llvmbc
>>>> >>> >> >> > section
>>>> >>> >> >> > generated by “$LD -r”, some
handling needs to be added to
>>>> >>> >> >> > gold
>>>> >>> >> >> > and to
>>>> >>> >> >> > the backend driver to
process each original module’s
>>>> >>> >> >> > bitcode.
>>>> >>> >> >> >
>>>> >>> >> >> > The function index/summary
will later be added as a special
>>>> >>> >> >> > ELF
>>>> >>> >> >> > section alongside the
.llvmbc sections.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > 2. Stage 2: ThinLTO
Infrastructure
>>>> >>> >> >> >
----------------------------------------------
>>>> >>> >> >> >
>>>> >>> >> >> > The next set of patches
adds the base implementation of the
>>>> >>> >> >> > ThinLTO
>>>> >>> >> >> > infrastructure,
specifically those required to make ThinLTO
>>>> >>> >> >> > functional
>>>> >>> >> >> > and generate correct but
not necessarily high-performing
>>>> >>> >> >> > binaries. It
>>>> >>> >> >> > also does not include
support to make debug support under -g
>>>> >>> >> >> > efficient
>>>> >>> >> >> > with ThinLTO.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > a. Clang/LLVM/gold linker
options:
>>>> >>> >> >> >
>>>> >>> >> >> > An early set of clang/llvm
patches is needed to provide
>>>> >>> >> >> > options
>>>> >>> >> >> > to
>>>> >>> >> >> > enable ThinLTO (off by
default), so that the rest of the
>>>> >>> >> >> > implementation can be
disabled by default as it is added.
>>>> >>> >> >> > Specifically, clang options
-fthinlto (used instead of
>>>> >>> >> >> > -flto)
>>>> >>> >> >> > will
>>>> >>> >> >> > cause clang to invoke the
phase-1 emission of LLVM bitcode
>>>> >>> >> >> > and
>>>> >>> >> >> > function summary/index on a
compile step, and pass the
>>>> >>> >> >> > appropriate
>>>> >>> >> >> > option to the gold plugin
on a link step. The -thinlto
>>>> >>> >> >> > option
>>>> >>> >> >> > will be
>>>> >>> >> >> > added to the gold plugin
and llvm-lto tool to launch the
>>>> >>> >> >> > phase-2
>>>> >>> >> >> > thin
>>>> >>> >> >> > archive step. The -thinlto
option will also be added to the
>>>> >>> >> >> > ‘opt’
>>>> >>> >> >> > tool
>>>> >>> >> >> > to invoke it as a phase-3
parallel backend instance.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > b. Thin-archive linking
support in Gold plugin and llvm-lto:
>>>> >>> >> >> >
>>>> >>> >> >> > Under the new plugin option
(see above), the plugin needs to
>>>> >>> >> >> > perform
>>>> >>> >> >> > the phase-2 (thin archive)
link which simply emits a
>>>> >>> >> >> > combined
>>>> >>> >> >> > function
>>>> >>> >> >> > map from the linked
modules, without actually performing the
>>>> >>> >> >> > normal
>>>> >>> >> >> > link. Corresponding support
should be added to the
>>>> >>> >> >> > standalone
>>>> >>> >> >> > llvm-lto
>>>> >>> >> >> > tool to enable
testing/debugging without involving the
>>>> >>> >> >> > linker and
>>>> >>> >> >> > plugin.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > c. ThinLTO backend support:
>>>> >>> >> >> >
>>>> >>> >> >> > Support for invoking a
phase-3 backend invocation (including
>>>> >>> >> >> > importing) on a module
should be added to the ‘opt’ tool
>>>> >>> >> >> > under
>>>> >>> >> >> > the
>>>> >>> >> >> > new
>>>> >>> >> >> > option. The main change
under the option is to instantiate a
>>>> >>> >> >> > Linker
>>>> >>> >> >> > object used to manage the
process of linking imported
>>>> >>> >> >> > functions
>>>> >>> >> >> > into
>>>> >>> >> >> > the module, efficient read
of the combined function map, and
>>>> >>> >> >> > enable
>>>> >>> >> >> > the ThinLTO import pass.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > d. Function index/summary
support:
>>>> >>> >> >> >
>>>> >>> >> >> > This includes
infrastructure for writing and reading the
>>>> >>> >> >> > function
>>>> >>> >> >> > index/summary section. As
noted earlier this will be encoded
>>>> >>> >> >> > in a
>>>> >>> >> >> > special ELF section within
the module, alongside the .llvmbc
>>>> >>> >> >> > section
>>>> >>> >> >> > containing the bitcode. The
thin archive generated by
>>>> >>> >> >> > phase-2 of
>>>> >>> >> >> > ThinLTO simply contains all
of the function index/summary
>>>> >>> >> >> > sections
>>>> >>> >> >> > across the linked modules,
organized for efficient function
>>>> >>> >> >> > lookup.
>>>> >>> >> >> >
>>>> >>> >> >> > Each function available for
importing from the module
>>>> >>> >> >> > contains an
>>>> >>> >> >> > entry in the module’s
function index/summary section and in
>>>> >>> >> >> > the
>>>> >>> >> >> > resulting combined function
map. Each function entry
>>>> >>> >> >> > contains
>>>> >>> >> >> > that
>>>> >>> >> >> > function’s offset within
the bitcode file, used to
>>>> >>> >> >> > efficiently
>>>> >>> >> >> > locate
>>>> >>> >> >> > and quickly import just
that function. The entry also
>>>> >>> >> >> > contains
>>>> >>> >> >> > summary
>>>> >>> >> >> > information (e.g. basic
information determined during
>>>> >>> >> >> > parsing
>>>> >>> >> >> > such as
>>>> >>> >> >> > the number of instructions
in the function), that will be
>>>> >>> >> >> > used to
>>>> >>> >> >> > help
>>>> >>> >> >> > guide later import
decisions. Because the contents of this
>>>> >>> >> >> > section
>>>> >>> >> >> > will change frequently
during ThinLTO tuning, it should also
>>>> >>> >> >> > be
>>>> >>> >> >> > marked
>>>> >>> >> >> > with a version id for
backwards compatibility or version
>>>> >>> >> >> > checking.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > e. ThinLTO importing
support:
>>>> >>> >> >> >
>>>> >>> >> >> > Support for the mechanics
of importing functions from other
>>>> >>> >> >> > modules,
>>>> >>> >> >> > which can go in gradually
as a set of patches since it will
>>>> >>> >> >> > be
>>>> >>> >> >> > off by
>>>> >>> >> >> > default. Separate patches
can include:
>>>> >>> >> >> >
>>>> >>> >> >> > - BitcodeReader changes to
use function index to
>>>> >>> >> >> > import/deserialize
>>>> >>> >> >> > single function of interest
(small changes, leverages
>>>> >>> >> >> > existing
>>>> >>> >> >> > lazy
>>>> >>> >> >> > streamer support).
>>>> >>> >> >> >
>>>> >>> >> >> > - Minor LTOModule changes
to pass the ThinLTO function to
>>>> >>> >> >> > import
>>>> >>> >> >> > and
>>>> >>> >> >> > its index into bitcode
reader.
>>>> >>> >> >> >
>>>> >>> >> >> > - Marking of imported
functions (for use in ThinLTO-specific
>>>> >>> >> >> > symbol
>>>> >>> >> >> > linking and global DCE, for
example). This can be in-memory
>>>> >>> >> >> > initially,
>>>> >>> >> >> > but IR support may be
required in order to support streaming
>>>> >>> >> >> > bitcode
>>>> >>> >> >> > out and back in again after
importing.
>>>> >>> >> >> >
>>>> >>> >> >> > - ModuleLinker changes to
do ThinLTO-specific symbol linking
>>>> >>> >> >> > and
>>>> >>> >> >> > static promotion when
necessary. The linkage type of
>>>> >>> >> >> > imported
>>>> >>> >> >> > functions changes to
AvailableExternallyLinkage, for
>>>> >>> >> >> > example.
>>>> >>> >> >> > Statics
>>>> >>> >> >> > must be promoted in certain
cases, and renamed in consistent
>>>> >>> >> >> > ways.
>>>> >>> >> >> >
>>>> >>> >> >> > - GlobalDCE changes to
support removing imported functions
>>>> >>> >> >> > that
>>>> >>> >> >> > were
>>>> >>> >> >> > not inlined (very small
changes to existing pass logic).
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > f. ThinLTO Import Driver
SCC pass:
>>>> >>> >> >> >
>>>> >>> >> >> > Adds
Transforms/IPO/ThinLTO.cpp with framework for doing
>>>> >>> >> >> > ThinLTO
>>>> >>> >> >> > via
>>>> >>> >> >> > an SCC pass, enabled only
under -fthinlto options. The pass
>>>> >>> >> >> > includes
>>>> >>> >> >> > utilizing the thin archive
(global function index/summary),
>>>> >>> >> >> > import
>>>> >>> >> >> > decision heuristics,
invocation of LTOModule/ModuleLinker
>>>> >>> >> >> > routines
>>>> >>> >> >> > that perform the import,
and any necessary callgraph updates
>>>> >>> >> >> > and
>>>> >>> >> >> > verification.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > g. Backend Driver:
>>>> >>> >> >> >
>>>> >>> >> >> > For a single node build,
the gold plugin can simply write a
>>>> >>> >> >> > makefile
>>>> >>> >> >> > and fork the parallel
backend instances directly via
>>>> >>> >> >> > parallel
>>>> >>> >> >> > make.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning
and Enhancements
>>>> >>> >> >> >
>>>> >>> >> >> >
----------------------------------------------------------------
>>>> >>> >> >> >
>>>> >>> >> >> > This refers to the patches
that are not required for ThinLTO
>>>> >>> >> >> > to
>>>> >>> >> >> > work,
>>>> >>> >> >> > but rather to improve
compile time, memory, run-time
>>>> >>> >> >> > performance
>>>> >>> >> >> > and
>>>> >>> >> >> > usability.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > a. Lazy Debug Metadata
Linking:
>>>> >>> >> >> >
>>>> >>> >> >> > The prototype
implementation included lazy importing of
>>>> >>> >> >> > module-level
>>>> >>> >> >> > metadata during the ThinLTO
pass finalization (i.e. after
>>>> >>> >> >> > all
>>>> >>> >> >> > function
>>>> >>> >> >> > importing is complete).
This actually applies to all
>>>> >>> >> >> > module-level
>>>> >>> >> >> > metadata, not just debug,
although it is the largest. This
>>>> >>> >> >> > can be
>>>> >>> >> >> > added as a separate set of
patches. Changes to
>>>> >>> >> >> > BitcodeReader,
>>>> >>> >> >> > ValueMapper, ModuleLinker
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > b. Import Tuning:
>>>> >>> >> >> >
>>>> >>> >> >> > Tuning the import strategy
will be an iterative process that
>>>> >>> >> >> > will
>>>> >>> >> >> > continue to be refined over
time. It involves several
>>>> >>> >> >> > different
>>>> >>> >> >> > types
>>>> >>> >> >> > of changes: adding support
for recording additional metrics
>>>> >>> >> >> > in
>>>> >>> >> >> > the
>>>> >>> >> >> > function summary, such as
profile data and optional
>>>> >>> >> >> > heavier-weight
>>>> >>> >> >> > IPA
>>>> >>> >> >> > analyses, and tuning the
import heuristics based on the
>>>> >>> >> >> > summary
>>>> >>> >> >> > and
>>>> >>> >> >> > callsite context.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > c. Combined Function Map
Pruning:
>>>> >>> >> >> >
>>>> >>> >> >> > The combined function map
can be pruned of functions that
>>>> >>> >> >> > are
>>>> >>> >> >> > unlikely
>>>> >>> >> >> > to benefit from being
imported. For example, during the
>>>> >>> >> >> > phase-2
>>>> >>> >> >> > thin
>>>> >>> >> >> > archive plug step we can
safely omit large and (with profile
>>>> >>> >> >> > data)
>>>> >>> >> >> > cold functions, which are
unlikely to benefit from being
>>>> >>> >> >> > inlined.
>>>> >>> >> >> > Additionally, all but one
copy of comdat functions can be
>>>> >>> >> >> > suppressed.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > d. Distributed Build System
Integration:
>>>> >>> >> >> >
>>>> >>> >> >> > For a distributed build
system, the gold plugin should write
>>>> >>> >> >> > the
>>>> >>> >> >> > parallel backend
invocations into a makefile, including the
>>>> >>> >> >> > mapping
>>>> >>> >> >> > from the IR file to the
real object file path, and exit.
>>>> >>> >> >> > Additional
>>>> >>> >> >> > work needs to be done in
the distributed build system itself
>>>> >>> >> >> > to
>>>> >>> >> >> > distribute and dispatch the
parallel backend jobs to the
>>>> >>> >> >> > build
>>>> >>> >> >> > cluster.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > e. Dependence Tracking and
Incremental Compiles:
>>>> >>> >> >> >
>>>> >>> >> >> > In order to support build
systems that stage from local
>>>> >>> >> >> > disks or
>>>> >>> >> >> > network storage, the plugin
will optionally support
>>>> >>> >> >> > computation
>>>> >>> >> >> > of
>>>> >>> >> >> > dependent sets of IR files
that each module may import from.
>>>> >>> >> >> > This
>>>> >>> >> >> > can
>>>> >>> >> >> > be computed from profile
data, if it exists, or from the
>>>> >>> >> >> > symbol
>>>> >>> >> >> > table
>>>> >>> >> >> > and heuristics if not.
These dependence sets also enable
>>>> >>> >> >> > support
>>>> >>> >> >> > for
>>>> >>> >> >> > incremental backend
compiles.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > --
>>>> >>> >> >> > Teresa Johnson | Software
Engineer | tejohnson at google.com |
>>>> >>> >> >> > 408-460-2413
>>>> >>> >> >> >
>>>> >>> >> >> >
_______________________________________________
>>>> >>> >> >> > LLVM Developers mailing
list
>>>> >>> >> >> > LLVMdev at cs.uiuc.edu     
http://llvm.cs.uiuc.edu
>>>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>> >> >>
>>>> >>> >> >>
_______________________________________________
>>>> >>> >> >> LLVM Developers mailing list
>>>> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Teresa Johnson | Software Engineer |
tejohnson at google.com |
>>>> >>> >> 408-460-2413
>>>> >>> >>
>>>> >>> >>
_______________________________________________
>>>> >>> >> LLVM Developers mailing list
>>>> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>> >>> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>>>> >>> 408-460-2413
>>>> >>
>>>> >>
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - May 2015 - [LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

Maybe Matching Threads