thr3ads.net - llvm dev - [LLVMdev] RFC: ThinLTO Impementation Plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin

2015-May-14 18:34 UTC

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at gmail.com>
wrote:> I'm not sure this is a particularly great assumption to make.
Which part?
>  We have to
> support a lot of different build systems and tools and concentrating on
> something that just binutils uses isn't particularly friendly here.I think you may have misunderstood
His point was exactly that they want to be transparent to *all of* these tools.
You are saying "we should be friendly to everyone". He is saying the
same thing.
We should be friendly to everyone. The friendly way to do this is to
not require all of these tools build plugins to handle bitcode.

Hence, elf-wrapped bitcode.

> I also
> can't imagine how it's necessary for any of the lto aspects as
currently
> written in the proposal.
>
> -eric
>
> On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>>
>> The design objective is to make thinLTO mostly transparent to binutil
>> tools to enable easy integration with any build system in the wild.
>> 'Pass-through' mode with 'ld -r' instead of the partial
LTO mode is another
>> reason.
>>
>> David
>>
>> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson at
google.com>
>> wrote:
>>>
>>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher <echristo at
gmail.com>
>>> wrote:
>>> > So, what Alex is saying is that we have these tools as well
and they
>>> > understand bitcode just fine, as well as every object format -
not just
>>> > ELF.
>>> > :)
>>>
>>> Right, there are also LLVM specific versions (llvm-ar, llvm-nm)
that
>>> handle bitcode similarly to the way the standard tool + plugin
does.
>>> But the goal we are trying to achieve is to allow the standard
system
>>> versions of the tools to handle these files without requiring a
>>> plugin. I know the LLVM tool handles other object formats, but
I'm not
>>> sure how that helps here? We're not planning to replace those
tools,
>>> just allow the standard system versions to handle the intermediate
>>> objects produced by ThinLTO.
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > -eric
>>> >
>>> >
>>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson <tejohnson at
google.com>
>>> > wrote:
>>> >>
>>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li
>>> >> <xinliangli at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg
>>> >> > <alexr at leftfield.org>
>>> >> > wrote:
>>> >> >>
>>> >> >> "ELF-wrapped bitcode" seems potentially
controversial to me.
>>> >> >>
>>> >> >> What about ar, nm, and various ld implementations
adds this
>>> >> >> requirement?
>>> >> >> What about the LLVM implementations of these
tools is lacking?
>>> >> >
>>> >> >
>>> >> > Sorry I can not parse your questions properly. Can
you make it
>>> >> > clearer?
>>> >>
>>> >> Alex is asking what the issue is with ar, nm, ld -r and
regular
>>> >> bitcode that makes using elf-wrapped bitcode easier.
>>> >>
>>> >> The issue is that generally you need to provide a plugin
to these
>>> >> tools in order for them to understand and handle bitcode
files. We'd
>>> >> like standard tools to work without requiring a plugin as
much as
>>> >> possible. And in some cases we want them to be handled
different than
>>> >> the way bitcode files are handled with the plugin.
>>> >>
>>> >> nm: Without a plugin, normal bitcode files are
inscrutable. When
>>> >> provided the gold plugin it can emit the symbols.
>>> >>
>>> >> ar: Without a plugin, it will create an archive of bitcode
files, but
>>> >> without an index, so it can't be handled by the linker
even with a
>>> >> plugin on an -flto link. When ar is provided the gold
plugin it does
>>> >> create an index, so the linker + gold plugin handle it
appropriately
>>> >> on an -flto link.
>>> >>
>>> >> ld -r: Without a plugin, fails when provided bitcode
inputs. When
>>> >> provided the gold plugin, it handles them but compiles
them all the
>>> >> way through to ELF executable instructions via a partial
LTO link.
>>> >> This is where we would like to differ in behavior (while
also not
>>> >> requiring a plugin) with ELF-wrapped bitcode: we would
like the ld -r
>>> >> output file to still contain ELF-wrapped bitcode, delaying
the LTO
>>> >> until the full link step.
>>> >>
>>> >> Let me know if that helps address your concerns.
>>> >>
>>> >> Thanks,
>>> >> Teresa
>>> >>
>>> >> >
>>> >> > David
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> Alex
>>> >> >>
>>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson
>>> >> >> > <tejohnson at google.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> > I've included below an RFC for
implementing ThinLTO in LLVM,
>>> >> >> > looking
>>> >> >> > forward to feedback and questions.
>>> >> >> > Thanks!
>>> >> >> > Teresa
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > RFC to discuss plans for implementing
ThinLTO upstream.
>>> >> >> > Background
>>> >> >> > can
>>> >> >> > be found in slides from EuroLLVM 2015:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>>> >> >> > As described in the talk, we have a
prototype implementation, and
>>> >> >> > would like to start staging patches
upstream. This RFC describes
>>> >> >> > a
>>> >> >> > breakdown of the major pieces. We would like
to commit upstream
>>> >> >> > gradually in several stages, with all
functionality off by
>>> >> >> > default.
>>> >> >> > The core ThinLTO importing support and
tuning will require
>>> >> >> > frequent
>>> >> >> > change and iteration during testing and
tuning, and for that part
>>> >> >> > we
>>> >> >> > would like to commit rapidly (off by
default). See the proposed
>>> >> >> > staged
>>> >> >> > implementation described in the
Implementation Plan section.
>>> >> >> >
>>> >> >> >
>>> >> >> > ThinLTO Overview
>>> >> >> > =============>>> >> >>
>
>>> >> >> > See the talk slides linked above for more
details. The following
>>> >> >> > is a
>>> >> >> > high-level overview of the motivation.
>>> >> >> >
>>> >> >> > Cross Module Optimization (CMO) is an
effective means for
>>> >> >> > improving
>>> >> >> > runtime performance, by extending the scope
of optimizations
>>> >> >> > across
>>> >> >> > source module boundaries. Without CMO, the
compiler is limited to
>>> >> >> > optimizing within the scope of single source
modules. Two
>>> >> >> > solutions
>>> >> >> > for enabling CMO are Link-Time Optimization
(LTO), which is
>>> >> >> > currently
>>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
>>> >> >> > Optimization (LIPO). However, each of these
solutions has
>>> >> >> > limitations
>>> >> >> > that prevent it from being enabled by
default. ThinLTO is a new
>>> >> >> > approach that attempts to address these
limitations, with a goal
>>> >> >> > of
>>> >> >> > being enabled more broadly. ThinLTO is
designed with many of the
>>> >> >> > same
>>> >> >> > principals as LIPO, and therefore its
advantages, without any of
>>> >> >> > its
>>> >> >> > inherent weakness. Unlike in LIPO where the
module group decision
>>> >> >> > is
>>> >> >> > made at profile training runtime, ThinLTO
makes the decision at
>>> >> >> > compile time, but in a lazy mode that
facilitates large scale
>>> >> >> > parallelism. The serial linker plugin phase
is designed to be
>>> >> >> > razor
>>> >> >> > thin and blazingly fast. By default this
step only does minimal
>>> >> >> > preparation work to enable the parallel lazy
importing performed
>>> >> >> > later. ThinLTO aims to be scalable like a
regular O2 build,
>>> >> >> > enabling
>>> >> >> > CMO on machines without large memory
configurations, while also
>>> >> >> > integrating well with distributed build
systems. Results from
>>> >> >> > early
>>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks
are in line with
>>> >> >> > expectations that ThinLTO can scale like O2
while enabling much
>>> >> >> > of
>>> >> >> > the
>>> >> >> > CMO performed during a full LTO build.
>>> >> >> >
>>> >> >> >
>>> >> >> > A ThinLTO build is divided into 3 phases,
which are referred to
>>> >> >> > in
>>> >> >> > the
>>> >> >> > following implementation plan:
>>> >> >> >
>>> >> >> > phase-1: IR and Function Summary Generation
(-c compile)
>>> >> >> > phase-2: Thin Linker Plugin Layer (thin
archive linker step)
>>> >> >> > phase-3: Parallel Backend with Demand-Driven
Importing
>>> >> >> >
>>> >> >> >
>>> >> >> > Implementation Plan
>>> >> >> > ===============>>> >>
>> >
>>> >> >> > This section gives a high-level breakdown of
the ThinLTO support
>>> >> >> > that
>>> >> >> > will be added, in roughly the order that the
patches would be
>>> >> >> > staged.
>>> >> >> > The patches are divided into three stages.
The first stage
>>> >> >> > contains a
>>> >> >> > minimal amount of preparation work that is
not ThinLTO-specific.
>>> >> >> > The
>>> >> >> > second stage contains most of the
infrastructure for ThinLTO,
>>> >> >> > which
>>> >> >> > will be off by default. The third stage
includes
>>> >> >> > enhancements/improvements/tunings that can
be performed after the
>>> >> >> > main
>>> >> >> > ThinLTO infrastructure is in.
>>> >> >> >
>>> >> >> > The second and third implementation stages
will initially be very
>>> >> >> > volatile, requiring a lot of iterations and
tuning with large
>>> >> >> > apps to
>>> >> >> > get stabilized. Therefore it will be
important to do fast commits
>>> >> >> > for
>>> >> >> > these implementation stages.
>>> >> >> >
>>> >> >> >
>>> >> >> > 1. Stage 1: Preparation
>>> >> >> > -------------------------------
>>> >> >> >
>>> >> >> > The first planned sets of patches are
enablers for ThinLTO work:
>>> >> >> >
>>> >> >> >
>>> >> >> > a. LTO directory structure:
>>> >> >> >
>>> >> >> > Restructure the LTO directory to remove
circular dependence when
>>> >> >> > ThinLTO pass added. Because ThinLTO is being
implemented as a SCC
>>> >> >> > pass
>>> >> >> > within Transforms/IPO, and leverages the
LTOModule class for
>>> >> >> > linking
>>> >> >> > in functions from modules, IPO then requires
the LTO library.
>>> >> >> > This
>>> >> >> > creates a circular dependence between LTO
and IPO. To break that,
>>> >> >> > we
>>> >> >> > need to split the lib/LTO directory/library
into lib/LTO/CodeGen
>>> >> >> > and
>>> >> >> > lib/LTO/Module, containing LTOCodeGenerator
and LTOModule,
>>> >> >> > respectively. Only LTOCodeGenerator has a
dependence on IPO,
>>> >> >> > removing
>>> >> >> > the circular dependence.
>>> >> >> >
>>> >> >> >
>>> >> >> > b. ELF wrapper generation support:
>>> >> >> >
>>> >> >> > Implement ELF wrapped bitcode writer. In
order to more easily
>>> >> >> > interact
>>> >> >> > with tools such as $AR, $NM, and “$LD -r” we
plan to emit the
>>> >> >> > phase-1
>>> >> >> > bitcode wrapped in ELF via the .llvmbc
section, along with a
>>> >> >> > symbol
>>> >> >> > table. The goal is both to interact with
these tools without
>>> >> >> > requiring
>>> >> >> > a plugin, and also to avoid doing partial
LTO/ThinLTO across
>>> >> >> > files
>>> >> >> > linked with “$LD -r” (i.e. the resulting
object file should still
>>> >> >> > contain ELF-wrapped bitcode to enable
ThinLTO at the full link
>>> >> >> > step).
>>> >> >> > I will send a separate design document for
these changes, but the
>>> >> >> > following is a high-level overview.
>>> >> >> >
>>> >> >> > Support was added to LLVM for reading
ELF-wrapped bitcode
>>> >> >> > (http://reviews.llvm.org/rL218078), but
there does not yet exist
>>> >> >> > support in LLVM/Clang for emitting bitcode
wrapped in ELF. I plan
>>> >> >> > to
>>> >> >> > add support for optionally generating
bitcode in an ELF file
>>> >> >> > containing a single .llvmbc section holding
the bitcode.
>>> >> >> > Specifically,
>>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object file)
>>> >> >> > and
>>> >> >> > corresponding “emit-llvm-elf” (textual
assembly code equivalent).
>>> >> >> > Eventually these would be automatically
triggered under
>>> >> >> > “-fthinlto
>>> >> >> > -c”
>>> >> >> > and “-fthinlto -S”, respectively.
>>> >> >> >
>>> >> >> > Additionally, a symbol table will be
generated in the ELF file,
>>> >> >> > holding the function symbols within the
bitcode. This facilitates
>>> >> >> > handling archives of the ELF-wrapped bitcode
created with $AR,
>>> >> >> > since
>>> >> >> > the archive will have a symbol table as
well. The archive symbol
>>> >> >> > table
>>> >> >> > enables gold to extract and pass to the
plugin the constituent
>>> >> >> > ELF-wrapped bitcode files. To support the
concatenated llvmbc
>>> >> >> > section
>>> >> >> > generated by “$LD -r”, some handling needs
to be added to gold
>>> >> >> > and to
>>> >> >> > the backend driver to process each original
module’s bitcode.
>>> >> >> >
>>> >> >> > The function index/summary will later be
added as a special ELF
>>> >> >> > section alongside the .llvmbc sections.
>>> >> >> >
>>> >> >> >
>>> >> >> > 2. Stage 2: ThinLTO Infrastructure
>>> >> >> >
----------------------------------------------
>>> >> >> >
>>> >> >> > The next set of patches adds the base
implementation of the
>>> >> >> > ThinLTO
>>> >> >> > infrastructure, specifically those required
to make ThinLTO
>>> >> >> > functional
>>> >> >> > and generate correct but not necessarily
high-performing
>>> >> >> > binaries. It
>>> >> >> > also does not include support to make debug
support under -g
>>> >> >> > efficient
>>> >> >> > with ThinLTO.
>>> >> >> >
>>> >> >> >
>>> >> >> > a. Clang/LLVM/gold linker options:
>>> >> >> >
>>> >> >> > An early set of clang/llvm patches is needed
to provide options
>>> >> >> > to
>>> >> >> > enable ThinLTO (off by default), so that the
rest of the
>>> >> >> > implementation can be disabled by default as
it is added.
>>> >> >> > Specifically, clang options -fthinlto (used
instead of -flto)
>>> >> >> > will
>>> >> >> > cause clang to invoke the phase-1 emission
of LLVM bitcode and
>>> >> >> > function summary/index on a compile step,
and pass the
>>> >> >> > appropriate
>>> >> >> > option to the gold plugin on a link step.
The -thinlto option
>>> >> >> > will be
>>> >> >> > added to the gold plugin and llvm-lto tool
to launch the phase-2
>>> >> >> > thin
>>> >> >> > archive step. The -thinlto option will also
be added to the ‘opt’
>>> >> >> > tool
>>> >> >> > to invoke it as a phase-3 parallel backend
instance.
>>> >> >> >
>>> >> >> >
>>> >> >> > b. Thin-archive linking support in Gold
plugin and llvm-lto:
>>> >> >> >
>>> >> >> > Under the new plugin option (see above), the
plugin needs to
>>> >> >> > perform
>>> >> >> > the phase-2 (thin archive) link which simply
emits a combined
>>> >> >> > function
>>> >> >> > map from the linked modules, without
actually performing the
>>> >> >> > normal
>>> >> >> > link. Corresponding support should be added
to the standalone
>>> >> >> > llvm-lto
>>> >> >> > tool to enable testing/debugging without
involving the linker and
>>> >> >> > plugin.
>>> >> >> >
>>> >> >> >
>>> >> >> > c. ThinLTO backend support:
>>> >> >> >
>>> >> >> > Support for invoking a phase-3 backend
invocation (including
>>> >> >> > importing) on a module should be added to
the ‘opt’ tool under
>>> >> >> > the
>>> >> >> > new
>>> >> >> > option. The main change under the option is
to instantiate a
>>> >> >> > Linker
>>> >> >> > object used to manage the process of linking
imported functions
>>> >> >> > into
>>> >> >> > the module, efficient read of the combined
function map, and
>>> >> >> > enable
>>> >> >> > the ThinLTO import pass.
>>> >> >> >
>>> >> >> >
>>> >> >> > d. Function index/summary support:
>>> >> >> >
>>> >> >> > This includes infrastructure for writing and
reading the function
>>> >> >> > index/summary section. As noted earlier this
will be encoded in a
>>> >> >> > special ELF section within the module,
alongside the .llvmbc
>>> >> >> > section
>>> >> >> > containing the bitcode. The thin archive
generated by phase-2 of
>>> >> >> > ThinLTO simply contains all of the function
index/summary
>>> >> >> > sections
>>> >> >> > across the linked modules, organized for
efficient function
>>> >> >> > lookup.
>>> >> >> >
>>> >> >> > Each function available for importing from
the module contains an
>>> >> >> > entry in the module’s function index/summary
section and in the
>>> >> >> > resulting combined function map. Each
function entry contains
>>> >> >> > that
>>> >> >> > function’s offset within the bitcode file,
used to efficiently
>>> >> >> > locate
>>> >> >> > and quickly import just that function. The
entry also contains
>>> >> >> > summary
>>> >> >> > information (e.g. basic information
determined during parsing
>>> >> >> > such as
>>> >> >> > the number of instructions in the function),
that will be used to
>>> >> >> > help
>>> >> >> > guide later import decisions. Because the
contents of this
>>> >> >> > section
>>> >> >> > will change frequently during ThinLTO
tuning, it should also be
>>> >> >> > marked
>>> >> >> > with a version id for backwards
compatibility or version
>>> >> >> > checking.
>>> >> >> >
>>> >> >> >
>>> >> >> > e. ThinLTO importing support:
>>> >> >> >
>>> >> >> > Support for the mechanics of importing
functions from other
>>> >> >> > modules,
>>> >> >> > which can go in gradually as a set of
patches since it will be
>>> >> >> > off by
>>> >> >> > default. Separate patches can include:
>>> >> >> >
>>> >> >> > - BitcodeReader changes to use function
index to
>>> >> >> > import/deserialize
>>> >> >> > single function of interest (small changes,
leverages existing
>>> >> >> > lazy
>>> >> >> > streamer support).
>>> >> >> >
>>> >> >> > - Minor LTOModule changes to pass the
ThinLTO function to import
>>> >> >> > and
>>> >> >> > its index into bitcode reader.
>>> >> >> >
>>> >> >> > - Marking of imported functions (for use in
ThinLTO-specific
>>> >> >> > symbol
>>> >> >> > linking and global DCE, for example). This
can be in-memory
>>> >> >> > initially,
>>> >> >> > but IR support may be required in order to
support streaming
>>> >> >> > bitcode
>>> >> >> > out and back in again after importing.
>>> >> >> >
>>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking and
>>> >> >> > static promotion when necessary. The linkage
type of imported
>>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
>>> >> >> > Statics
>>> >> >> > must be promoted in certain cases, and
renamed in consistent
>>> >> >> > ways.
>>> >> >> >
>>> >> >> > - GlobalDCE changes to support removing
imported functions that
>>> >> >> > were
>>> >> >> > not inlined (very small changes to existing
pass logic).
>>> >> >> >
>>> >> >> >
>>> >> >> > f. ThinLTO Import Driver SCC pass:
>>> >> >> >
>>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with
framework for doing ThinLTO
>>> >> >> > via
>>> >> >> > an SCC pass, enabled only under -fthinlto
options. The pass
>>> >> >> > includes
>>> >> >> > utilizing the thin archive (global function
index/summary),
>>> >> >> > import
>>> >> >> > decision heuristics, invocation of
LTOModule/ModuleLinker
>>> >> >> > routines
>>> >> >> > that perform the import, and any necessary
callgraph updates and
>>> >> >> > verification.
>>> >> >> >
>>> >> >> >
>>> >> >> > g. Backend Driver:
>>> >> >> >
>>> >> >> > For a single node build, the gold plugin can
simply write a
>>> >> >> > makefile
>>> >> >> > and fork the parallel backend instances
directly via parallel
>>> >> >> > make.
>>> >> >> >
>>> >> >> >
>>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements
>>> >> >> >
----------------------------------------------------------------
>>> >> >> >
>>> >> >> > This refers to the patches that are not
required for ThinLTO to
>>> >> >> > work,
>>> >> >> > but rather to improve compile time, memory,
run-time performance
>>> >> >> > and
>>> >> >> > usability.
>>> >> >> >
>>> >> >> >
>>> >> >> > a. Lazy Debug Metadata Linking:
>>> >> >> >
>>> >> >> > The prototype implementation included lazy
importing of
>>> >> >> > module-level
>>> >> >> > metadata during the ThinLTO pass
finalization (i.e. after all
>>> >> >> > function
>>> >> >> > importing is complete). This actually
applies to all module-level
>>> >> >> > metadata, not just debug, although it is the
largest. This can be
>>> >> >> > added as a separate set of patches. Changes
to BitcodeReader,
>>> >> >> > ValueMapper, ModuleLinker
>>> >> >> >
>>> >> >> >
>>> >> >> > b. Import Tuning:
>>> >> >> >
>>> >> >> > Tuning the import strategy will be an
iterative process that will
>>> >> >> > continue to be refined over time. It
involves several different
>>> >> >> > types
>>> >> >> > of changes: adding support for recording
additional metrics in
>>> >> >> > the
>>> >> >> > function summary, such as profile data and
optional
>>> >> >> > heavier-weight
>>> >> >> > IPA
>>> >> >> > analyses, and tuning the import heuristics
based on the summary
>>> >> >> > and
>>> >> >> > callsite context.
>>> >> >> >
>>> >> >> >
>>> >> >> > c. Combined Function Map Pruning:
>>> >> >> >
>>> >> >> > The combined function map can be pruned of
functions that are
>>> >> >> > unlikely
>>> >> >> > to benefit from being imported. For example,
during the phase-2
>>> >> >> > thin
>>> >> >> > archive plug step we can safely omit large
and (with profile
>>> >> >> > data)
>>> >> >> > cold functions, which are unlikely to
benefit from being inlined.
>>> >> >> > Additionally, all but one copy of comdat
functions can be
>>> >> >> > suppressed.
>>> >> >> >
>>> >> >> >
>>> >> >> > d. Distributed Build System Integration:
>>> >> >> >
>>> >> >> > For a distributed build system, the gold
plugin should write the
>>> >> >> > parallel backend invocations into a
makefile, including the
>>> >> >> > mapping
>>> >> >> > from the IR file to the real object file
path, and exit.
>>> >> >> > Additional
>>> >> >> > work needs to be done in the distributed
build system itself to
>>> >> >> > distribute and dispatch the parallel backend
jobs to the build
>>> >> >> > cluster.
>>> >> >> >
>>> >> >> >
>>> >> >> > e. Dependence Tracking and Incremental
Compiles:
>>> >> >> >
>>> >> >> > In order to support build systems that stage
from local disks or
>>> >> >> > network storage, the plugin will optionally
support computation
>>> >> >> > of
>>> >> >> > dependent sets of IR files that each module
may import from. This
>>> >> >> > can
>>> >> >> > be computed from profile data, if it exists,
or from the symbol
>>> >> >> > table
>>> >> >> > and heuristics if not. These dependence sets
also enable support
>>> >> >> > for
>>> >> >> > incremental backend compiles.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Teresa Johnson | Software Engineer |
tejohnson at google.com |
>>> >> >> > 408-460-2413
>>> >> >> >
>>> >> >> >
_______________________________________________
>>> >> >> > LLVM Developers mailing list
>>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> LLVM Developers mailing list
>>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>>> >> 408-460-2413
>>> >>
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
408-460-2413
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Xinliang David Li

2015-May-14 18:43 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

that is exactly the point.

thanks,

David


On Thu, May 14, 2015 at 11:34 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
> wrote:
> > I'm not sure this is a particularly great assumption to make.
>
> Which part?
>
> >  We have to
> > support a lot of different build systems and tools and concentrating
on
> > something that just binutils uses isn't particularly friendly
here.
> I think you may have misunderstood
> His point was exactly that they want to be transparent to *all of* these
> tools.
> You are saying "we should be friendly to everyone". He is saying
the same
> thing.
> We should be friendly to everyone. The friendly way to do this is to
> not require all of these tools build plugins to handle bitcode.
>
> Hence, elf-wrapped bitcode.
>
>
> > I also
> > can't imagine how it's necessary for any of the lto aspects as
currently
> > written in the proposal.
> >
> > -eric
> >
> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli at
gmail.com>
> > wrote:
> >>
> >> The design objective is to make thinLTO mostly transparent to
binutil
> >> tools to enable easy integration with any build system in the
wild.
> >> 'Pass-through' mode with 'ld -r' instead of the
partial LTO mode is
> another
> >> reason.
> >>
> >> David
> >>
> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson at
google.com>
> >> wrote:
> >>>
> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher <echristo
at gmail.com>
> >>> wrote:
> >>> > So, what Alex is saying is that we have these tools as
well and they
> >>> > understand bitcode just fine, as well as every object
format - not
> just
> >>> > ELF.
> >>> > :)
> >>>
> >>> Right, there are also LLVM specific versions (llvm-ar,
llvm-nm) that
> >>> handle bitcode similarly to the way the standard tool + plugin
does.
> >>> But the goal we are trying to achieve is to allow the standard
system
> >>> versions of the tools to handle these files without requiring
a
> >>> plugin. I know the LLVM tool handles other object formats, but
I'm not
> >>> sure how that helps here? We're not planning to replace
those tools,
> >>> just allow the standard system versions to handle the
intermediate
> >>> objects produced by ThinLTO.
> >>>
> >>> Thanks,
> >>> Teresa
> >>>
> >>> >
> >>> > -eric
> >>> >
> >>> >
> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
<tejohnson at google.com>
> >>> > wrote:
> >>> >>
> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li
> >>> >> <xinliangli at gmail.com> wrote:
> >>> >> >
> >>> >> >
> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg
> >>> >> > <alexr at leftfield.org>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> "ELF-wrapped bitcode" seems
potentially controversial to me.
> >>> >> >>
> >>> >> >> What about ar, nm, and various ld
implementations adds this
> >>> >> >> requirement?
> >>> >> >> What about the LLVM implementations of these
tools is lacking?
> >>> >> >
> >>> >> >
> >>> >> > Sorry I can not parse your questions properly.
Can you make it
> >>> >> > clearer?
> >>> >>
> >>> >> Alex is asking what the issue is with ar, nm, ld -r
and regular
> >>> >> bitcode that makes using elf-wrapped bitcode easier.
> >>> >>
> >>> >> The issue is that generally you need to provide a
plugin to these
> >>> >> tools in order for them to understand and handle
bitcode files. We'd
> >>> >> like standard tools to work without requiring a
plugin as much as
> >>> >> possible. And in some cases we want them to be
handled different
> than
> >>> >> the way bitcode files are handled with the plugin.
> >>> >>
> >>> >> nm: Without a plugin, normal bitcode files are
inscrutable. When
> >>> >> provided the gold plugin it can emit the symbols.
> >>> >>
> >>> >> ar: Without a plugin, it will create an archive of
bitcode files,
> but
> >>> >> without an index, so it can't be handled by the
linker even with a
> >>> >> plugin on an -flto link. When ar is provided the gold
plugin it does
> >>> >> create an index, so the linker + gold plugin handle
it appropriately
> >>> >> on an -flto link.
> >>> >>
> >>> >> ld -r: Without a plugin, fails when provided bitcode
inputs. When
> >>> >> provided the gold plugin, it handles them but
compiles them all the
> >>> >> way through to ELF executable instructions via a
partial LTO link.
> >>> >> This is where we would like to differ in behavior
(while also not
> >>> >> requiring a plugin) with ELF-wrapped bitcode: we
would like the ld
> -r
> >>> >> output file to still contain ELF-wrapped bitcode,
delaying the LTO
> >>> >> until the full link step.
> >>> >>
> >>> >> Let me know if that helps address your concerns.
> >>> >>
> >>> >> Thanks,
> >>> >> Teresa
> >>> >>
> >>> >> >
> >>> >> > David
> >>> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> Alex
> >>> >> >>
> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa
Johnson
> >>> >> >> > <tejohnson at google.com>
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> > I've included below an RFC for
implementing ThinLTO in LLVM,
> >>> >> >> > looking
> >>> >> >> > forward to feedback and questions.
> >>> >> >> > Thanks!
> >>> >> >> > Teresa
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > RFC to discuss plans for implementing
ThinLTO upstream.
> >>> >> >> > Background
> >>> >> >> > can
> >>> >> >> > be found in slides from EuroLLVM 2015:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
> >>> >> >> > As described in the talk, we have a
prototype implementation,
> and
> >>> >> >> > would like to start staging patches
upstream. This RFC
> describes
> >>> >> >> > a
> >>> >> >> > breakdown of the major pieces. We would
like to commit upstream
> >>> >> >> > gradually in several stages, with all
functionality off by
> >>> >> >> > default.
> >>> >> >> > The core ThinLTO importing support and
tuning will require
> >>> >> >> > frequent
> >>> >> >> > change and iteration during testing and
tuning, and for that
> part
> >>> >> >> > we
> >>> >> >> > would like to commit rapidly (off by
default). See the proposed
> >>> >> >> > staged
> >>> >> >> > implementation described in the
Implementation Plan section.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > ThinLTO Overview
> >>> >> >> > =============> >>> >>
>> >
> >>> >> >> > See the talk slides linked above for
more details. The
> following
> >>> >> >> > is a
> >>> >> >> > high-level overview of the motivation.
> >>> >> >> >
> >>> >> >> > Cross Module Optimization (CMO) is an
effective means for
> >>> >> >> > improving
> >>> >> >> > runtime performance, by extending the
scope of optimizations
> >>> >> >> > across
> >>> >> >> > source module boundaries. Without CMO,
the compiler is limited
> to
> >>> >> >> > optimizing within the scope of single
source modules. Two
> >>> >> >> > solutions
> >>> >> >> > for enabling CMO are Link-Time
Optimization (LTO), which is
> >>> >> >> > currently
> >>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
> >>> >> >> > Optimization (LIPO). However, each of
these solutions has
> >>> >> >> > limitations
> >>> >> >> > that prevent it from being enabled by
default. ThinLTO is a new
> >>> >> >> > approach that attempts to address these
limitations, with a
> goal
> >>> >> >> > of
> >>> >> >> > being enabled more broadly. ThinLTO is
designed with many of
> the
> >>> >> >> > same
> >>> >> >> > principals as LIPO, and therefore its
advantages, without any
> of
> >>> >> >> > its
> >>> >> >> > inherent weakness. Unlike in LIPO where
the module group
> decision
> >>> >> >> > is
> >>> >> >> > made at profile training runtime,
ThinLTO makes the decision at
> >>> >> >> > compile time, but in a lazy mode that
facilitates large scale
> >>> >> >> > parallelism. The serial linker plugin
phase is designed to be
> >>> >> >> > razor
> >>> >> >> > thin and blazingly fast. By default
this step only does minimal
> >>> >> >> > preparation work to enable the parallel
lazy importing
> performed
> >>> >> >> > later. ThinLTO aims to be scalable like
a regular O2 build,
> >>> >> >> > enabling
> >>> >> >> > CMO on machines without large memory
configurations, while also
> >>> >> >> > integrating well with distributed build
systems. Results from
> >>> >> >> > early
> >>> >> >> > prototyping on SPEC cpu2006 C++
benchmarks are in line with
> >>> >> >> > expectations that ThinLTO can scale
like O2 while enabling much
> >>> >> >> > of
> >>> >> >> > the
> >>> >> >> > CMO performed during a full LTO build.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > A ThinLTO build is divided into 3
phases, which are referred to
> >>> >> >> > in
> >>> >> >> > the
> >>> >> >> > following implementation plan:
> >>> >> >> >
> >>> >> >> > phase-1: IR and Function Summary
Generation (-c compile)
> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin
archive linker step)
> >>> >> >> > phase-3: Parallel Backend with
Demand-Driven Importing
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > Implementation Plan
> >>> >> >> > ===============> >>>
>> >> >
> >>> >> >> > This section gives a high-level
breakdown of the ThinLTO
> support
> >>> >> >> > that
> >>> >> >> > will be added, in roughly the order
that the patches would be
> >>> >> >> > staged.
> >>> >> >> > The patches are divided into three
stages. The first stage
> >>> >> >> > contains a
> >>> >> >> > minimal amount of preparation work that
is not
> ThinLTO-specific.
> >>> >> >> > The
> >>> >> >> > second stage contains most of the
infrastructure for ThinLTO,
> >>> >> >> > which
> >>> >> >> > will be off by default. The third stage
includes
> >>> >> >> > enhancements/improvements/tunings that
can be performed after
> the
> >>> >> >> > main
> >>> >> >> > ThinLTO infrastructure is in.
> >>> >> >> >
> >>> >> >> > The second and third implementation
stages will initially be
> very
> >>> >> >> > volatile, requiring a lot of iterations
and tuning with large
> >>> >> >> > apps to
> >>> >> >> > get stabilized. Therefore it will be
important to do fast
> commits
> >>> >> >> > for
> >>> >> >> > these implementation stages.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 1. Stage 1: Preparation
> >>> >> >> > -------------------------------
> >>> >> >> >
> >>> >> >> > The first planned sets of patches are
enablers for ThinLTO
> work:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. LTO directory structure:
> >>> >> >> >
> >>> >> >> > Restructure the LTO directory to remove
circular dependence
> when
> >>> >> >> > ThinLTO pass added. Because ThinLTO is
being implemented as a
> SCC
> >>> >> >> > pass
> >>> >> >> > within Transforms/IPO, and leverages
the LTOModule class for
> >>> >> >> > linking
> >>> >> >> > in functions from modules, IPO then
requires the LTO library.
> >>> >> >> > This
> >>> >> >> > creates a circular dependence between
LTO and IPO. To break
> that,
> >>> >> >> > we
> >>> >> >> > need to split the lib/LTO
directory/library into
> lib/LTO/CodeGen
> >>> >> >> > and
> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
> >>> >> >> > respectively. Only LTOCodeGenerator has
a dependence on IPO,
> >>> >> >> > removing
> >>> >> >> > the circular dependence.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. ELF wrapper generation support:
> >>> >> >> >
> >>> >> >> > Implement ELF wrapped bitcode writer.
In order to more easily
> >>> >> >> > interact
> >>> >> >> > with tools such as $AR, $NM, and “$LD
-r” we plan to emit the
> >>> >> >> > phase-1
> >>> >> >> > bitcode wrapped in ELF via the .llvmbc
section, along with a
> >>> >> >> > symbol
> >>> >> >> > table. The goal is both to interact
with these tools without
> >>> >> >> > requiring
> >>> >> >> > a plugin, and also to avoid doing
partial LTO/ThinLTO across
> >>> >> >> > files
> >>> >> >> > linked with “$LD -r” (i.e. the
resulting object file should
> still
> >>> >> >> > contain ELF-wrapped bitcode to enable
ThinLTO at the full link
> >>> >> >> > step).
> >>> >> >> > I will send a separate design document
for these changes, but
> the
> >>> >> >> > following is a high-level overview.
> >>> >> >> >
> >>> >> >> > Support was added to LLVM for reading
ELF-wrapped bitcode
> >>> >> >> > (http://reviews.llvm.org/rL218078), but
there does not yet
> exist
> >>> >> >> > support in LLVM/Clang for emitting
bitcode wrapped in ELF. I
> plan
> >>> >> >> > to
> >>> >> >> > add support for optionally generating
bitcode in an ELF file
> >>> >> >> > containing a single .llvmbc section
holding the bitcode.
> >>> >> >> > Specifically,
> >>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object
> file)
> >>> >> >> > and
> >>> >> >> > corresponding “emit-llvm-elf” (textual
assembly code
> equivalent).
> >>> >> >> > Eventually these would be automatically
triggered under
> >>> >> >> > “-fthinlto
> >>> >> >> > -c”
> >>> >> >> > and “-fthinlto -S”, respectively.
> >>> >> >> >
> >>> >> >> > Additionally, a symbol table will be
generated in the ELF file,
> >>> >> >> > holding the function symbols within the
bitcode. This
> facilitates
> >>> >> >> > handling archives of the ELF-wrapped
bitcode created with $AR,
> >>> >> >> > since
> >>> >> >> > the archive will have a symbol table as
well. The archive
> symbol
> >>> >> >> > table
> >>> >> >> > enables gold to extract and pass to the
plugin the constituent
> >>> >> >> > ELF-wrapped bitcode files. To support
the concatenated llvmbc
> >>> >> >> > section
> >>> >> >> > generated by “$LD -r”, some handling
needs to be added to gold
> >>> >> >> > and to
> >>> >> >> > the backend driver to process each
original module’s bitcode.
> >>> >> >> >
> >>> >> >> > The function index/summary will later
be added as a special ELF
> >>> >> >> > section alongside the .llvmbc sections.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure
> >>> >> >> >
----------------------------------------------
> >>> >> >> >
> >>> >> >> > The next set of patches adds the base
implementation of the
> >>> >> >> > ThinLTO
> >>> >> >> > infrastructure, specifically those
required to make ThinLTO
> >>> >> >> > functional
> >>> >> >> > and generate correct but not
necessarily high-performing
> >>> >> >> > binaries. It
> >>> >> >> > also does not include support to make
debug support under -g
> >>> >> >> > efficient
> >>> >> >> > with ThinLTO.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. Clang/LLVM/gold linker options:
> >>> >> >> >
> >>> >> >> > An early set of clang/llvm patches is
needed to provide options
> >>> >> >> > to
> >>> >> >> > enable ThinLTO (off by default), so
that the rest of the
> >>> >> >> > implementation can be disabled by
default as it is added.
> >>> >> >> > Specifically, clang options -fthinlto
(used instead of -flto)
> >>> >> >> > will
> >>> >> >> > cause clang to invoke the phase-1
emission of LLVM bitcode and
> >>> >> >> > function summary/index on a compile
step, and pass the
> >>> >> >> > appropriate
> >>> >> >> > option to the gold plugin on a link
step. The -thinlto option
> >>> >> >> > will be
> >>> >> >> > added to the gold plugin and llvm-lto
tool to launch the
> phase-2
> >>> >> >> > thin
> >>> >> >> > archive step. The -thinlto option will
also be added to the
> ‘opt’
> >>> >> >> > tool
> >>> >> >> > to invoke it as a phase-3 parallel
backend instance.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. Thin-archive linking support in Gold
plugin and llvm-lto:
> >>> >> >> >
> >>> >> >> > Under the new plugin option (see
above), the plugin needs to
> >>> >> >> > perform
> >>> >> >> > the phase-2 (thin archive) link which
simply emits a combined
> >>> >> >> > function
> >>> >> >> > map from the linked modules, without
actually performing the
> >>> >> >> > normal
> >>> >> >> > link. Corresponding support should be
added to the standalone
> >>> >> >> > llvm-lto
> >>> >> >> > tool to enable testing/debugging
without involving the linker
> and
> >>> >> >> > plugin.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > c. ThinLTO backend support:
> >>> >> >> >
> >>> >> >> > Support for invoking a phase-3 backend
invocation (including
> >>> >> >> > importing) on a module should be added
to the ‘opt’ tool under
> >>> >> >> > the
> >>> >> >> > new
> >>> >> >> > option. The main change under the
option is to instantiate a
> >>> >> >> > Linker
> >>> >> >> > object used to manage the process of
linking imported functions
> >>> >> >> > into
> >>> >> >> > the module, efficient read of the
combined function map, and
> >>> >> >> > enable
> >>> >> >> > the ThinLTO import pass.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > d. Function index/summary support:
> >>> >> >> >
> >>> >> >> > This includes infrastructure for
writing and reading the
> function
> >>> >> >> > index/summary section. As noted earlier
this will be encoded
> in a
> >>> >> >> > special ELF section within the module,
alongside the .llvmbc
> >>> >> >> > section
> >>> >> >> > containing the bitcode. The thin
archive generated by phase-2
> of
> >>> >> >> > ThinLTO simply contains all of the
function index/summary
> >>> >> >> > sections
> >>> >> >> > across the linked modules, organized
for efficient function
> >>> >> >> > lookup.
> >>> >> >> >
> >>> >> >> > Each function available for importing
from the module contains
> an
> >>> >> >> > entry in the module’s function
index/summary section and in the
> >>> >> >> > resulting combined function map. Each
function entry contains
> >>> >> >> > that
> >>> >> >> > function’s offset within the bitcode
file, used to efficiently
> >>> >> >> > locate
> >>> >> >> > and quickly import just that function.
The entry also contains
> >>> >> >> > summary
> >>> >> >> > information (e.g. basic information
determined during parsing
> >>> >> >> > such as
> >>> >> >> > the number of instructions in the
function), that will be used
> to
> >>> >> >> > help
> >>> >> >> > guide later import decisions. Because
the contents of this
> >>> >> >> > section
> >>> >> >> > will change frequently during ThinLTO
tuning, it should also be
> >>> >> >> > marked
> >>> >> >> > with a version id for backwards
compatibility or version
> >>> >> >> > checking.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > e. ThinLTO importing support:
> >>> >> >> >
> >>> >> >> > Support for the mechanics of importing
functions from other
> >>> >> >> > modules,
> >>> >> >> > which can go in gradually as a set of
patches since it will be
> >>> >> >> > off by
> >>> >> >> > default. Separate patches can include:
> >>> >> >> >
> >>> >> >> > - BitcodeReader changes to use function
index to
> >>> >> >> > import/deserialize
> >>> >> >> > single function of interest (small
changes, leverages existing
> >>> >> >> > lazy
> >>> >> >> > streamer support).
> >>> >> >> >
> >>> >> >> > - Minor LTOModule changes to pass the
ThinLTO function to
> import
> >>> >> >> > and
> >>> >> >> > its index into bitcode reader.
> >>> >> >> >
> >>> >> >> > - Marking of imported functions (for
use in ThinLTO-specific
> >>> >> >> > symbol
> >>> >> >> > linking and global DCE, for example).
This can be in-memory
> >>> >> >> > initially,
> >>> >> >> > but IR support may be required in order
to support streaming
> >>> >> >> > bitcode
> >>> >> >> > out and back in again after importing.
> >>> >> >> >
> >>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking
> and
> >>> >> >> > static promotion when necessary. The
linkage type of imported
> >>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
> >>> >> >> > Statics
> >>> >> >> > must be promoted in certain cases, and
renamed in consistent
> >>> >> >> > ways.
> >>> >> >> >
> >>> >> >> > - GlobalDCE changes to support removing
imported functions that
> >>> >> >> > were
> >>> >> >> > not inlined (very small changes to
existing pass logic).
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > f. ThinLTO Import Driver SCC pass:
> >>> >> >> >
> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with
framework for doing
> ThinLTO
> >>> >> >> > via
> >>> >> >> > an SCC pass, enabled only under
-fthinlto options. The pass
> >>> >> >> > includes
> >>> >> >> > utilizing the thin archive (global
function index/summary),
> >>> >> >> > import
> >>> >> >> > decision heuristics, invocation of
LTOModule/ModuleLinker
> >>> >> >> > routines
> >>> >> >> > that perform the import, and any
necessary callgraph updates
> and
> >>> >> >> > verification.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > g. Backend Driver:
> >>> >> >> >
> >>> >> >> > For a single node build, the gold
plugin can simply write a
> >>> >> >> > makefile
> >>> >> >> > and fork the parallel backend instances
directly via parallel
> >>> >> >> > make.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 3. Stage 3: ThinLTO Tuning and
Enhancements
> >>> >> >> >
> ----------------------------------------------------------------
> >>> >> >> >
> >>> >> >> > This refers to the patches that are not
required for ThinLTO to
> >>> >> >> > work,
> >>> >> >> > but rather to improve compile time,
memory, run-time
> performance
> >>> >> >> > and
> >>> >> >> > usability.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. Lazy Debug Metadata Linking:
> >>> >> >> >
> >>> >> >> > The prototype implementation included
lazy importing of
> >>> >> >> > module-level
> >>> >> >> > metadata during the ThinLTO pass
finalization (i.e. after all
> >>> >> >> > function
> >>> >> >> > importing is complete). This actually
applies to all
> module-level
> >>> >> >> > metadata, not just debug, although it
is the largest. This can
> be
> >>> >> >> > added as a separate set of patches.
Changes to BitcodeReader,
> >>> >> >> > ValueMapper, ModuleLinker
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. Import Tuning:
> >>> >> >> >
> >>> >> >> > Tuning the import strategy will be an
iterative process that
> will
> >>> >> >> > continue to be refined over time. It
involves several different
> >>> >> >> > types
> >>> >> >> > of changes: adding support for
recording additional metrics in
> >>> >> >> > the
> >>> >> >> > function summary, such as profile data
and optional
> >>> >> >> > heavier-weight
> >>> >> >> > IPA
> >>> >> >> > analyses, and tuning the import
heuristics based on the summary
> >>> >> >> > and
> >>> >> >> > callsite context.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > c. Combined Function Map Pruning:
> >>> >> >> >
> >>> >> >> > The combined function map can be pruned
of functions that are
> >>> >> >> > unlikely
> >>> >> >> > to benefit from being imported. For
example, during the phase-2
> >>> >> >> > thin
> >>> >> >> > archive plug step we can safely omit
large and (with profile
> >>> >> >> > data)
> >>> >> >> > cold functions, which are unlikely to
benefit from being
> inlined.
> >>> >> >> > Additionally, all but one copy of
comdat functions can be
> >>> >> >> > suppressed.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > d. Distributed Build System
Integration:
> >>> >> >> >
> >>> >> >> > For a distributed build system, the
gold plugin should write
> the
> >>> >> >> > parallel backend invocations into a
makefile, including the
> >>> >> >> > mapping
> >>> >> >> > from the IR file to the real object
file path, and exit.
> >>> >> >> > Additional
> >>> >> >> > work needs to be done in the
distributed build system itself to
> >>> >> >> > distribute and dispatch the parallel
backend jobs to the build
> >>> >> >> > cluster.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > e. Dependence Tracking and Incremental
Compiles:
> >>> >> >> >
> >>> >> >> > In order to support build systems that
stage from local disks
> or
> >>> >> >> > network storage, the plugin will
optionally support computation
> >>> >> >> > of
> >>> >> >> > dependent sets of IR files that each
module may import from.
> This
> >>> >> >> > can
> >>> >> >> > be computed from profile data, if it
exists, or from the symbol
> >>> >> >> > table
> >>> >> >> > and heuristics if not. These dependence
sets also enable
> support
> >>> >> >> > for
> >>> >> >> > incremental backend compiles.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Teresa Johnson | Software Engineer |
tejohnson at google.com |
> >>> >> >> > 408-460-2413
> >>> >> >> >
> >>> >> >> >
_______________________________________________
> >>> >> >> > LLVM Developers mailing list
> >>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>> >> >>
> >>> >> >>
_______________________________________________
> >>> >> >> LLVM Developers mailing list
> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
> >>> >> 408-460-2413
> >>> >>
> >>> >> _______________________________________________
> >>> >> LLVM Developers mailing list
> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>>
> >>> --
> >>> Teresa Johnson | Software Engineer | tejohnson at google.com |
> 408-460-2413
> >>
> >>
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/b6540bf2/attachment.html>

Eric Christopher

2015-May-14 19:53 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at dberlin.org>
wrote:
> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
> wrote:
> > I'm not sure this is a particularly great assumption to make.
>
> Which part?
>
The binutils part :)

>
> >  We have to
> > support a lot of different build systems and tools and concentrating
on
> > something that just binutils uses isn't particularly friendly
here.
> I think you may have misunderstood
> His point was exactly that they want to be transparent to *all of* these
> tools.
> You are saying "we should be friendly to everyone". He is saying
the same
> thing.
> We should be friendly to everyone. The friendly way to do this is to
> not require all of these tools build plugins to handle bitcode.
>
> Hence, elf-wrapped bitcode.
>
Oh, I understood. I just don't know that I agree. To do anything with the
tools will require some knowledge of bitcode anyhow or need the plugin. I'm
saying that as a baseline start we should look at how to do this using the
tools we've got rather than wrapping things for no real gain.

I've talked to Teresa a bit offline and we're going to talk more later
(and
discuss on the list), but there are some discussions about how to make this
work either with just bitcode/llvm tools and so not requiring integration
on all platforms. The latter is what I consider as particularly friendly :)

-eric

>
>
> > I also
> > can't imagine how it's necessary for any of the lto aspects as
currently
> > written in the proposal.
> >
> > -eric
> >
> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli at
gmail.com>
> > wrote:
> >>
> >> The design objective is to make thinLTO mostly transparent to
binutil
> >> tools to enable easy integration with any build system in the
wild.
> >> 'Pass-through' mode with 'ld -r' instead of the
partial LTO mode is
> another
> >> reason.
> >>
> >> David
> >>
> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson at
google.com>
> >> wrote:
> >>>
> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher <echristo
at gmail.com>
> >>> wrote:
> >>> > So, what Alex is saying is that we have these tools as
well and they
> >>> > understand bitcode just fine, as well as every object
format - not
> just
> >>> > ELF.
> >>> > :)
> >>>
> >>> Right, there are also LLVM specific versions (llvm-ar,
llvm-nm) that
> >>> handle bitcode similarly to the way the standard tool + plugin
does.
> >>> But the goal we are trying to achieve is to allow the standard
system
> >>> versions of the tools to handle these files without requiring
a
> >>> plugin. I know the LLVM tool handles other object formats, but
I'm not
> >>> sure how that helps here? We're not planning to replace
those tools,
> >>> just allow the standard system versions to handle the
intermediate
> >>> objects produced by ThinLTO.
> >>>
> >>> Thanks,
> >>> Teresa
> >>>
> >>> >
> >>> > -eric
> >>> >
> >>> >
> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
<tejohnson at google.com>
> >>> > wrote:
> >>> >>
> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li
> >>> >> <xinliangli at gmail.com> wrote:
> >>> >> >
> >>> >> >
> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg
> >>> >> > <alexr at leftfield.org>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> "ELF-wrapped bitcode" seems
potentially controversial to me.
> >>> >> >>
> >>> >> >> What about ar, nm, and various ld
implementations adds this
> >>> >> >> requirement?
> >>> >> >> What about the LLVM implementations of these
tools is lacking?
> >>> >> >
> >>> >> >
> >>> >> > Sorry I can not parse your questions properly.
Can you make it
> >>> >> > clearer?
> >>> >>
> >>> >> Alex is asking what the issue is with ar, nm, ld -r
and regular
> >>> >> bitcode that makes using elf-wrapped bitcode easier.
> >>> >>
> >>> >> The issue is that generally you need to provide a
plugin to these
> >>> >> tools in order for them to understand and handle
bitcode files. We'd
> >>> >> like standard tools to work without requiring a
plugin as much as
> >>> >> possible. And in some cases we want them to be
handled different
> than
> >>> >> the way bitcode files are handled with the plugin.
> >>> >>
> >>> >> nm: Without a plugin, normal bitcode files are
inscrutable. When
> >>> >> provided the gold plugin it can emit the symbols.
> >>> >>
> >>> >> ar: Without a plugin, it will create an archive of
bitcode files,
> but
> >>> >> without an index, so it can't be handled by the
linker even with a
> >>> >> plugin on an -flto link. When ar is provided the gold
plugin it does
> >>> >> create an index, so the linker + gold plugin handle
it appropriately
> >>> >> on an -flto link.
> >>> >>
> >>> >> ld -r: Without a plugin, fails when provided bitcode
inputs. When
> >>> >> provided the gold plugin, it handles them but
compiles them all the
> >>> >> way through to ELF executable instructions via a
partial LTO link.
> >>> >> This is where we would like to differ in behavior
(while also not
> >>> >> requiring a plugin) with ELF-wrapped bitcode: we
would like the ld
> -r
> >>> >> output file to still contain ELF-wrapped bitcode,
delaying the LTO
> >>> >> until the full link step.
> >>> >>
> >>> >> Let me know if that helps address your concerns.
> >>> >>
> >>> >> Thanks,
> >>> >> Teresa
> >>> >>
> >>> >> >
> >>> >> > David
> >>> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> Alex
> >>> >> >>
> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa
Johnson
> >>> >> >> > <tejohnson at google.com>
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> > I've included below an RFC for
implementing ThinLTO in LLVM,
> >>> >> >> > looking
> >>> >> >> > forward to feedback and questions.
> >>> >> >> > Thanks!
> >>> >> >> > Teresa
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > RFC to discuss plans for implementing
ThinLTO upstream.
> >>> >> >> > Background
> >>> >> >> > can
> >>> >> >> > be found in slides from EuroLLVM 2015:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
> >>> >> >> > As described in the talk, we have a
prototype implementation,
> and
> >>> >> >> > would like to start staging patches
upstream. This RFC
> describes
> >>> >> >> > a
> >>> >> >> > breakdown of the major pieces. We would
like to commit upstream
> >>> >> >> > gradually in several stages, with all
functionality off by
> >>> >> >> > default.
> >>> >> >> > The core ThinLTO importing support and
tuning will require
> >>> >> >> > frequent
> >>> >> >> > change and iteration during testing and
tuning, and for that
> part
> >>> >> >> > we
> >>> >> >> > would like to commit rapidly (off by
default). See the proposed
> >>> >> >> > staged
> >>> >> >> > implementation described in the
Implementation Plan section.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > ThinLTO Overview
> >>> >> >> > =============> >>> >>
>> >
> >>> >> >> > See the talk slides linked above for
more details. The
> following
> >>> >> >> > is a
> >>> >> >> > high-level overview of the motivation.
> >>> >> >> >
> >>> >> >> > Cross Module Optimization (CMO) is an
effective means for
> >>> >> >> > improving
> >>> >> >> > runtime performance, by extending the
scope of optimizations
> >>> >> >> > across
> >>> >> >> > source module boundaries. Without CMO,
the compiler is limited
> to
> >>> >> >> > optimizing within the scope of single
source modules. Two
> >>> >> >> > solutions
> >>> >> >> > for enabling CMO are Link-Time
Optimization (LTO), which is
> >>> >> >> > currently
> >>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
> >>> >> >> > Optimization (LIPO). However, each of
these solutions has
> >>> >> >> > limitations
> >>> >> >> > that prevent it from being enabled by
default. ThinLTO is a new
> >>> >> >> > approach that attempts to address these
limitations, with a
> goal
> >>> >> >> > of
> >>> >> >> > being enabled more broadly. ThinLTO is
designed with many of
> the
> >>> >> >> > same
> >>> >> >> > principals as LIPO, and therefore its
advantages, without any
> of
> >>> >> >> > its
> >>> >> >> > inherent weakness. Unlike in LIPO where
the module group
> decision
> >>> >> >> > is
> >>> >> >> > made at profile training runtime,
ThinLTO makes the decision at
> >>> >> >> > compile time, but in a lazy mode that
facilitates large scale
> >>> >> >> > parallelism. The serial linker plugin
phase is designed to be
> >>> >> >> > razor
> >>> >> >> > thin and blazingly fast. By default
this step only does minimal
> >>> >> >> > preparation work to enable the parallel
lazy importing
> performed
> >>> >> >> > later. ThinLTO aims to be scalable like
a regular O2 build,
> >>> >> >> > enabling
> >>> >> >> > CMO on machines without large memory
configurations, while also
> >>> >> >> > integrating well with distributed build
systems. Results from
> >>> >> >> > early
> >>> >> >> > prototyping on SPEC cpu2006 C++
benchmarks are in line with
> >>> >> >> > expectations that ThinLTO can scale
like O2 while enabling much
> >>> >> >> > of
> >>> >> >> > the
> >>> >> >> > CMO performed during a full LTO build.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > A ThinLTO build is divided into 3
phases, which are referred to
> >>> >> >> > in
> >>> >> >> > the
> >>> >> >> > following implementation plan:
> >>> >> >> >
> >>> >> >> > phase-1: IR and Function Summary
Generation (-c compile)
> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin
archive linker step)
> >>> >> >> > phase-3: Parallel Backend with
Demand-Driven Importing
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > Implementation Plan
> >>> >> >> > ===============> >>>
>> >> >
> >>> >> >> > This section gives a high-level
breakdown of the ThinLTO
> support
> >>> >> >> > that
> >>> >> >> > will be added, in roughly the order
that the patches would be
> >>> >> >> > staged.
> >>> >> >> > The patches are divided into three
stages. The first stage
> >>> >> >> > contains a
> >>> >> >> > minimal amount of preparation work that
is not
> ThinLTO-specific.
> >>> >> >> > The
> >>> >> >> > second stage contains most of the
infrastructure for ThinLTO,
> >>> >> >> > which
> >>> >> >> > will be off by default. The third stage
includes
> >>> >> >> > enhancements/improvements/tunings that
can be performed after
> the
> >>> >> >> > main
> >>> >> >> > ThinLTO infrastructure is in.
> >>> >> >> >
> >>> >> >> > The second and third implementation
stages will initially be
> very
> >>> >> >> > volatile, requiring a lot of iterations
and tuning with large
> >>> >> >> > apps to
> >>> >> >> > get stabilized. Therefore it will be
important to do fast
> commits
> >>> >> >> > for
> >>> >> >> > these implementation stages.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 1. Stage 1: Preparation
> >>> >> >> > -------------------------------
> >>> >> >> >
> >>> >> >> > The first planned sets of patches are
enablers for ThinLTO
> work:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. LTO directory structure:
> >>> >> >> >
> >>> >> >> > Restructure the LTO directory to remove
circular dependence
> when
> >>> >> >> > ThinLTO pass added. Because ThinLTO is
being implemented as a
> SCC
> >>> >> >> > pass
> >>> >> >> > within Transforms/IPO, and leverages
the LTOModule class for
> >>> >> >> > linking
> >>> >> >> > in functions from modules, IPO then
requires the LTO library.
> >>> >> >> > This
> >>> >> >> > creates a circular dependence between
LTO and IPO. To break
> that,
> >>> >> >> > we
> >>> >> >> > need to split the lib/LTO
directory/library into
> lib/LTO/CodeGen
> >>> >> >> > and
> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
> >>> >> >> > respectively. Only LTOCodeGenerator has
a dependence on IPO,
> >>> >> >> > removing
> >>> >> >> > the circular dependence.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. ELF wrapper generation support:
> >>> >> >> >
> >>> >> >> > Implement ELF wrapped bitcode writer.
In order to more easily
> >>> >> >> > interact
> >>> >> >> > with tools such as $AR, $NM, and “$LD
-r” we plan to emit the
> >>> >> >> > phase-1
> >>> >> >> > bitcode wrapped in ELF via the .llvmbc
section, along with a
> >>> >> >> > symbol
> >>> >> >> > table. The goal is both to interact
with these tools without
> >>> >> >> > requiring
> >>> >> >> > a plugin, and also to avoid doing
partial LTO/ThinLTO across
> >>> >> >> > files
> >>> >> >> > linked with “$LD -r” (i.e. the
resulting object file should
> still
> >>> >> >> > contain ELF-wrapped bitcode to enable
ThinLTO at the full link
> >>> >> >> > step).
> >>> >> >> > I will send a separate design document
for these changes, but
> the
> >>> >> >> > following is a high-level overview.
> >>> >> >> >
> >>> >> >> > Support was added to LLVM for reading
ELF-wrapped bitcode
> >>> >> >> > (http://reviews.llvm.org/rL218078), but
there does not yet
> exist
> >>> >> >> > support in LLVM/Clang for emitting
bitcode wrapped in ELF. I
> plan
> >>> >> >> > to
> >>> >> >> > add support for optionally generating
bitcode in an ELF file
> >>> >> >> > containing a single .llvmbc section
holding the bitcode.
> >>> >> >> > Specifically,
> >>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object
> file)
> >>> >> >> > and
> >>> >> >> > corresponding “emit-llvm-elf” (textual
assembly code
> equivalent).
> >>> >> >> > Eventually these would be automatically
triggered under
> >>> >> >> > “-fthinlto
> >>> >> >> > -c”
> >>> >> >> > and “-fthinlto -S”, respectively.
> >>> >> >> >
> >>> >> >> > Additionally, a symbol table will be
generated in the ELF file,
> >>> >> >> > holding the function symbols within the
bitcode. This
> facilitates
> >>> >> >> > handling archives of the ELF-wrapped
bitcode created with $AR,
> >>> >> >> > since
> >>> >> >> > the archive will have a symbol table as
well. The archive
> symbol
> >>> >> >> > table
> >>> >> >> > enables gold to extract and pass to the
plugin the constituent
> >>> >> >> > ELF-wrapped bitcode files. To support
the concatenated llvmbc
> >>> >> >> > section
> >>> >> >> > generated by “$LD -r”, some handling
needs to be added to gold
> >>> >> >> > and to
> >>> >> >> > the backend driver to process each
original module’s bitcode.
> >>> >> >> >
> >>> >> >> > The function index/summary will later
be added as a special ELF
> >>> >> >> > section alongside the .llvmbc sections.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure
> >>> >> >> >
----------------------------------------------
> >>> >> >> >
> >>> >> >> > The next set of patches adds the base
implementation of the
> >>> >> >> > ThinLTO
> >>> >> >> > infrastructure, specifically those
required to make ThinLTO
> >>> >> >> > functional
> >>> >> >> > and generate correct but not
necessarily high-performing
> >>> >> >> > binaries. It
> >>> >> >> > also does not include support to make
debug support under -g
> >>> >> >> > efficient
> >>> >> >> > with ThinLTO.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. Clang/LLVM/gold linker options:
> >>> >> >> >
> >>> >> >> > An early set of clang/llvm patches is
needed to provide options
> >>> >> >> > to
> >>> >> >> > enable ThinLTO (off by default), so
that the rest of the
> >>> >> >> > implementation can be disabled by
default as it is added.
> >>> >> >> > Specifically, clang options -fthinlto
(used instead of -flto)
> >>> >> >> > will
> >>> >> >> > cause clang to invoke the phase-1
emission of LLVM bitcode and
> >>> >> >> > function summary/index on a compile
step, and pass the
> >>> >> >> > appropriate
> >>> >> >> > option to the gold plugin on a link
step. The -thinlto option
> >>> >> >> > will be
> >>> >> >> > added to the gold plugin and llvm-lto
tool to launch the
> phase-2
> >>> >> >> > thin
> >>> >> >> > archive step. The -thinlto option will
also be added to the
> ‘opt’
> >>> >> >> > tool
> >>> >> >> > to invoke it as a phase-3 parallel
backend instance.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. Thin-archive linking support in Gold
plugin and llvm-lto:
> >>> >> >> >
> >>> >> >> > Under the new plugin option (see
above), the plugin needs to
> >>> >> >> > perform
> >>> >> >> > the phase-2 (thin archive) link which
simply emits a combined
> >>> >> >> > function
> >>> >> >> > map from the linked modules, without
actually performing the
> >>> >> >> > normal
> >>> >> >> > link. Corresponding support should be
added to the standalone
> >>> >> >> > llvm-lto
> >>> >> >> > tool to enable testing/debugging
without involving the linker
> and
> >>> >> >> > plugin.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > c. ThinLTO backend support:
> >>> >> >> >
> >>> >> >> > Support for invoking a phase-3 backend
invocation (including
> >>> >> >> > importing) on a module should be added
to the ‘opt’ tool under
> >>> >> >> > the
> >>> >> >> > new
> >>> >> >> > option. The main change under the
option is to instantiate a
> >>> >> >> > Linker
> >>> >> >> > object used to manage the process of
linking imported functions
> >>> >> >> > into
> >>> >> >> > the module, efficient read of the
combined function map, and
> >>> >> >> > enable
> >>> >> >> > the ThinLTO import pass.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > d. Function index/summary support:
> >>> >> >> >
> >>> >> >> > This includes infrastructure for
writing and reading the
> function
> >>> >> >> > index/summary section. As noted earlier
this will be encoded
> in a
> >>> >> >> > special ELF section within the module,
alongside the .llvmbc
> >>> >> >> > section
> >>> >> >> > containing the bitcode. The thin
archive generated by phase-2
> of
> >>> >> >> > ThinLTO simply contains all of the
function index/summary
> >>> >> >> > sections
> >>> >> >> > across the linked modules, organized
for efficient function
> >>> >> >> > lookup.
> >>> >> >> >
> >>> >> >> > Each function available for importing
from the module contains
> an
> >>> >> >> > entry in the module’s function
index/summary section and in the
> >>> >> >> > resulting combined function map. Each
function entry contains
> >>> >> >> > that
> >>> >> >> > function’s offset within the bitcode
file, used to efficiently
> >>> >> >> > locate
> >>> >> >> > and quickly import just that function.
The entry also contains
> >>> >> >> > summary
> >>> >> >> > information (e.g. basic information
determined during parsing
> >>> >> >> > such as
> >>> >> >> > the number of instructions in the
function), that will be used
> to
> >>> >> >> > help
> >>> >> >> > guide later import decisions. Because
the contents of this
> >>> >> >> > section
> >>> >> >> > will change frequently during ThinLTO
tuning, it should also be
> >>> >> >> > marked
> >>> >> >> > with a version id for backwards
compatibility or version
> >>> >> >> > checking.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > e. ThinLTO importing support:
> >>> >> >> >
> >>> >> >> > Support for the mechanics of importing
functions from other
> >>> >> >> > modules,
> >>> >> >> > which can go in gradually as a set of
patches since it will be
> >>> >> >> > off by
> >>> >> >> > default. Separate patches can include:
> >>> >> >> >
> >>> >> >> > - BitcodeReader changes to use function
index to
> >>> >> >> > import/deserialize
> >>> >> >> > single function of interest (small
changes, leverages existing
> >>> >> >> > lazy
> >>> >> >> > streamer support).
> >>> >> >> >
> >>> >> >> > - Minor LTOModule changes to pass the
ThinLTO function to
> import
> >>> >> >> > and
> >>> >> >> > its index into bitcode reader.
> >>> >> >> >
> >>> >> >> > - Marking of imported functions (for
use in ThinLTO-specific
> >>> >> >> > symbol
> >>> >> >> > linking and global DCE, for example).
This can be in-memory
> >>> >> >> > initially,
> >>> >> >> > but IR support may be required in order
to support streaming
> >>> >> >> > bitcode
> >>> >> >> > out and back in again after importing.
> >>> >> >> >
> >>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking
> and
> >>> >> >> > static promotion when necessary. The
linkage type of imported
> >>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
> >>> >> >> > Statics
> >>> >> >> > must be promoted in certain cases, and
renamed in consistent
> >>> >> >> > ways.
> >>> >> >> >
> >>> >> >> > - GlobalDCE changes to support removing
imported functions that
> >>> >> >> > were
> >>> >> >> > not inlined (very small changes to
existing pass logic).
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > f. ThinLTO Import Driver SCC pass:
> >>> >> >> >
> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with
framework for doing
> ThinLTO
> >>> >> >> > via
> >>> >> >> > an SCC pass, enabled only under
-fthinlto options. The pass
> >>> >> >> > includes
> >>> >> >> > utilizing the thin archive (global
function index/summary),
> >>> >> >> > import
> >>> >> >> > decision heuristics, invocation of
LTOModule/ModuleLinker
> >>> >> >> > routines
> >>> >> >> > that perform the import, and any
necessary callgraph updates
> and
> >>> >> >> > verification.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > g. Backend Driver:
> >>> >> >> >
> >>> >> >> > For a single node build, the gold
plugin can simply write a
> >>> >> >> > makefile
> >>> >> >> > and fork the parallel backend instances
directly via parallel
> >>> >> >> > make.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 3. Stage 3: ThinLTO Tuning and
Enhancements
> >>> >> >> >
> ----------------------------------------------------------------
> >>> >> >> >
> >>> >> >> > This refers to the patches that are not
required for ThinLTO to
> >>> >> >> > work,
> >>> >> >> > but rather to improve compile time,
memory, run-time
> performance
> >>> >> >> > and
> >>> >> >> > usability.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > a. Lazy Debug Metadata Linking:
> >>> >> >> >
> >>> >> >> > The prototype implementation included
lazy importing of
> >>> >> >> > module-level
> >>> >> >> > metadata during the ThinLTO pass
finalization (i.e. after all
> >>> >> >> > function
> >>> >> >> > importing is complete). This actually
applies to all
> module-level
> >>> >> >> > metadata, not just debug, although it
is the largest. This can
> be
> >>> >> >> > added as a separate set of patches.
Changes to BitcodeReader,
> >>> >> >> > ValueMapper, ModuleLinker
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > b. Import Tuning:
> >>> >> >> >
> >>> >> >> > Tuning the import strategy will be an
iterative process that
> will
> >>> >> >> > continue to be refined over time. It
involves several different
> >>> >> >> > types
> >>> >> >> > of changes: adding support for
recording additional metrics in
> >>> >> >> > the
> >>> >> >> > function summary, such as profile data
and optional
> >>> >> >> > heavier-weight
> >>> >> >> > IPA
> >>> >> >> > analyses, and tuning the import
heuristics based on the summary
> >>> >> >> > and
> >>> >> >> > callsite context.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > c. Combined Function Map Pruning:
> >>> >> >> >
> >>> >> >> > The combined function map can be pruned
of functions that are
> >>> >> >> > unlikely
> >>> >> >> > to benefit from being imported. For
example, during the phase-2
> >>> >> >> > thin
> >>> >> >> > archive plug step we can safely omit
large and (with profile
> >>> >> >> > data)
> >>> >> >> > cold functions, which are unlikely to
benefit from being
> inlined.
> >>> >> >> > Additionally, all but one copy of
comdat functions can be
> >>> >> >> > suppressed.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > d. Distributed Build System
Integration:
> >>> >> >> >
> >>> >> >> > For a distributed build system, the
gold plugin should write
> the
> >>> >> >> > parallel backend invocations into a
makefile, including the
> >>> >> >> > mapping
> >>> >> >> > from the IR file to the real object
file path, and exit.
> >>> >> >> > Additional
> >>> >> >> > work needs to be done in the
distributed build system itself to
> >>> >> >> > distribute and dispatch the parallel
backend jobs to the build
> >>> >> >> > cluster.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > e. Dependence Tracking and Incremental
Compiles:
> >>> >> >> >
> >>> >> >> > In order to support build systems that
stage from local disks
> or
> >>> >> >> > network storage, the plugin will
optionally support computation
> >>> >> >> > of
> >>> >> >> > dependent sets of IR files that each
module may import from.
> This
> >>> >> >> > can
> >>> >> >> > be computed from profile data, if it
exists, or from the symbol
> >>> >> >> > table
> >>> >> >> > and heuristics if not. These dependence
sets also enable
> support
> >>> >> >> > for
> >>> >> >> > incremental backend compiles.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Teresa Johnson | Software Engineer |
tejohnson at google.com |
> >>> >> >> > 408-460-2413
> >>> >> >> >
> >>> >> >> >
_______________________________________________
> >>> >> >> > LLVM Developers mailing list
> >>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>> >> >>
> >>> >> >>
_______________________________________________
> >>> >> >> LLVM Developers mailing list
> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
> >>> >> 408-460-2413
> >>> >>
> >>> >> _______________________________________________
> >>> >> LLVM Developers mailing list
> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>>
> >>> --
> >>> Teresa Johnson | Software Engineer | tejohnson at google.com |
> 408-460-2413
> >>
> >>
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/9aba9460/attachment.html>

David Blaikie

2015-May-14 20:11 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>> > I'm not sure this is a particularly great assumption to make.
>>
>> Which part?
>>
>
> The binutils part :)
>
>
>>
>> >  We have to
>> > support a lot of different build systems and tools and
concentrating on
>> > something that just binutils uses isn't particularly friendly
here.
>> I think you may have misunderstood
>> His point was exactly that they want to be transparent to *all of*
these
>> tools.
>> You are saying "we should be friendly to everyone". He is
saying the same
>> thing.
>> We should be friendly to everyone. The friendly way to do this is to
>> not require all of these tools build plugins to handle bitcode.
>>
>> Hence, elf-wrapped bitcode.
>>
>
> Oh, I understood. I just don't know that I agree. To do anything with
the
> tools will require some knowledge of bitcode anyhow or need the plugin.
I'm
> saying that as a baseline start we should look at how to do this using the
> tools we've got rather than wrapping things for no real gain.
>
That doesn't seem strictly true - the ar situation (which I'm lead to
believe is in use in our build system & others, one would assume). With the
symbol table included as proposed, ar can be used without any knowledge of
the bitcode or need for a plugin.

It'd be helpful to have the scenarios we're trying to support with these
tools & then weigh up the alternatives.

> I've talked to Teresa a bit offline and we're going to talk more
later
> (and discuss on the list), but there are some discussions about how to make
> this work either with just bitcode/llvm tools and so not requiring
> integration on all platforms. The latter is what I consider as particularly
> friendly :)
>
> -eric
>
>
>>
>>
>> > I also
>> > can't imagine how it's necessary for any of the lto
aspects as currently
>> > written in the proposal.
>> >
>> > -eric
>> >
>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli
at gmail.com
>> >
>> > wrote:
>> >>
>> >> The design objective is to make thinLTO mostly transparent to
binutil
>> >> tools to enable easy integration with any build system in the
wild.
>> >> 'Pass-through' mode with 'ld -r' instead of
the partial LTO mode is
>> another
>> >> reason.
>> >>
>> >> David
>> >>
>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson
at google.com>
>> >> wrote:
>> >>>
>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher
<echristo at gmail.com
>> >
>> >>> wrote:
>> >>> > So, what Alex is saying is that we have these tools
as well and they
>> >>> > understand bitcode just fine, as well as every object
format - not
>> just
>> >>> > ELF.
>> >>> > :)
>> >>>
>> >>> Right, there are also LLVM specific versions (llvm-ar,
llvm-nm) that
>> >>> handle bitcode similarly to the way the standard tool +
plugin does.
>> >>> But the goal we are trying to achieve is to allow the
standard system
>> >>> versions of the tools to handle these files without
requiring a
>> >>> plugin. I know the LLVM tool handles other object formats,
but I'm not
>> >>> sure how that helps here? We're not planning to
replace those tools,
>> >>> just allow the standard system versions to handle the
intermediate
>> >>> objects produced by ThinLTO.
>> >>>
>> >>> Thanks,
>> >>> Teresa
>> >>>
>> >>> >
>> >>> > -eric
>> >>> >
>> >>> >
>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
<tejohnson at google.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David
Li
>> >>> >> <xinliangli at gmail.com> wrote:
>> >>> >> >
>> >>> >> >
>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex
Rosenberg
>> >>> >> > <alexr at leftfield.org>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> "ELF-wrapped bitcode" seems
potentially controversial to me.
>> >>> >> >>
>> >>> >> >> What about ar, nm, and various ld
implementations adds this
>> >>> >> >> requirement?
>> >>> >> >> What about the LLVM implementations of
these tools is lacking?
>> >>> >> >
>> >>> >> >
>> >>> >> > Sorry I can not parse your questions
properly. Can you make it
>> >>> >> > clearer?
>> >>> >>
>> >>> >> Alex is asking what the issue is with ar, nm, ld
-r and regular
>> >>> >> bitcode that makes using elf-wrapped bitcode
easier.
>> >>> >>
>> >>> >> The issue is that generally you need to provide a
plugin to these
>> >>> >> tools in order for them to understand and handle
bitcode files.
>> We'd
>> >>> >> like standard tools to work without requiring a
plugin as much as
>> >>> >> possible. And in some cases we want them to be
handled different
>> than
>> >>> >> the way bitcode files are handled with the
plugin.
>> >>> >>
>> >>> >> nm: Without a plugin, normal bitcode files are
inscrutable. When
>> >>> >> provided the gold plugin it can emit the symbols.
>> >>> >>
>> >>> >> ar: Without a plugin, it will create an archive
of bitcode files,
>> but
>> >>> >> without an index, so it can't be handled by
the linker even with a
>> >>> >> plugin on an -flto link. When ar is provided the
gold plugin it
>> does
>> >>> >> create an index, so the linker + gold plugin
handle it
>> appropriately
>> >>> >> on an -flto link.
>> >>> >>
>> >>> >> ld -r: Without a plugin, fails when provided
bitcode inputs. When
>> >>> >> provided the gold plugin, it handles them but
compiles them all the
>> >>> >> way through to ELF executable instructions via a
partial LTO link.
>> >>> >> This is where we would like to differ in behavior
(while also not
>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we
would like the ld
>> -r
>> >>> >> output file to still contain ELF-wrapped bitcode,
delaying the LTO
>> >>> >> until the full link step.
>> >>> >>
>> >>> >> Let me know if that helps address your concerns.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Teresa
>> >>> >>
>> >>> >> >
>> >>> >> > David
>> >>> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> Alex
>> >>> >> >>
>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa
Johnson
>> >>> >> >> > <tejohnson at google.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > I've included below an RFC for
implementing ThinLTO in LLVM,
>> >>> >> >> > looking
>> >>> >> >> > forward to feedback and questions.
>> >>> >> >> > Thanks!
>> >>> >> >> > Teresa
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > RFC to discuss plans for
implementing ThinLTO upstream.
>> >>> >> >> > Background
>> >>> >> >> > can
>> >>> >> >> > be found in slides from EuroLLVM
2015:
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>> >>> >> >> > As described in the talk, we have a
prototype implementation,
>> and
>> >>> >> >> > would like to start staging patches
upstream. This RFC
>> describes
>> >>> >> >> > a
>> >>> >> >> > breakdown of the major pieces. We
would like to commit
>> upstream
>> >>> >> >> > gradually in several stages, with
all functionality off by
>> >>> >> >> > default.
>> >>> >> >> > The core ThinLTO importing support
and tuning will require
>> >>> >> >> > frequent
>> >>> >> >> > change and iteration during testing
and tuning, and for that
>> part
>> >>> >> >> > we
>> >>> >> >> > would like to commit rapidly (off
by default). See the
>> proposed
>> >>> >> >> > staged
>> >>> >> >> > implementation described in the
Implementation Plan section.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > ThinLTO Overview
>> >>> >> >> > =============>> >>>
>> >> >
>> >>> >> >> > See the talk slides linked above
for more details. The
>> following
>> >>> >> >> > is a
>> >>> >> >> > high-level overview of the
motivation.
>> >>> >> >> >
>> >>> >> >> > Cross Module Optimization (CMO) is
an effective means for
>> >>> >> >> > improving
>> >>> >> >> > runtime performance, by extending
the scope of optimizations
>> >>> >> >> > across
>> >>> >> >> > source module boundaries. Without
CMO, the compiler is
>> limited to
>> >>> >> >> > optimizing within the scope of
single source modules. Two
>> >>> >> >> > solutions
>> >>> >> >> > for enabling CMO are Link-Time
Optimization (LTO), which is
>> >>> >> >> > currently
>> >>> >> >> > supported in LLVM and GCC, and
Lightweight-Interprocedural
>> >>> >> >> > Optimization (LIPO). However, each
of these solutions has
>> >>> >> >> > limitations
>> >>> >> >> > that prevent it from being enabled
by default. ThinLTO is a
>> new
>> >>> >> >> > approach that attempts to address
these limitations, with a
>> goal
>> >>> >> >> > of
>> >>> >> >> > being enabled more broadly. ThinLTO
is designed with many of
>> the
>> >>> >> >> > same
>> >>> >> >> > principals as LIPO, and therefore
its advantages, without any
>> of
>> >>> >> >> > its
>> >>> >> >> > inherent weakness. Unlike in LIPO
where the module group
>> decision
>> >>> >> >> > is
>> >>> >> >> > made at profile training runtime,
ThinLTO makes the decision
>> at
>> >>> >> >> > compile time, but in a lazy mode
that facilitates large scale
>> >>> >> >> > parallelism. The serial linker
plugin phase is designed to be
>> >>> >> >> > razor
>> >>> >> >> > thin and blazingly fast. By default
this step only does
>> minimal
>> >>> >> >> > preparation work to enable the
parallel lazy importing
>> performed
>> >>> >> >> > later. ThinLTO aims to be scalable
like a regular O2 build,
>> >>> >> >> > enabling
>> >>> >> >> > CMO on machines without large
memory configurations, while
>> also
>> >>> >> >> > integrating well with distributed
build systems. Results from
>> >>> >> >> > early
>> >>> >> >> > prototyping on SPEC cpu2006 C++
benchmarks are in line with
>> >>> >> >> > expectations that ThinLTO can scale
like O2 while enabling
>> much
>> >>> >> >> > of
>> >>> >> >> > the
>> >>> >> >> > CMO performed during a full LTO
build.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > A ThinLTO build is divided into 3
phases, which are referred
>> to
>> >>> >> >> > in
>> >>> >> >> > the
>> >>> >> >> > following implementation plan:
>> >>> >> >> >
>> >>> >> >> > phase-1: IR and Function Summary
Generation (-c compile)
>> >>> >> >> > phase-2: Thin Linker Plugin Layer
(thin archive linker step)
>> >>> >> >> > phase-3: Parallel Backend with
Demand-Driven Importing
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > Implementation Plan
>> >>> >> >> > ===============>>
>>> >> >> >
>> >>> >> >> > This section gives a high-level
breakdown of the ThinLTO
>> support
>> >>> >> >> > that
>> >>> >> >> > will be added, in roughly the order
that the patches would be
>> >>> >> >> > staged.
>> >>> >> >> > The patches are divided into three
stages. The first stage
>> >>> >> >> > contains a
>> >>> >> >> > minimal amount of preparation work
that is not
>> ThinLTO-specific.
>> >>> >> >> > The
>> >>> >> >> > second stage contains most of the
infrastructure for ThinLTO,
>> >>> >> >> > which
>> >>> >> >> > will be off by default. The third
stage includes
>> >>> >> >> > enhancements/improvements/tunings
that can be performed after
>> the
>> >>> >> >> > main
>> >>> >> >> > ThinLTO infrastructure is in.
>> >>> >> >> >
>> >>> >> >> > The second and third implementation
stages will initially be
>> very
>> >>> >> >> > volatile, requiring a lot of
iterations and tuning with large
>> >>> >> >> > apps to
>> >>> >> >> > get stabilized. Therefore it will
be important to do fast
>> commits
>> >>> >> >> > for
>> >>> >> >> > these implementation stages.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 1. Stage 1: Preparation
>> >>> >> >> > -------------------------------
>> >>> >> >> >
>> >>> >> >> > The first planned sets of patches
are enablers for ThinLTO
>> work:
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. LTO directory structure:
>> >>> >> >> >
>> >>> >> >> > Restructure the LTO directory to
remove circular dependence
>> when
>> >>> >> >> > ThinLTO pass added. Because ThinLTO
is being implemented as a
>> SCC
>> >>> >> >> > pass
>> >>> >> >> > within Transforms/IPO, and
leverages the LTOModule class for
>> >>> >> >> > linking
>> >>> >> >> > in functions from modules, IPO then
requires the LTO library.
>> >>> >> >> > This
>> >>> >> >> > creates a circular dependence
between LTO and IPO. To break
>> that,
>> >>> >> >> > we
>> >>> >> >> > need to split the lib/LTO
directory/library into
>> lib/LTO/CodeGen
>> >>> >> >> > and
>> >>> >> >> > lib/LTO/Module, containing
LTOCodeGenerator and LTOModule,
>> >>> >> >> > respectively. Only LTOCodeGenerator
has a dependence on IPO,
>> >>> >> >> > removing
>> >>> >> >> > the circular dependence.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. ELF wrapper generation support:
>> >>> >> >> >
>> >>> >> >> > Implement ELF wrapped bitcode
writer. In order to more easily
>> >>> >> >> > interact
>> >>> >> >> > with tools such as $AR, $NM, and
“$LD -r” we plan to emit the
>> >>> >> >> > phase-1
>> >>> >> >> > bitcode wrapped in ELF via the
.llvmbc section, along with a
>> >>> >> >> > symbol
>> >>> >> >> > table. The goal is both to interact
with these tools without
>> >>> >> >> > requiring
>> >>> >> >> > a plugin, and also to avoid doing
partial LTO/ThinLTO across
>> >>> >> >> > files
>> >>> >> >> > linked with “$LD -r” (i.e. the
resulting object file should
>> still
>> >>> >> >> > contain ELF-wrapped bitcode to
enable ThinLTO at the full link
>> >>> >> >> > step).
>> >>> >> >> > I will send a separate design
document for these changes, but
>> the
>> >>> >> >> > following is a high-level overview.
>> >>> >> >> >
>> >>> >> >> > Support was added to LLVM for
reading ELF-wrapped bitcode
>> >>> >> >> > (http://reviews.llvm.org/rL218078),
but there does not yet
>> exist
>> >>> >> >> > support in LLVM/Clang for emitting
bitcode wrapped in ELF. I
>> plan
>> >>> >> >> > to
>> >>> >> >> > add support for optionally
generating bitcode in an ELF file
>> >>> >> >> > containing a single .llvmbc section
holding the bitcode.
>> >>> >> >> > Specifically,
>> >>> >> >> > the patch would add new options
“emit-llvm-bc-elf” (object
>> file)
>> >>> >> >> > and
>> >>> >> >> > corresponding “emit-llvm-elf”
(textual assembly code
>> equivalent).
>> >>> >> >> > Eventually these would be
automatically triggered under
>> >>> >> >> > “-fthinlto
>> >>> >> >> > -c”
>> >>> >> >> > and “-fthinlto -S”, respectively.
>> >>> >> >> >
>> >>> >> >> > Additionally, a symbol table will
be generated in the ELF
>> file,
>> >>> >> >> > holding the function symbols within
the bitcode. This
>> facilitates
>> >>> >> >> > handling archives of the
ELF-wrapped bitcode created with $AR,
>> >>> >> >> > since
>> >>> >> >> > the archive will have a symbol
table as well. The archive
>> symbol
>> >>> >> >> > table
>> >>> >> >> > enables gold to extract and pass to
the plugin the constituent
>> >>> >> >> > ELF-wrapped bitcode files. To
support the concatenated llvmbc
>> >>> >> >> > section
>> >>> >> >> > generated by “$LD -r”, some
handling needs to be added to gold
>> >>> >> >> > and to
>> >>> >> >> > the backend driver to process each
original module’s bitcode.
>> >>> >> >> >
>> >>> >> >> > The function index/summary will
later be added as a special
>> ELF
>> >>> >> >> > section alongside the .llvmbc
sections.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure
>> >>> >> >> >
----------------------------------------------
>> >>> >> >> >
>> >>> >> >> > The next set of patches adds the
base implementation of the
>> >>> >> >> > ThinLTO
>> >>> >> >> > infrastructure, specifically those
required to make ThinLTO
>> >>> >> >> > functional
>> >>> >> >> > and generate correct but not
necessarily high-performing
>> >>> >> >> > binaries. It
>> >>> >> >> > also does not include support to
make debug support under -g
>> >>> >> >> > efficient
>> >>> >> >> > with ThinLTO.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. Clang/LLVM/gold linker options:
>> >>> >> >> >
>> >>> >> >> > An early set of clang/llvm patches
is needed to provide
>> options
>> >>> >> >> > to
>> >>> >> >> > enable ThinLTO (off by default), so
that the rest of the
>> >>> >> >> > implementation can be disabled by
default as it is added.
>> >>> >> >> > Specifically, clang options
-fthinlto (used instead of -flto)
>> >>> >> >> > will
>> >>> >> >> > cause clang to invoke the phase-1
emission of LLVM bitcode and
>> >>> >> >> > function summary/index on a compile
step, and pass the
>> >>> >> >> > appropriate
>> >>> >> >> > option to the gold plugin on a link
step. The -thinlto option
>> >>> >> >> > will be
>> >>> >> >> > added to the gold plugin and
llvm-lto tool to launch the
>> phase-2
>> >>> >> >> > thin
>> >>> >> >> > archive step. The -thinlto option
will also be added to the
>> ‘opt’
>> >>> >> >> > tool
>> >>> >> >> > to invoke it as a phase-3 parallel
backend instance.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. Thin-archive linking support in
Gold plugin and llvm-lto:
>> >>> >> >> >
>> >>> >> >> > Under the new plugin option (see
above), the plugin needs to
>> >>> >> >> > perform
>> >>> >> >> > the phase-2 (thin archive) link
which simply emits a combined
>> >>> >> >> > function
>> >>> >> >> > map from the linked modules,
without actually performing the
>> >>> >> >> > normal
>> >>> >> >> > link. Corresponding support should
be added to the standalone
>> >>> >> >> > llvm-lto
>> >>> >> >> > tool to enable testing/debugging
without involving the linker
>> and
>> >>> >> >> > plugin.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > c. ThinLTO backend support:
>> >>> >> >> >
>> >>> >> >> > Support for invoking a phase-3
backend invocation (including
>> >>> >> >> > importing) on a module should be
added to the ‘opt’ tool under
>> >>> >> >> > the
>> >>> >> >> > new
>> >>> >> >> > option. The main change under the
option is to instantiate a
>> >>> >> >> > Linker
>> >>> >> >> > object used to manage the process
of linking imported
>> functions
>> >>> >> >> > into
>> >>> >> >> > the module, efficient read of the
combined function map, and
>> >>> >> >> > enable
>> >>> >> >> > the ThinLTO import pass.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > d. Function index/summary support:
>> >>> >> >> >
>> >>> >> >> > This includes infrastructure for
writing and reading the
>> function
>> >>> >> >> > index/summary section. As noted
earlier this will be encoded
>> in a
>> >>> >> >> > special ELF section within the
module, alongside the .llvmbc
>> >>> >> >> > section
>> >>> >> >> > containing the bitcode. The thin
archive generated by phase-2
>> of
>> >>> >> >> > ThinLTO simply contains all of the
function index/summary
>> >>> >> >> > sections
>> >>> >> >> > across the linked modules,
organized for efficient function
>> >>> >> >> > lookup.
>> >>> >> >> >
>> >>> >> >> > Each function available for
importing from the module
>> contains an
>> >>> >> >> > entry in the module’s function
index/summary section and in
>> the
>> >>> >> >> > resulting combined function map.
Each function entry contains
>> >>> >> >> > that
>> >>> >> >> > function’s offset within the
bitcode file, used to efficiently
>> >>> >> >> > locate
>> >>> >> >> > and quickly import just that
function. The entry also contains
>> >>> >> >> > summary
>> >>> >> >> > information (e.g. basic information
determined during parsing
>> >>> >> >> > such as
>> >>> >> >> > the number of instructions in the
function), that will be
>> used to
>> >>> >> >> > help
>> >>> >> >> > guide later import decisions.
Because the contents of this
>> >>> >> >> > section
>> >>> >> >> > will change frequently during
ThinLTO tuning, it should also
>> be
>> >>> >> >> > marked
>> >>> >> >> > with a version id for backwards
compatibility or version
>> >>> >> >> > checking.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > e. ThinLTO importing support:
>> >>> >> >> >
>> >>> >> >> > Support for the mechanics of
importing functions from other
>> >>> >> >> > modules,
>> >>> >> >> > which can go in gradually as a set
of patches since it will be
>> >>> >> >> > off by
>> >>> >> >> > default. Separate patches can
include:
>> >>> >> >> >
>> >>> >> >> > - BitcodeReader changes to use
function index to
>> >>> >> >> > import/deserialize
>> >>> >> >> > single function of interest (small
changes, leverages existing
>> >>> >> >> > lazy
>> >>> >> >> > streamer support).
>> >>> >> >> >
>> >>> >> >> > - Minor LTOModule changes to pass
the ThinLTO function to
>> import
>> >>> >> >> > and
>> >>> >> >> > its index into bitcode reader.
>> >>> >> >> >
>> >>> >> >> > - Marking of imported functions
(for use in ThinLTO-specific
>> >>> >> >> > symbol
>> >>> >> >> > linking and global DCE, for
example). This can be in-memory
>> >>> >> >> > initially,
>> >>> >> >> > but IR support may be required in
order to support streaming
>> >>> >> >> > bitcode
>> >>> >> >> > out and back in again after
importing.
>> >>> >> >> >
>> >>> >> >> > - ModuleLinker changes to do
ThinLTO-specific symbol linking
>> and
>> >>> >> >> > static promotion when necessary.
The linkage type of imported
>> >>> >> >> > functions changes to
AvailableExternallyLinkage, for example.
>> >>> >> >> > Statics
>> >>> >> >> > must be promoted in certain cases,
and renamed in consistent
>> >>> >> >> > ways.
>> >>> >> >> >
>> >>> >> >> > - GlobalDCE changes to support
removing imported functions
>> that
>> >>> >> >> > were
>> >>> >> >> > not inlined (very small changes to
existing pass logic).
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > f. ThinLTO Import Driver SCC pass:
>> >>> >> >> >
>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp
with framework for doing
>> ThinLTO
>> >>> >> >> > via
>> >>> >> >> > an SCC pass, enabled only under
-fthinlto options. The pass
>> >>> >> >> > includes
>> >>> >> >> > utilizing the thin archive (global
function index/summary),
>> >>> >> >> > import
>> >>> >> >> > decision heuristics, invocation of
LTOModule/ModuleLinker
>> >>> >> >> > routines
>> >>> >> >> > that perform the import, and any
necessary callgraph updates
>> and
>> >>> >> >> > verification.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > g. Backend Driver:
>> >>> >> >> >
>> >>> >> >> > For a single node build, the gold
plugin can simply write a
>> >>> >> >> > makefile
>> >>> >> >> > and fork the parallel backend
instances directly via parallel
>> >>> >> >> > make.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and
Enhancements
>> >>> >> >> >
>> ----------------------------------------------------------------
>> >>> >> >> >
>> >>> >> >> > This refers to the patches that are
not required for ThinLTO
>> to
>> >>> >> >> > work,
>> >>> >> >> > but rather to improve compile time,
memory, run-time
>> performance
>> >>> >> >> > and
>> >>> >> >> > usability.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > a. Lazy Debug Metadata Linking:
>> >>> >> >> >
>> >>> >> >> > The prototype implementation
included lazy importing of
>> >>> >> >> > module-level
>> >>> >> >> > metadata during the ThinLTO pass
finalization (i.e. after all
>> >>> >> >> > function
>> >>> >> >> > importing is complete). This
actually applies to all
>> module-level
>> >>> >> >> > metadata, not just debug, although
it is the largest. This
>> can be
>> >>> >> >> > added as a separate set of patches.
Changes to BitcodeReader,
>> >>> >> >> > ValueMapper, ModuleLinker
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > b. Import Tuning:
>> >>> >> >> >
>> >>> >> >> > Tuning the import strategy will be
an iterative process that
>> will
>> >>> >> >> > continue to be refined over time.
It involves several
>> different
>> >>> >> >> > types
>> >>> >> >> > of changes: adding support for
recording additional metrics in
>> >>> >> >> > the
>> >>> >> >> > function summary, such as profile
data and optional
>> >>> >> >> > heavier-weight
>> >>> >> >> > IPA
>> >>> >> >> > analyses, and tuning the import
heuristics based on the
>> summary
>> >>> >> >> > and
>> >>> >> >> > callsite context.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > c. Combined Function Map Pruning:
>> >>> >> >> >
>> >>> >> >> > The combined function map can be
pruned of functions that are
>> >>> >> >> > unlikely
>> >>> >> >> > to benefit from being imported. For
example, during the
>> phase-2
>> >>> >> >> > thin
>> >>> >> >> > archive plug step we can safely
omit large and (with profile
>> >>> >> >> > data)
>> >>> >> >> > cold functions, which are unlikely
to benefit from being
>> inlined.
>> >>> >> >> > Additionally, all but one copy of
comdat functions can be
>> >>> >> >> > suppressed.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > d. Distributed Build System
Integration:
>> >>> >> >> >
>> >>> >> >> > For a distributed build system, the
gold plugin should write
>> the
>> >>> >> >> > parallel backend invocations into a
makefile, including the
>> >>> >> >> > mapping
>> >>> >> >> > from the IR file to the real object
file path, and exit.
>> >>> >> >> > Additional
>> >>> >> >> > work needs to be done in the
distributed build system itself
>> to
>> >>> >> >> > distribute and dispatch the
parallel backend jobs to the build
>> >>> >> >> > cluster.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > e. Dependence Tracking and
Incremental Compiles:
>> >>> >> >> >
>> >>> >> >> > In order to support build systems
that stage from local disks
>> or
>> >>> >> >> > network storage, the plugin will
optionally support
>> computation
>> >>> >> >> > of
>> >>> >> >> > dependent sets of IR files that
each module may import from.
>> This
>> >>> >> >> > can
>> >>> >> >> > be computed from profile data, if
it exists, or from the
>> symbol
>> >>> >> >> > table
>> >>> >> >> > and heuristics if not. These
dependence sets also enable
>> support
>> >>> >> >> > for
>> >>> >> >> > incremental backend compiles.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Teresa Johnson | Software Engineer
| tejohnson at google.com |
>> >>> >> >> > 408-460-2413
>> >>> >> >> >
>> >>> >> >> >
_______________________________________________
>> >>> >> >> > LLVM Developers mailing list
>> >>> >> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>> >> >>
>> >>> >> >>
_______________________________________________
>> >>> >> >> LLVM Developers mailing list
>> >>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>> >>> >> 408-460-2413
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> LLVM Developers mailing list
>> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Teresa Johnson | Software Engineer | tejohnson at
google.com |
>> 408-460-2413
>> >>
>> >>
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/db632f0a/attachment.html>

Daniel Berlin

2015-May-14 20:39 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at gmail.com>
wrote:>
>
> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>> > I'm not sure this is a particularly great assumption to make.
>>
>> Which part?
>
>
> The binutils part :)
I took it as the more general: "we want to simply work with native
toolchains", not as something specific to binutils.
>
>>
>>
>> >  We have to
>> > support a lot of different build systems and tools and
concentrating on
>> > something that just binutils uses isn't particularly friendly
here.
>> I think you may have misunderstood
>> His point was exactly that they want to be transparent to *all of*
these
>> tools.
>> You are saying "we should be friendly to everyone". He is
saying the same
>> thing.
>> We should be friendly to everyone. The friendly way to do this is to
>> not require all of these tools build plugins to handle bitcode.
>>
>> Hence, elf-wrapped bitcode.
>
>
> Oh, I understood. I just don't know that I agree.
Fair enough. I just wanted to make sure there wasn't a misunderstanding here
:)
> To do anything with the
> tools will require some knowledge of bitcode anyhow or need the plugin.
This is certainly true, but that's part of the point - the ability to
pass through native tools without them  breaking, or worrying about
the bitcode there.
>  I'm
> saying that as a baseline start we should look at how to do this using the
> tools we've got rather than wrapping things for no real gain.
The gain is precisely: "People on different platforms do not have to
use all-llvm tools to have this build mode work".

>
> I've talked to Teresa a bit offline and we're going to talk more
later (and
> discuss on the list), but there are some discussions about how to make this
> work either with just bitcode/llvm tools and so not requiring integration
on
> all platforms. The latter is what I consider as particularly friendly :)
Sure, if you have a way to make this work that doesn't require
everyone in the world replace ar with llvm-ar and ld with llvm-ld,
sounds awesome :)

(I actually have no real dog in this fight, just trying to make sure
everyone is on the same page ;P)

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - May 2015 - [LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

Reasonably Related Threads