thr3ads.net - llvm dev - [LLVMdev] Updated RFC: ThinLTO Implementation Plan [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Sean Silva

2015-Jun-03 20:29 UTC

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

On Mon, Jun 1, 2015 at 6:34 AM, Teresa Johnson <tejohnson at google.com>
wrote:
> On Fri, May 29, 2015 at 6:15 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> >
> > On Fri, May 29, 2015 at 8:01 AM, Teresa Johnson <tejohnson at
google.com>
> > wrote:
> >>
> >> On Fri, May 29, 2015 at 6:56 AM, Alex Rosenberg <alexr at
leftfield.org>
> >> wrote:
> >> > My earlier statement about wrapping things in a native object
file
> held
> >> > in that it is controversial. It appears to be still central
to your
> design.
> >> >
> >> > It may help to look at the problem from a different
viewpoint: LLVM is
> >> > not a compiler. It is a framework that can be used to make
> compiler-like
> >> > tools.
> >> >
> >> > From that view, it no longer makes sense to discuss "the
plugin," or
> >> > gold, or $AR, because there isn't just one of any of
those things.
> ld64
> >> > isn't the only outlier linker to consider. We have our
own linker at
> Sony,
> >> > for example. From this perspective, then it makes more sense
to
> consider
> >> > replacing the binary utilities with ones that support
bitcode,
> because from
> >> > a user-perspective, all of the linkers already transparently
support
> bitcode
> >> > directly today, as do ar, nm, etc. This has been necessary
for the
> regular
> >> > LTO process.
> >>
> >> Hi Alex,
> >>
> >> It's true that the LLVM versions of these tools support
bitcode
> >> transparently, but not all build systems use LLVM versions of
these
> >> tools, particularly build systems that support a variety of
compilers,
> >> or legacy build systems.
> >
> >
> > If a build system can do
> > CC=clang
> > why wouldn't it be able to do
> > AR=llvm-ar
> > ?
>
> That assumes that the LLVM tools are all deployed in the build system,
> and adds a requirement for using clang in this mode that wasn't there
> before when using clang for -O2. We are trying to make the transition
> from clang -O2 to clang -O2 + ThinLTO as seamless as possible.
>
I'd just like the point out that downthread serious suggestions are being
fielded to use a nonstandard ELF header or nonstandard bits marking the
header. This "adds a requirement".

At this point at least 1 (the only?) concrete deployment use case (besides
yours) that has been brought up in the ThinLTO RFC threads is
inconvenienced by this design decision. This suggests that native object
wrapping doesn't offer as much seamlessness as it seems.

In general, your proposal contains lots of "we will start out with"
type
constructs regarding the tooling and practical deployment, and it is not
clear that you have done any feasibility research into whether "we will
start out with" will turn into "practically speaking, this will only
ever
be implemented in" due to failing to take a sufficiently diverse set of use
cases into account.

-- Sean Silva

>
> Teresa
>
> >
> > -- Sean Silva
> >
> >>
> >> And not all build systems have the plugin or
> >> currently pass it to the native tools that can take a plugin for
> >> handling bitcode. In those cases the bitcode support is not
> >> transparently available, and our aim is to reduce the friction as
much
> >> as possible. And not all use LTO currently (I know we don't
due to the
> >> scalability issues we're trying to address with this design),
and in
> >> those cases the migration to bitcode-aware tools and plugins was
not
> >> previously required.
> >>
> >> For Sony's linker, are you using the gold plugin or libLTO
interfaces?
> >> If the latter, I suppose some ThinLTO handling would have to be
added
> >> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
> >> combined function map and either launch the backend processes in
> >> parallel or write out a make or other build file). The current
support
> >> for reading native object wrapped bitcode is baked into
IRObjectFile
> >> so presumably the Sony linker can handle these native object
wrapped
> >> bitcode files if it uses libLTO. We would similarly embed the
handling
> >> of the function index/summary behind an API that can handle either
so
> >> it is similarly transparent to the linkers. Let me know if there
would
> >> be additional issues that make wrapped bitcode more difficult in
your
> >> case, or how we could make ThinLTO usage simpler for you in
general.
> >>
> >> >
> >> > The only tool in the list of tools you mentioned that do not
support
> >> > bitcode directly is objcopy, and that's because nobody
has yet
> written an
> >> > LLVM-project implementation of it. Personally, I'd much
rather you
> focus on
> >> > making ThinLTO work by extending bitcode as needed, and we
work as a
> >> > community toward replacing objcopy with an LLVM-native one.
It's a big
> >> > missing piece of the LLVM project today and could be so much
better
> if we
> >> > could use it to replace Apple's lipo and possibly other
extant object
> file
> >> > modification tools. (Has anyone surveyed this area?)
> >> >
> >> > That older toolchains have tried to slip non-object file data
through
> >> > the binary utilities isn't really proof that this is a
good choice.
> It might
> >> > simply reflect the realities of those engineering teams. I
wasn't at
> Sun for
> >> > this, but DTrace needed a linker feature that apparently the
Sun
> linker team
> >> > was unwilling or unable to provide, so dtrace(1) gained the
ability to
> >> > modify ELF files directly as needed. That doesn't prove
that DTrace's
> USDT
> >> > feature shouldn't have been implemented in the linker (as
ld64 does
> directly
> >> > for Apple), does it?
> >>
> >> I'd argue that the realities being addressed by using native
object
> >> format in those cases still exist.
> >>
> >> >
> >> > If in the end using native object-wrapped bitcode is the best
> solution,
> >> > so be it. However, I think it is largely orthogonal to
ThinLTO's
> needs for
> >> > transporting symtab data alongside the existing bitcode
format.
> >>
> >> That's certainly true, ThinLTO can be implemented using either
format,
> >> and bitcode only support can certainly be implemented. It is a
matter
> >> of prioritizing which format to implement first. I had added some
> >> description to the updated RFC on how the function index/summary
can
> >> be represented, etc in bitcode. Prioritizing the native object
format
> >> doesn't make it easier to implement ThinLTO, but should make
it easier
> >> to deploy.
> >>
> >> Thanks!
> >> Teresa
> >>
> >> >
> >> > Alex
> >> >
> >> >> On May 28, 2015, at 2:10 PM, Teresa Johnson <tejohnson
at google.com>
> >> >> wrote:
> >> >>
> >> >> As promised, here is an new version of the ThinLTO RFC,
updated based
> >> >> on some of the comments, questions and feedback from the
first RFC.
> >> >> Hopefully we have addressed many of these, and as noted
below, will
> >> >> fork some of the detailed discussion on particular
aspects into
> >> >> separate design doc threads. Please send any additional
feedback and
> >> >> questions on the overall design.
> >> >> Thanks!
> >> >> Teresa
> >> >>
> >> >>
> >> >> Updated RFC to discuss plans for implementing ThinLTO
upstream,
> >> >> reflecting feedback and discussion from initial RFC
> >> >>
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html).
> As
> >> >> discussed in the earlier thread and below, more detailed
design
> >> >> documents for several pieces (native object format,
linkage type
> >> >> changes and static promotions, etc) are in progress and
will be sent
> >> >> separately. This RFC covers the overall design and the
breakdown of
> >> >> work at a higher level.
> >> >>
> >> >>
> >> >> Background on ThinLTO can be found in slides from
EuroLLVM 2015:
> >> >>
> >> >>
>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
> >> >> As described in the talk, we have a prototype
implementation, and
> >> >> would like to start staging patches upstream. This RFC
describes a
> >> >> breakdown of the major pieces. We would like to commit
upstream
> >> >> gradually in several stages, with all functionality off
by default.
> >> >> The core ThinLTO importing support and tuning will
require frequent
> >> >> change and iteration during testing and tuning, and for
that part we
> >> >> would like to commit rapidly (off by default). See the
proposed
> staged
> >> >> implementation described in the Implementation Plan
section.
> >> >>
> >> >>
> >> >> ThinLTO Overview
> >> >> =================> >> >>
> >> >>
> >> >> See the talk slides linked above for more details. The
following is a
> >> >> high-level overview of the motivation.
> >> >>
> >> >>
> >> >> Cross Module Optimization (CMO) is an effective means for
improving
> >> >> runtime performance, by extending the scope of
optimizations across
> >> >> source module boundaries. Without CMO, the compiler is
limited to
> >> >> optimizing within the scope of single source modules. Two
solutions
> >> >> for enabling CMO are Link-Time Optimization (LTO), which
is currently
> >> >> supported in LLVM and GCC, and
Lightweight-Interprocedural
> >> >> Optimization (LIPO). However, each of these solutions has
limitations
> >> >> that prevent it from being enabled by default. ThinLTO is
a new
> >> >> approach that attempts to address these limitations, with
a goal of
> >> >> being enabled more broadly. ThinLTO is designed with many
of the same
> >> >> principals as LIPO, and therefore its advantages, without
any of its
> >> >> inherent weakness. Unlike in LIPO where the module group
decision is
> >> >> made at profile training runtime, ThinLTO makes the
decision at
> >> >> compile time, but in a lazy mode that facilitates large
scale
> >> >> parallelism. LTO implementations all contain a serial
IPA/IPO step
> >> >> that is both memory intensive and slow, limiting
usability on both
> >> >> smaller workstations and huge applications. In contrast,
the ThinLTO
> >> >> serial linker plugin phase is designed to be razor thin
and blazingly
> >> >> fast. By default this step only does minimal preparation
work to
> >> >> enable the parallel lazy importing performed later.
ThinLTO aims to
> be
> >> >> scalable like a regular O2 build, enabling CMO on
machines without
> >> >> large memory configurations, while also integrating well
with
> >> >> distributed build systems. Results from early prototyping
on SPEC
> >> >> cpu2006 C++ benchmarks are in line with expectations that
ThinLTO can
> >> >> scale like O2 while enabling much of the CMO performed
during a full
> >> >> LTO build.
> >> >>
> >> >>
> >> >> A ThinLTO build is divided into 3 phases, which are
referred to in
> the
> >> >> following implementation plan:
> >> >> 1. phase-1: IR and Function Summary Generation (-c
compile)
> >> >> 2. phase-2: Thin Linker Plugin Layer (thin archive linker
step)
> >> >> 3. phase-3: Parallel Backend with Demand-Driven Importing
> >> >>
> >> >>
> >> >> Implementation Plan
> >> >> ===================> >> >>
> >> >>
> >> >> This section gives a high-level breakdown of the ThinLTO
support that
> >> >> will be added, in roughly the order that the patches
would be staged.
> >> >> The patches are divided into three stages. The first
stage contains a
> >> >> minimal amount of preparation work that is not
ThinLTO-specific. The
> >> >> second stage contains most of the infrastructure for
ThinLTO, which
> >> >> will be off by default. The third stage includes
> >> >> enhancements/improvements/tunings that can be performed
after the
> main
> >> >> ThinLTO infrastructure is in.
> >> >>
> >> >>
> >> >> The second and third implementation stages will initially
be very
> >> >> volatile, requiring a lot of iterations and tuning with
large apps to
> >> >> get stabilized. Therefore it will be important to do fast
commits for
> >> >> these implementation stages.
> >> >>
> >> >>
> >> >> 1. Stage 1: Preparation
> >> >> ------------------------------------
> >> >>
> >> >>
> >> >> The first planned sets of patches are enablers for
ThinLTO work:
> >> >>
> >> >>
> >> >> a. LTO directory structure
> >> >>
> >> >>
> >> >> Restructure the LTO directory to remove circular
dependence when
> >> >> ThinLTO pass added. Because ThinLTO is being implemented
as a SCC
> pass
> >> >> within Transforms/IPO, and leverages the LTOModule class
for linking
> >> >> in functions from modules, IPO then requires the LTO
library. This
> >> >> creates a circular dependence between LTO and IPO. To
break that, we
> >> >> need to split the lib/LTO directory/library into
lib/LTO/CodeGen and
> >> >> lib/LTO/Module, containing LTOCodeGenerator and
LTOModule,
> >> >> respectively. Only LTOCodeGenerator has a dependence on
IPO, removing
> >> >> the circular dependence.
> >> >>
> >> >>
> >> >> Note that libLTO and llvm-lto use
LTOModule/LTOCodeGenerator, whereas
> >> >> the gold plugin uses lib/Object/IRObject and lib/Linker
directly. The
> >> >> use of LTOModule in the ThinLTO pass is a convenience,
but could be
> >> >> avoided by using the IRObject/Linker methods directly if
that is
> >> >> preferred.
> >> >>
> >> >>
> >> >> b. Native object wrapper generation support
> >> >>
> >> >>
> >> >> Implement native-object wrapped bitcode writer. The main
goal is to
> >> >> more easily interact with existing native tools such as
$AR, $NM,
> “$LD
> >> >> -r”, $OBJCOPY, and $RANLIB, without requiring the build
system to
> find
> >> >> and pass the plugin as an option. We plan to emit the
phase-1 bitcode
> >> >> wrapped in native object format via the .llvmbc section,
along with a
> >> >> symbol table. We will implement ELF first, but
subsequently extend
> >> >> support to COFF and Mach-O. Additionally, we also want to
avoid doing
> >> >> partial LTO/ThinLTO across files linked with “$LD -r”
(i.e. the
> >> >> resulting object file should still contain native
object-wrapped
> >> >> bitcode to enable ThinLTO at the full link step). I will
send a
> >> >> separate design document for these changes, including the
format of
> >> >> the symtab and function index/summary section, but the
following is a
> >> >> high-level motivation and overview.
> >> >>
> >> >>
> >> >> Note that support for ThinLTO using bitcode can be added
as a
> >> >> follow-on under an option, so that bitcode-aware tools do
not need to
> >> >> use the wrapper. Under the bitcode-only option, the
symbol table will
> >> >> be replaced by the bitcode form of the function index and
summary
> >> >> section, which can be encoded as a new bitcode block
type. Changes
> >> >> should be made to the gold plugin to avoid partial link
of bitcode
> >> >> files under “$LD -r” (emitting bitcode rather than
compiling all the
> >> >> way down to native code, which is how ld64 behaves on
Darwin as per
> >> >> dexonsmith).
> >> >>
> >> >>
> >> >> Advantages of using native object format:
> >> >> * Out of the box interoperability with existing native
build tools
> >> >> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not
currently
> >> >> know how to locate/pass the appropriate plugin.
> >> >> * There is precedence in using this format: other
compilers also wrap
> >> >> intermediate LTO files (probably related to the above
advantage)[1].
> >> >> * Tools that modify symbol linkage and visibility (e.g.
$OBJCOPY and
> >> >> “$LD -r”) can mark the change in the symbol table without
needing to
> >> >> parse/change/encode bitcode. The change can be propagated
to bitcode
> >> >> by the ThinLTO backend.
> >> >> * Some tools only need to read/write the symtab and can
avoid
> >> >> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).
> >> >> * The second phase of ThinLTO does not need to parse the
bitcode when
> >> >> creating the combined function index.
> >> >>
> >> >>
> >> >> Disadvantages of using native object format:
> >> >> * Unnecessary when using plugins with plugin-aware native
tools, or
> >> >> LLVM’s custom tools.
> >> >> * Slightly increase disk storage and I/O from symtab.
However, with
> >> >> our design the symtab is leveraged to hold function
indexing info
> >> >> required for ThinLTO. The I/O for some build tools and
build steps
> can
> >> >> actually be reduced as there is no need to read the
bitcode, as
> >> >> described above.
> >> >>
> >> >>
> >> >> Support was added to LLVM for reading native
object-wrapped bitcode
> >> >> (http://reviews.llvm.org/rL218078), but there does not
yet exist
> >> >> support in LLVM/Clang for emitting bitcode wrapped in
native object
> >> >> format. I plan to add support for optionally generating
bitcode in an
> >> >> native object file containing a single .llvmbc section
holding the
> >> >> bitcode. Specifically, the patch would add new options
> >> >> “emit-llvm-native-object” (object file) and corresponding
> >> >> “emit-llvm-native-assembly” (textual assembly code
equivalent).
> >> >> Eventually these would be automatically triggered under
“-fthinlto
> -c”
> >> >> and “-fthinlto -S”, respectively.
> >> >>
> >> >>
> >> >> Additionally, a symbol table will be generated in the
native object
> >> >> file, holding the function symbols within the bitcode.
This
> >> >> facilitates handling archives of the native
object-wrapped bitcode
> >> >> created with $AR, since the archive will have a symbol
table as well.
> >> >> The archive symbol table enables gold to extract and pass
to the
> >> >> plugin the constituent native object-wrapped bitcode
files. To
> support
> >> >> the concatenated llvmbc section generated by “$LD -r”,
some handling
> >> >> needs to be added to gold and to the backend driver to
process each
> >> >> original module’s bitcode.
> >> >>
> >> >>
> >> >> The function index/summary will later be added as a
special native
> >> >> object section alongside the .llvmbc sections. The offset
and size of
> >> >> the corresponding function summary can be placed in the
associated
> >> >> symtab entry. As noted above, a separate design document
will be sent
> >> >> for the native object format changes.
> >> >>
> >> >>
> >> >> 2. Stage 2: ThinLTO Infrastructure
> >> >> ------------------------------------------------------
> >> >>
> >> >>
> >> >> The next set of patches adds the base implementation of
the ThinLTO
> >> >> infrastructure, specifically those required to make
ThinLTO
> functional
> >> >> and generate correct but not necessarily high-performing
binaries.
> >> >>
> >> >>
> >> >> a. Clang/LLVM/gold linker options
> >> >>
> >> >>
> >> >> An early set of clang/llvm patches is needed to provide
options to
> >> >> enable ThinLTO (off by default), so that the rest of the
> >> >> implementation can be disabled by default as it is added.
> >> >> Specifically, clang options -fthinlto (used instead of
-flto) will
> >> >> cause clang to invoke the phase-1 emission of LLVM
bitcode and
> >> >> function summary/index on a compile step, and pass the
appropriate
> >> >> option to the gold plugin on a link step. The -thinlto
option will be
> >> >> added to the gold plugin and llvm-lto tool to launch the
phase-2 thin
> >> >> archive step. The -thinlto-be option will also be added
to clang to
> >> >> invoke it as a phase-3 parallel backend instance with a
bitcode file
> >> >> as input.
> >> >>
> >> >>
> >> >> b. Thin-archive linking support in Gold plugin and
llvm-lto
> >> >>
> >> >>
> >> >> Under the new plugin option (see above), the plugin needs
to perform
> >> >> the phase-2 (thin archive) link which simply emits a
combined
> function
> >> >> index from the linked modules, without actually
performing the normal
> >> >> link. Corresponding support should be added to the
standalone
> llvm-lto
> >> >> tool to enable testing/debugging without involving the
linker and
> >> >> plugin.
> >> >>
> >> >>
> >> >> c. ThinLTO backend support
> >> >>
> >> >>
> >> >> Support for invoking a phase-3 backend invocation
(including
> >> >> importing) on a module should be added to the clang
driver under the
> >> >> new option. The main change under the option is to
instantiate a
> >> >> Linker object used to manage the process of linking
imported
> functions
> >> >> into the module, efficient read of the combined function
index, and
> >> >> enable the ThinLTO import pass.
> >> >>
> >> >>
> >> >> d. Function index/summary support
> >> >>
> >> >>
> >> >> This includes infrastructure for writing and reading the
function
> >> >> index/summary section. As noted earlier this will be
encoded in a
> >> >> special section within the native object file for the
module,
> >> >> alongside the .llvmbc section containing the bitcode. The
thin
> archive
> >> >> (combined function index) generated by phase-2 of ThinLTO
simply
> >> >> contains all of the function index/summary sections
across the linked
> >> >> modules, organized for efficient function lookup. As
mentioned
> earlier
> >> >> when discussing the native object wrapper format, a
separate design
> >> >> document will be sent for this format.
> >> >>
> >> >>
> >> >> Each function available for importing from the module
contains an
> >> >> entry in the module’s function index/summary section and
in the
> >> >> resulting combined function index. Each function entry
contains that
> >> >> function’s offset within the bitcode file, used to
efficiently locate
> >> >> and quickly import just that function (see below in 2e
for more
> >> >> details on the importing mechanics). The entry also
contains summary
> >> >> information (e.g. basic information determined during
parsing such as
> >> >> the number of instructions in the function), that will be
used to
> help
> >> >> guide later import decisions. Because the contents of
this section
> >> >> will change frequently during ThinLTO tuning, it should
also be
> marked
> >> >> with a version id for backwards compatibility or version
checking.
> >> >>
> >> >>
> >> >> e. ThinLTO importing support
> >> >>
> >> >>
> >> >> Support for the mechanics of importing functions from
other modules,
> >> >> which can go in gradually as a set of patches since it
will be off by
> >> >> default (the ThinLTO pass itself discussed below in 2f).
> >> >>
> >> >>
> >> >> Note that ThinLTO function importing is iterative, and we
may import
> >> >> from a number of modules in an interleaved fashion. For
example,
> >> >> assume we have hot call chains a()->b1()->c() and
a()->b2()->d(),
> >> >> where functions a(), b1()/b2(), c() and d() are from
modules A, B, C
> >> >> and D, respectively. When performing ThinLTO backend
compilation of
> >> >> module A, we may decide to import in the following order
(based on
> >> >> callsite and function summary info):
> >> >> 1. B::b1()  # exposes call to c()
> >> >> 2. C::c()
> >> >> 3. B::b2()  # exposes call to d()
> >> >> 4. D::d()
> >> >> For this reason, ThinLTO importing is different than
regular LTO
> >> >> bitcode reading and linking, which reads and links in a
module in its
> >> >> entirety on a single pass through each module (notice in
the above
> >> >> example the imports of the two module B functions have an
intervening
> >> >> import from module C). As a result, for example, the
existing support
> >> >> for lazy metadata parsing that delays it until the first
function is
> >> >> materialized can’t be leveraged (metadata handling is
discussed more
> >> >> below in 2h). Therefore, the ThinLTO importing pass
instantiates a
> new
> >> >> BitcodeReader and LTOModule object for each function we
decide to
> >> >> import, parsing only what is needed and linking in just
that
> function.
> >> >> This is fast and efficient as found in the prototype
results shown in
> >> >> the linked EuroLLVM slides.
> >> >>
> >> >>
> >> >> Separate patches can include:
> >> >>
> >> >>
> >> >> * BitcodeReader changes to use function index to
import/deserialize
> >> >> single function of interest (small changes, leverages
existing lazy
> >> >> function streamer support). The declarations and other
symbol table
> >> >> info in the bitcode must be reloaded, but the bitcode
parsing can
> stop
> >> >> once the first function body is hit. We simply set up an
entry in the
> >> >> lazy streamer’s DeferredFunctionInfo function index map
from the
> >> >> bitcode index that was saved in the ThinLTO function
summary (and
> >> >> therefore don’t need to build up this function index
structure
> through
> >> >> repeated calls to RememberAndSkipFunctionBody via
> >> >> FindFunctionInStream).
> >> >> * Minor LTOModule changes to pass the ThinLTO function to
import and
> >> >> its index into bitcode reader (see 1a for discussion on
LTOModule
> >> >> use).
> >> >> * Marking of imported functions. Most handling for
ThinLTO imported
> >> >> functions will simply rely on applying the appropriate
linkage type.
> >> >> But it is useful to know which functions were imported,
both for
> >> >> compiler debugging and and verification, and possibly to
modify some
> >> >> optimization heuristics along with the summary
information. This can
> >> >> be in-memory initially, but IR support may be required in
order to
> >> >> support streaming bitcode out and back in again after
importing.
> >> >> * ModuleLinker changes to do ThinLTO-specific symbol
linking and
> >> >> static promotion when necessary. The linkage type of
imported
> >> >> non-local functions and variables changes to
> >> >> AvailableExternallyLinkage, for example. Statics must be
promoted in
> >> >> certain cases, and accordingly renamed in consistent
ways. Read-write
> >> >> or address-taken static variables must always be
promoted. Other
> >> >> discardable functions, i.e. link-once such as comdats,
will be force
> >> >> imported on reference by another imported function. We
are working on
> >> >> a separate design document describing these changes in
more detail
> >> >> with examples, as a more detailed discussion of these
changes is
> >> >> beyond the scope of this RFC.
> >> >> * GlobalDCE changes to support removing imported
non-local functions
> >> >> that were not inlined and imported non-local variables,
which are
> >> >> marked AvailableExternallyLinkage (very small changes to
existing
> pass
> >> >> logic). As discussed in the original RFC threads,
currently GlobalDCE
> >> >> does not remove referenced AvailableExternallyLinkage
functions.
> >> >> Instead, these are suppressed later during code
generation. It isn’t
> >> >> clear that these functions are useful past the first call
to
> >> >> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP
(so
> >> >> presumably after inter procedural constant prop, etc).
Patch with
> >> >> these changes in testing as discussed in this thread:
> >> >>
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html.
> >> >>
> >> >>
> >> >> f. ThinLTO Import Driver SCC pass
> >> >>
> >> >>
> >> >> Adds Transforms/IPO/ThinLTO.cpp with framework for doing
ThinLTO via
> >> >> an SCC pass, enabled only under the -fthinlto-be option.
The pass
> >> >> includes utilizing the thin archive[2] (combined global
function
> >> >> index/summary), import decision heuristics, invocation of
> >> >> LTOModule/ModuleLinker routines that perform the import,
and any
> >> >> necessary callgraph updates and verification.
> >> >>
> >> >>
> >> >> g. Backend Driver
> >> >>
> >> >>
> >> >> For a single node build, the gold plugin will initially
exec the
> >> >> backend processes directly, with the amount of
parallelism controlled
> >> >> via an option and/or env variable. It is also possible to
leverage
> >> >> existing single node build system task dispatching
mechanisms such as
> >> >> Unix Makefiles, Ninja, etc., where the plugin can simply
write a
> build
> >> >> file and fork the parallel backend instances directly
under an
> >> >> appropriate option. We will also initially add support
for our
> >> >> distributed build system as described below under 3c.
> >> >>
> >> >>
> >> >> h. Lazy Debug Metadata Linking
> >> >>
> >> >>
> >> >> The prototype implementation included lazy importing of
module-level
> >> >> metadata during the ThinLTO pass finalization (i.e. after
all
> function
> >> >> importing is complete). This actually applies to all
module-level
> >> >> metadata, not just debug, although it is the largest.
This can be
> >> >> added as a separate set of patches, and the detailed
design will be
> >> >> sent with those. Includes changes to BitcodeReader,
ValueMapper, and
> >> >> the ModuleLinker classes. As described in 2e, due to the
> >> >> iterative/interleaved nature of ThinLTO importing, the
bitcode
> parsing
> >> >> is structured differently than LTO where a single pass
over each
> >> >> module can be performed to parse and materialize all
functions and
> >> >> metadata. Therefore, the lazy metadata parsing support in
> >> >> BitcodeReader, which parses all the metadata once the
first function
> >> >> is materialized, are not applicable. We may instantiate a
> >> >> BitcodeReader multiple times for a module, if multiple
functions are
> >> >> eventually imported, and we need a way to suture up the
metadata to
> >> >> the functions imported by an earlier BitcodeReader
instantiation. The
> >> >> high level summary is that during the initial import we
leave the
> >> >> temporary metadata on the instructions that were
imported, but save
> >> >> the index used by the bitcode reader used to correlate
with the
> >> >> metadata when it is ready (i.e. the MDValuePtrs index),
and skip the
> >> >> metadata parsing. During the ThinLTO pass finalization we
parse just
> >> >> the metadata, and suture it up during metadata value
mapping using
> the
> >> >> saved index. As mentioned earlier, this will be described
in more
> >> >> detail when the patches are ready.
> >> >>
> >> >>
> >> >> 3. Stage 3: ThinLTO Tuning and Enhancements
> >> >>
> >> >>
> -------------------------------------------------------------------------
> >> >>
> >> >>
> >> >> This refers to the patches that are not required for
ThinLTO to work,
> >> >> but rather to improve compile time, memory, run-time
performance and
> >> >> usability.
> >> >>
> >> >>
> >> >> a. Import Tuning
> >> >>
> >> >>
> >> >> Tuning the import strategy will be an iterative process
that will
> >> >> continue to be refined over time. It involves several
different types
> >> >> of changes: adding support for recording additional
metrics in the
> >> >> function summary, such as profile data and optional
heavier-weight
> IPA
> >> >> analyses, and tuning the import heuristics based on the
summary and
> >> >> callsite context.
> >> >>
> >> >>
> >> >> b. Combined Function Index Pruning
> >> >>
> >> >>
> >> >> The combined function index can be pruned of functions
that are
> >> >> unlikely to benefit from being imported. For example,
during the
> >> >> phase-2 thin archive plug step we can safely omit large
and (with
> >> >> profile data) cold functions, which are unlikely to
benefit from
> being
> >> >> inlined. Additionally, all but one copy of comdat
functions can be
> >> >> suppressed.
> >> >>
> >> >>
> >> >> c. Distributed Build System Integration
> >> >>
> >> >>
> >> >> For a distributed build system such as Bazel
(http://bazel.io/), the
> >> >> gold plugin should write the parallel backend invocations
into a
> build
> >> >> file, including the mapping from the IR file to the real
object file
> >> >> path, and exit. Additional work needs to be done in the
distributed
> >> >> build system itself to distribute and dispatch the
parallel backend
> >> >> jobs to the build cluster.
> >> >>
> >> >>
> >> >> d. Dependence Tracking and Incremental Compiles
> >> >>
> >> >>
> >> >> In order to support build systems that stage from local
disks or
> >> >> network storage, the plugin will optionally support
computation of
> >> >> dependent sets of IR files that each module may import
from. This can
> >> >> be computed from profile data, if it exists, or from the
symbol table
> >> >> and heuristics if not. These dependence sets also enable
support for
> >> >> incremental backend compiles.
> >> >>
> >> >>
> >> >> ________________
> >> >> [1] The following compilers currently wrap intermediate
LTO files in
> >> >> native object format: GCC fat and non-fat objects (with a
custom
> >> >> symtab), Intel icc non-fat (IR-only) objects (with a full
native
> >> >> symtab), HP’s aCC non-fat objects (with full native
symtab), IBM xlC
> >> >> both fat and non-fat objects (with full native symtab).
> >> >> [2] The “thin archive” here (also referred to as a
combined function
> >> >> index) has some similarities to the AR tool thin archive
format, but
> >> >> is not exactly the same. Both contain the symtab and not
the code,
> but
> >> >> the ThinLTO combined function index contains the summary
sections as
> >> >> well.
> >> >>
> >> >> --
> >> >> Teresa Johnson | Software Engineer | tejohnson at
google.com |
> >> >> 408-460-2413
> >> >>
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson at google.com |
> 408-460-2413
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150603/651fa849/attachment.html>

Teresa Johnson

2015-Jun-04 14:43 UTC

head link

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

On Wed, Jun 3, 2015 at 1:29 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Mon, Jun 1, 2015 at 6:34 AM, Teresa Johnson <tejohnson at
google.com> wrote:
>>
>> On Fri, May 29, 2015 at 6:15 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>> >
>> >
>> > On Fri, May 29, 2015 at 8:01 AM, Teresa Johnson <tejohnson at
google.com>
>> > wrote:
>> >>
>> >> On Fri, May 29, 2015 at 6:56 AM, Alex Rosenberg <alexr at
leftfield.org>
>> >> wrote:
>> >> > My earlier statement about wrapping things in a native
object file
>> >> > held
>> >> > in that it is controversial. It appears to be still
central to your
>> >> > design.
>> >> >
>> >> > It may help to look at the problem from a different
viewpoint: LLVM
>> >> > is
>> >> > not a compiler. It is a framework that can be used to
make
>> >> > compiler-like
>> >> > tools.
>> >> >
>> >> > From that view, it no longer makes sense to discuss
"the plugin," or
>> >> > gold, or $AR, because there isn't just one of any of
those things.
>> >> > ld64
>> >> > isn't the only outlier linker to consider. We have
our own linker at
>> >> > Sony,
>> >> > for example. From this perspective, then it makes more
sense to
>> >> > consider
>> >> > replacing the binary utilities with ones that support
bitcode,
>> >> > because from
>> >> > a user-perspective, all of the linkers already
transparently support
>> >> > bitcode
>> >> > directly today, as do ar, nm, etc. This has been
necessary for the
>> >> > regular
>> >> > LTO process.
>> >>
>> >> Hi Alex,
>> >>
>> >> It's true that the LLVM versions of these tools support
bitcode
>> >> transparently, but not all build systems use LLVM versions of
these
>> >> tools, particularly build systems that support a variety of
compilers,
>> >> or legacy build systems.
>> >
>> >
>> > If a build system can do
>> > CC=clang
>> > why wouldn't it be able to do
>> > AR=llvm-ar
>> > ?
>>
>> That assumes that the LLVM tools are all deployed in the build system,
>> and adds a requirement for using clang in this mode that wasn't
there
>> before when using clang for -O2. We are trying to make the transition
>> from clang -O2 to clang -O2 + ThinLTO as seamless as possible.
>
>
> I'd just like the point out that downthread serious suggestions are
being
> fielded to use a nonstandard ELF header or nonstandard bits marking the
> header. This "adds a requirement".
>
> At this point at least 1 (the only?) concrete deployment use case (besides
> yours) that has been brought up in the ThinLTO RFC threads is
inconvenienced
> by this design decision. This suggests that native object wrapping
doesn't
> offer as much seamlessness as it seems.
One big goal is to make it as painless as possible to transition from
plain -O2 to -O2+thinlto. Users of clang who don't already use LTO
have not had to use/deploy llvm versions of all of these tools
(llvm-nm, llvm-objcopy, llvm-ar, llvm-ranlib, etc), or the plugins for
the native versions of these tools, because they weren't dealing with
bitcode files. That is why we are prioritizing the native wrapped
approach for the initial implementation. For users of the gold linker
(which uses the LTOModule interfaces) this should make it much easier
to enable ThinLTO (from my browsing of ld64 source, it looks like
native-wrapped bitcode should be handled already there too due to the
handling being hidden behind the lto_module API).

Teresa
>
> In general, your proposal contains lots of "we will start out
with" type
> constructs regarding the tooling and practical deployment, and it is not
> clear that you have done any feasibility research into whether "we
will
> start out with" will turn into "practically speaking, this will
only ever be
> implemented in" due to failing to take a sufficiently diverse set of
use
> cases into account.
>
> -- Sean Silva
>
>>
>>
>> Teresa
>>
>> >
>> > -- Sean Silva
>> >
>> >>
>> >> And not all build systems have the plugin or
>> >> currently pass it to the native tools that can take a plugin
for
>> >> handling bitcode. In those cases the bitcode support is not
>> >> transparently available, and our aim is to reduce the friction
as much
>> >> as possible. And not all use LTO currently (I know we
don't due to the
>> >> scalability issues we're trying to address with this
design), and in
>> >> those cases the migration to bitcode-aware tools and plugins
was not
>> >> previously required.
>> >>
>> >> For Sony's linker, are you using the gold plugin or libLTO
interfaces?
>> >> If the latter, I suppose some ThinLTO handling would have to
be added
>> >> to your linker (e.g. to invoke the LLVM hooks to write the
stage-2
>> >> combined function map and either launch the backend processes
in
>> >> parallel or write out a make or other build file). The current
support
>> >> for reading native object wrapped bitcode is baked into
IRObjectFile
>> >> so presumably the Sony linker can handle these native object
wrapped
>> >> bitcode files if it uses libLTO. We would similarly embed the
handling
>> >> of the function index/summary behind an API that can handle
either so
>> >> it is similarly transparent to the linkers. Let me know if
there would
>> >> be additional issues that make wrapped bitcode more difficult
in your
>> >> case, or how we could make ThinLTO usage simpler for you in
general.
>> >>
>> >> >
>> >> > The only tool in the list of tools you mentioned that do
not support
>> >> > bitcode directly is objcopy, and that's because
nobody has yet
>> >> > written an
>> >> > LLVM-project implementation of it. Personally, I'd
much rather you
>> >> > focus on
>> >> > making ThinLTO work by extending bitcode as needed, and
we work as a
>> >> > community toward replacing objcopy with an LLVM-native
one. It's a
>> >> > big
>> >> > missing piece of the LLVM project today and could be so
much better
>> >> > if we
>> >> > could use it to replace Apple's lipo and possibly
other extant object
>> >> > file
>> >> > modification tools. (Has anyone surveyed this area?)
>> >> >
>> >> > That older toolchains have tried to slip non-object file
data through
>> >> > the binary utilities isn't really proof that this is
a good choice.
>> >> > It might
>> >> > simply reflect the realities of those engineering teams.
I wasn't at
>> >> > Sun for
>> >> > this, but DTrace needed a linker feature that apparently
the Sun
>> >> > linker team
>> >> > was unwilling or unable to provide, so dtrace(1) gained
the ability
>> >> > to
>> >> > modify ELF files directly as needed. That doesn't
prove that DTrace's
>> >> > USDT
>> >> > feature shouldn't have been implemented in the linker
(as ld64 does
>> >> > directly
>> >> > for Apple), does it?
>> >>
>> >> I'd argue that the realities being addressed by using
native object
>> >> format in those cases still exist.
>> >>
>> >> >
>> >> > If in the end using native object-wrapped bitcode is the
best
>> >> > solution,
>> >> > so be it. However, I think it is largely orthogonal to
ThinLTO's
>> >> > needs for
>> >> > transporting symtab data alongside the existing bitcode
format.
>> >>
>> >> That's certainly true, ThinLTO can be implemented using
either format,
>> >> and bitcode only support can certainly be implemented. It is a
matter
>> >> of prioritizing which format to implement first. I had added
some
>> >> description to the updated RFC on how the function
index/summary can
>> >> be represented, etc in bitcode. Prioritizing the native object
format
>> >> doesn't make it easier to implement ThinLTO, but should
make it easier
>> >> to deploy.
>> >>
>> >> Thanks!
>> >> Teresa
>> >>
>> >> >
>> >> > Alex
>> >> >


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

Philip Reames

2015-Jun-04 17:11 UTC

head link

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

On 06/04/2015 07:43 AM, Teresa Johnson wrote:> On Wed, Jun 3, 2015 at 1:29 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>>
>> On Mon, Jun 1, 2015 at 6:34 AM, Teresa Johnson <tejohnson at
google.com> wrote:
>>> On Fri, May 29, 2015 at 6:15 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>>>>
>>>> On Fri, May 29, 2015 at 8:01 AM, Teresa Johnson <tejohnson
at google.com>
>>>> wrote:
>>>>> On Fri, May 29, 2015 at 6:56 AM, Alex Rosenberg <alexr
at leftfield.org>
>>>>> wrote:
>>>>>> My earlier statement about wrapping things in a native
object file
>>>>>> held
>>>>>> in that it is controversial. It appears to be still
central to your
>>>>>> design.
>>>>>>
>>>>>> It may help to look at the problem from a different
viewpoint: LLVM
>>>>>> is
>>>>>> not a compiler. It is a framework that can be used to
make
>>>>>> compiler-like
>>>>>> tools.
>>>>>>
>>>>>>  From that view, it no longer makes sense to discuss
"the plugin," or
>>>>>> gold, or $AR, because there isn't just one of any
of those things.
>>>>>> ld64
>>>>>> isn't the only outlier linker to consider. We have
our own linker at
>>>>>> Sony,
>>>>>> for example. From this perspective, then it makes more
sense to
>>>>>> consider
>>>>>> replacing the binary utilities with ones that support
bitcode,
>>>>>> because from
>>>>>> a user-perspective, all of the linkers already
transparently support
>>>>>> bitcode
>>>>>> directly today, as do ar, nm, etc. This has been
necessary for the
>>>>>> regular
>>>>>> LTO process.
>>>>> Hi Alex,
>>>>>
>>>>> It's true that the LLVM versions of these tools support
bitcode
>>>>> transparently, but not all build systems use LLVM versions
of these
>>>>> tools, particularly build systems that support a variety of
compilers,
>>>>> or legacy build systems.
>>>>
>>>> If a build system can do
>>>> CC=clang
>>>> why wouldn't it be able to do
>>>> AR=llvm-ar
>>>> ?
>>> That assumes that the LLVM tools are all deployed in the build
system,
>>> and adds a requirement for using clang in this mode that wasn't
there
>>> before when using clang for -O2. We are trying to make the
transition
>>> from clang -O2 to clang -O2 + ThinLTO as seamless as possible.
>>
>> I'd just like the point out that downthread serious suggestions are
being
>> fielded to use a nonstandard ELF header or nonstandard bits marking the
>> header. This "adds a requirement".
>>
>> At this point at least 1 (the only?) concrete deployment use case
(besides
>> yours) that has been brought up in the ThinLTO RFC threads is
inconvenienced
>> by this design decision. This suggests that native object wrapping
doesn't
>> offer as much seamlessness as it seems.
> One big goal is to make it as painless as possible to transition from
> plain -O2 to -O2+thinlto. Users of clang who don't already use LTO
> have not had to use/deploy llvm versions of all of these tools
> (llvm-nm, llvm-objcopy, llvm-ar, llvm-ranlib, etc), or the plugins for
> the native versions of these tools, because they weren't dealing with
> bitcode files. That is why we are prioritizing the native wrapped
> approach for the initial implementation. For users of the gold linker
> (which uses the LTOModule interfaces) this should make it much easier
> to enable ThinLTO (from my browsing of ld64 source, it looks like
> native-wrapped bitcode should be handled already there too due to the
> handling being hidden behind the lto_module API).
>
> TeresaQuick question: Is the word required to support ThinLTO using llvm's 
native tools orthogonal to that required to supporting non-llvm tools?  
If not, would it make sense to start with a deployment of entirely LLVM 
based tools - since there seems to be general interest in that - and 
then come back to the non-llvm based tools separately?

Personally, I see both sides here.  I can understand why you want to 
minimize build configuration changes - they tend to be painful - but I 
also am reluctant to design a major enhancement to LLVM under the 
assumption that LLVM's own tools aren't adequate for the purpose. That 
seems like it would be majorly problematic from the perspective of the 
project as a whole.

(I realize that LLVM's tools could simply extract the bitcode out of the 
wrapper file, but that seems unnecessarily complex for an *initial* LLVM 
only solution.)

Philip

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jun 2015 - [LLVMdev] Updated RFC: ThinLTO Implementation Plan

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

[LLVMdev] Updated RFC: ThinLTO Implementation Plan

Apparently Analagous Threads