thr3ads.net - llvm dev - [LLVMdev] RFC: ThinLTO Impementation Plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2015-May-14 21:28 UTC

[LLVMdev] RFC: ThinLTO Impementation Plan

On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at
google.com>
> wrote:
>
>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at
gmail.com>
>> wrote:
>> >>
>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher
<echristo at gmail.com
>> >
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin
at dberlin.org>
>> >>> wrote:
>> >>>>
>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher
<
>> echristo at gmail.com>
>> >>>> wrote:
>> >>>> > I'm not sure this is a particularly great
assumption to make.
>> >>>>
>> >>>> Which part?
>> >>>
>> >>>
>> >>> The binutils part :)
>> >>>
>> >>>>
>> >>>>
>> >>>> >  We have to
>> >>>> > support a lot of different build systems and
tools and
>> concentrating
>> >>>> > on
>> >>>> > something that just binutils uses isn't
particularly friendly here.
>> >>>> I think you may have misunderstood
>> >>>> His point was exactly that they want to be transparent
to *all of*
>> these
>> >>>> tools.
>> >>>> You are saying "we should be friendly to
everyone". He is saying the
>> >>>> same thing.
>> >>>> We should be friendly to everyone. The friendly way to
do this is to
>> >>>> not require all of these tools build plugins to handle
bitcode.
>> >>>>
>> >>>> Hence, elf-wrapped bitcode.
>> >>>
>> >>>
>> >>> Oh, I understood. I just don't know that I agree. To
do anything with
>> the
>> >>> tools will require some knowledge of bitcode anyhow or
need the
>> plugin. I'm
>> >>> saying that as a baseline start we should look at how to
do this
>> using the
>> >>> tools we've got rather than wrapping things for no
real gain.
>> >>
>> >>
>> >> That doesn't seem strictly true - the ar situation (which
I'm lead to
>> >> believe is in use in our build system & others, one would
assume).
>> With the
>> >> symbol table included as proposed, ar can be used without any
>> knowledge of
>> >> the bitcode or need for a plugin.
>> >>
>> >
>> > For some bits, sure. Optimizing for ar seems a bit silly, why not
'ld
>> -r'?
>>
>> But as mentioned, ld -r can work on native object wrapped bitcode
>> without a plugin as well.
>>
>>
> How? It's not like any partial linking is going to go on inside the
> bitcode if the linker doesn't understand bitcode.
>
What do we want plugin to do anything here?  We just need the linker to
concatenate the bitcode sections and produce a combined bitcode file.

>
>
>> > Agreed. The ar situation is interesting because one thing we
discussed
>> after
>> > you wandered off was just adding a ToC section to bitcode as it is
and
>> then
>> > having the tools handle that. Would seem to accomplish at least
the
>> goals as
>> > I've seen them up to this point without worrying too much.
>>
>> The ToC section is a way we can encode the function index/summary into
>> bitcode, but won't help integrate with existing tools. The main
issue
>> we are trying to solve is integrating transparently with existing
>> binutils tools in use in our build system and probably elsewhere.
>>
>>
> Right. I'm not entirely sure what use we're going to see in the
existing
> tools that we want to encompass here. There's some of it for
convenience
> (i.e. nm etc for developers), but they can use a tool that understands
> bitcode and we can make the existing llvm tools suffice for these needs.
>
> I think the way of looking at this is that we can:
>
> a) go with wrapping things in native object formats, this means
>  - some tools continue to work at the cost of additional I/O and space at
> compile/link time
>
Are you sure about the additional I/O? With native symtab, existing tools
just need to read those, while plugin based approach needs to read bit code
section to feedback symbols to the tool.

>  - we still have to update some tools to work at all
>
If any, it will be minimal.

>
> b) we extend those tools/our own tools and have them be drop in
> replacements to the existing tools. They'll understand the bitcode
format
> natively, they'll be smaller, and we'll be able to push the state
of the
> art in tooling/analysis a bit more in the future without having to rework
> thin lto.
>
> It's basically a set of trade-offs and for llvm we've historically
gone
> the b direction.
>
>I am fine making llvm tools work with it, but we should not require/force
user using them. I think this is an orthogonal feature.

David



> >
>> > At any rate, I think this aspect of the proposal needs a bit of
>> discussion
>> > and some mapping out of the pros and cons here.
>>
>> Sure, we can continue to discuss and I will try to lay out the
pros/cons.
>>
>
> Excellent.
>
> -eric
>
>
>>
>> Teresa
>>
>> >
>> > -eric
>> >
>> >>>
>> >>> I've talked to Teresa a bit offline and we're
going to talk more later
>> >>> (and discuss on the list), but there are some discussions
about how
>> to make
>> >>> this work either with just bitcode/llvm tools and so not
requiring
>> >>> integration on all platforms. The latter is what I
consider as
>> particularly
>> >>> friendly :)
>> >>>
>> >>> -eric
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> > I also
>> >>>> > can't imagine how it's necessary for any
of the lto aspects as
>> >>>> > currently
>> >>>> > written in the proposal.
>> >>>> >
>> >>>> > -eric
>> >>>> >
>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li
>> >>>> > <xinliangli at gmail.com>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> The design objective is to make thinLTO
mostly transparent to
>> binutil
>> >>>> >> tools to enable easy integration with any
build system in the
>> wild.
>> >>>> >> 'Pass-through' mode with 'ld
-r' instead of the partial LTO mode
>> is
>> >>>> >> another
>> >>>> >> reason.
>> >>>> >>
>> >>>> >> David
>> >>>> >>
>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa
Johnson
>> >>>> >> <tejohnson at google.com>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric
Christopher
>> >>>> >>> <echristo at gmail.com>
>> >>>> >>> wrote:
>> >>>> >>> > So, what Alex is saying is that we
have these tools as well and
>> >>>> >>> > they
>> >>>> >>> > understand bitcode just fine, as
well as every object format -
>> not
>> >>>> >>> > just
>> >>>> >>> > ELF.
>> >>>> >>> > :)
>> >>>> >>>
>> >>>> >>> Right, there are also LLVM specific
versions (llvm-ar, llvm-nm)
>> that
>> >>>> >>> handle bitcode similarly to the way the
standard tool + plugin
>> does.
>> >>>> >>> But the goal we are trying to achieve is
to allow the standard
>> >>>> >>> system
>> >>>> >>> versions of the tools to handle these
files without requiring a
>> >>>> >>> plugin. I know the LLVM tool handles
other object formats, but
>> I'm
>> >>>> >>> not
>> >>>> >>> sure how that helps here? We're not
planning to replace those
>> tools,
>> >>>> >>> just allow the standard system versions
to handle the
>> intermediate
>> >>>> >>> objects produced by ThinLTO.
>> >>>> >>>
>> >>>> >>> Thanks,
>> >>>> >>> Teresa
>> >>>> >>>
>> >>>> >>> >
>> >>>> >>> > -eric
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa
Johnson
>> >>>> >>> > <tejohnson at google.com>
>> >>>> >>> > wrote:
>> >>>> >>> >>
>> >>>> >>> >> On Wed, May 13, 2015 at 11:23
PM, Xinliang David Li
>> >>>> >>> >> <xinliangli at gmail.com>
wrote:
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >> > On Wed, May 13, 2015 at
10:46 PM, Alex Rosenberg
>> >>>> >>> >> > <alexr at
leftfield.org>
>> >>>> >>> >> > wrote:
>> >>>> >>> >> >>
>> >>>> >>> >> >> "ELF-wrapped
bitcode" seems potentially controversial to
>> me.
>> >>>> >>> >> >>
>> >>>> >>> >> >> What about ar, nm, and
various ld implementations adds this
>> >>>> >>> >> >> requirement?
>> >>>> >>> >> >> What about the LLVM
implementations of these tools is
>> lacking?
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >> > Sorry I can not parse your
questions properly. Can you make
>> it
>> >>>> >>> >> > clearer?
>> >>>> >>> >>
>> >>>> >>> >> Alex is asking what the issue is
with ar, nm, ld -r and
>> regular
>> >>>> >>> >> bitcode that makes using
elf-wrapped bitcode easier.
>> >>>> >>> >>
>> >>>> >>> >> The issue is that generally you
need to provide a plugin to
>> these
>> >>>> >>> >> tools in order for them to
understand and handle bitcode
>> files.
>> >>>> >>> >> We'd
>> >>>> >>> >> like standard tools to work
without requiring a plugin as
>> much as
>> >>>> >>> >> possible. And in some cases we
want them to be handled
>> different
>> >>>> >>> >> than
>> >>>> >>> >> the way bitcode files are
handled with the plugin.
>> >>>> >>> >>
>> >>>> >>> >> nm: Without a plugin, normal
bitcode files are inscrutable.
>> When
>> >>>> >>> >> provided the gold plugin it can
emit the symbols.
>> >>>> >>> >>
>> >>>> >>> >> ar: Without a plugin, it will
create an archive of bitcode
>> files,
>> >>>> >>> >> but
>> >>>> >>> >> without an index, so it
can't be handled by the linker even
>> with
>> >>>> >>> >> a
>> >>>> >>> >> plugin on an -flto link. When ar
is provided the gold plugin
>> it
>> >>>> >>> >> does
>> >>>> >>> >> create an index, so the linker +
gold plugin handle it
>> >>>> >>> >> appropriately
>> >>>> >>> >> on an -flto link.
>> >>>> >>> >>
>> >>>> >>> >> ld -r: Without a plugin, fails
when provided bitcode inputs.
>> When
>> >>>> >>> >> provided the gold plugin, it
handles them but compiles them
>> all
>> >>>> >>> >> the
>> >>>> >>> >> way through to ELF executable
instructions via a partial LTO
>> >>>> >>> >> link.
>> >>>> >>> >> This is where we would like to
differ in behavior (while also
>> not
>> >>>> >>> >> requiring a plugin) with
ELF-wrapped bitcode: we would like
>> the
>> >>>> >>> >> ld -r
>> >>>> >>> >> output file to still contain
ELF-wrapped bitcode, delaying the
>> >>>> >>> >> LTO
>> >>>> >>> >> until the full link step.
>> >>>> >>> >>
>> >>>> >>> >> Let me know if that helps
address your concerns.
>> >>>> >>> >>
>> >>>> >>> >> Thanks,
>> >>>> >>> >> Teresa
>> >>>> >>> >>
>> >>>> >>> >> >
>> >>>> >>> >> > David
>> >>>> >>> >> >
>> >>>> >>> >> >>
>> >>>> >>> >> >>
>> >>>> >>> >> >> Alex
>> >>>> >>> >> >>
>> >>>> >>> >> >> > On May 13, 2015,
at 7:44 PM, Teresa Johnson
>> >>>> >>> >> >> > <tejohnson at
google.com>
>> >>>> >>> >> >> > wrote:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > I've included
below an RFC for implementing ThinLTO in
>> LLVM,
>> >>>> >>> >> >> > looking
>> >>>> >>> >> >> > forward to
feedback and questions.
>> >>>> >>> >> >> > Thanks!
>> >>>> >>> >> >> > Teresa
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > RFC to discuss
plans for implementing ThinLTO upstream.
>> >>>> >>> >> >> > Background
>> >>>> >>> >> >> > can
>> >>>> >>> >> >> > be found in slides
from EuroLLVM 2015:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>> >>>> >>> >> >> > As described in
the talk, we have a prototype
>> >>>> >>> >> >> > implementation,
and
>> >>>> >>> >> >> > would like to
start staging patches upstream. This RFC
>> >>>> >>> >> >> > describes
>> >>>> >>> >> >> > a
>> >>>> >>> >> >> > breakdown of the
major pieces. We would like to commit
>> >>>> >>> >> >> > upstream
>> >>>> >>> >> >> > gradually in
several stages, with all functionality off
>> by
>> >>>> >>> >> >> > default.
>> >>>> >>> >> >> > The core ThinLTO
importing support and tuning will
>> require
>> >>>> >>> >> >> > frequent
>> >>>> >>> >> >> > change and
iteration during testing and tuning, and for
>> that
>> >>>> >>> >> >> > part
>> >>>> >>> >> >> > we
>> >>>> >>> >> >> > would like to
commit rapidly (off by default). See the
>> >>>> >>> >> >> > proposed
>> >>>> >>> >> >> > staged
>> >>>> >>> >> >> > implementation
described in the Implementation Plan
>> section.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > ThinLTO Overview
>> >>>> >>> >> >> >
=============>> >>>> >>> >> >> >
>> >>>> >>> >> >> > See the talk
slides linked above for more details. The
>> >>>> >>> >> >> > following
>> >>>> >>> >> >> > is a
>> >>>> >>> >> >> > high-level
overview of the motivation.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Cross Module
Optimization (CMO) is an effective means for
>> >>>> >>> >> >> > improving
>> >>>> >>> >> >> > runtime
performance, by extending the scope of
>> optimizations
>> >>>> >>> >> >> > across
>> >>>> >>> >> >> > source module
boundaries. Without CMO, the compiler is
>> >>>> >>> >> >> > limited to
>> >>>> >>> >> >> > optimizing within
the scope of single source modules. Two
>> >>>> >>> >> >> > solutions
>> >>>> >>> >> >> > for enabling CMO
are Link-Time Optimization (LTO), which
>> is
>> >>>> >>> >> >> > currently
>> >>>> >>> >> >> > supported in LLVM
and GCC, and
>> Lightweight-Interprocedural
>> >>>> >>> >> >> > Optimization
(LIPO). However, each of these solutions has
>> >>>> >>> >> >> > limitations
>> >>>> >>> >> >> > that prevent it
from being enabled by default. ThinLTO
>> is a
>> >>>> >>> >> >> > new
>> >>>> >>> >> >> > approach that
attempts to address these limitations,
>> with a
>> >>>> >>> >> >> > goal
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > being enabled more
broadly. ThinLTO is designed with
>> many of
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > same
>> >>>> >>> >> >> > principals as
LIPO, and therefore its advantages, without
>> >>>> >>> >> >> > any of
>> >>>> >>> >> >> > its
>> >>>> >>> >> >> > inherent weakness.
Unlike in LIPO where the module group
>> >>>> >>> >> >> > decision
>> >>>> >>> >> >> > is
>> >>>> >>> >> >> > made at profile
training runtime, ThinLTO makes the
>> decision
>> >>>> >>> >> >> > at
>> >>>> >>> >> >> > compile time, but
in a lazy mode that facilitates large
>> >>>> >>> >> >> > scale
>> >>>> >>> >> >> > parallelism. The
serial linker plugin phase is designed
>> to
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > razor
>> >>>> >>> >> >> > thin and blazingly
fast. By default this step only does
>> >>>> >>> >> >> > minimal
>> >>>> >>> >> >> > preparation work
to enable the parallel lazy importing
>> >>>> >>> >> >> > performed
>> >>>> >>> >> >> > later. ThinLTO
aims to be scalable like a regular O2
>> build,
>> >>>> >>> >> >> > enabling
>> >>>> >>> >> >> > CMO on machines
without large memory configurations,
>> while
>> >>>> >>> >> >> > also
>> >>>> >>> >> >> > integrating well
with distributed build systems. Results
>> >>>> >>> >> >> > from
>> >>>> >>> >> >> > early
>> >>>> >>> >> >> > prototyping on
SPEC cpu2006 C++ benchmarks are in line
>> with
>> >>>> >>> >> >> > expectations that
ThinLTO can scale like O2 while
>> enabling
>> >>>> >>> >> >> > much
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > CMO performed
during a full LTO build.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > A ThinLTO build is
divided into 3 phases, which are
>> referred
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > following
implementation plan:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > phase-1: IR and
Function Summary Generation (-c compile)
>> >>>> >>> >> >> > phase-2: Thin
Linker Plugin Layer (thin archive linker
>> step)
>> >>>> >>> >> >> > phase-3: Parallel
Backend with Demand-Driven Importing
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Implementation
Plan
>> >>>> >>> >> >> >
===============>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This section gives
a high-level breakdown of the ThinLTO
>> >>>> >>> >> >> > support
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > will be added, in
roughly the order that the patches
>> would
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > staged.
>> >>>> >>> >> >> > The patches are
divided into three stages. The first
>> stage
>> >>>> >>> >> >> > contains a
>> >>>> >>> >> >> > minimal amount of
preparation work that is not
>> >>>> >>> >> >> > ThinLTO-specific.
>> >>>> >>> >> >> > The
>> >>>> >>> >> >> > second stage
contains most of the infrastructure for
>> >>>> >>> >> >> > ThinLTO,
>> >>>> >>> >> >> > which
>> >>>> >>> >> >> > will be off by
default. The third stage includes
>> >>>> >>> >> >> >
enhancements/improvements/tunings that can be performed
>> >>>> >>> >> >> > after the
>> >>>> >>> >> >> > main
>> >>>> >>> >> >> > ThinLTO
infrastructure is in.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The second and
third implementation stages will
>> initially be
>> >>>> >>> >> >> > very
>> >>>> >>> >> >> > volatile,
requiring a lot of iterations and tuning with
>> >>>> >>> >> >> > large
>> >>>> >>> >> >> > apps to
>> >>>> >>> >> >> > get stabilized.
Therefore it will be important to do fast
>> >>>> >>> >> >> > commits
>> >>>> >>> >> >> > for
>> >>>> >>> >> >> > these
implementation stages.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 1. Stage 1:
Preparation
>> >>>> >>> >> >> >
-------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The first planned
sets of patches are enablers for
>> ThinLTO
>> >>>> >>> >> >> > work:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. LTO directory
structure:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Restructure the
LTO directory to remove circular
>> dependence
>> >>>> >>> >> >> > when
>> >>>> >>> >> >> > ThinLTO pass
added. Because ThinLTO is being implemented
>> as
>> >>>> >>> >> >> > a SCC
>> >>>> >>> >> >> > pass
>> >>>> >>> >> >> > within
Transforms/IPO, and leverages the LTOModule class
>> for
>> >>>> >>> >> >> > linking
>> >>>> >>> >> >> > in functions from
modules, IPO then requires the LTO
>> >>>> >>> >> >> > library.
>> >>>> >>> >> >> > This
>> >>>> >>> >> >> > creates a circular
dependence between LTO and IPO. To
>> break
>> >>>> >>> >> >> > that,
>> >>>> >>> >> >> > we
>> >>>> >>> >> >> > need to split the
lib/LTO directory/library into
>> >>>> >>> >> >> > lib/LTO/CodeGen
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > lib/LTO/Module,
containing LTOCodeGenerator and
>> LTOModule,
>> >>>> >>> >> >> > respectively. Only
LTOCodeGenerator has a dependence on
>> IPO,
>> >>>> >>> >> >> > removing
>> >>>> >>> >> >> > the circular
dependence.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. ELF wrapper
generation support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Implement ELF
wrapped bitcode writer. In order to more
>> >>>> >>> >> >> > easily
>> >>>> >>> >> >> > interact
>> >>>> >>> >> >> > with tools such as
$AR, $NM, and “$LD -r” we plan to emit
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > phase-1
>> >>>> >>> >> >> > bitcode wrapped in
ELF via the .llvmbc section, along
>> with a
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table. The goal is
both to interact with these tools
>> without
>> >>>> >>> >> >> > requiring
>> >>>> >>> >> >> > a plugin, and also
to avoid doing partial LTO/ThinLTO
>> across
>> >>>> >>> >> >> > files
>> >>>> >>> >> >> > linked with “$LD
-r” (i.e. the resulting object file
>> should
>> >>>> >>> >> >> > still
>> >>>> >>> >> >> > contain
ELF-wrapped bitcode to enable ThinLTO at the full
>> >>>> >>> >> >> > link
>> >>>> >>> >> >> > step).
>> >>>> >>> >> >> > I will send a
separate design document for these changes,
>> >>>> >>> >> >> > but the
>> >>>> >>> >> >> > following is a
high-level overview.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support was added
to LLVM for reading ELF-wrapped bitcode
>> >>>> >>> >> >> >
(http://reviews.llvm.org/rL218078), but there does not
>> yet
>> >>>> >>> >> >> > exist
>> >>>> >>> >> >> > support in
LLVM/Clang for emitting bitcode wrapped in
>> ELF. I
>> >>>> >>> >> >> > plan
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > add support for
optionally generating bitcode in an ELF
>> file
>> >>>> >>> >> >> > containing a
single .llvmbc section holding the bitcode.
>> >>>> >>> >> >> > Specifically,
>> >>>> >>> >> >> > the patch would
add new options “emit-llvm-bc-elf”
>> (object
>> >>>> >>> >> >> > file)
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > corresponding
“emit-llvm-elf” (textual assembly code
>> >>>> >>> >> >> > equivalent).
>> >>>> >>> >> >> > Eventually these
would be automatically triggered under
>> >>>> >>> >> >> > “-fthinlto
>> >>>> >>> >> >> > -c”
>> >>>> >>> >> >> > and “-fthinlto
-S”, respectively.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Additionally, a
symbol table will be generated in the ELF
>> >>>> >>> >> >> > file,
>> >>>> >>> >> >> > holding the
function symbols within the bitcode. This
>> >>>> >>> >> >> > facilitates
>> >>>> >>> >> >> > handling archives
of the ELF-wrapped bitcode created with
>> >>>> >>> >> >> > $AR,
>> >>>> >>> >> >> > since
>> >>>> >>> >> >> > the archive will
have a symbol table as well. The archive
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table
>> >>>> >>> >> >> > enables gold to
extract and pass to the plugin the
>> >>>> >>> >> >> > constituent
>> >>>> >>> >> >> > ELF-wrapped
bitcode files. To support the concatenated
>> >>>> >>> >> >> > llvmbc
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > generated by “$LD
-r”, some handling needs to be added to
>> >>>> >>> >> >> > gold
>> >>>> >>> >> >> > and to
>> >>>> >>> >> >> > the backend driver
to process each original module’s
>> >>>> >>> >> >> > bitcode.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The function
index/summary will later be added as a
>> special
>> >>>> >>> >> >> > ELF
>> >>>> >>> >> >> > section alongside
the .llvmbc sections.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 2. Stage 2:
ThinLTO Infrastructure
>> >>>> >>> >> >> >
----------------------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The next set of
patches adds the base implementation of
>> the
>> >>>> >>> >> >> > ThinLTO
>> >>>> >>> >> >> > infrastructure,
specifically those required to make
>> ThinLTO
>> >>>> >>> >> >> > functional
>> >>>> >>> >> >> > and generate
correct but not necessarily high-performing
>> >>>> >>> >> >> > binaries. It
>> >>>> >>> >> >> > also does not
include support to make debug support
>> under -g
>> >>>> >>> >> >> > efficient
>> >>>> >>> >> >> > with ThinLTO.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. Clang/LLVM/gold
linker options:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > An early set of
clang/llvm patches is needed to provide
>> >>>> >>> >> >> > options
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > enable ThinLTO
(off by default), so that the rest of the
>> >>>> >>> >> >> > implementation can
be disabled by default as it is added.
>> >>>> >>> >> >> > Specifically,
clang options -fthinlto (used instead of
>> >>>> >>> >> >> > -flto)
>> >>>> >>> >> >> > will
>> >>>> >>> >> >> > cause clang to
invoke the phase-1 emission of LLVM
>> bitcode
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > function
summary/index on a compile step, and pass the
>> >>>> >>> >> >> > appropriate
>> >>>> >>> >> >> > option to the gold
plugin on a link step. The -thinlto
>> >>>> >>> >> >> > option
>> >>>> >>> >> >> > will be
>> >>>> >>> >> >> > added to the gold
plugin and llvm-lto tool to launch the
>> >>>> >>> >> >> > phase-2
>> >>>> >>> >> >> > thin
>> >>>> >>> >> >> > archive step. The
-thinlto option will also be added to
>> the
>> >>>> >>> >> >> > ‘opt’
>> >>>> >>> >> >> > tool
>> >>>> >>> >> >> > to invoke it as a
phase-3 parallel backend instance.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. Thin-archive
linking support in Gold plugin and
>> llvm-lto:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Under the new
plugin option (see above), the plugin
>> needs to
>> >>>> >>> >> >> > perform
>> >>>> >>> >> >> > the phase-2 (thin
archive) link which simply emits a
>> >>>> >>> >> >> > combined
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > map from the
linked modules, without actually performing
>> the
>> >>>> >>> >> >> > normal
>> >>>> >>> >> >> > link.
Corresponding support should be added to the
>> >>>> >>> >> >> > standalone
>> >>>> >>> >> >> > llvm-lto
>> >>>> >>> >> >> > tool to enable
testing/debugging without involving the
>> >>>> >>> >> >> > linker and
>> >>>> >>> >> >> > plugin.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > c. ThinLTO backend
support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support for
invoking a phase-3 backend invocation
>> (including
>> >>>> >>> >> >> > importing) on a
module should be added to the ‘opt’ tool
>> >>>> >>> >> >> > under
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > new
>> >>>> >>> >> >> > option. The main
change under the option is to
>> instantiate a
>> >>>> >>> >> >> > Linker
>> >>>> >>> >> >> > object used to
manage the process of linking imported
>> >>>> >>> >> >> > functions
>> >>>> >>> >> >> > into
>> >>>> >>> >> >> > the module,
efficient read of the combined function map,
>> and
>> >>>> >>> >> >> > enable
>> >>>> >>> >> >> > the ThinLTO import
pass.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > d. Function
index/summary support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This includes
infrastructure for writing and reading the
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > index/summary
section. As noted earlier this will be
>> encoded
>> >>>> >>> >> >> > in a
>> >>>> >>> >> >> > special ELF
section within the module, alongside the
>> .llvmbc
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > containing the
bitcode. The thin archive generated by
>> >>>> >>> >> >> > phase-2 of
>> >>>> >>> >> >> > ThinLTO simply
contains all of the function index/summary
>> >>>> >>> >> >> > sections
>> >>>> >>> >> >> > across the linked
modules, organized for efficient
>> function
>> >>>> >>> >> >> > lookup.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Each function
available for importing from the module
>> >>>> >>> >> >> > contains an
>> >>>> >>> >> >> > entry in the
module’s function index/summary section and
>> in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > resulting combined
function map. Each function entry
>> >>>> >>> >> >> > contains
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > function’s offset
within the bitcode file, used to
>> >>>> >>> >> >> > efficiently
>> >>>> >>> >> >> > locate
>> >>>> >>> >> >> > and quickly import
just that function. The entry also
>> >>>> >>> >> >> > contains
>> >>>> >>> >> >> > summary
>> >>>> >>> >> >> > information (e.g.
basic information determined during
>> >>>> >>> >> >> > parsing
>> >>>> >>> >> >> > such as
>> >>>> >>> >> >> > the number of
instructions in the function), that will be
>> >>>> >>> >> >> > used to
>> >>>> >>> >> >> > help
>> >>>> >>> >> >> > guide later import
decisions. Because the contents of
>> this
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > will change
frequently during ThinLTO tuning, it should
>> also
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > marked
>> >>>> >>> >> >> > with a version id
for backwards compatibility or version
>> >>>> >>> >> >> > checking.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > e. ThinLTO
importing support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support for the
mechanics of importing functions from
>> other
>> >>>> >>> >> >> > modules,
>> >>>> >>> >> >> > which can go in
gradually as a set of patches since it
>> will
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > off by
>> >>>> >>> >> >> > default. Separate
patches can include:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - BitcodeReader
changes to use function index to
>> >>>> >>> >> >> > import/deserialize
>> >>>> >>> >> >> > single function of
interest (small changes, leverages
>> >>>> >>> >> >> > existing
>> >>>> >>> >> >> > lazy
>> >>>> >>> >> >> > streamer support).
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - Minor LTOModule
changes to pass the ThinLTO function to
>> >>>> >>> >> >> > import
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > its index into
bitcode reader.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - Marking of
imported functions (for use in
>> ThinLTO-specific
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > linking and global
DCE, for example). This can be
>> in-memory
>> >>>> >>> >> >> > initially,
>> >>>> >>> >> >> > but IR support may
be required in order to support
>> streaming
>> >>>> >>> >> >> > bitcode
>> >>>> >>> >> >> > out and back in
again after importing.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - ModuleLinker
changes to do ThinLTO-specific symbol
>> linking
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > static promotion
when necessary. The linkage type of
>> >>>> >>> >> >> > imported
>> >>>> >>> >> >> > functions changes
to AvailableExternallyLinkage, for
>> >>>> >>> >> >> > example.
>> >>>> >>> >> >> > Statics
>> >>>> >>> >> >> > must be promoted
in certain cases, and renamed in
>> consistent
>> >>>> >>> >> >> > ways.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - GlobalDCE
changes to support removing imported
>> functions
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > were
>> >>>> >>> >> >> > not inlined (very
small changes to existing pass logic).
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > f. ThinLTO Import
Driver SCC pass:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Adds
Transforms/IPO/ThinLTO.cpp with framework for doing
>> >>>> >>> >> >> > ThinLTO
>> >>>> >>> >> >> > via
>> >>>> >>> >> >> > an SCC pass,
enabled only under -fthinlto options. The
>> pass
>> >>>> >>> >> >> > includes
>> >>>> >>> >> >> > utilizing the thin
archive (global function
>> index/summary),
>> >>>> >>> >> >> > import
>> >>>> >>> >> >> > decision
heuristics, invocation of LTOModule/ModuleLinker
>> >>>> >>> >> >> > routines
>> >>>> >>> >> >> > that perform the
import, and any necessary callgraph
>> updates
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > verification.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > g. Backend Driver:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > For a single node
build, the gold plugin can simply
>> write a
>> >>>> >>> >> >> > makefile
>> >>>> >>> >> >> > and fork the
parallel backend instances directly via
>> >>>> >>> >> >> > parallel
>> >>>> >>> >> >> > make.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 3. Stage 3:
ThinLTO Tuning and Enhancements
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> ----------------------------------------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This refers to the
patches that are not required for
>> ThinLTO
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > work,
>> >>>> >>> >> >> > but rather to
improve compile time, memory, run-time
>> >>>> >>> >> >> > performance
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > usability.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. Lazy Debug
Metadata Linking:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The prototype
implementation included lazy importing of
>> >>>> >>> >> >> > module-level
>> >>>> >>> >> >> > metadata during
the ThinLTO pass finalization (i.e. after
>> >>>> >>> >> >> > all
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > importing is
complete). This actually applies to all
>> >>>> >>> >> >> > module-level
>> >>>> >>> >> >> > metadata, not just
debug, although it is the largest.
>> This
>> >>>> >>> >> >> > can be
>> >>>> >>> >> >> > added as a
separate set of patches. Changes to
>> >>>> >>> >> >> > BitcodeReader,
>> >>>> >>> >> >> > ValueMapper,
ModuleLinker
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. Import Tuning:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Tuning the import
strategy will be an iterative process
>> that
>> >>>> >>> >> >> > will
>> >>>> >>> >> >> > continue to be
refined over time. It involves several
>> >>>> >>> >> >> > different
>> >>>> >>> >> >> > types
>> >>>> >>> >> >> > of changes: adding
support for recording additional
>> metrics
>> >>>> >>> >> >> > in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > function summary,
such as profile data and optional
>> >>>> >>> >> >> > heavier-weight
>> >>>> >>> >> >> > IPA
>> >>>> >>> >> >> > analyses, and
tuning the import heuristics based on the
>> >>>> >>> >> >> > summary
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > callsite context.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > c. Combined
Function Map Pruning:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The combined
function map can be pruned of functions that
>> >>>> >>> >> >> > are
>> >>>> >>> >> >> > unlikely
>> >>>> >>> >> >> > to benefit from
being imported. For example, during the
>> >>>> >>> >> >> > phase-2
>> >>>> >>> >> >> > thin
>> >>>> >>> >> >> > archive plug step
we can safely omit large and (with
>> profile
>> >>>> >>> >> >> > data)
>> >>>> >>> >> >> > cold functions,
which are unlikely to benefit from being
>> >>>> >>> >> >> > inlined.
>> >>>> >>> >> >> > Additionally, all
but one copy of comdat functions can be
>> >>>> >>> >> >> > suppressed.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > d. Distributed
Build System Integration:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > For a distributed
build system, the gold plugin should
>> write
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > parallel backend
invocations into a makefile, including
>> the
>> >>>> >>> >> >> > mapping
>> >>>> >>> >> >> > from the IR file
to the real object file path, and exit.
>> >>>> >>> >> >> > Additional
>> >>>> >>> >> >> > work needs to be
done in the distributed build system
>> itself
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > distribute and
dispatch the parallel backend jobs to the
>> >>>> >>> >> >> > build
>> >>>> >>> >> >> > cluster.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > e. Dependence
Tracking and Incremental Compiles:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > In order to
support build systems that stage from local
>> >>>> >>> >> >> > disks or
>> >>>> >>> >> >> > network storage,
the plugin will optionally support
>> >>>> >>> >> >> > computation
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > dependent sets of
IR files that each module may import
>> from.
>> >>>> >>> >> >> > This
>> >>>> >>> >> >> > can
>> >>>> >>> >> >> > be computed from
profile data, if it exists, or from the
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table
>> >>>> >>> >> >> > and heuristics if
not. These dependence sets also enable
>> >>>> >>> >> >> > support
>> >>>> >>> >> >> > for
>> >>>> >>> >> >> > incremental
backend compiles.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > --
>> >>>> >>> >> >> > Teresa Johnson |
Software Engineer |
>> tejohnson at google.com |
>> >>>> >>> >> >> > 408-460-2413
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
_______________________________________________
>> >>>> >>> >> >> > LLVM Developers
mailing list
>> >>>> >>> >> >> > LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>> >> >>
>> >>>> >>> >> >>
_______________________________________________
>> >>>> >>> >> >> LLVM Developers mailing
list
>> >>>> >>> >> >> LLVMdev at cs.uiuc.edu 
http://llvm.cs.uiuc.edu
>> >>>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> --
>> >>>> >>> >> Teresa Johnson | Software
Engineer | tejohnson at google.com |
>> >>>> >>> >> 408-460-2413
>> >>>> >>> >>
>> >>>> >>> >>
_______________________________________________
>> >>>> >>> >> LLVM Developers mailing list
>> >>>> >>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>>> >>> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> Teresa Johnson | Software Engineer |
tejohnson at google.com |
>> >>>> >>> 408-460-2413
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> > _______________________________________________
>> >>>> > LLVM Developers mailing list
>> >>>> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com |
408-460-2413
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/484a6408/attachment.html>

Dave Bozier

2015-May-15 12:11 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

> Are you sure about the additional I/O? With native symtab, existing toolsjust need to read those, while plugin based approach needs to read bit code
section to feedback symbols to the tool.

The additional I/O will be quite big if you are going to emit the full
symbol table. Looking at some of our real world links the symbol table and
string tables of all the inputs seen by the linker add up to about 50 -
100mb.

On Thu, May 14, 2015 at 10:28 PM, Xinliang David Li <xinliangli at
gmail.com>
wrote:
>
>
> On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>
>>
>>
>> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at
google.com>
>> wrote:
>>
>>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at
gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>> >>
>>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <
>>> echristo at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin
<dberlin at dberlin.org>
>>> >>> wrote:
>>> >>>>
>>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher
<
>>> echristo at gmail.com>
>>> >>>> wrote:
>>> >>>> > I'm not sure this is a particularly great
assumption to make.
>>> >>>>
>>> >>>> Which part?
>>> >>>
>>> >>>
>>> >>> The binutils part :)
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> >  We have to
>>> >>>> > support a lot of different build systems and
tools and
>>> concentrating
>>> >>>> > on
>>> >>>> > something that just binutils uses isn't
particularly friendly
>>> here.
>>> >>>> I think you may have misunderstood
>>> >>>> His point was exactly that they want to be
transparent to *all of*
>>> these
>>> >>>> tools.
>>> >>>> You are saying "we should be friendly to
everyone". He is saying the
>>> >>>> same thing.
>>> >>>> We should be friendly to everyone. The friendly
way to do this is to
>>> >>>> not require all of these tools build plugins to
handle bitcode.
>>> >>>>
>>> >>>> Hence, elf-wrapped bitcode.
>>> >>>
>>> >>>
>>> >>> Oh, I understood. I just don't know that I agree.
To do anything
>>> with the
>>> >>> tools will require some knowledge of bitcode anyhow or
need the
>>> plugin. I'm
>>> >>> saying that as a baseline start we should look at how
to do this
>>> using the
>>> >>> tools we've got rather than wrapping things for no
real gain.
>>> >>
>>> >>
>>> >> That doesn't seem strictly true - the ar situation
(which I'm lead to
>>> >> believe is in use in our build system & others, one
would assume).
>>> With the
>>> >> symbol table included as proposed, ar can be used without
any
>>> knowledge of
>>> >> the bitcode or need for a plugin.
>>> >>
>>> >
>>> > For some bits, sure. Optimizing for ar seems a bit silly, why
not 'ld
>>> -r'?
>>>
>>> But as mentioned, ld -r can work on native object wrapped bitcode
>>> without a plugin as well.
>>>
>>>
>> How? It's not like any partial linking is going to go on inside the
>> bitcode if the linker doesn't understand bitcode.
>>
>
> What do we want plugin to do anything here?  We just need the linker to
> concatenate the bitcode sections and produce a combined bitcode file.
>
>
>>
>>
>>> > Agreed. The ar situation is interesting because one thing we
discussed
>>> after
>>> > you wandered off was just adding a ToC section to bitcode as
it is and
>>> then
>>> > having the tools handle that. Would seem to accomplish at
least the
>>> goals as
>>> > I've seen them up to this point without worrying too much.
>>>
>>> The ToC section is a way we can encode the function index/summary
into
>>> bitcode, but won't help integrate with existing tools. The main
issue
>>> we are trying to solve is integrating transparently with existing
>>> binutils tools in use in our build system and probably elsewhere.
>>>
>>>
>> Right. I'm not entirely sure what use we're going to see in the
existing
>> tools that we want to encompass here. There's some of it for
convenience
>> (i.e. nm etc for developers), but they can use a tool that understands
>> bitcode and we can make the existing llvm tools suffice for these
needs.
>>
>> I think the way of looking at this is that we can:
>>
>> a) go with wrapping things in native object formats, this means
>>  - some tools continue to work at the cost of additional I/O and space
at
>> compile/link time
>>
>
> Are you sure about the additional I/O? With native symtab, existing tools
> just need to read those, while plugin based approach needs to read bit code
> section to feedback symbols to the tool.
>
>
>>  - we still have to update some tools to work at all
>>
>
> If any, it will be minimal.
>
>
>>
>> b) we extend those tools/our own tools and have them be drop in
>> replacements to the existing tools. They'll understand the bitcode
format
>> natively, they'll be smaller, and we'll be able to push the
state of the
>> art in tooling/analysis a bit more in the future without having to
rework
>> thin lto.
>>
>> It's basically a set of trade-offs and for llvm we've
historically gone
>> the b direction.
>>
>>
> I am fine making llvm tools work with it, but we should not require/force
> user using them. I think this is an orthogonal feature.
>
> David
>
>
>
>
>> >
>>> > At any rate, I think this aspect of the proposal needs a bit
of
>>> discussion
>>> > and some mapping out of the pros and cons here.
>>>
>>> Sure, we can continue to discuss and I will try to lay out the
pros/cons.
>>>
>>
>> Excellent.
>>
>> -eric
>>
>>
>>>
>>> Teresa
>>>
>>> >
>>> > -eric
>>> >
>>> >>>
>>> >>> I've talked to Teresa a bit offline and we're
going to talk more
>>> later
>>> >>> (and discuss on the list), but there are some
discussions about how
>>> to make
>>> >>> this work either with just bitcode/llvm tools and so
not requiring
>>> >>> integration on all platforms. The latter is what I
consider as
>>> particularly
>>> >>> friendly :)
>>> >>>
>>> >>> -eric
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> > I also
>>> >>>> > can't imagine how it's necessary for
any of the lto aspects as
>>> >>>> > currently
>>> >>>> > written in the proposal.
>>> >>>> >
>>> >>>> > -eric
>>> >>>> >
>>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang
David Li
>>> >>>> > <xinliangli at gmail.com>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> The design objective is to make thinLTO
mostly transparent to
>>> binutil
>>> >>>> >> tools to enable easy integration with any
build system in the
>>> wild.
>>> >>>> >> 'Pass-through' mode with 'ld
-r' instead of the partial LTO mode
>>> is
>>> >>>> >> another
>>> >>>> >> reason.
>>> >>>> >>
>>> >>>> >> David
>>> >>>> >>
>>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa
Johnson
>>> >>>> >> <tejohnson at google.com>
>>> >>>> >> wrote:
>>> >>>> >>>
>>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric
Christopher
>>> >>>> >>> <echristo at gmail.com>
>>> >>>> >>> wrote:
>>> >>>> >>> > So, what Alex is saying is that
we have these tools as well
>>> and
>>> >>>> >>> > they
>>> >>>> >>> > understand bitcode just fine, as
well as every object format
>>> - not
>>> >>>> >>> > just
>>> >>>> >>> > ELF.
>>> >>>> >>> > :)
>>> >>>> >>>
>>> >>>> >>> Right, there are also LLVM specific
versions (llvm-ar, llvm-nm)
>>> that
>>> >>>> >>> handle bitcode similarly to the way
the standard tool + plugin
>>> does.
>>> >>>> >>> But the goal we are trying to achieve
is to allow the standard
>>> >>>> >>> system
>>> >>>> >>> versions of the tools to handle these
files without requiring a
>>> >>>> >>> plugin. I know the LLVM tool handles
other object formats, but
>>> I'm
>>> >>>> >>> not
>>> >>>> >>> sure how that helps here? We're
not planning to replace those
>>> tools,
>>> >>>> >>> just allow the standard system
versions to handle the
>>> intermediate
>>> >>>> >>> objects produced by ThinLTO.
>>> >>>> >>>
>>> >>>> >>> Thanks,
>>> >>>> >>> Teresa
>>> >>>> >>>
>>> >>>> >>> >
>>> >>>> >>> > -eric
>>> >>>> >>> >
>>> >>>> >>> >
>>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM
Teresa Johnson
>>> >>>> >>> > <tejohnson at google.com>
>>> >>>> >>> > wrote:
>>> >>>> >>> >>
>>> >>>> >>> >> On Wed, May 13, 2015 at
11:23 PM, Xinliang David Li
>>> >>>> >>> >> <xinliangli at
gmail.com> wrote:
>>> >>>> >>> >> >
>>> >>>> >>> >> >
>>> >>>> >>> >> > On Wed, May 13, 2015 at
10:46 PM, Alex Rosenberg
>>> >>>> >>> >> > <alexr at
leftfield.org>
>>> >>>> >>> >> > wrote:
>>> >>>> >>> >> >>
>>> >>>> >>> >> >> "ELF-wrapped
bitcode" seems potentially controversial to
>>> me.
>>> >>>> >>> >> >>
>>> >>>> >>> >> >> What about ar, nm,
and various ld implementations adds
>>> this
>>> >>>> >>> >> >> requirement?
>>> >>>> >>> >> >> What about the LLVM
implementations of these tools is
>>> lacking?
>>> >>>> >>> >> >
>>> >>>> >>> >> >
>>> >>>> >>> >> > Sorry I can not parse
your questions properly. Can you
>>> make it
>>> >>>> >>> >> > clearer?
>>> >>>> >>> >>
>>> >>>> >>> >> Alex is asking what the
issue is with ar, nm, ld -r and
>>> regular
>>> >>>> >>> >> bitcode that makes using
elf-wrapped bitcode easier.
>>> >>>> >>> >>
>>> >>>> >>> >> The issue is that generally
you need to provide a plugin to
>>> these
>>> >>>> >>> >> tools in order for them to
understand and handle bitcode
>>> files.
>>> >>>> >>> >> We'd
>>> >>>> >>> >> like standard tools to work
without requiring a plugin as
>>> much as
>>> >>>> >>> >> possible. And in some cases
we want them to be handled
>>> different
>>> >>>> >>> >> than
>>> >>>> >>> >> the way bitcode files are
handled with the plugin.
>>> >>>> >>> >>
>>> >>>> >>> >> nm: Without a plugin, normal
bitcode files are inscrutable.
>>> When
>>> >>>> >>> >> provided the gold plugin it
can emit the symbols.
>>> >>>> >>> >>
>>> >>>> >>> >> ar: Without a plugin, it
will create an archive of bitcode
>>> files,
>>> >>>> >>> >> but
>>> >>>> >>> >> without an index, so it
can't be handled by the linker even
>>> with
>>> >>>> >>> >> a
>>> >>>> >>> >> plugin on an -flto link.
When ar is provided the gold plugin
>>> it
>>> >>>> >>> >> does
>>> >>>> >>> >> create an index, so the
linker + gold plugin handle it
>>> >>>> >>> >> appropriately
>>> >>>> >>> >> on an -flto link.
>>> >>>> >>> >>
>>> >>>> >>> >> ld -r: Without a plugin,
fails when provided bitcode inputs.
>>> When
>>> >>>> >>> >> provided the gold plugin, it
handles them but compiles them
>>> all
>>> >>>> >>> >> the
>>> >>>> >>> >> way through to ELF
executable instructions via a partial LTO
>>> >>>> >>> >> link.
>>> >>>> >>> >> This is where we would like
to differ in behavior (while
>>> also not
>>> >>>> >>> >> requiring a plugin) with
ELF-wrapped bitcode: we would like
>>> the
>>> >>>> >>> >> ld -r
>>> >>>> >>> >> output file to still contain
ELF-wrapped bitcode, delaying
>>> the
>>> >>>> >>> >> LTO
>>> >>>> >>> >> until the full link step.
>>> >>>> >>> >>
>>> >>>> >>> >> Let me know if that helps
address your concerns.
>>> >>>> >>> >>
>>> >>>> >>> >> Thanks,
>>> >>>> >>> >> Teresa
>>> >>>> >>> >>
>>> >>>> >>> >> >
>>> >>>> >>> >> > David
>>> >>>> >>> >> >
>>> >>>> >>> >> >>
>>> >>>> >>> >> >>
>>> >>>> >>> >> >> Alex
>>> >>>> >>> >> >>
>>> >>>> >>> >> >> > On May 13,
2015, at 7:44 PM, Teresa Johnson
>>> >>>> >>> >> >> > <tejohnson
at google.com>
>>> >>>> >>> >> >> > wrote:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > I've
included below an RFC for implementing ThinLTO in
>>> LLVM,
>>> >>>> >>> >> >> > looking
>>> >>>> >>> >> >> > forward to
feedback and questions.
>>> >>>> >>> >> >> > Thanks!
>>> >>>> >>> >> >> > Teresa
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > RFC to discuss
plans for implementing ThinLTO upstream.
>>> >>>> >>> >> >> > Background
>>> >>>> >>> >> >> > can
>>> >>>> >>> >> >> > be found in
slides from EuroLLVM 2015:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>>> )
>>> >>>> >>> >> >> > As described
in the talk, we have a prototype
>>> >>>> >>> >> >> >
implementation, and
>>> >>>> >>> >> >> > would like to
start staging patches upstream. This RFC
>>> >>>> >>> >> >> > describes
>>> >>>> >>> >> >> > a
>>> >>>> >>> >> >> > breakdown of
the major pieces. We would like to commit
>>> >>>> >>> >> >> > upstream
>>> >>>> >>> >> >> > gradually in
several stages, with all functionality off
>>> by
>>> >>>> >>> >> >> > default.
>>> >>>> >>> >> >> > The core
ThinLTO importing support and tuning will
>>> require
>>> >>>> >>> >> >> > frequent
>>> >>>> >>> >> >> > change and
iteration during testing and tuning, and for
>>> that
>>> >>>> >>> >> >> > part
>>> >>>> >>> >> >> > we
>>> >>>> >>> >> >> > would like to
commit rapidly (off by default). See the
>>> >>>> >>> >> >> > proposed
>>> >>>> >>> >> >> > staged
>>> >>>> >>> >> >> > implementation
described in the Implementation Plan
>>> section.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > ThinLTO
Overview
>>> >>>> >>> >> >> >
=============>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > See the talk
slides linked above for more details. The
>>> >>>> >>> >> >> > following
>>> >>>> >>> >> >> > is a
>>> >>>> >>> >> >> > high-level
overview of the motivation.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Cross Module
Optimization (CMO) is an effective means
>>> for
>>> >>>> >>> >> >> > improving
>>> >>>> >>> >> >> > runtime
performance, by extending the scope of
>>> optimizations
>>> >>>> >>> >> >> > across
>>> >>>> >>> >> >> > source module
boundaries. Without CMO, the compiler is
>>> >>>> >>> >> >> > limited to
>>> >>>> >>> >> >> > optimizing
within the scope of single source modules.
>>> Two
>>> >>>> >>> >> >> > solutions
>>> >>>> >>> >> >> > for enabling
CMO are Link-Time Optimization (LTO),
>>> which is
>>> >>>> >>> >> >> > currently
>>> >>>> >>> >> >> > supported in
LLVM and GCC, and
>>> Lightweight-Interprocedural
>>> >>>> >>> >> >> > Optimization
(LIPO). However, each of these solutions
>>> has
>>> >>>> >>> >> >> > limitations
>>> >>>> >>> >> >> > that prevent
it from being enabled by default. ThinLTO
>>> is a
>>> >>>> >>> >> >> > new
>>> >>>> >>> >> >> > approach that
attempts to address these limitations,
>>> with a
>>> >>>> >>> >> >> > goal
>>> >>>> >>> >> >> > of
>>> >>>> >>> >> >> > being enabled
more broadly. ThinLTO is designed with
>>> many of
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > same
>>> >>>> >>> >> >> > principals as
LIPO, and therefore its advantages,
>>> without
>>> >>>> >>> >> >> > any of
>>> >>>> >>> >> >> > its
>>> >>>> >>> >> >> > inherent
weakness. Unlike in LIPO where the module group
>>> >>>> >>> >> >> > decision
>>> >>>> >>> >> >> > is
>>> >>>> >>> >> >> > made at
profile training runtime, ThinLTO makes the
>>> decision
>>> >>>> >>> >> >> > at
>>> >>>> >>> >> >> > compile time,
but in a lazy mode that facilitates large
>>> >>>> >>> >> >> > scale
>>> >>>> >>> >> >> > parallelism.
The serial linker plugin phase is designed
>>> to
>>> >>>> >>> >> >> > be
>>> >>>> >>> >> >> > razor
>>> >>>> >>> >> >> > thin and
blazingly fast. By default this step only does
>>> >>>> >>> >> >> > minimal
>>> >>>> >>> >> >> > preparation
work to enable the parallel lazy importing
>>> >>>> >>> >> >> > performed
>>> >>>> >>> >> >> > later. ThinLTO
aims to be scalable like a regular O2
>>> build,
>>> >>>> >>> >> >> > enabling
>>> >>>> >>> >> >> > CMO on
machines without large memory configurations,
>>> while
>>> >>>> >>> >> >> > also
>>> >>>> >>> >> >> > integrating
well with distributed build systems. Results
>>> >>>> >>> >> >> > from
>>> >>>> >>> >> >> > early
>>> >>>> >>> >> >> > prototyping on
SPEC cpu2006 C++ benchmarks are in line
>>> with
>>> >>>> >>> >> >> > expectations
that ThinLTO can scale like O2 while
>>> enabling
>>> >>>> >>> >> >> > much
>>> >>>> >>> >> >> > of
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > CMO performed
during a full LTO build.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > A ThinLTO
build is divided into 3 phases, which are
>>> referred
>>> >>>> >>> >> >> > to
>>> >>>> >>> >> >> > in
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > following
implementation plan:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > phase-1: IR
and Function Summary Generation (-c compile)
>>> >>>> >>> >> >> > phase-2: Thin
Linker Plugin Layer (thin archive linker
>>> step)
>>> >>>> >>> >> >> > phase-3:
Parallel Backend with Demand-Driven Importing
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Implementation
Plan
>>> >>>> >>> >> >> >
===============>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > This section
gives a high-level breakdown of the ThinLTO
>>> >>>> >>> >> >> > support
>>> >>>> >>> >> >> > that
>>> >>>> >>> >> >> > will be added,
in roughly the order that the patches
>>> would
>>> >>>> >>> >> >> > be
>>> >>>> >>> >> >> > staged.
>>> >>>> >>> >> >> > The patches
are divided into three stages. The first
>>> stage
>>> >>>> >>> >> >> > contains a
>>> >>>> >>> >> >> > minimal amount
of preparation work that is not
>>> >>>> >>> >> >> >
ThinLTO-specific.
>>> >>>> >>> >> >> > The
>>> >>>> >>> >> >> > second stage
contains most of the infrastructure for
>>> >>>> >>> >> >> > ThinLTO,
>>> >>>> >>> >> >> > which
>>> >>>> >>> >> >> > will be off by
default. The third stage includes
>>> >>>> >>> >> >> >
enhancements/improvements/tunings that can be performed
>>> >>>> >>> >> >> > after the
>>> >>>> >>> >> >> > main
>>> >>>> >>> >> >> > ThinLTO
infrastructure is in.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The second and
third implementation stages will
>>> initially be
>>> >>>> >>> >> >> > very
>>> >>>> >>> >> >> > volatile,
requiring a lot of iterations and tuning with
>>> >>>> >>> >> >> > large
>>> >>>> >>> >> >> > apps to
>>> >>>> >>> >> >> > get
stabilized. Therefore it will be important to do
>>> fast
>>> >>>> >>> >> >> > commits
>>> >>>> >>> >> >> > for
>>> >>>> >>> >> >> > these
implementation stages.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > 1. Stage 1:
Preparation
>>> >>>> >>> >> >> >
-------------------------------
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The first
planned sets of patches are enablers for
>>> ThinLTO
>>> >>>> >>> >> >> > work:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > a. LTO
directory structure:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Restructure
the LTO directory to remove circular
>>> dependence
>>> >>>> >>> >> >> > when
>>> >>>> >>> >> >> > ThinLTO pass
added. Because ThinLTO is being
>>> implemented as
>>> >>>> >>> >> >> > a SCC
>>> >>>> >>> >> >> > pass
>>> >>>> >>> >> >> > within
Transforms/IPO, and leverages the LTOModule
>>> class for
>>> >>>> >>> >> >> > linking
>>> >>>> >>> >> >> > in functions
from modules, IPO then requires the LTO
>>> >>>> >>> >> >> > library.
>>> >>>> >>> >> >> > This
>>> >>>> >>> >> >> > creates a
circular dependence between LTO and IPO. To
>>> break
>>> >>>> >>> >> >> > that,
>>> >>>> >>> >> >> > we
>>> >>>> >>> >> >> > need to split
the lib/LTO directory/library into
>>> >>>> >>> >> >> >
lib/LTO/CodeGen
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> >
lib/LTO/Module, containing LTOCodeGenerator and
>>> LTOModule,
>>> >>>> >>> >> >> > respectively.
Only LTOCodeGenerator has a dependence on
>>> IPO,
>>> >>>> >>> >> >> > removing
>>> >>>> >>> >> >> > the circular
dependence.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > b. ELF wrapper
generation support:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Implement ELF
wrapped bitcode writer. In order to more
>>> >>>> >>> >> >> > easily
>>> >>>> >>> >> >> > interact
>>> >>>> >>> >> >> > with tools
such as $AR, $NM, and “$LD -r” we plan to
>>> emit
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > phase-1
>>> >>>> >>> >> >> > bitcode
wrapped in ELF via the .llvmbc section, along
>>> with a
>>> >>>> >>> >> >> > symbol
>>> >>>> >>> >> >> > table. The
goal is both to interact with these tools
>>> without
>>> >>>> >>> >> >> > requiring
>>> >>>> >>> >> >> > a plugin, and
also to avoid doing partial LTO/ThinLTO
>>> across
>>> >>>> >>> >> >> > files
>>> >>>> >>> >> >> > linked with
“$LD -r” (i.e. the resulting object file
>>> should
>>> >>>> >>> >> >> > still
>>> >>>> >>> >> >> > contain
ELF-wrapped bitcode to enable ThinLTO at the
>>> full
>>> >>>> >>> >> >> > link
>>> >>>> >>> >> >> > step).
>>> >>>> >>> >> >> > I will send a
separate design document for these
>>> changes,
>>> >>>> >>> >> >> > but the
>>> >>>> >>> >> >> > following is a
high-level overview.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Support was
added to LLVM for reading ELF-wrapped
>>> bitcode
>>> >>>> >>> >> >> >
(http://reviews.llvm.org/rL218078), but there does not
>>> yet
>>> >>>> >>> >> >> > exist
>>> >>>> >>> >> >> > support in
LLVM/Clang for emitting bitcode wrapped in
>>> ELF. I
>>> >>>> >>> >> >> > plan
>>> >>>> >>> >> >> > to
>>> >>>> >>> >> >> > add support
for optionally generating bitcode in an ELF
>>> file
>>> >>>> >>> >> >> > containing a
single .llvmbc section holding the bitcode.
>>> >>>> >>> >> >> > Specifically,
>>> >>>> >>> >> >> > the patch
would add new options “emit-llvm-bc-elf”
>>> (object
>>> >>>> >>> >> >> > file)
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > corresponding
“emit-llvm-elf” (textual assembly code
>>> >>>> >>> >> >> > equivalent).
>>> >>>> >>> >> >> > Eventually
these would be automatically triggered under
>>> >>>> >>> >> >> > “-fthinlto
>>> >>>> >>> >> >> > -c”
>>> >>>> >>> >> >> > and “-fthinlto
-S”, respectively.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Additionally,
a symbol table will be generated in the
>>> ELF
>>> >>>> >>> >> >> > file,
>>> >>>> >>> >> >> > holding the
function symbols within the bitcode. This
>>> >>>> >>> >> >> > facilitates
>>> >>>> >>> >> >> > handling
archives of the ELF-wrapped bitcode created
>>> with
>>> >>>> >>> >> >> > $AR,
>>> >>>> >>> >> >> > since
>>> >>>> >>> >> >> > the archive
will have a symbol table as well. The
>>> archive
>>> >>>> >>> >> >> > symbol
>>> >>>> >>> >> >> > table
>>> >>>> >>> >> >> > enables gold
to extract and pass to the plugin the
>>> >>>> >>> >> >> > constituent
>>> >>>> >>> >> >> > ELF-wrapped
bitcode files. To support the concatenated
>>> >>>> >>> >> >> > llvmbc
>>> >>>> >>> >> >> > section
>>> >>>> >>> >> >> > generated by
“$LD -r”, some handling needs to be added
>>> to
>>> >>>> >>> >> >> > gold
>>> >>>> >>> >> >> > and to
>>> >>>> >>> >> >> > the backend
driver to process each original module’s
>>> >>>> >>> >> >> > bitcode.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The function
index/summary will later be added as a
>>> special
>>> >>>> >>> >> >> > ELF
>>> >>>> >>> >> >> > section
alongside the .llvmbc sections.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > 2. Stage 2:
ThinLTO Infrastructure
>>> >>>> >>> >> >> >
----------------------------------------------
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The next set
of patches adds the base implementation of
>>> the
>>> >>>> >>> >> >> > ThinLTO
>>> >>>> >>> >> >> >
infrastructure, specifically those required to make
>>> ThinLTO
>>> >>>> >>> >> >> > functional
>>> >>>> >>> >> >> > and generate
correct but not necessarily high-performing
>>> >>>> >>> >> >> > binaries. It
>>> >>>> >>> >> >> > also does not
include support to make debug support
>>> under -g
>>> >>>> >>> >> >> > efficient
>>> >>>> >>> >> >> > with ThinLTO.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > a.
Clang/LLVM/gold linker options:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > An early set
of clang/llvm patches is needed to provide
>>> >>>> >>> >> >> > options
>>> >>>> >>> >> >> > to
>>> >>>> >>> >> >> > enable ThinLTO
(off by default), so that the rest of the
>>> >>>> >>> >> >> > implementation
can be disabled by default as it is
>>> added.
>>> >>>> >>> >> >> > Specifically,
clang options -fthinlto (used instead of
>>> >>>> >>> >> >> > -flto)
>>> >>>> >>> >> >> > will
>>> >>>> >>> >> >> > cause clang to
invoke the phase-1 emission of LLVM
>>> bitcode
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > function
summary/index on a compile step, and pass the
>>> >>>> >>> >> >> > appropriate
>>> >>>> >>> >> >> > option to the
gold plugin on a link step. The -thinlto
>>> >>>> >>> >> >> > option
>>> >>>> >>> >> >> > will be
>>> >>>> >>> >> >> > added to the
gold plugin and llvm-lto tool to launch the
>>> >>>> >>> >> >> > phase-2
>>> >>>> >>> >> >> > thin
>>> >>>> >>> >> >> > archive step.
The -thinlto option will also be added to
>>> the
>>> >>>> >>> >> >> > ‘opt’
>>> >>>> >>> >> >> > tool
>>> >>>> >>> >> >> > to invoke it
as a phase-3 parallel backend instance.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > b.
Thin-archive linking support in Gold plugin and
>>> llvm-lto:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Under the new
plugin option (see above), the plugin
>>> needs to
>>> >>>> >>> >> >> > perform
>>> >>>> >>> >> >> > the phase-2
(thin archive) link which simply emits a
>>> >>>> >>> >> >> > combined
>>> >>>> >>> >> >> > function
>>> >>>> >>> >> >> > map from the
linked modules, without actually
>>> performing the
>>> >>>> >>> >> >> > normal
>>> >>>> >>> >> >> > link.
Corresponding support should be added to the
>>> >>>> >>> >> >> > standalone
>>> >>>> >>> >> >> > llvm-lto
>>> >>>> >>> >> >> > tool to enable
testing/debugging without involving the
>>> >>>> >>> >> >> > linker and
>>> >>>> >>> >> >> > plugin.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > c. ThinLTO
backend support:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Support for
invoking a phase-3 backend invocation
>>> (including
>>> >>>> >>> >> >> > importing) on
a module should be added to the ‘opt’ tool
>>> >>>> >>> >> >> > under
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > new
>>> >>>> >>> >> >> > option. The
main change under the option is to
>>> instantiate a
>>> >>>> >>> >> >> > Linker
>>> >>>> >>> >> >> > object used to
manage the process of linking imported
>>> >>>> >>> >> >> > functions
>>> >>>> >>> >> >> > into
>>> >>>> >>> >> >> > the module,
efficient read of the combined function
>>> map, and
>>> >>>> >>> >> >> > enable
>>> >>>> >>> >> >> > the ThinLTO
import pass.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > d. Function
index/summary support:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > This includes
infrastructure for writing and reading the
>>> >>>> >>> >> >> > function
>>> >>>> >>> >> >> > index/summary
section. As noted earlier this will be
>>> encoded
>>> >>>> >>> >> >> > in a
>>> >>>> >>> >> >> > special ELF
section within the module, alongside the
>>> .llvmbc
>>> >>>> >>> >> >> > section
>>> >>>> >>> >> >> > containing the
bitcode. The thin archive generated by
>>> >>>> >>> >> >> > phase-2 of
>>> >>>> >>> >> >> > ThinLTO simply
contains all of the function
>>> index/summary
>>> >>>> >>> >> >> > sections
>>> >>>> >>> >> >> > across the
linked modules, organized for efficient
>>> function
>>> >>>> >>> >> >> > lookup.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Each function
available for importing from the module
>>> >>>> >>> >> >> > contains an
>>> >>>> >>> >> >> > entry in the
module’s function index/summary section
>>> and in
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > resulting
combined function map. Each function entry
>>> >>>> >>> >> >> > contains
>>> >>>> >>> >> >> > that
>>> >>>> >>> >> >> > function’s
offset within the bitcode file, used to
>>> >>>> >>> >> >> > efficiently
>>> >>>> >>> >> >> > locate
>>> >>>> >>> >> >> > and quickly
import just that function. The entry also
>>> >>>> >>> >> >> > contains
>>> >>>> >>> >> >> > summary
>>> >>>> >>> >> >> > information
(e.g. basic information determined during
>>> >>>> >>> >> >> > parsing
>>> >>>> >>> >> >> > such as
>>> >>>> >>> >> >> > the number of
instructions in the function), that will
>>> be
>>> >>>> >>> >> >> > used to
>>> >>>> >>> >> >> > help
>>> >>>> >>> >> >> > guide later
import decisions. Because the contents of
>>> this
>>> >>>> >>> >> >> > section
>>> >>>> >>> >> >> > will change
frequently during ThinLTO tuning, it should
>>> also
>>> >>>> >>> >> >> > be
>>> >>>> >>> >> >> > marked
>>> >>>> >>> >> >> > with a version
id for backwards compatibility or version
>>> >>>> >>> >> >> > checking.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > e. ThinLTO
importing support:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Support for
the mechanics of importing functions from
>>> other
>>> >>>> >>> >> >> > modules,
>>> >>>> >>> >> >> > which can go
in gradually as a set of patches since it
>>> will
>>> >>>> >>> >> >> > be
>>> >>>> >>> >> >> > off by
>>> >>>> >>> >> >> > default.
Separate patches can include:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > -
BitcodeReader changes to use function index to
>>> >>>> >>> >> >> >
import/deserialize
>>> >>>> >>> >> >> > single
function of interest (small changes, leverages
>>> >>>> >>> >> >> > existing
>>> >>>> >>> >> >> > lazy
>>> >>>> >>> >> >> > streamer
support).
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > - Minor
LTOModule changes to pass the ThinLTO function
>>> to
>>> >>>> >>> >> >> > import
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > its index into
bitcode reader.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > - Marking of
imported functions (for use in
>>> ThinLTO-specific
>>> >>>> >>> >> >> > symbol
>>> >>>> >>> >> >> > linking and
global DCE, for example). This can be
>>> in-memory
>>> >>>> >>> >> >> > initially,
>>> >>>> >>> >> >> > but IR support
may be required in order to support
>>> streaming
>>> >>>> >>> >> >> > bitcode
>>> >>>> >>> >> >> > out and back
in again after importing.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > - ModuleLinker
changes to do ThinLTO-specific symbol
>>> linking
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > static
promotion when necessary. The linkage type of
>>> >>>> >>> >> >> > imported
>>> >>>> >>> >> >> > functions
changes to AvailableExternallyLinkage, for
>>> >>>> >>> >> >> > example.
>>> >>>> >>> >> >> > Statics
>>> >>>> >>> >> >> > must be
promoted in certain cases, and renamed in
>>> consistent
>>> >>>> >>> >> >> > ways.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > - GlobalDCE
changes to support removing imported
>>> functions
>>> >>>> >>> >> >> > that
>>> >>>> >>> >> >> > were
>>> >>>> >>> >> >> > not inlined
(very small changes to existing pass logic).
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > f. ThinLTO
Import Driver SCC pass:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Adds
Transforms/IPO/ThinLTO.cpp with framework for doing
>>> >>>> >>> >> >> > ThinLTO
>>> >>>> >>> >> >> > via
>>> >>>> >>> >> >> > an SCC pass,
enabled only under -fthinlto options. The
>>> pass
>>> >>>> >>> >> >> > includes
>>> >>>> >>> >> >> > utilizing the
thin archive (global function
>>> index/summary),
>>> >>>> >>> >> >> > import
>>> >>>> >>> >> >> > decision
heuristics, invocation of
>>> LTOModule/ModuleLinker
>>> >>>> >>> >> >> > routines
>>> >>>> >>> >> >> > that perform
the import, and any necessary callgraph
>>> updates
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > verification.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > g. Backend
Driver:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > For a single
node build, the gold plugin can simply
>>> write a
>>> >>>> >>> >> >> > makefile
>>> >>>> >>> >> >> > and fork the
parallel backend instances directly via
>>> >>>> >>> >> >> > parallel
>>> >>>> >>> >> >> > make.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > 3. Stage 3:
ThinLTO Tuning and Enhancements
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> ----------------------------------------------------------------
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > This refers to
the patches that are not required for
>>> ThinLTO
>>> >>>> >>> >> >> > to
>>> >>>> >>> >> >> > work,
>>> >>>> >>> >> >> > but rather to
improve compile time, memory, run-time
>>> >>>> >>> >> >> > performance
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > usability.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > a. Lazy Debug
Metadata Linking:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The prototype
implementation included lazy importing of
>>> >>>> >>> >> >> > module-level
>>> >>>> >>> >> >> > metadata
during the ThinLTO pass finalization (i.e.
>>> after
>>> >>>> >>> >> >> > all
>>> >>>> >>> >> >> > function
>>> >>>> >>> >> >> > importing is
complete). This actually applies to all
>>> >>>> >>> >> >> > module-level
>>> >>>> >>> >> >> > metadata, not
just debug, although it is the largest.
>>> This
>>> >>>> >>> >> >> > can be
>>> >>>> >>> >> >> > added as a
separate set of patches. Changes to
>>> >>>> >>> >> >> > BitcodeReader,
>>> >>>> >>> >> >> > ValueMapper,
ModuleLinker
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > b. Import
Tuning:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > Tuning the
import strategy will be an iterative process
>>> that
>>> >>>> >>> >> >> > will
>>> >>>> >>> >> >> > continue to be
refined over time. It involves several
>>> >>>> >>> >> >> > different
>>> >>>> >>> >> >> > types
>>> >>>> >>> >> >> > of changes:
adding support for recording additional
>>> metrics
>>> >>>> >>> >> >> > in
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > function
summary, such as profile data and optional
>>> >>>> >>> >> >> > heavier-weight
>>> >>>> >>> >> >> > IPA
>>> >>>> >>> >> >> > analyses, and
tuning the import heuristics based on the
>>> >>>> >>> >> >> > summary
>>> >>>> >>> >> >> > and
>>> >>>> >>> >> >> > callsite
context.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > c. Combined
Function Map Pruning:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > The combined
function map can be pruned of functions
>>> that
>>> >>>> >>> >> >> > are
>>> >>>> >>> >> >> > unlikely
>>> >>>> >>> >> >> > to benefit
from being imported. For example, during the
>>> >>>> >>> >> >> > phase-2
>>> >>>> >>> >> >> > thin
>>> >>>> >>> >> >> > archive plug
step we can safely omit large and (with
>>> profile
>>> >>>> >>> >> >> > data)
>>> >>>> >>> >> >> > cold
functions, which are unlikely to benefit from being
>>> >>>> >>> >> >> > inlined.
>>> >>>> >>> >> >> > Additionally,
all but one copy of comdat functions can
>>> be
>>> >>>> >>> >> >> > suppressed.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > d. Distributed
Build System Integration:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > For a
distributed build system, the gold plugin should
>>> write
>>> >>>> >>> >> >> > the
>>> >>>> >>> >> >> > parallel
backend invocations into a makefile, including
>>> the
>>> >>>> >>> >> >> > mapping
>>> >>>> >>> >> >> > from the IR
file to the real object file path, and exit.
>>> >>>> >>> >> >> > Additional
>>> >>>> >>> >> >> > work needs to
be done in the distributed build system
>>> itself
>>> >>>> >>> >> >> > to
>>> >>>> >>> >> >> > distribute and
dispatch the parallel backend jobs to the
>>> >>>> >>> >> >> > build
>>> >>>> >>> >> >> > cluster.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > e. Dependence
Tracking and Incremental Compiles:
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > In order to
support build systems that stage from local
>>> >>>> >>> >> >> > disks or
>>> >>>> >>> >> >> > network
storage, the plugin will optionally support
>>> >>>> >>> >> >> > computation
>>> >>>> >>> >> >> > of
>>> >>>> >>> >> >> > dependent sets
of IR files that each module may import
>>> from.
>>> >>>> >>> >> >> > This
>>> >>>> >>> >> >> > can
>>> >>>> >>> >> >> > be computed
from profile data, if it exists, or from the
>>> >>>> >>> >> >> > symbol
>>> >>>> >>> >> >> > table
>>> >>>> >>> >> >> > and heuristics
if not. These dependence sets also enable
>>> >>>> >>> >> >> > support
>>> >>>> >>> >> >> > for
>>> >>>> >>> >> >> > incremental
backend compiles.
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> > --
>>> >>>> >>> >> >> > Teresa Johnson
| Software Engineer |
>>> tejohnson at google.com |
>>> >>>> >>> >> >> > 408-460-2413
>>> >>>> >>> >> >> >
>>> >>>> >>> >> >> >
_______________________________________________
>>> >>>> >>> >> >> > LLVM
Developers mailing list
>>> >>>> >>> >> >> > LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >>>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>> >>> >> >>
>>> >>>> >>> >> >>
_______________________________________________
>>> >>>> >>> >> >> LLVM Developers
mailing list
>>> >>>> >>> >> >> LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >>>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>> >>> >> >
>>> >>>> >>> >> >
>>> >>>> >>> >>
>>> >>>> >>> >>
>>> >>>> >>> >>
>>> >>>> >>> >> --
>>> >>>> >>> >> Teresa Johnson | Software
Engineer | tejohnson at google.com |
>>> >>>> >>> >> 408-460-2413
>>> >>>> >>> >>
>>> >>>> >>> >>
_______________________________________________
>>> >>>> >>> >> LLVM Developers mailing list
>>> >>>> >>> >> LLVMdev at cs.uiuc.edu      
http://llvm.cs.uiuc.edu
>>> >>>> >>> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> --
>>> >>>> >>> Teresa Johnson | Software Engineer |
tejohnson at google.com |
>>> >>>> >>> 408-460-2413
>>> >>>> >>
>>> >>>> >>
>>> >>>> >
>>> >>>> >
_______________________________________________
>>> >>>> > LLVM Developers mailing list
>>> >>>> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>> >>>> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>> >
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> LLVM Developers mailing list
>>> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>>
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
408-460-2413
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/0a0fc743/attachment.html>

Xinliang David Li

2015-May-15 15:26 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

There is no need for emitting the full symtab. I checked the overhead with
a huge internal C++ source. The overhead of symtab + str table compared
with byte code with debug is about 3%.

More importantly, it is also possible to use the symtab also for
index/summary purpose, which makes the space usage completely
'unwasted'.
That gets into the details which will follow when patches are in.

David

On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote:
> > Are you sure about the additional I/O? With native symtab, existing
> tools just need to read those, while plugin based approach needs to read
> bit code section to feedback symbols to the tool.
>
> The additional I/O will be quite big if you are going to emit the full
> symbol table. Looking at some of our real world links the symbol table and
> string tables of all the inputs seen by the linker add up to about 50 -
> 100mb.
>
> On Thu, May 14, 2015 at 10:28 PM, Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>
>>
>>
>> On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at
google.com>
>>> wrote:
>>>
>>>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo
at gmail.com>
>>>> wrote:
>>>> >
>>>> >
>>>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie
at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher
<
>>>> echristo at gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin
<dberlin at dberlin.org
>>>> >
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric
Christopher <
>>>> echristo at gmail.com>
>>>> >>>> wrote:
>>>> >>>> > I'm not sure this is a particularly
great assumption to make.
>>>> >>>>
>>>> >>>> Which part?
>>>> >>>
>>>> >>>
>>>> >>> The binutils part :)
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> >  We have to
>>>> >>>> > support a lot of different build systems
and tools and
>>>> concentrating
>>>> >>>> > on
>>>> >>>> > something that just binutils uses
isn't particularly friendly
>>>> here.
>>>> >>>> I think you may have misunderstood
>>>> >>>> His point was exactly that they want to be
transparent to *all of*
>>>> these
>>>> >>>> tools.
>>>> >>>> You are saying "we should be friendly to
everyone". He is saying
>>>> the
>>>> >>>> same thing.
>>>> >>>> We should be friendly to everyone. The
friendly way to do this is
>>>> to
>>>> >>>> not require all of these tools build plugins
to handle bitcode.
>>>> >>>>
>>>> >>>> Hence, elf-wrapped bitcode.
>>>> >>>
>>>> >>>
>>>> >>> Oh, I understood. I just don't know that I
agree. To do anything
>>>> with the
>>>> >>> tools will require some knowledge of bitcode
anyhow or need the
>>>> plugin. I'm
>>>> >>> saying that as a baseline start we should look at
how to do this
>>>> using the
>>>> >>> tools we've got rather than wrapping things
for no real gain.
>>>> >>
>>>> >>
>>>> >> That doesn't seem strictly true - the ar situation
(which I'm lead to
>>>> >> believe is in use in our build system & others,
one would assume).
>>>> With the
>>>> >> symbol table included as proposed, ar can be used
without any
>>>> knowledge of
>>>> >> the bitcode or need for a plugin.
>>>> >>
>>>> >
>>>> > For some bits, sure. Optimizing for ar seems a bit silly,
why not 'ld
>>>> -r'?
>>>>
>>>> But as mentioned, ld -r can work on native object wrapped
bitcode
>>>> without a plugin as well.
>>>>
>>>>
>>> How? It's not like any partial linking is going to go on inside
the
>>> bitcode if the linker doesn't understand bitcode.
>>>
>>
>> What do we want plugin to do anything here?  We just need the linker to
>> concatenate the bitcode sections and produce a combined bitcode file.
>>
>>
>>>
>>>
>>>> > Agreed. The ar situation is interesting because one thing
we
>>>> discussed after
>>>> > you wandered off was just adding a ToC section to bitcode
as it is
>>>> and then
>>>> > having the tools handle that. Would seem to accomplish at
least the
>>>> goals as
>>>> > I've seen them up to this point without worrying too
much.
>>>>
>>>> The ToC section is a way we can encode the function
index/summary into
>>>> bitcode, but won't help integrate with existing tools. The
main issue
>>>> we are trying to solve is integrating transparently with
existing
>>>> binutils tools in use in our build system and probably
elsewhere.
>>>>
>>>>
>>> Right. I'm not entirely sure what use we're going to see in
the existing
>>> tools that we want to encompass here. There's some of it for
convenience
>>> (i.e. nm etc for developers), but they can use a tool that
understands
>>> bitcode and we can make the existing llvm tools suffice for these
needs.
>>>
>>> I think the way of looking at this is that we can:
>>>
>>> a) go with wrapping things in native object formats, this means
>>>  - some tools continue to work at the cost of additional I/O and
space
>>> at compile/link time
>>>
>>
>> Are you sure about the additional I/O? With native symtab, existing
tools
>> just need to read those, while plugin based approach needs to read bit
code
>> section to feedback symbols to the tool.
>>
>>
>>>  - we still have to update some tools to work at all
>>>
>>
>> If any, it will be minimal.
>>
>>
>>>
>>> b) we extend those tools/our own tools and have them be drop in
>>> replacements to the existing tools. They'll understand the
bitcode format
>>> natively, they'll be smaller, and we'll be able to push the
state of the
>>> art in tooling/analysis a bit more in the future without having to
rework
>>> thin lto.
>>>
>>> It's basically a set of trade-offs and for llvm we've
historically gone
>>> the b direction.
>>>
>>>
>> I am fine making llvm tools work with it, but we should not
require/force
>> user using them. I think this is an orthogonal feature.
>>
>> David
>>
>>
>>
>>
>>> >
>>>> > At any rate, I think this aspect of the proposal needs a
bit of
>>>> discussion
>>>> > and some mapping out of the pros and cons here.
>>>>
>>>> Sure, we can continue to discuss and I will try to lay out the
>>>> pros/cons.
>>>>
>>>
>>> Excellent.
>>>
>>> -eric
>>>
>>>
>>>>
>>>> Teresa
>>>>
>>>> >
>>>> > -eric
>>>> >
>>>> >>>
>>>> >>> I've talked to Teresa a bit offline and
we're going to talk more
>>>> later
>>>> >>> (and discuss on the list), but there are some
discussions about how
>>>> to make
>>>> >>> this work either with just bitcode/llvm tools and
so not requiring
>>>> >>> integration on all platforms. The latter is what I
consider as
>>>> particularly
>>>> >>> friendly :)
>>>> >>>
>>>> >>> -eric
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> > I also
>>>> >>>> > can't imagine how it's necessary
for any of the lto aspects as
>>>> >>>> > currently
>>>> >>>> > written in the proposal.
>>>> >>>> >
>>>> >>>> > -eric
>>>> >>>> >
>>>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang
David Li
>>>> >>>> > <xinliangli at gmail.com>
>>>> >>>> > wrote:
>>>> >>>> >>
>>>> >>>> >> The design objective is to make
thinLTO mostly transparent to
>>>> binutil
>>>> >>>> >> tools to enable easy integration with
any build system in the
>>>> wild.
>>>> >>>> >> 'Pass-through' mode with
'ld -r' instead of the partial LTO
>>>> mode is
>>>> >>>> >> another
>>>> >>>> >> reason.
>>>> >>>> >>
>>>> >>>> >> David
>>>> >>>> >>
>>>> >>>> >> On Thu, May 14, 2015 at 7:30 AM,
Teresa Johnson
>>>> >>>> >> <tejohnson at google.com>
>>>> >>>> >> wrote:
>>>> >>>> >>>
>>>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM,
Eric Christopher
>>>> >>>> >>> <echristo at gmail.com>
>>>> >>>> >>> wrote:
>>>> >>>> >>> > So, what Alex is saying is
that we have these tools as well
>>>> and
>>>> >>>> >>> > they
>>>> >>>> >>> > understand bitcode just
fine, as well as every object format
>>>> - not
>>>> >>>> >>> > just
>>>> >>>> >>> > ELF.
>>>> >>>> >>> > :)
>>>> >>>> >>>
>>>> >>>> >>> Right, there are also LLVM
specific versions (llvm-ar,
>>>> llvm-nm) that
>>>> >>>> >>> handle bitcode similarly to the
way the standard tool + plugin
>>>> does.
>>>> >>>> >>> But the goal we are trying to
achieve is to allow the standard
>>>> >>>> >>> system
>>>> >>>> >>> versions of the tools to handle
these files without requiring a
>>>> >>>> >>> plugin. I know the LLVM tool
handles other object formats, but
>>>> I'm
>>>> >>>> >>> not
>>>> >>>> >>> sure how that helps here?
We're not planning to replace those
>>>> tools,
>>>> >>>> >>> just allow the standard system
versions to handle the
>>>> intermediate
>>>> >>>> >>> objects produced by ThinLTO.
>>>> >>>> >>>
>>>> >>>> >>> Thanks,
>>>> >>>> >>> Teresa
>>>> >>>> >>>
>>>> >>>> >>> >
>>>> >>>> >>> > -eric
>>>> >>>> >>> >
>>>> >>>> >>> >
>>>> >>>> >>> > On Thu, May 14, 2015, 6:55
AM Teresa Johnson
>>>> >>>> >>> > <tejohnson at
google.com>
>>>> >>>> >>> > wrote:
>>>> >>>> >>> >>
>>>> >>>> >>> >> On Wed, May 13, 2015 at
11:23 PM, Xinliang David Li
>>>> >>>> >>> >> <xinliangli at
gmail.com> wrote:
>>>> >>>> >>> >> >
>>>> >>>> >>> >> >
>>>> >>>> >>> >> > On Wed, May 13,
2015 at 10:46 PM, Alex Rosenberg
>>>> >>>> >>> >> > <alexr at
leftfield.org>
>>>> >>>> >>> >> > wrote:
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >>
"ELF-wrapped bitcode" seems potentially controversial to
>>>> me.
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >> What about ar,
nm, and various ld implementations adds
>>>> this
>>>> >>>> >>> >> >> requirement?
>>>> >>>> >>> >> >> What about the
LLVM implementations of these tools is
>>>> lacking?
>>>> >>>> >>> >> >
>>>> >>>> >>> >> >
>>>> >>>> >>> >> > Sorry I can not
parse your questions properly. Can you
>>>> make it
>>>> >>>> >>> >> > clearer?
>>>> >>>> >>> >>
>>>> >>>> >>> >> Alex is asking what the
issue is with ar, nm, ld -r and
>>>> regular
>>>> >>>> >>> >> bitcode that makes using
elf-wrapped bitcode easier.
>>>> >>>> >>> >>
>>>> >>>> >>> >> The issue is that
generally you need to provide a plugin to
>>>> these
>>>> >>>> >>> >> tools in order for them
to understand and handle bitcode
>>>> files.
>>>> >>>> >>> >> We'd
>>>> >>>> >>> >> like standard tools to
work without requiring a plugin as
>>>> much as
>>>> >>>> >>> >> possible. And in some
cases we want them to be handled
>>>> different
>>>> >>>> >>> >> than
>>>> >>>> >>> >> the way bitcode files
are handled with the plugin.
>>>> >>>> >>> >>
>>>> >>>> >>> >> nm: Without a plugin,
normal bitcode files are inscrutable.
>>>> When
>>>> >>>> >>> >> provided the gold plugin
it can emit the symbols.
>>>> >>>> >>> >>
>>>> >>>> >>> >> ar: Without a plugin, it
will create an archive of bitcode
>>>> files,
>>>> >>>> >>> >> but
>>>> >>>> >>> >> without an index, so it
can't be handled by the linker even
>>>> with
>>>> >>>> >>> >> a
>>>> >>>> >>> >> plugin on an -flto link.
When ar is provided the gold
>>>> plugin it
>>>> >>>> >>> >> does
>>>> >>>> >>> >> create an index, so the
linker + gold plugin handle it
>>>> >>>> >>> >> appropriately
>>>> >>>> >>> >> on an -flto link.
>>>> >>>> >>> >>
>>>> >>>> >>> >> ld -r: Without a plugin,
fails when provided bitcode
>>>> inputs. When
>>>> >>>> >>> >> provided the gold
plugin, it handles them but compiles them
>>>> all
>>>> >>>> >>> >> the
>>>> >>>> >>> >> way through to ELF
executable instructions via a partial LTO
>>>> >>>> >>> >> link.
>>>> >>>> >>> >> This is where we would
like to differ in behavior (while
>>>> also not
>>>> >>>> >>> >> requiring a plugin) with
ELF-wrapped bitcode: we would like
>>>> the
>>>> >>>> >>> >> ld -r
>>>> >>>> >>> >> output file to still
contain ELF-wrapped bitcode, delaying
>>>> the
>>>> >>>> >>> >> LTO
>>>> >>>> >>> >> until the full link
step.
>>>> >>>> >>> >>
>>>> >>>> >>> >> Let me know if that
helps address your concerns.
>>>> >>>> >>> >>
>>>> >>>> >>> >> Thanks,
>>>> >>>> >>> >> Teresa
>>>> >>>> >>> >>
>>>> >>>> >>> >> >
>>>> >>>> >>> >> > David
>>>> >>>> >>> >> >
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >> Alex
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >> > On May 13,
2015, at 7:44 PM, Teresa Johnson
>>>> >>>> >>> >> >> >
<tejohnson at google.com>
>>>> >>>> >>> >> >> > wrote:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > I've
included below an RFC for implementing ThinLTO in
>>>> LLVM,
>>>> >>>> >>> >> >> > looking
>>>> >>>> >>> >> >> > forward to
feedback and questions.
>>>> >>>> >>> >> >> > Thanks!
>>>> >>>> >>> >> >> > Teresa
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > RFC to
discuss plans for implementing ThinLTO upstream.
>>>> >>>> >>> >> >> > Background
>>>> >>>> >>> >> >> > can
>>>> >>>> >>> >> >> > be found
in slides from EuroLLVM 2015:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>>
https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>>>> )
>>>> >>>> >>> >> >> > As
described in the talk, we have a prototype
>>>> >>>> >>> >> >> >
implementation, and
>>>> >>>> >>> >> >> > would like
to start staging patches upstream. This RFC
>>>> >>>> >>> >> >> > describes
>>>> >>>> >>> >> >> > a
>>>> >>>> >>> >> >> > breakdown
of the major pieces. We would like to commit
>>>> >>>> >>> >> >> > upstream
>>>> >>>> >>> >> >> > gradually
in several stages, with all functionality
>>>> off by
>>>> >>>> >>> >> >> > default.
>>>> >>>> >>> >> >> > The core
ThinLTO importing support and tuning will
>>>> require
>>>> >>>> >>> >> >> > frequent
>>>> >>>> >>> >> >> > change and
iteration during testing and tuning, and
>>>> for that
>>>> >>>> >>> >> >> > part
>>>> >>>> >>> >> >> > we
>>>> >>>> >>> >> >> > would like
to commit rapidly (off by default). See the
>>>> >>>> >>> >> >> > proposed
>>>> >>>> >>> >> >> > staged
>>>> >>>> >>> >> >> >
implementation described in the Implementation Plan
>>>> section.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > ThinLTO
Overview
>>>> >>>> >>> >> >> >
=============>>>> >>>> >>> >> >>
>
>>>> >>>> >>> >> >> > See the
talk slides linked above for more details. The
>>>> >>>> >>> >> >> > following
>>>> >>>> >>> >> >> > is a
>>>> >>>> >>> >> >> > high-level
overview of the motivation.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Cross
Module Optimization (CMO) is an effective means
>>>> for
>>>> >>>> >>> >> >> > improving
>>>> >>>> >>> >> >> > runtime
performance, by extending the scope of
>>>> optimizations
>>>> >>>> >>> >> >> > across
>>>> >>>> >>> >> >> > source
module boundaries. Without CMO, the compiler is
>>>> >>>> >>> >> >> > limited to
>>>> >>>> >>> >> >> > optimizing
within the scope of single source modules.
>>>> Two
>>>> >>>> >>> >> >> > solutions
>>>> >>>> >>> >> >> > for
enabling CMO are Link-Time Optimization (LTO),
>>>> which is
>>>> >>>> >>> >> >> > currently
>>>> >>>> >>> >> >> > supported
in LLVM and GCC, and
>>>> Lightweight-Interprocedural
>>>> >>>> >>> >> >> >
Optimization (LIPO). However, each of these solutions
>>>> has
>>>> >>>> >>> >> >> >
limitations
>>>> >>>> >>> >> >> > that
prevent it from being enabled by default. ThinLTO
>>>> is a
>>>> >>>> >>> >> >> > new
>>>> >>>> >>> >> >> > approach
that attempts to address these limitations,
>>>> with a
>>>> >>>> >>> >> >> > goal
>>>> >>>> >>> >> >> > of
>>>> >>>> >>> >> >> > being
enabled more broadly. ThinLTO is designed with
>>>> many of
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > same
>>>> >>>> >>> >> >> > principals
as LIPO, and therefore its advantages,
>>>> without
>>>> >>>> >>> >> >> > any of
>>>> >>>> >>> >> >> > its
>>>> >>>> >>> >> >> > inherent
weakness. Unlike in LIPO where the module
>>>> group
>>>> >>>> >>> >> >> > decision
>>>> >>>> >>> >> >> > is
>>>> >>>> >>> >> >> > made at
profile training runtime, ThinLTO makes the
>>>> decision
>>>> >>>> >>> >> >> > at
>>>> >>>> >>> >> >> > compile
time, but in a lazy mode that facilitates large
>>>> >>>> >>> >> >> > scale
>>>> >>>> >>> >> >> >
parallelism. The serial linker plugin phase is
>>>> designed to
>>>> >>>> >>> >> >> > be
>>>> >>>> >>> >> >> > razor
>>>> >>>> >>> >> >> > thin and
blazingly fast. By default this step only does
>>>> >>>> >>> >> >> > minimal
>>>> >>>> >>> >> >> >
preparation work to enable the parallel lazy importing
>>>> >>>> >>> >> >> > performed
>>>> >>>> >>> >> >> > later.
ThinLTO aims to be scalable like a regular O2
>>>> build,
>>>> >>>> >>> >> >> > enabling
>>>> >>>> >>> >> >> > CMO on
machines without large memory configurations,
>>>> while
>>>> >>>> >>> >> >> > also
>>>> >>>> >>> >> >> >
integrating well with distributed build systems.
>>>> Results
>>>> >>>> >>> >> >> > from
>>>> >>>> >>> >> >> > early
>>>> >>>> >>> >> >> >
prototyping on SPEC cpu2006 C++ benchmarks are in line
>>>> with
>>>> >>>> >>> >> >> >
expectations that ThinLTO can scale like O2 while
>>>> enabling
>>>> >>>> >>> >> >> > much
>>>> >>>> >>> >> >> > of
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > CMO
performed during a full LTO build.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > A ThinLTO
build is divided into 3 phases, which are
>>>> referred
>>>> >>>> >>> >> >> > to
>>>> >>>> >>> >> >> > in
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > following
implementation plan:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > phase-1:
IR and Function Summary Generation (-c
>>>> compile)
>>>> >>>> >>> >> >> > phase-2:
Thin Linker Plugin Layer (thin archive linker
>>>> step)
>>>> >>>> >>> >> >> > phase-3:
Parallel Backend with Demand-Driven Importing
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
Implementation Plan
>>>> >>>> >>> >> >> >
===============>>>> >>>> >>> >> >>
>
>>>> >>>> >>> >> >> > This
section gives a high-level breakdown of the
>>>> ThinLTO
>>>> >>>> >>> >> >> > support
>>>> >>>> >>> >> >> > that
>>>> >>>> >>> >> >> > will be
added, in roughly the order that the patches
>>>> would
>>>> >>>> >>> >> >> > be
>>>> >>>> >>> >> >> > staged.
>>>> >>>> >>> >> >> > The
patches are divided into three stages. The first
>>>> stage
>>>> >>>> >>> >> >> > contains a
>>>> >>>> >>> >> >> > minimal
amount of preparation work that is not
>>>> >>>> >>> >> >> >
ThinLTO-specific.
>>>> >>>> >>> >> >> > The
>>>> >>>> >>> >> >> > second
stage contains most of the infrastructure for
>>>> >>>> >>> >> >> > ThinLTO,
>>>> >>>> >>> >> >> > which
>>>> >>>> >>> >> >> > will be
off by default. The third stage includes
>>>> >>>> >>> >> >> >
enhancements/improvements/tunings that can be performed
>>>> >>>> >>> >> >> > after the
>>>> >>>> >>> >> >> > main
>>>> >>>> >>> >> >> > ThinLTO
infrastructure is in.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The second
and third implementation stages will
>>>> initially be
>>>> >>>> >>> >> >> > very
>>>> >>>> >>> >> >> > volatile,
requiring a lot of iterations and tuning with
>>>> >>>> >>> >> >> > large
>>>> >>>> >>> >> >> > apps to
>>>> >>>> >>> >> >> > get
stabilized. Therefore it will be important to do
>>>> fast
>>>> >>>> >>> >> >> > commits
>>>> >>>> >>> >> >> > for
>>>> >>>> >>> >> >> > these
implementation stages.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > 1. Stage
1: Preparation
>>>> >>>> >>> >> >> >
-------------------------------
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The first
planned sets of patches are enablers for
>>>> ThinLTO
>>>> >>>> >>> >> >> > work:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > a. LTO
directory structure:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
Restructure the LTO directory to remove circular
>>>> dependence
>>>> >>>> >>> >> >> > when
>>>> >>>> >>> >> >> > ThinLTO
pass added. Because ThinLTO is being
>>>> implemented as
>>>> >>>> >>> >> >> > a SCC
>>>> >>>> >>> >> >> > pass
>>>> >>>> >>> >> >> > within
Transforms/IPO, and leverages the LTOModule
>>>> class for
>>>> >>>> >>> >> >> > linking
>>>> >>>> >>> >> >> > in
functions from modules, IPO then requires the LTO
>>>> >>>> >>> >> >> > library.
>>>> >>>> >>> >> >> > This
>>>> >>>> >>> >> >> > creates a
circular dependence between LTO and IPO. To
>>>> break
>>>> >>>> >>> >> >> > that,
>>>> >>>> >>> >> >> > we
>>>> >>>> >>> >> >> > need to
split the lib/LTO directory/library into
>>>> >>>> >>> >> >> >
lib/LTO/CodeGen
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> >
lib/LTO/Module, containing LTOCodeGenerator and
>>>> LTOModule,
>>>> >>>> >>> >> >> >
respectively. Only LTOCodeGenerator has a dependence
>>>> on IPO,
>>>> >>>> >>> >> >> > removing
>>>> >>>> >>> >> >> > the
circular dependence.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > b. ELF
wrapper generation support:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Implement
ELF wrapped bitcode writer. In order to more
>>>> >>>> >>> >> >> > easily
>>>> >>>> >>> >> >> > interact
>>>> >>>> >>> >> >> > with tools
such as $AR, $NM, and “$LD -r” we plan to
>>>> emit
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > phase-1
>>>> >>>> >>> >> >> > bitcode
wrapped in ELF via the .llvmbc section, along
>>>> with a
>>>> >>>> >>> >> >> > symbol
>>>> >>>> >>> >> >> > table. The
goal is both to interact with these tools
>>>> without
>>>> >>>> >>> >> >> > requiring
>>>> >>>> >>> >> >> > a plugin,
and also to avoid doing partial LTO/ThinLTO
>>>> across
>>>> >>>> >>> >> >> > files
>>>> >>>> >>> >> >> > linked
with “$LD -r” (i.e. the resulting object file
>>>> should
>>>> >>>> >>> >> >> > still
>>>> >>>> >>> >> >> > contain
ELF-wrapped bitcode to enable ThinLTO at the
>>>> full
>>>> >>>> >>> >> >> > link
>>>> >>>> >>> >> >> > step).
>>>> >>>> >>> >> >> > I will
send a separate design document for these
>>>> changes,
>>>> >>>> >>> >> >> > but the
>>>> >>>> >>> >> >> > following
is a high-level overview.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Support
was added to LLVM for reading ELF-wrapped
>>>> bitcode
>>>> >>>> >>> >> >> >
(http://reviews.llvm.org/rL218078), but there does
>>>> not yet
>>>> >>>> >>> >> >> > exist
>>>> >>>> >>> >> >> > support in
LLVM/Clang for emitting bitcode wrapped in
>>>> ELF. I
>>>> >>>> >>> >> >> > plan
>>>> >>>> >>> >> >> > to
>>>> >>>> >>> >> >> > add
support for optionally generating bitcode in an
>>>> ELF file
>>>> >>>> >>> >> >> > containing
a single .llvmbc section holding the
>>>> bitcode.
>>>> >>>> >>> >> >> >
Specifically,
>>>> >>>> >>> >> >> > the patch
would add new options “emit-llvm-bc-elf”
>>>> (object
>>>> >>>> >>> >> >> > file)
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> >
corresponding “emit-llvm-elf” (textual assembly code
>>>> >>>> >>> >> >> >
equivalent).
>>>> >>>> >>> >> >> > Eventually
these would be automatically triggered under
>>>> >>>> >>> >> >> > “-fthinlto
>>>> >>>> >>> >> >> > -c”
>>>> >>>> >>> >> >> > and
“-fthinlto -S”, respectively.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
Additionally, a symbol table will be generated in the
>>>> ELF
>>>> >>>> >>> >> >> > file,
>>>> >>>> >>> >> >> > holding
the function symbols within the bitcode. This
>>>> >>>> >>> >> >> >
facilitates
>>>> >>>> >>> >> >> > handling
archives of the ELF-wrapped bitcode created
>>>> with
>>>> >>>> >>> >> >> > $AR,
>>>> >>>> >>> >> >> > since
>>>> >>>> >>> >> >> > the
archive will have a symbol table as well. The
>>>> archive
>>>> >>>> >>> >> >> > symbol
>>>> >>>> >>> >> >> > table
>>>> >>>> >>> >> >> > enables
gold to extract and pass to the plugin the
>>>> >>>> >>> >> >> >
constituent
>>>> >>>> >>> >> >> >
ELF-wrapped bitcode files. To support the concatenated
>>>> >>>> >>> >> >> > llvmbc
>>>> >>>> >>> >> >> > section
>>>> >>>> >>> >> >> > generated
by “$LD -r”, some handling needs to be added
>>>> to
>>>> >>>> >>> >> >> > gold
>>>> >>>> >>> >> >> > and to
>>>> >>>> >>> >> >> > the
backend driver to process each original module’s
>>>> >>>> >>> >> >> > bitcode.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The
function index/summary will later be added as a
>>>> special
>>>> >>>> >>> >> >> > ELF
>>>> >>>> >>> >> >> > section
alongside the .llvmbc sections.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > 2. Stage
2: ThinLTO Infrastructure
>>>> >>>> >>> >> >> >
----------------------------------------------
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The next
set of patches adds the base implementation
>>>> of the
>>>> >>>> >>> >> >> > ThinLTO
>>>> >>>> >>> >> >> >
infrastructure, specifically those required to make
>>>> ThinLTO
>>>> >>>> >>> >> >> > functional
>>>> >>>> >>> >> >> > and
generate correct but not necessarily
>>>> high-performing
>>>> >>>> >>> >> >> > binaries.
It
>>>> >>>> >>> >> >> > also does
not include support to make debug support
>>>> under -g
>>>> >>>> >>> >> >> > efficient
>>>> >>>> >>> >> >> > with
ThinLTO.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > a.
Clang/LLVM/gold linker options:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > An early
set of clang/llvm patches is needed to provide
>>>> >>>> >>> >> >> > options
>>>> >>>> >>> >> >> > to
>>>> >>>> >>> >> >> > enable
ThinLTO (off by default), so that the rest of
>>>> the
>>>> >>>> >>> >> >> >
implementation can be disabled by default as it is
>>>> added.
>>>> >>>> >>> >> >> >
Specifically, clang options -fthinlto (used instead of
>>>> >>>> >>> >> >> > -flto)
>>>> >>>> >>> >> >> > will
>>>> >>>> >>> >> >> > cause
clang to invoke the phase-1 emission of LLVM
>>>> bitcode
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> > function
summary/index on a compile step, and pass the
>>>> >>>> >>> >> >> >
appropriate
>>>> >>>> >>> >> >> > option to
the gold plugin on a link step. The -thinlto
>>>> >>>> >>> >> >> > option
>>>> >>>> >>> >> >> > will be
>>>> >>>> >>> >> >> > added to
the gold plugin and llvm-lto tool to launch
>>>> the
>>>> >>>> >>> >> >> > phase-2
>>>> >>>> >>> >> >> > thin
>>>> >>>> >>> >> >> > archive
step. The -thinlto option will also be added
>>>> to the
>>>> >>>> >>> >> >> > ‘opt’
>>>> >>>> >>> >> >> > tool
>>>> >>>> >>> >> >> > to invoke
it as a phase-3 parallel backend instance.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > b.
Thin-archive linking support in Gold plugin and
>>>> llvm-lto:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Under the
new plugin option (see above), the plugin
>>>> needs to
>>>> >>>> >>> >> >> > perform
>>>> >>>> >>> >> >> > the
phase-2 (thin archive) link which simply emits a
>>>> >>>> >>> >> >> > combined
>>>> >>>> >>> >> >> > function
>>>> >>>> >>> >> >> > map from
the linked modules, without actually
>>>> performing the
>>>> >>>> >>> >> >> > normal
>>>> >>>> >>> >> >> > link.
Corresponding support should be added to the
>>>> >>>> >>> >> >> > standalone
>>>> >>>> >>> >> >> > llvm-lto
>>>> >>>> >>> >> >> > tool to
enable testing/debugging without involving the
>>>> >>>> >>> >> >> > linker and
>>>> >>>> >>> >> >> > plugin.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > c. ThinLTO
backend support:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Support
for invoking a phase-3 backend invocation
>>>> (including
>>>> >>>> >>> >> >> > importing)
on a module should be added to the ‘opt’
>>>> tool
>>>> >>>> >>> >> >> > under
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > new
>>>> >>>> >>> >> >> > option.
The main change under the option is to
>>>> instantiate a
>>>> >>>> >>> >> >> > Linker
>>>> >>>> >>> >> >> > object
used to manage the process of linking imported
>>>> >>>> >>> >> >> > functions
>>>> >>>> >>> >> >> > into
>>>> >>>> >>> >> >> > the
module, efficient read of the combined function
>>>> map, and
>>>> >>>> >>> >> >> > enable
>>>> >>>> >>> >> >> > the
ThinLTO import pass.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > d.
Function index/summary support:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > This
includes infrastructure for writing and reading
>>>> the
>>>> >>>> >>> >> >> > function
>>>> >>>> >>> >> >> >
index/summary section. As noted earlier this will be
>>>> encoded
>>>> >>>> >>> >> >> > in a
>>>> >>>> >>> >> >> > special
ELF section within the module, alongside the
>>>> .llvmbc
>>>> >>>> >>> >> >> > section
>>>> >>>> >>> >> >> > containing
the bitcode. The thin archive generated by
>>>> >>>> >>> >> >> > phase-2 of
>>>> >>>> >>> >> >> > ThinLTO
simply contains all of the function
>>>> index/summary
>>>> >>>> >>> >> >> > sections
>>>> >>>> >>> >> >> > across the
linked modules, organized for efficient
>>>> function
>>>> >>>> >>> >> >> > lookup.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Each
function available for importing from the module
>>>> >>>> >>> >> >> > contains
an
>>>> >>>> >>> >> >> > entry in
the module’s function index/summary section
>>>> and in
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > resulting
combined function map. Each function entry
>>>> >>>> >>> >> >> > contains
>>>> >>>> >>> >> >> > that
>>>> >>>> >>> >> >> > function’s
offset within the bitcode file, used to
>>>> >>>> >>> >> >> >
efficiently
>>>> >>>> >>> >> >> > locate
>>>> >>>> >>> >> >> > and
quickly import just that function. The entry also
>>>> >>>> >>> >> >> > contains
>>>> >>>> >>> >> >> > summary
>>>> >>>> >>> >> >> >
information (e.g. basic information determined during
>>>> >>>> >>> >> >> > parsing
>>>> >>>> >>> >> >> > such as
>>>> >>>> >>> >> >> > the number
of instructions in the function), that will
>>>> be
>>>> >>>> >>> >> >> > used to
>>>> >>>> >>> >> >> > help
>>>> >>>> >>> >> >> > guide
later import decisions. Because the contents of
>>>> this
>>>> >>>> >>> >> >> > section
>>>> >>>> >>> >> >> > will
change frequently during ThinLTO tuning, it
>>>> should also
>>>> >>>> >>> >> >> > be
>>>> >>>> >>> >> >> > marked
>>>> >>>> >>> >> >> > with a
version id for backwards compatibility or
>>>> version
>>>> >>>> >>> >> >> > checking.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > e. ThinLTO
importing support:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Support
for the mechanics of importing functions from
>>>> other
>>>> >>>> >>> >> >> > modules,
>>>> >>>> >>> >> >> > which can
go in gradually as a set of patches since it
>>>> will
>>>> >>>> >>> >> >> > be
>>>> >>>> >>> >> >> > off by
>>>> >>>> >>> >> >> > default.
Separate patches can include:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > -
BitcodeReader changes to use function index to
>>>> >>>> >>> >> >> >
import/deserialize
>>>> >>>> >>> >> >> > single
function of interest (small changes, leverages
>>>> >>>> >>> >> >> > existing
>>>> >>>> >>> >> >> > lazy
>>>> >>>> >>> >> >> > streamer
support).
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > - Minor
LTOModule changes to pass the ThinLTO function
>>>> to
>>>> >>>> >>> >> >> > import
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> > its index
into bitcode reader.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > - Marking
of imported functions (for use in
>>>> ThinLTO-specific
>>>> >>>> >>> >> >> > symbol
>>>> >>>> >>> >> >> > linking
and global DCE, for example). This can be
>>>> in-memory
>>>> >>>> >>> >> >> > initially,
>>>> >>>> >>> >> >> > but IR
support may be required in order to support
>>>> streaming
>>>> >>>> >>> >> >> > bitcode
>>>> >>>> >>> >> >> > out and
back in again after importing.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > -
ModuleLinker changes to do ThinLTO-specific symbol
>>>> linking
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> > static
promotion when necessary. The linkage type of
>>>> >>>> >>> >> >> > imported
>>>> >>>> >>> >> >> > functions
changes to AvailableExternallyLinkage, for
>>>> >>>> >>> >> >> > example.
>>>> >>>> >>> >> >> > Statics
>>>> >>>> >>> >> >> > must be
promoted in certain cases, and renamed in
>>>> consistent
>>>> >>>> >>> >> >> > ways.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > -
GlobalDCE changes to support removing imported
>>>> functions
>>>> >>>> >>> >> >> > that
>>>> >>>> >>> >> >> > were
>>>> >>>> >>> >> >> > not
inlined (very small changes to existing pass
>>>> logic).
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > f. ThinLTO
Import Driver SCC pass:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Adds
Transforms/IPO/ThinLTO.cpp with framework for
>>>> doing
>>>> >>>> >>> >> >> > ThinLTO
>>>> >>>> >>> >> >> > via
>>>> >>>> >>> >> >> > an SCC
pass, enabled only under -fthinlto options. The
>>>> pass
>>>> >>>> >>> >> >> > includes
>>>> >>>> >>> >> >> > utilizing
the thin archive (global function
>>>> index/summary),
>>>> >>>> >>> >> >> > import
>>>> >>>> >>> >> >> > decision
heuristics, invocation of
>>>> LTOModule/ModuleLinker
>>>> >>>> >>> >> >> > routines
>>>> >>>> >>> >> >> > that
perform the import, and any necessary callgraph
>>>> updates
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> >
verification.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > g. Backend
Driver:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > For a
single node build, the gold plugin can simply
>>>> write a
>>>> >>>> >>> >> >> > makefile
>>>> >>>> >>> >> >> > and fork
the parallel backend instances directly via
>>>> >>>> >>> >> >> > parallel
>>>> >>>> >>> >> >> > make.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > 3. Stage
3: ThinLTO Tuning and Enhancements
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>>
----------------------------------------------------------------
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > This
refers to the patches that are not required for
>>>> ThinLTO
>>>> >>>> >>> >> >> > to
>>>> >>>> >>> >> >> > work,
>>>> >>>> >>> >> >> > but rather
to improve compile time, memory, run-time
>>>> >>>> >>> >> >> >
performance
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> > usability.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > a. Lazy
Debug Metadata Linking:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The
prototype implementation included lazy importing of
>>>> >>>> >>> >> >> >
module-level
>>>> >>>> >>> >> >> > metadata
during the ThinLTO pass finalization (i.e.
>>>> after
>>>> >>>> >>> >> >> > all
>>>> >>>> >>> >> >> > function
>>>> >>>> >>> >> >> > importing
is complete). This actually applies to all
>>>> >>>> >>> >> >> >
module-level
>>>> >>>> >>> >> >> > metadata,
not just debug, although it is the largest.
>>>> This
>>>> >>>> >>> >> >> > can be
>>>> >>>> >>> >> >> > added as a
separate set of patches. Changes to
>>>> >>>> >>> >> >> >
BitcodeReader,
>>>> >>>> >>> >> >> >
ValueMapper, ModuleLinker
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > b. Import
Tuning:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > Tuning the
import strategy will be an iterative
>>>> process that
>>>> >>>> >>> >> >> > will
>>>> >>>> >>> >> >> > continue
to be refined over time. It involves several
>>>> >>>> >>> >> >> > different
>>>> >>>> >>> >> >> > types
>>>> >>>> >>> >> >> > of
changes: adding support for recording additional
>>>> metrics
>>>> >>>> >>> >> >> > in
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > function
summary, such as profile data and optional
>>>> >>>> >>> >> >> >
heavier-weight
>>>> >>>> >>> >> >> > IPA
>>>> >>>> >>> >> >> > analyses,
and tuning the import heuristics based on the
>>>> >>>> >>> >> >> > summary
>>>> >>>> >>> >> >> > and
>>>> >>>> >>> >> >> > callsite
context.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > c.
Combined Function Map Pruning:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > The
combined function map can be pruned of functions
>>>> that
>>>> >>>> >>> >> >> > are
>>>> >>>> >>> >> >> > unlikely
>>>> >>>> >>> >> >> > to benefit
from being imported. For example, during the
>>>> >>>> >>> >> >> > phase-2
>>>> >>>> >>> >> >> > thin
>>>> >>>> >>> >> >> > archive
plug step we can safely omit large and (with
>>>> profile
>>>> >>>> >>> >> >> > data)
>>>> >>>> >>> >> >> > cold
functions, which are unlikely to benefit from
>>>> being
>>>> >>>> >>> >> >> > inlined.
>>>> >>>> >>> >> >> >
Additionally, all but one copy of comdat functions can
>>>> be
>>>> >>>> >>> >> >> >
suppressed.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > d.
Distributed Build System Integration:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > For a
distributed build system, the gold plugin should
>>>> write
>>>> >>>> >>> >> >> > the
>>>> >>>> >>> >> >> > parallel
backend invocations into a makefile,
>>>> including the
>>>> >>>> >>> >> >> > mapping
>>>> >>>> >>> >> >> > from the
IR file to the real object file path, and
>>>> exit.
>>>> >>>> >>> >> >> > Additional
>>>> >>>> >>> >> >> > work needs
to be done in the distributed build system
>>>> itself
>>>> >>>> >>> >> >> > to
>>>> >>>> >>> >> >> > distribute
and dispatch the parallel backend jobs to
>>>> the
>>>> >>>> >>> >> >> > build
>>>> >>>> >>> >> >> > cluster.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > e.
Dependence Tracking and Incremental Compiles:
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > In order
to support build systems that stage from local
>>>> >>>> >>> >> >> > disks or
>>>> >>>> >>> >> >> > network
storage, the plugin will optionally support
>>>> >>>> >>> >> >> >
computation
>>>> >>>> >>> >> >> > of
>>>> >>>> >>> >> >> > dependent
sets of IR files that each module may import
>>>> from.
>>>> >>>> >>> >> >> > This
>>>> >>>> >>> >> >> > can
>>>> >>>> >>> >> >> > be
computed from profile data, if it exists, or from
>>>> the
>>>> >>>> >>> >> >> > symbol
>>>> >>>> >>> >> >> > table
>>>> >>>> >>> >> >> > and
heuristics if not. These dependence sets also
>>>> enable
>>>> >>>> >>> >> >> > support
>>>> >>>> >>> >> >> > for
>>>> >>>> >>> >> >> >
incremental backend compiles.
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> > --
>>>> >>>> >>> >> >> > Teresa
Johnson | Software Engineer |
>>>> tejohnson at google.com |
>>>> >>>> >>> >> >> >
408-460-2413
>>>> >>>> >>> >> >> >
>>>> >>>> >>> >> >> >
_______________________________________________
>>>> >>>> >>> >> >> > LLVM
Developers mailing list
>>>> >>>> >>> >> >> > LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> >>>> >>> >> >> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>> >>> >> >>
>>>> >>>> >>> >> >>
_______________________________________________
>>>> >>>> >>> >> >> LLVM Developers
mailing list
>>>> >>>> >>> >> >> LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> >>>> >>> >> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>> >>> >> >
>>>> >>>> >>> >> >
>>>> >>>> >>> >>
>>>> >>>> >>> >>
>>>> >>>> >>> >>
>>>> >>>> >>> >> --
>>>> >>>> >>> >> Teresa Johnson |
Software Engineer | tejohnson at google.com |
>>>> >>>> >>> >> 408-460-2413
>>>> >>>> >>> >>
>>>> >>>> >>> >>
_______________________________________________
>>>> >>>> >>> >> LLVM Developers mailing
list
>>>> >>>> >>> >> LLVMdev at cs.uiuc.edu  
http://llvm.cs.uiuc.edu
>>>> >>>> >>> >>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>> >>>
>>>> >>>> >>>
>>>> >>>> >>>
>>>> >>>> >>> --
>>>> >>>> >>> Teresa Johnson | Software
Engineer | tejohnson at google.com |
>>>> >>>> >>> 408-460-2413
>>>> >>>> >>
>>>> >>>> >>
>>>> >>>> >
>>>> >>>> >
_______________________________________________
>>>> >>>> > LLVM Developers mailing list
>>>> >>>> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>> >>>> >
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>> >
>>>> >>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> LLVM Developers mailing list
>>>> >>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >>>
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>> 408-460-2413
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/1842c0bc/attachment.html>

Xinliang David Li

2015-May-15 16:47 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote:
> > Are you sure about the additional I/O? With native symtab, existing
> tools just need to read those, while plugin based approach needs to read
> bit code section to feedback symbols to the tool.
>
> The additional I/O will be quite big if you are going to emit the full
> symbol table. Looking at some of our real world links the symbol table and
> string tables of all the inputs seen by the linker add up to about 50 -
> 100mb.
>(resent as the previous message got bounced)

There is no need for emitting the full symtab. I checked the overhead with
a huge internal C++ source. The overhead of symtab + str table compared
with byte code with debug is about 3%.

More importantly, there is plan to use the symtab also for thinLTO indexing
purpose, which makes the space usage completely 'unwasted'. That gets
into
the details which will follow when the patches are in (with design docs).

thanks,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/9837a62c/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - May 2015 - [LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

Possibly Parallel Threads