thr3ads.net - llvm dev - [LLVMdev] RFC: ThinLTO Impementation Plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2015-May-15 16:47 UTC

[LLVMdev] RFC: ThinLTO Impementation Plan

On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com> wrote:
> > Are you sure about the additional I/O? With native symtab, existing
> tools just need to read those, while plugin based approach needs to read
> bit code section to feedback symbols to the tool.
>
> The additional I/O will be quite big if you are going to emit the full
> symbol table. Looking at some of our real world links the symbol table and
> string tables of all the inputs seen by the linker add up to about 50 -
> 100mb.
>(resent as the previous message got bounced)

There is no need for emitting the full symtab. I checked the overhead with
a huge internal C++ source. The overhead of symtab + str table compared
with byte code with debug is about 3%.

More importantly, there is plan to use the symtab also for thinLTO indexing
purpose, which makes the space usage completely 'unwasted'. That gets
into
the details which will follow when the patches are in (with design docs).

thanks,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/9837a62c/attachment.html>

Eric Christopher

2015-May-17 21:29 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

Hi Teresa, David,

I've got a proposed solution to the object/bitcode discussion below, please
take a look - it's a bit light on details because I'm designing an API
without quite enough information, but based on our discussions I think this
is the best way forward.

On Fri, May 15, 2015 at 9:47 AM Xinliang David Li <xinliangli at
gmail.com>
wrote:
> On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at gmail.com>
wrote:
>
>> > Are you sure about the additional I/O? With native symtab,
existing
>> tools just need to read those, while plugin based approach needs to
read
>> bit code section to feedback symbols to the tool.
>>
>> The additional I/O will be quite big if you are going to emit the full
>> symbol table. Looking at some of our real world links the symbol table
and
>> string tables of all the inputs seen by the linker add up to about 50 -
>> 100mb.
>>
> (resent as the previous message got bounced)
>
> There is no need for emitting the full symtab. I checked the overhead with
> a huge internal C++ source. The overhead of symtab + str table compared
> with byte code with debug is about 3%.
>
>(We should take this up later, not sure this is an apples to apples
comparison if you're including debug info, but it's orthogonal to this
email)

> More importantly, there is plan to use the symtab also for thinLTO
> indexing purpose, which makes the space usage completely
'unwasted'. That
> gets into the details which will follow when the patches are in (with
> design docs).
>
Well, let's come up with a design that makes sense before we put patches in
:)

That said, I have what's probably a compromise, but also what I think might
be a good API design for this and avoids the question of container format
completely.

I think what we're going to want is to solve this (as with about half of
all problems in CS) with another layer of indirection. Let's get an API
proposed that'll be a way to get the data in an abstract way outside of
whatever container format we happen to be speaking about and then vending
that data through to the thinlto functionality, or any process that wants
to grab information across all of the inputs. Then, underneath that, we can
use the existing object/module layer to crack open the container and read
the information we want.

This way we'll be able to abstract away any issues and still offer
what's
likely going to be a stable enough API so that we can experiment with the
underlying formats without too much worrying. Not a Stable(tm) format, this
is still too early, but stable enough.

So we'll want a simple wrapper class with a few functions to encompass what
we want (pointers to functions, symbol information, etc) with ties to the
Object library (for native object files) and IR (for Module based). Make
sense? I think it'd make sense to have a single instance wrap a single
input for ease of iterating over them, but I'm definitely open to other
ideas.

Thoughts?

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150517/093d6318/attachment.html>

Xinliang David Li

2015-May-18 04:14 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

Eric, I don't think anybody would disagree to define APIs to hide details
of container format, which can be taken for granted.

Such APIs are mostly internally used in thinLTO implementation. Having
'stable' APIs is also not the right goal, at least initially because it
serves no real purpose. We expect them to to evolve and get refined in many
iterations of testing (with very large apps).  Someday when there is a need
to make the interfaces public (to be used by other clients), we can start
to talk about stable APIs.

thanks,

David

On Sun, May 17, 2015 at 2:29 PM, Eric Christopher <echristo at gmail.com>
wrote:
> Hi Teresa, David,
>
> I've got a proposed solution to the object/bitcode discussion below,
> please take a look - it's a bit light on details because I'm
designing an
> API without quite enough information, but based on our discussions I think
> this is the best way forward.
>
> On Fri, May 15, 2015 at 9:47 AM Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>
>> On Fri, May 15, 2015 at 5:11 AM, Dave Bozier <seifsta at
gmail.com> wrote:
>>
>>> > Are you sure about the additional I/O? With native symtab,
existing
>>> tools just need to read those, while plugin based approach needs to
read
>>> bit code section to feedback symbols to the tool.
>>>
>>> The additional I/O will be quite big if you are going to emit the
full
>>> symbol table. Looking at some of our real world links the symbol
table and
>>> string tables of all the inputs seen by the linker add up to about
50 -
>>> 100mb.
>>>
>> (resent as the previous message got bounced)
>>
>> There is no need for emitting the full symtab. I checked the overhead
>> with a huge internal C++ source. The overhead of symtab + str table
>> compared with byte code with debug is about 3%.
>>
>>
> (We should take this up later, not sure this is an apples to apples
> comparison if you're including debug info, but it's orthogonal to
this
> email)
>
>
>> More importantly, there is plan to use the symtab also for thinLTO
>> indexing purpose, which makes the space usage completely
'unwasted'. That
>> gets into the details which will follow when the patches are in (with
>> design docs).
>>
>
> Well, let's come up with a design that makes sense before we put
patches
> in :)
>
> That said, I have what's probably a compromise, but also what I think
> might be a good API design for this and avoids the question of container
> format completely.
>
> I think what we're going to want is to solve this (as with about half
of
> all problems in CS) with another layer of indirection. Let's get an API
> proposed that'll be a way to get the data in an abstract way outside of
> whatever container format we happen to be speaking about and then vending
> that data through to the thinlto functionality, or any process that wants
> to grab information across all of the inputs. Then, underneath that, we can
> use the existing object/module layer to crack open the container and read
> the information we want.
>
> This way we'll be able to abstract away any issues and still offer
what's
> likely going to be a stable enough API so that we can experiment with the
> underlying formats without too much worrying. Not a Stable(tm) format, this
> is still too early, but stable enough.
>
> So we'll want a simple wrapper class with a few functions to encompass
> what we want (pointers to functions, symbol information, etc) with ties to
> the Object library (for native object files) and IR (for Module based).
> Make sense? I think it'd make sense to have a single instance wrap a
single
> input for ease of iterating over them, but I'm definitely open to other
> ideas.
>
> Thoughts?
>
> -eric
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150517/121919cc/attachment.html>

llvm dev - May 2015 - [LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan