thr3ads.net - llvm dev - [LLVMdev] RFC: ThinLTO Impementation Plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2015-May-15 19:36 UTC

[LLVMdev] RFC: ThinLTO Impementation Plan

On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at
google.com> wrote:
> >
> >>> a. Lazy Debug Metadata Linking:
> >>>
> >>> The prototype implementation included lazy importing of
module-level
> >>> metadata during the ThinLTO pass finalization (i.e. after all
function
> >>> importing is complete). This actually applies to all
module-level
> >>> metadata, not just debug, although it is the largest. This can
be
> >>> added as a separate set of patches. Changes to BitcodeReader,
> >>> ValueMapper, ModuleLinker
> >>
> >> It sounds like this would work well with the "full" LTO
implemented
> >> by tools/gold-plugin right now.  What exactly did you do to
improve
> >> this?
> >
> > I don't think it will help with full LTO. The parsing of the
metadata
> > is only delayed until the ThinLTO pass finalization, and the delayed
> > metadata import is necessary to avoid reading and linking in the
> > metadata multiple times (for each function imported from that module).
> > Coming out of the ThinLTO pass you still have all the metadata
> > necessary for each function that was imported. For a full LTO that
> > would end up being all of the metadata in the module.
> >
> > The high level summary is that during the initial import it leaves the
> > temporary metadata on the instructions that were imported, but saves
> > the index used by the bitcode reader used to correlate with the
> > metadata when it is ready (i.e. the MDValuePtrs index), and skips the
> > metadata parsing. During finalization we parse just the metadata, and
> > suture it up during metadata value mapping using the saved index.
>
> AFAICT, the gold-plugin currently does similar work.  (Rafael knows
> this code better (I've only introduced bugs there), but IIRC he's
on
> vacation until next week.)  Even in "full" LTO.
>
> Have a look at `getModuleForFile()` and its calling loop inside
> `allSymbolsReadHook()`:
>
>  1. Load a single module, lazily.
>  2. Delete the bodies of unwanted functions (without ever loading them)
>     and fiddle with linkage as necessary.
 3. Link in the module.>  4. Delete the module.
>
> How is what you're proposing different?  Does it need to be different?
>
The difference is that for thinLTO, the lazy import happens in backend
compilation instead of running at linker plugin time. It is really lazy and
imports bare minimal depends on only references from imported functions.

Say module B has 1000 functions. Module A only imports 2 function foo1 and
foo2 from Module B transitively. The lazy reading will import necessary
module level data from B only to satisfy foo1 and foo2.

David

> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/eec65dfb/attachment.html>

Teresa Johnson

2015-May-15 20:06 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li
<xinliangli at gmail.com> wrote:>
>
> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>>
>>
>> > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at
google.com> wrote:
>> >
>> >>> a. Lazy Debug Metadata Linking:
>> >>>
>> >>> The prototype implementation included lazy importing of
module-level
>> >>> metadata during the ThinLTO pass finalization (i.e. after
all function
>> >>> importing is complete). This actually applies to all
module-level
>> >>> metadata, not just debug, although it is the largest. This
can be
>> >>> added as a separate set of patches. Changes to
BitcodeReader,
>> >>> ValueMapper, ModuleLinker
>> >>
>> >> It sounds like this would work well with the "full"
LTO implemented
>> >> by tools/gold-plugin right now.  What exactly did you do to
improve
>> >> this?
>> >
>> > I don't think it will help with full LTO. The parsing of the
metadata
>> > is only delayed until the ThinLTO pass finalization, and the
delayed
>> > metadata import is necessary to avoid reading and linking in the
>> > metadata multiple times (for each function imported from that
module).
>> > Coming out of the ThinLTO pass you still have all the metadata
>> > necessary for each function that was imported. For a full LTO that
>> > would end up being all of the metadata in the module.
>> >
>> > The high level summary is that during the initial import it leaves
the
>> > temporary metadata on the instructions that were imported, but
saves
>> > the index used by the bitcode reader used to correlate with the
>> > metadata when it is ready (i.e. the MDValuePtrs index), and skips
the
>> > metadata parsing. During finalization we parse just the metadata,
and
>> > suture it up during metadata value mapping using the saved index.
>>
>> AFAICT, the gold-plugin currently does similar work.  (Rafael knows
>> this code better (I've only introduced bugs there), but IIRC
he's on
>> vacation until next week.)  Even in "full" LTO.
>>
>> Have a look at `getModuleForFile()` and its calling loop inside
>> `allSymbolsReadHook()`:
>>
>>  1. Load a single module, lazily.
>>  2. Delete the bodies of unwanted functions (without ever loading them)
>>     and fiddle with linkage as necessary.
>>
>>  3. Link in the module.
>>  4. Delete the module.
>>
>> How is what you're proposing different?  Does it need to be
different?
>
>
> The difference is that for thinLTO, the lazy import happens in backend
> compilation instead of running at linker plugin time. It is really lazy and
> imports bare minimal depends on only references from imported functions.
>
> Say module B has 1000 functions. Module A only imports 2 function foo1 and
> foo2 from Module B transitively. The lazy reading will import necessary
> module level data from B only to satisfy foo1 and foo2.
Right, so there are a couple of differences. As David mentions, we do
this later and also in an iterative fashion. I.e. it may import
function foo1 from module B, then in function foo1 see hot calls to
functions bar1 in module C and bar2 in module D, importing each of
those in turn, then later decide to import function foo2 from module
B. So the BitcodeReader is created and opens a module multiple times.
Each time we import we create a BitcodeReader, read in what we need,
create a Module (which contains exactly 1 function), and invoke
linkInModule, delete the module.

This is in contrast to gold/LTO and the lazy bitcode reader which does
instantiate the BitcodeReader once per module, and so it can defer
parsing the metadata until the first function is materialized (at
which point it parses and materializes all of the metadata). We don't
want to parse/materialize the metadata until we have completed all
imports from the module.

The alternative if we wanted to leverage the existing lazy metadata
parsing support would be to keep the BitcodeReader instantiated for
each module and not close it until ThinLTO importing finalization. So
we would have a number of BitcodeReaders active at once (one per
module we import from). I need to think more about whether this is
feasible and investigate if this adds a lot of overhead.

The other thing that David alluded to above, that I should call out
more explicitly in my RFC, is that we only link in the metadata
transitively reached by the imported functions. This is handled during
metadata value mapping.

I think I will update the RFC based on some of the discussion so far
and will send out an updated version early next week.

Thanks,
Teresa
>
> David
>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

Duncan P. N. Exon Smith

2015-May-15 21:45 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

> On 2015-May-15, at 13:06, Teresa Johnson <tejohnson at google.com>
wrote:
> 
> On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li
> <xinliangli at gmail.com> wrote:
>> 
>> 
>> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>> 
>>> 
>>>> On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at
google.com> wrote:
>>>> 
>>>>>> a. Lazy Debug Metadata Linking:
>>>>>> 
>>>>>> The prototype implementation included lazy importing of
module-level
>>>>>> metadata during the ThinLTO pass finalization (i.e.
after all function
>>>>>> importing is complete). This actually applies to all
module-level
>>>>>> metadata, not just debug, although it is the largest.
This can be
>>>>>> added as a separate set of patches. Changes to
BitcodeReader,
>>>>>> ValueMapper, ModuleLinker
>>>>> 
>>>>> It sounds like this would work well with the
"full" LTO implemented
>>>>> by tools/gold-plugin right now.  What exactly did you do to
improve
>>>>> this?
>>>> 
>>>> I don't think it will help with full LTO. The parsing of
the metadata
>>>> is only delayed until the ThinLTO pass finalization, and the
delayed
>>>> metadata import is necessary to avoid reading and linking in
the
>>>> metadata multiple times (for each function imported from that
module).
>>>> Coming out of the ThinLTO pass you still have all the metadata
>>>> necessary for each function that was imported. For a full LTO
that
>>>> would end up being all of the metadata in the module.
>>>> 
>>>> The high level summary is that during the initial import it
leaves the
>>>> temporary metadata on the instructions that were imported, but
saves
>>>> the index used by the bitcode reader used to correlate with the
>>>> metadata when it is ready (i.e. the MDValuePtrs index), and
skips the
>>>> metadata parsing. During finalization we parse just the
metadata, and
>>>> suture it up during metadata value mapping using the saved
index.
>>> 
>>> AFAICT, the gold-plugin currently does similar work.  (Rafael knows
>>> this code better (I've only introduced bugs there), but IIRC
he's on
>>> vacation until next week.)  Even in "full" LTO.
>>> 
>>> Have a look at `getModuleForFile()` and its calling loop inside
>>> `allSymbolsReadHook()`:
>>> 
>>> 1. Load a single module, lazily.
>>> 2. Delete the bodies of unwanted functions (without ever loading
them)
>>>    and fiddle with linkage as necessary.
>>> 
>>> 3. Link in the module.
>>> 4. Delete the module.
>>> 
>>> How is what you're proposing different?  Does it need to be
different?
>> 
>> 
>> The difference is that for thinLTO, the lazy import happens in backend
>> compilation instead of running at linker plugin time. It is really lazy
and
>> imports bare minimal depends on only references from imported
functions.
>> 
>> Say module B has 1000 functions. Module A only imports 2 function foo1
and
>> foo2 from Module B transitively. The lazy reading will import necessary
>> module level data from B only to satisfy foo1 and foo2.
> 
> Right, so there are a couple of differences. As David mentions, we do
> this later and also in an iterative fashion. I.e. it may import
> function foo1 from module B, then in function foo1 see hot calls to
> functions bar1 in module C and bar2 in module D,
How do you know they're hot?  Are you selectively reading the profile
metadata?

> importing each of
> those in turn, then later decide to import function foo2 from module
> B. So the BitcodeReader is created and opens a module multiple times.
> Each time we import we create a BitcodeReader, read in what we need,
> create a Module (which contains exactly 1 function), and invoke
> linkInModule, delete the module.
> 
> This is in contrast to gold/LTO and the lazy bitcode reader which does
> instantiate the BitcodeReader once per module, and so it can defer
> parsing the metadata until the first function is materialized (at
> which point it parses and materializes all of the metadata). We don't
> want to parse/materialize the metadata until we have completed all
> imports from the module.
> 
> The alternative if we wanted to leverage the existing lazy metadata
> parsing support would be to keep the BitcodeReader instantiated for
> each module and not close it until ThinLTO importing finalization. So
> we would have a number of BitcodeReaders active at once (one per
> module we import from). I need to think more about whether this is
> feasible and investigate if this adds a lot of overhead.
> 
> The other thing that David alluded to above, that I should call out
> more explicitly in my RFC, is that we only link in the metadata
> transitively reached by the imported functions. This is handled during
> metadata value mapping.
> 
> I think I will update the RFC based on some of the discussion so far
> and will send out an updated version early next week.
> 
> Thanks,
> Teresa
> 
>> 
>> David
>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

Teresa Johnson

2015-May-16 06:07 UTC

head link

[LLVMdev] RFC: ThinLTO Impementation Plan

On Fri, May 15, 2015 at 1:06 PM, Teresa Johnson <tejohnson at google.com>
wrote:> On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li
> <xinliangli at gmail.com> wrote:
>>
>>
>> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>>
>>>
>>> > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at
google.com> wrote:
>>> >
>>> >>> a. Lazy Debug Metadata Linking:
>>> >>>
>>> >>> The prototype implementation included lazy importing
of module-level
>>> >>> metadata during the ThinLTO pass finalization (i.e.
after all function
>>> >>> importing is complete). This actually applies to all
module-level
>>> >>> metadata, not just debug, although it is the largest.
This can be
>>> >>> added as a separate set of patches. Changes to
BitcodeReader,
>>> >>> ValueMapper, ModuleLinker
>>> >>
>>> >> It sounds like this would work well with the
"full" LTO implemented
>>> >> by tools/gold-plugin right now.  What exactly did you do
to improve
>>> >> this?
>>> >
>>> > I don't think it will help with full LTO. The parsing of
the metadata
>>> > is only delayed until the ThinLTO pass finalization, and the
delayed
>>> > metadata import is necessary to avoid reading and linking in
the
>>> > metadata multiple times (for each function imported from that
module).
>>> > Coming out of the ThinLTO pass you still have all the metadata
>>> > necessary for each function that was imported. For a full LTO
that
>>> > would end up being all of the metadata in the module.
>>> >
>>> > The high level summary is that during the initial import it
leaves the
>>> > temporary metadata on the instructions that were imported, but
saves
>>> > the index used by the bitcode reader used to correlate with
the
>>> > metadata when it is ready (i.e. the MDValuePtrs index), and
skips the
>>> > metadata parsing. During finalization we parse just the
metadata, and
>>> > suture it up during metadata value mapping using the saved
index.
>>>
>>> AFAICT, the gold-plugin currently does similar work.  (Rafael knows
>>> this code better (I've only introduced bugs there), but IIRC
he's on
>>> vacation until next week.)  Even in "full" LTO.
>>>
>>> Have a look at `getModuleForFile()` and its calling loop inside
>>> `allSymbolsReadHook()`:
>>>
>>>  1. Load a single module, lazily.
>>>  2. Delete the bodies of unwanted functions (without ever loading
them)
>>>     and fiddle with linkage as necessary.
>>>
>>>  3. Link in the module.
>>>  4. Delete the module.
>>>
>>> How is what you're proposing different?  Does it need to be
different?
>>
>>
>> The difference is that for thinLTO, the lazy import happens in backend
>> compilation instead of running at linker plugin time. It is really lazy
and
>> imports bare minimal depends on only references from imported
functions.
>>
>> Say module B has 1000 functions. Module A only imports 2 function foo1
and
>> foo2 from Module B transitively. The lazy reading will import necessary
>> module level data from B only to satisfy foo1 and foo2.
>
> Right, so there are a couple of differences. As David mentions, we do
> this later and also in an iterative fashion. I.e. it may import
> function foo1 from module B, then in function foo1 see hot calls to
> functions bar1 in module C and bar2 in module D, importing each of
> those in turn, then later decide to import function foo2 from module
> B. So the BitcodeReader is created and opens a module multiple times.
> Each time we import we create a BitcodeReader, read in what we need,
> create a Module (which contains exactly 1 function), and invoke
> linkInModule, delete the module.
>
> This is in contrast to gold/LTO and the lazy bitcode reader which does
> instantiate the BitcodeReader once per module, and so it can defer
> parsing the metadata until the first function is materialized (at
> which point it parses and materializes all of the metadata). We don't
> want to parse/materialize the metadata until we have completed all
> imports from the module.
>
> The alternative if we wanted to leverage the existing lazy metadata
> parsing support would be to keep the BitcodeReader instantiated for
> each module and not close it until ThinLTO importing finalization. So
> we would have a number of BitcodeReaders active at once (one per
> module we import from). I need to think more about whether this is
> feasible and investigate if this adds a lot of overhead.
I like the idea of reusing the lazy metadata support, but after
thinking some more there are some additional issues that need to be
thought about.

We also need to keep around the source Module created during bitcode
reading for each imported module. And in the prototype as I import
functions during the ThinLTO SCC pass I update the callgraph with the
imported functions for later processing. To do that I believe I need
to link them into the combined module, which I currently do via
linkInModule (where only one function has a body to link, and I skip
metadata since that is handled later). To do this with a single
bitcode read over each module, where I am keeping a BitcodeReader and
source Module instantiated for each module being imported from, means
that the linking from different source modules will be interleaved.
I'm not sure how much restructuring of the linking will need to be
done to get this to work.

Teresa
>
> The other thing that David alluded to above, that I should call out
> more explicitly in my RFC, is that we only link in the metadata
> transitively reached by the imported functions. This is handled during
> metadata value mapping.
>
> I think I will update the RFC based on some of the discussion so far
> and will send out an updated version early next week.
>
> Thanks,
> Teresa
>
>>
>> David
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413

llvm dev - May 2015 - [LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan

[LLVMdev] RFC: ThinLTO Impementation Plan