On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at google.com> wrote: > > > >>> a. Lazy Debug Metadata Linking: > >>> > >>> The prototype implementation included lazy importing of module-level > >>> metadata during the ThinLTO pass finalization (i.e. after all function > >>> importing is complete). This actually applies to all module-level > >>> metadata, not just debug, although it is the largest. This can be > >>> added as a separate set of patches. Changes to BitcodeReader, > >>> ValueMapper, ModuleLinker > >> > >> It sounds like this would work well with the "full" LTO implemented > >> by tools/gold-plugin right now. What exactly did you do to improve > >> this? > > > > I don't think it will help with full LTO. The parsing of the metadata > > is only delayed until the ThinLTO pass finalization, and the delayed > > metadata import is necessary to avoid reading and linking in the > > metadata multiple times (for each function imported from that module). > > Coming out of the ThinLTO pass you still have all the metadata > > necessary for each function that was imported. For a full LTO that > > would end up being all of the metadata in the module. > > > > The high level summary is that during the initial import it leaves the > > temporary metadata on the instructions that were imported, but saves > > the index used by the bitcode reader used to correlate with the > > metadata when it is ready (i.e. the MDValuePtrs index), and skips the > > metadata parsing. During finalization we parse just the metadata, and > > suture it up during metadata value mapping using the saved index. > > AFAICT, the gold-plugin currently does similar work. (Rafael knows > this code better (I've only introduced bugs there), but IIRC he's on > vacation until next week.) Even in "full" LTO. > > Have a look at `getModuleForFile()` and its calling loop inside > `allSymbolsReadHook()`: > > 1. Load a single module, lazily. > 2. Delete the bodies of unwanted functions (without ever loading them) > and fiddle with linkage as necessary.3. Link in the module.> 4. Delete the module. > > How is what you're proposing different? Does it need to be different? >The difference is that for thinLTO, the lazy import happens in backend compilation instead of running at linker plugin time. It is really lazy and imports bare minimal depends on only references from imported functions. Say module B has 1000 functions. Module A only imports 2 function foo1 and foo2 from Module B transitively. The lazy reading will import necessary module level data from B only to satisfy foo1 and foo2. David> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150515/eec65dfb/attachment.html>
On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> > > On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith > <dexonsmith at apple.com> wrote: >> >> >> > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at google.com> wrote: >> > >> >>> a. Lazy Debug Metadata Linking: >> >>> >> >>> The prototype implementation included lazy importing of module-level >> >>> metadata during the ThinLTO pass finalization (i.e. after all function >> >>> importing is complete). This actually applies to all module-level >> >>> metadata, not just debug, although it is the largest. This can be >> >>> added as a separate set of patches. Changes to BitcodeReader, >> >>> ValueMapper, ModuleLinker >> >> >> >> It sounds like this would work well with the "full" LTO implemented >> >> by tools/gold-plugin right now. What exactly did you do to improve >> >> this? >> > >> > I don't think it will help with full LTO. The parsing of the metadata >> > is only delayed until the ThinLTO pass finalization, and the delayed >> > metadata import is necessary to avoid reading and linking in the >> > metadata multiple times (for each function imported from that module). >> > Coming out of the ThinLTO pass you still have all the metadata >> > necessary for each function that was imported. For a full LTO that >> > would end up being all of the metadata in the module. >> > >> > The high level summary is that during the initial import it leaves the >> > temporary metadata on the instructions that were imported, but saves >> > the index used by the bitcode reader used to correlate with the >> > metadata when it is ready (i.e. the MDValuePtrs index), and skips the >> > metadata parsing. During finalization we parse just the metadata, and >> > suture it up during metadata value mapping using the saved index. >> >> AFAICT, the gold-plugin currently does similar work. (Rafael knows >> this code better (I've only introduced bugs there), but IIRC he's on >> vacation until next week.) Even in "full" LTO. >> >> Have a look at `getModuleForFile()` and its calling loop inside >> `allSymbolsReadHook()`: >> >> 1. Load a single module, lazily. >> 2. Delete the bodies of unwanted functions (without ever loading them) >> and fiddle with linkage as necessary. >> >> 3. Link in the module. >> 4. Delete the module. >> >> How is what you're proposing different? Does it need to be different? > > > The difference is that for thinLTO, the lazy import happens in backend > compilation instead of running at linker plugin time. It is really lazy and > imports bare minimal depends on only references from imported functions. > > Say module B has 1000 functions. Module A only imports 2 function foo1 and > foo2 from Module B transitively. The lazy reading will import necessary > module level data from B only to satisfy foo1 and foo2.Right, so there are a couple of differences. As David mentions, we do this later and also in an iterative fashion. I.e. it may import function foo1 from module B, then in function foo1 see hot calls to functions bar1 in module C and bar2 in module D, importing each of those in turn, then later decide to import function foo2 from module B. So the BitcodeReader is created and opens a module multiple times. Each time we import we create a BitcodeReader, read in what we need, create a Module (which contains exactly 1 function), and invoke linkInModule, delete the module. This is in contrast to gold/LTO and the lazy bitcode reader which does instantiate the BitcodeReader once per module, and so it can defer parsing the metadata until the first function is materialized (at which point it parses and materializes all of the metadata). We don't want to parse/materialize the metadata until we have completed all imports from the module. The alternative if we wanted to leverage the existing lazy metadata parsing support would be to keep the BitcodeReader instantiated for each module and not close it until ThinLTO importing finalization. So we would have a number of BitcodeReaders active at once (one per module we import from). I need to think more about whether this is feasible and investigate if this adds a lot of overhead. The other thing that David alluded to above, that I should call out more explicitly in my RFC, is that we only link in the metadata transitively reached by the imported functions. This is handled during metadata value mapping. I think I will update the RFC based on some of the discussion so far and will send out an updated version early next week. Thanks, Teresa> > David > >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
> On 2015-May-15, at 13:06, Teresa Johnson <tejohnson at google.com> wrote: > > On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li > <xinliangli at gmail.com> wrote: >> >> >> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith >> <dexonsmith at apple.com> wrote: >>> >>> >>>> On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at google.com> wrote: >>>> >>>>>> a. Lazy Debug Metadata Linking: >>>>>> >>>>>> The prototype implementation included lazy importing of module-level >>>>>> metadata during the ThinLTO pass finalization (i.e. after all function >>>>>> importing is complete). This actually applies to all module-level >>>>>> metadata, not just debug, although it is the largest. This can be >>>>>> added as a separate set of patches. Changes to BitcodeReader, >>>>>> ValueMapper, ModuleLinker >>>>> >>>>> It sounds like this would work well with the "full" LTO implemented >>>>> by tools/gold-plugin right now. What exactly did you do to improve >>>>> this? >>>> >>>> I don't think it will help with full LTO. The parsing of the metadata >>>> is only delayed until the ThinLTO pass finalization, and the delayed >>>> metadata import is necessary to avoid reading and linking in the >>>> metadata multiple times (for each function imported from that module). >>>> Coming out of the ThinLTO pass you still have all the metadata >>>> necessary for each function that was imported. For a full LTO that >>>> would end up being all of the metadata in the module. >>>> >>>> The high level summary is that during the initial import it leaves the >>>> temporary metadata on the instructions that were imported, but saves >>>> the index used by the bitcode reader used to correlate with the >>>> metadata when it is ready (i.e. the MDValuePtrs index), and skips the >>>> metadata parsing. During finalization we parse just the metadata, and >>>> suture it up during metadata value mapping using the saved index. >>> >>> AFAICT, the gold-plugin currently does similar work. (Rafael knows >>> this code better (I've only introduced bugs there), but IIRC he's on >>> vacation until next week.) Even in "full" LTO. >>> >>> Have a look at `getModuleForFile()` and its calling loop inside >>> `allSymbolsReadHook()`: >>> >>> 1. Load a single module, lazily. >>> 2. Delete the bodies of unwanted functions (without ever loading them) >>> and fiddle with linkage as necessary. >>> >>> 3. Link in the module. >>> 4. Delete the module. >>> >>> How is what you're proposing different? Does it need to be different? >> >> >> The difference is that for thinLTO, the lazy import happens in backend >> compilation instead of running at linker plugin time. It is really lazy and >> imports bare minimal depends on only references from imported functions. >> >> Say module B has 1000 functions. Module A only imports 2 function foo1 and >> foo2 from Module B transitively. The lazy reading will import necessary >> module level data from B only to satisfy foo1 and foo2. > > Right, so there are a couple of differences. As David mentions, we do > this later and also in an iterative fashion. I.e. it may import > function foo1 from module B, then in function foo1 see hot calls to > functions bar1 in module C and bar2 in module D,How do you know they're hot? Are you selectively reading the profile metadata?> importing each of > those in turn, then later decide to import function foo2 from module > B. So the BitcodeReader is created and opens a module multiple times. > Each time we import we create a BitcodeReader, read in what we need, > create a Module (which contains exactly 1 function), and invoke > linkInModule, delete the module. > > This is in contrast to gold/LTO and the lazy bitcode reader which does > instantiate the BitcodeReader once per module, and so it can defer > parsing the metadata until the first function is materialized (at > which point it parses and materializes all of the metadata). We don't > want to parse/materialize the metadata until we have completed all > imports from the module. > > The alternative if we wanted to leverage the existing lazy metadata > parsing support would be to keep the BitcodeReader instantiated for > each module and not close it until ThinLTO importing finalization. So > we would have a number of BitcodeReaders active at once (one per > module we import from). I need to think more about whether this is > feasible and investigate if this adds a lot of overhead. > > The other thing that David alluded to above, that I should call out > more explicitly in my RFC, is that we only link in the metadata > transitively reached by the imported functions. This is handled during > metadata value mapping. > > I think I will update the RFC based on some of the discussion so far > and will send out an updated version early next week. > > Thanks, > Teresa > >> >> David >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
On Fri, May 15, 2015 at 1:06 PM, Teresa Johnson <tejohnson at google.com> wrote:> On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li > <xinliangli at gmail.com> wrote: >> >> >> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith >> <dexonsmith at apple.com> wrote: >>> >>> >>> > On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at google.com> wrote: >>> > >>> >>> a. Lazy Debug Metadata Linking: >>> >>> >>> >>> The prototype implementation included lazy importing of module-level >>> >>> metadata during the ThinLTO pass finalization (i.e. after all function >>> >>> importing is complete). This actually applies to all module-level >>> >>> metadata, not just debug, although it is the largest. This can be >>> >>> added as a separate set of patches. Changes to BitcodeReader, >>> >>> ValueMapper, ModuleLinker >>> >> >>> >> It sounds like this would work well with the "full" LTO implemented >>> >> by tools/gold-plugin right now. What exactly did you do to improve >>> >> this? >>> > >>> > I don't think it will help with full LTO. The parsing of the metadata >>> > is only delayed until the ThinLTO pass finalization, and the delayed >>> > metadata import is necessary to avoid reading and linking in the >>> > metadata multiple times (for each function imported from that module). >>> > Coming out of the ThinLTO pass you still have all the metadata >>> > necessary for each function that was imported. For a full LTO that >>> > would end up being all of the metadata in the module. >>> > >>> > The high level summary is that during the initial import it leaves the >>> > temporary metadata on the instructions that were imported, but saves >>> > the index used by the bitcode reader used to correlate with the >>> > metadata when it is ready (i.e. the MDValuePtrs index), and skips the >>> > metadata parsing. During finalization we parse just the metadata, and >>> > suture it up during metadata value mapping using the saved index. >>> >>> AFAICT, the gold-plugin currently does similar work. (Rafael knows >>> this code better (I've only introduced bugs there), but IIRC he's on >>> vacation until next week.) Even in "full" LTO. >>> >>> Have a look at `getModuleForFile()` and its calling loop inside >>> `allSymbolsReadHook()`: >>> >>> 1. Load a single module, lazily. >>> 2. Delete the bodies of unwanted functions (without ever loading them) >>> and fiddle with linkage as necessary. >>> >>> 3. Link in the module. >>> 4. Delete the module. >>> >>> How is what you're proposing different? Does it need to be different? >> >> >> The difference is that for thinLTO, the lazy import happens in backend >> compilation instead of running at linker plugin time. It is really lazy and >> imports bare minimal depends on only references from imported functions. >> >> Say module B has 1000 functions. Module A only imports 2 function foo1 and >> foo2 from Module B transitively. The lazy reading will import necessary >> module level data from B only to satisfy foo1 and foo2. > > Right, so there are a couple of differences. As David mentions, we do > this later and also in an iterative fashion. I.e. it may import > function foo1 from module B, then in function foo1 see hot calls to > functions bar1 in module C and bar2 in module D, importing each of > those in turn, then later decide to import function foo2 from module > B. So the BitcodeReader is created and opens a module multiple times. > Each time we import we create a BitcodeReader, read in what we need, > create a Module (which contains exactly 1 function), and invoke > linkInModule, delete the module. > > This is in contrast to gold/LTO and the lazy bitcode reader which does > instantiate the BitcodeReader once per module, and so it can defer > parsing the metadata until the first function is materialized (at > which point it parses and materializes all of the metadata). We don't > want to parse/materialize the metadata until we have completed all > imports from the module. > > The alternative if we wanted to leverage the existing lazy metadata > parsing support would be to keep the BitcodeReader instantiated for > each module and not close it until ThinLTO importing finalization. So > we would have a number of BitcodeReaders active at once (one per > module we import from). I need to think more about whether this is > feasible and investigate if this adds a lot of overhead.I like the idea of reusing the lazy metadata support, but after thinking some more there are some additional issues that need to be thought about. We also need to keep around the source Module created during bitcode reading for each imported module. And in the prototype as I import functions during the ThinLTO SCC pass I update the callgraph with the imported functions for later processing. To do that I believe I need to link them into the combined module, which I currently do via linkInModule (where only one function has a body to link, and I skip metadata since that is handled later). To do this with a single bitcode read over each module, where I am keeping a BitcodeReader and source Module instantiated for each module being imported from, means that the linking from different source modules will be interleaved. I'm not sure how much restructuring of the linking will need to be done to get this to work. Teresa> > The other thing that David alluded to above, that I should call out > more explicitly in my RFC, is that we only link in the metadata > transitively reached by the imported functions. This is handled during > metadata value mapping. > > I think I will update the RFC based on some of the discussion so far > and will send out an updated version early next week. > > Thanks, > Teresa > >> >> David >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413