Eric Christopher via llvm-dev
2016-Mar-23 03:11 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote:> +pcc, who had some other ideas/patch out for improving memory usage of > debug info > +Reid, who's responsible for the windows/CodeView/PDB debug info which is > motivating some of the ideas about changes to type emission > >So I discussed this with Adrian and Mehdi at the social last Thursday and I'm getting set to finish the write up. I think it'll have some bearing on this proposal as I think it'll change how we want to take a look at the format of DISubprogram metadata a bit more. That said, most of it is orthogonal to the changes Duncan is talking about here. Just puts the pressure on to get the other proposal written up. -eric> So how does this relate, or not, to Peter's (pcc) work trying to reduce > the DIE overhead during code gen? Are you folks chasing different memory > bottlenecks? Are they both relevant (perhaps in different scenarios)? > > Baking into the IR more about types as units has pretty direct overlap > with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as > you say, it's not in your immediate plan anyway, so we can come back to > that - but would be good for whoever gets there first to discuss it with > the others) > > On Tue, Mar 22, 2016 at 7:28 PM, Duncan P. N. Exon Smith < > dexonsmith at apple.com> wrote: > >> I have some ideas to allow the BitcodeReader to lazy-load debug info >> metadata, and wanted to air this on llvm-dev before getting too deep >> into the code. >> >> Motivation >> =========>> >> Based on some analysis Mehdi ran (ping him for details), there are three >> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`: >> >> a) Reading the large number of Metadata bitcode records in the global >> metadata block. I'm talking about raw `BitStreamer` calls here. >> >> b) Creating unnecessary `DI*` instances (that aren't relevant to code). >> >> c) Emitting unnecessary `DI*` instances (that aren't relevant to code). >> >> Here is my recollection of some peak memory stats on a small testcase >> during thin-LTO, which should be a decent indicator of (b): >> >> - ~150MB: DILocation >> - ~100MB: DISubprogram >> - ~70MB: DILocalVariable >> - ~50MB: (cumulative) DIType descendents >> >> It looks, suprisingly, like types are not the primary bottleneck. >> >> There are caveats: >> >> - `DISubprogram` declarations -- member function descriptors -- are >> part of the type hierarchy. >> - Most of the type hierarchy gets uniqued at parse time. >> - As a result, these data are a poor indicator for (a). >> >> Even so, non-types are substantial. >> >> Related work >> ===========>> >> Teresa has some post-processing in-place/in-review to avoid importing >> metadata unnecessarily, but IIUC: it won't address (a) and (b), only >> (c) (maybe I'm wrong?); and it only helps -flto=thin, not other >> lazy-loaders. >> >> I heard a rumour that Eric has a grand plan to factor away the type >> hierarchy -- awesome if true -- but I think most of this is worthwhile >> regardless. >> >> Proposal >> =======>> >> Short version >> ------------- >> >> 1. Serialize metadata in Function blocks where possible. >> 2. Reverse the `DISubprogram`/`DICompileUnit` link. >> 3. Create a `METADATA_SUBPROGRAM_BLOCK`. >> >> Type-related work Eric will make unnecessary if he's fast: >> >> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2). >> 5. Create a `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3). >> >> Long version >> ------------ >> >> 1. If a piece of metadata is referenced from only a single `Function`, >> serialize that metadata in the function's metadata block instead of >> the global metadata block. >> >> This addresses problems (a) and (b), primarily targeting >> `DILocation`s. It should pick up lots of other stuff, depending on >> how much inlining has happened. >> >> (I have a draft of the writer side, still working on the reader.) >> >> 2. Reverse the `DISubprogram`/`DICompileUnit` link (David and I have >> talked about this in the past in barely-related threads). The >> direct effect is that subprograms that are not pointed at by any >> code (!dbg attachments or @llvm.dbg.value intrinsics) get dropped. >> >> This addresses problem (c). If a consumer is only linking/loading a >> subset of a module's functions, this naturally filters subprograms >> to the relevant ones. Also, with limited inlining (and assuming >> (1)), it addresses problems (a) and (b), too. >> >> Adrian volunteered to implement this and is apparently almost ready >> to post a patch (still working on testcase update script logic I >> believe (probably other details, don't let me oversell it)). >> >> 3. Create a special `METADATA_SUBPROGRAM_BLOCK` for each `DISubprogram` >> in the global metadata block. Store the relevant `DISubprogram` and >> all of the subprogram's `DILexicalBlock`s and `DILocalVariable`s. >> The block can be lazy-loaded on an all-or-nothing basis. >> >> In combination with (2), this addresses (a) and (b) in cases that >> (1) doesn't catch. A lazy-loading module will only load the >> subprogram blocks that get referenced. >> >> (I have a basic design for this that accounts for references into >> the middle of block; I'll see what happens when I flesh it out.) >> >> I think this will solve the non-type bottlenecks. >> >> If Eric hasn't solved types by then, we can do similar things to the IR >> for the debug info type hierarchy. >> >> 4. Implement my proposal to remove the `DICompositeType` name map from >> `retainedTypes:`. >> >> >> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html >> >> Similar to (2) above, this will naturally filter the types that get >> linked in to the ones actually used by the code being linked. >> >> It should also allow the reader to skip records for types that have >> already been loaded in the main module. >> >> 5. Create a special `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3) but >> for composite types and their members. This avoids the raw bitcode >> reading overhead. (This is totally undesigned at this point.) >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160323/8951a345/attachment.html>
Duncan P. N. Exon Smith via llvm-dev
2016-Mar-30 01:46 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
> On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> wrote: > >> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote: >> +pcc, who had some other ideas/patch out for improving memory usage of debug info >> +Reid, who's responsible for the windows/CodeView/PDB debug info which is motivating some of the ideas about changes to type emission > > So I discussed this with Adrian and Mehdi at the social last Thursday and I'm getting set to finish the write up. I think it'll have some bearing on this proposal as I think it'll change how we want to take a look at the format of DISubprogram metadata a bit more.(The interesting bit here is to make a clearer split between DISubprogram declarations (part of the type hierarchY) and DISubprogram definitions (part of the code/line table/variable locations). I think that'll end up being mostly orthogonal to what I'm trying to do.)> That said, most of it is orthogonal to the changes Duncan is talking about here. Just puts the pressure on to get the other proposal written up.Which is now here: http://lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html>> Baking into the IR more about types as units has pretty direct overlap with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as you say, it's not in your immediate plan anyway, so we can come back to that - but would be good for whoever gets there first to discuss it with the others)After thinking for a few days, I don't think this will bake anything new into the IR. If anything it removes a few special cases. More at the bottom.>>> Motivation >>> =========>>> >>> Based on some analysis Mehdi ran (ping him for details), there are three >>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`: >>> >>> a) Reading the large number of Metadata bitcode records in the global >>> metadata block. I'm talking about raw `BitStreamer` calls here. >>> >>> b) Creating unnecessary `DI*` instances (that aren't relevant to code). >>> >>> c) Emitting unnecessary `DI*` instances (that aren't relevant to code). >>> >>> Here is my recollection of some peak memory stats on a small testcase >>> during thin-LTO, which should be a decent indicator of (b): >>> >>> - ~150MB: DILocation >>> - ~100MB: DISubprogram >>> - ~70MB: DILocalVariable >>> - ~50MB: (cumulative) DIType descendents >>> >>> It looks, suprisingly, like types are not the primary bottleneck.(Probably wrong for (a), BTW. Caveats matter.)>>> There are caveats: >>> >>> - `DISubprogram` declarations -- member function descriptors -- are >>> part of the type hierarchy. >>> - Most of the type hierarchy gets uniqued at parse time. >>> - As a result, these data are a poor indicator for (a).((a) is the main bottleneck for compile-time of -flto=thin (since it's quadratic in the number of files). (b) only affects memory. Also important, but at least it scales linearly.)>>> Even so, non-types are substantial. >>> >>> Proposal >>> =======>>> >>> Short version >>> ------------- >>> >>> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).This is the part that's relevant to the new RFC Eric just posted.>>> Long version >>> ------------- >>> >>> 4. Implement my proposal to remove the `DICompositeType` name map from >>> `retainedTypes:`. >>> >>> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html >>> >>> Similar to (2) above, this will naturally filter the types that get >>> linked in to the ones actually used by the code being linked. >>> >>> It should also allow the reader to skip records for types that have >>> already been loaded in the main module.The essential things I want to accomplish with this part: - Make `type:` operands less special: instead of referencing types indirectly through MDString, point directly at the type node. - Stop using `retainedTypes:` by default (only for -gfull, etc.). - Avoid blowing up memory in -flto=full (which converting MDString references back to pointers would do naively, through re-introducing cycles). Note that this needs to be handled somehow at BitcodeReader time. After chatting with Eric, I don't think this conflicts at all with the other RFC. Unifying the `type:` operands might actually help both. One good point David mentioned last week was that we don't want to teach the IR any more about types. Rather than inventing some new context (as I suggested originally), I figure LTO plugins can just pass a (StringRef -> DIType*) map to the BitcodeReader, and the Module itself doesn't need to know anything about it.
Eric Christopher via llvm-dev
2016-Mar-30 02:11 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
I have no objections to any of this FWIW :) -eric On Tue, Mar 29, 2016 at 6:46 PM Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > > On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> wrote: > > > >> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> +pcc, who had some other ideas/patch out for improving memory usage of > debug info > >> +Reid, who's responsible for the windows/CodeView/PDB debug info which > is motivating some of the ideas about changes to type emission > > > > So I discussed this with Adrian and Mehdi at the social last Thursday > and I'm getting set to finish the write up. I think it'll have some bearing > on this proposal as I think it'll change how we want to take a look at the > format of DISubprogram metadata a bit more. > > (The interesting bit here is to make a clearer split between > DISubprogram declarations (part of the type hierarchY) and > DISubprogram definitions (part of the code/line table/variable > locations). I think that'll end up being mostly orthogonal to what > I'm trying to do.) > > > That said, most of it is orthogonal to the changes Duncan is talking > about here. Just puts the pressure on to get the other proposal written up. > > Which is now here: > http://lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html > > >> Baking into the IR more about types as units has pretty direct overlap > with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as > you say, it's not in your immediate plan anyway, so we can come back to > that - but would be good for whoever gets there first to discuss it with > the others) > > After thinking for a few days, I don't think this will bake anything > new into the IR. If anything it removes a few special cases. > > More at the bottom. > > >>> Motivation > >>> =========> >>> > >>> Based on some analysis Mehdi ran (ping him for details), there are > three > >>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`: > >>> > >>> a) Reading the large number of Metadata bitcode records in the global > >>> metadata block. I'm talking about raw `BitStreamer` calls here. > >>> > >>> b) Creating unnecessary `DI*` instances (that aren't relevant to > code). > >>> > >>> c) Emitting unnecessary `DI*` instances (that aren't relevant to > code). > >>> > >>> Here is my recollection of some peak memory stats on a small testcase > >>> during thin-LTO, which should be a decent indicator of (b): > >>> > >>> - ~150MB: DILocation > >>> - ~100MB: DISubprogram > >>> - ~70MB: DILocalVariable > >>> - ~50MB: (cumulative) DIType descendents > >>> > >>> It looks, suprisingly, like types are not the primary bottleneck. > > (Probably wrong for (a), BTW. Caveats matter.) > > >>> There are caveats: > >>> > >>> - `DISubprogram` declarations -- member function descriptors -- are > >>> part of the type hierarchy. > >>> - Most of the type hierarchy gets uniqued at parse time. > >>> - As a result, these data are a poor indicator for (a). > > ((a) is the main bottleneck for compile-time of -flto=thin (since it's > quadratic in the number of files). (b) only affects memory. Also > important, but at least it scales linearly.) > > >>> Even so, non-types are substantial. > >>> > >>> Proposal > >>> =======> >>> > >>> Short version > >>> ------------- > >>> > >>> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2). > > This is the part that's relevant to the new RFC Eric just posted. > > >>> Long version > >>> ------------- > >>> > >>> 4. Implement my proposal to remove the `DICompositeType` name map from > >>> `retainedTypes:`. > >>> > >>> > http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html > >>> > >>> Similar to (2) above, this will naturally filter the types that get > >>> linked in to the ones actually used by the code being linked. > >>> > >>> It should also allow the reader to skip records for types that have > >>> already been loaded in the main module. > > The essential things I want to accomplish with this part: > > - Make `type:` operands less special: instead of referencing types > indirectly through MDString, point directly at the type node. > > - Stop using `retainedTypes:` by default (only for -gfull, etc.). > > - Avoid blowing up memory in -flto=full (which converting MDString > references back to pointers would do naively, through > re-introducing cycles). Note that this needs to be handled > somehow at BitcodeReader time. > > After chatting with Eric, I don't think this conflicts at all with the > other RFC. Unifying the `type:` operands might actually help both. > > One good point David mentioned last week was that we don't want to > teach the IR any more about types. Rather than inventing some new > context (as I suggested originally), I figure LTO plugins can just > pass a (StringRef -> DIType*) map to the BitcodeReader, and the Module > itself doesn't need to know anything about it.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/ea8eb645/attachment.html>