David Blaikie via llvm-dev
2016-Apr-16 00:12 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
On Fri, Apr 15, 2016 at 4:04 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > > On 2016-Apr-15, at 14:53, David Blaikie <dblaikie at gmail.com> wrote: > > > > > > > >> On Fri, Apr 15, 2016 at 2:27 PM, Duncan P. N. Exon Smith < > dexonsmith at apple.com> wrote: > >> > >> > On 2016-Apr-15, at 10:27, David Blaikie <dblaikie at gmail.com> wrote: > >> > > >> > > >> > > >> > On Fri, Apr 15, 2016 at 9:50 AM, Duncan P. N. Exon Smith < > dexonsmith at apple.com> wrote: > >> > Since I haven't heard any objections to the direction, I'm planning to > >> > commit this (step 4) when I find some cycles; likely over the weekend. > >> > > >> > To make this more concrete (in case someone is silently concerned) > I've > >> > posted my WIP patches below. They apply cleanly to r266414. There > are > >> > a few mechanical changes missing that are tracked in the commit > >> > messages (such as LangRef, or new tests in some cases). To get > >> > `ninja check` to pass after the final commit you'll need to run the > >> > attached upgrade script. The clang changes are truly uninteresting so > >> > I've left them out. > >> > > >> > - 0001/2: Prep commits I need to flush out (sorry for the noise). > >> > - 0003: Add an explicit type map for ODR type uniquing. > >> > - 0004: Prep. > >> > - 0005: Add ODR type uniquing of members. > >> > > >> > What's this ^ for? > >> > > >> > In theory, the list of /members/ of a type shouldn't vary between > instances. The special cases (implicit special members, member function > template specializations, and nested types) don't appear in the member list > - instead they just list the type as their scope (so it's one directional, > instead of bidirectional) > >> > > >> > Is this to handle ODR violations? We currently don't handle them, > right - we just pick one instance of the type & the user gets to keep the > pieces. Is there a particular motivation for changing that? Some place > where it's more problematic, we have to support it, etc? > >> > >> This is a strange one; I probably should have called it out actually. > >> It's a bugfix vs. ToT; and without it, 0006 regresses DWARF output. > >> > >> File and line can change without violating ODR, which means that we > >> end up with duplicate members. I think I may have written something > >> up in the WIP commit message for 0005, but also have a look in tree at > >> test/Linker/type-unique-odr-a.ll. > >> > >> It's legal, e.g., to have the following in two separate files, as long > >> as the sequence of tokens is identical: > >> -- > >> struct S { void foo(); }; > >> -- > >> However, the DISubprogram(isDefinition: false) for S::foo will not be > >> structurally equivalent (since the 'file:' node won't match), so we > >> currently end up with two metadata nodes. The same thing happens for > >> the DICompositeType for `S` (and also applies to any fields). > >> > >> The checked-in testcase confirms that the description of `S` only has > >> one copy of `S::foo` in the DWARF output. (The testcase doesn't provide > >> good enough coverage actually; it gets lucky because the definition of > >> S::foo is in the second file being linked (type-unique-odr-b.ll). If > >> you reverse the order of the files, then `S` will have two copies of > >> `S::foo`.) > >> > >> 0005 solves the problem up front (and in both directions; I'll add a RUN > >> line to test/Linker/type-unique-odr-a.ll). > >> > >> I came across this because the testcase started failing with 0006. The > >> current (but not-quite-sufficient) logic relies on the resolve() logic > >> in DwarfDebug.cpp, which indirects through the merged 'retainedTypes:' > >> map (the asymmetry in building the merged map explains why the in-tree > >> logic only works for one linking direction). > > > > Not sure I'm quite following. > > > > I think I understand the circumstance you're describing (indeed, the > type itself can be defined on different lines/files without violating the > ODR, or its members, or, etc... ). But if the subprogram declaration is in > a different file in one definition - that'd make the composite type itself > different (metadata uniquing - composite type references the subprogram, > etc). > > > > But we should only every find one version of the type (all references to > the type should be via string-based dityperefs). Sure, we'll pick one at > random, but when/how would we get two? > > We get two because one of the modules contributes a definition of S::foo, > and it may be pointing at the declaration from a version of S that isn't > blessed. If you reverse the RUN line in the in-tree testcase you can see > what happens, but here's a quick hand-written version: > -- > !llvm.dbg.cus = !{!0, !4} > > !0 = distinct !DICompileUnit(file: !1, retainedTypes: !{!2}) > !1 = !DIFile(filename: "a.cpp") > !2 = !DICompositeType(file: !1, name: "S", elements: !{!3}, > identifier: "_ZTS1S") > !3 = !DISubprogram(isDefinition: false, file: !1, name: "foo", > scope: !2, linkageName: "_ZN1S3fooEv") > > !4 = distinct !DICompileUnit(file: !5, retainedTypes: !{!6}) > !5 = !DIFile(filename: "b.cpp") > !6 = !DICompositeType(file: !5, name: "S", elements: !{!7}, > identifier: "_ZTS1S") > !7 = !DISubprogram(isDefinition: false, file: !5, name: "foo", > scope: !6, linkageName: "_ZN1S3fooEv") > > define void @foo() !dbg !8 { > ret void > } > > !8 = !DISubprogram(isDefinition: true, file: !5, name: "foo", > unit: !4, linkageName: "_Z3foov", declaration: !7) > -- > > If DwarfDebug picks !6 for the canonical "_ZTS1S", then there's no > problem. !6 points at its function !7, and the definition !8 also > points at !7. > > On the other hand, if DwarfDebug picks !2 for "_ZTS1S", then !2 will > pull in !3 and !8 will still pull in !7. The llvm-dwarfdebug output > will emit both !3 and !7 inside of !2. >Ah, thanks for explaining it, I understand now. The problem is that the definition reaches into the type blob via the declaration with a direct reference. We're going to want to abstract that reference away at some point to cross the same boundary as for type references, I think - I imagine that's what your patch/plan is to do? As I think Reid chimed in here or elsewhere, he's going to need something like that to cross the boundary into the opaque type blob for CodeView output too, I imagine.> > > > > > > > > - 0006: Remove DITypeRef (string-based references) and strip down > > > 'retainedTypes:'. > > > > > > Once this is done, I expect the bitcode block layout improvements for > > > lazy-loading of subprograms and composite types (steps 3 and 5 from > > > this RFC) to be fairly straightforward. > > > > > > > > > > > > > > > > On 2016-Mar-29, at 19:11, Eric Christopher <echristo at gmail.com> > wrote: > > > > > > > > I have no objections to any of this FWIW :) > > > > > > > > -eric > > > > > > > > On Tue, Mar 29, 2016 at 6:46 PM Duncan P. N. Exon Smith < > dexonsmith at apple.com> wrote: > > > > > > > > > On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> > wrote: > > > > > > > > > >> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> > wrote: > > > > >> +pcc, who had some other ideas/patch out for improving memory > usage of debug info > > > > >> +Reid, who's responsible for the windows/CodeView/PDB debug info > which is motivating some of the ideas about changes to type emission > > > > > > > > > > So I discussed this with Adrian and Mehdi at the social last > Thursday and I'm getting set to finish the write up. I think it'll have > some bearing on this proposal as I think it'll change how we want to take a > look at the format of DISubprogram metadata a bit more. > > > > > > > > (The interesting bit here is to make a clearer split between > > > > DISubprogram declarations (part of the type hierarchY) and > > > > DISubprogram definitions (part of the code/line table/variable > > > > locations). I think that'll end up being mostly orthogonal to what > > > > I'm trying to do.) > > > > > > > > > That said, most of it is orthogonal to the changes Duncan is > talking about here. Just puts the pressure on to get the other proposal > written up. > > > > > > > > Which is now here: > > > > lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html > > > > > > > > >> Baking into the IR more about types as units has pretty direct > overlap with Reid/CodeView/etc - so, yeah, that'll takes ome discussion > (but, as you say, it's not in your immediate plan anyway, so we can come > back to that - but would be good for whoever gets there first to discuss it > with the others) > > > > > > > > After thinking for a few days, I don't think this will bake anything > > > > new into the IR. If anything it removes a few special cases. > > > > > > > > More at the bottom. > > > > > > > > >>> Motivation > > > > >>> =========> > > > >>> > > > > >>> Based on some analysis Mehdi ran (ping him for details), there > are three > > > > >>> (related) compile-time bottlenecks we're seeing with `-flto=thin > -g`: > > > > >>> > > > > >>> a) Reading the large number of Metadata bitcode records in the > global > > > > >>> metadata block. I'm talking about raw `BitStreamer` calls > here. > > > > >>> > > > > >>> b) Creating unnecessary `DI*` instances (that aren't relevant > to code). > > > > >>> > > > > >>> c) Emitting unnecessary `DI*` instances (that aren't relevant > to code). > > > > >>> > > > > >>> Here is my recollection of some peak memory stats on a small > testcase > > > > >>> during thin-LTO, which should be a decent indicator of (b): > > > > >>> > > > > >>> - ~150MB: DILocation > > > > >>> - ~100MB: DISubprogram > > > > >>> - ~70MB: DILocalVariable > > > > >>> - ~50MB: (cumulative) DIType descendents > > > > >>> > > > > >>> It looks, suprisingly, like types are not the primary bottleneck. > > > > > > > > (Probably wrong for (a), BTW. Caveats matter.) > > > > > > > > >>> There are caveats: > > > > >>> > > > > >>> - `DISubprogram` declarations -- member function descriptors > -- are > > > > >>> part of the type hierarchy. > > > > >>> - Most of the type hierarchy gets uniqued at parse time. > > > > >>> - As a result, these data are a poor indicator for (a). > > > > > > > > ((a) is the main bottleneck for compile-time of -flto=thin (since > it's > > > > quadratic in the number of files). (b) only affects memory. Also > > > > important, but at least it scales linearly.) > > > > > > > > >>> Even so, non-types are substantial. > > > > >>> > > > > >>> Proposal > > > > >>> =======> > > > >>> > > > > >>> Short version > > > > >>> ------------- > > > > >>> > > > > >>> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to > (2). > > > > > > > > This is the part that's relevant to the new RFC Eric just posted. > > > > > > > > >>> Long version > > > > >>> ------------- > > > > >>> > > > > >>> 4. Implement my proposal to remove the `DICompositeType` name > map from > > > > >>> `retainedTypes:`. > > > > >>> > > > > >>> > lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html > > > > >>> > > > > >>> Similar to (2) above, this will naturally filter the types > that get > > > > >>> linked in to the ones actually used by the code being linked. > > > > >>> > > > > >>> It should also allow the reader to skip records for types > that have > > > > >>> already been loaded in the main module. > > > > > > > > The essential things I want to accomplish with this part: > > > > > > > > - Make `type:` operands less special: instead of referencing types > > > > indirectly through MDString, point directly at the type node. > > > > > > > > - Stop using `retainedTypes:` by default (only for -gfull, etc.). > > > > > > > > - Avoid blowing up memory in -flto=full (which converting MDString > > > > references back to pointers would do naively, through > > > > re-introducing cycles). Note that this needs to be handled > > > > somehow at BitcodeReader time. > > > > > > > > After chatting with Eric, I don't think this conflicts at all with > the > > > > other RFC. Unifying the `type:` operands might actually help both. > > > > > > > > One good point David mentioned last week was that we don't want to > > > > teach the IR any more about types. Rather than inventing some new > > > > context (as I suggested originally), I figure LTO plugins can just > > > > pass a (StringRef -> DIType*) map to the BitcodeReader, and the > Module > > > > itself doesn't need to know anything about it. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160415/1a9d085f/attachment.html>
Duncan P. N. Exon Smith via llvm-dev
2016-Apr-18 15:08 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
> >> On 2016-Apr-15, at 17:12, David Blaikie <dblaikie at gmail.com> wrote: >> >> >> >> On Fri, Apr 15, 2016 at 4:04 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> > On 2016-Apr-15, at 14:53, David Blaikie <dblaikie at gmail.com> wrote: >> > >> > >> > >> >> On Fri, Apr 15, 2016 at 2:27 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> >> >> > On 2016-Apr-15, at 10:27, David Blaikie <dblaikie at gmail.com> wrote: >> >> > >> >> > >> >> > >> >> > On Fri, Apr 15, 2016 at 9:50 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> > Since I haven't heard any objections to the direction, I'm planning to >> >> > commit this (step 4) when I find some cycles; likely over the weekend. >> >> > >> >> > To make this more concrete (in case someone is silently concerned) I've >> >> > posted my WIP patches below. They apply cleanly to r266414. There are >> >> > a few mechanical changes missing that are tracked in the commit >> >> > messages (such as LangRef, or new tests in some cases). To get >> >> > `ninja check` to pass after the final commit you'll need to run the >> >> > attached upgrade script. The clang changes are truly uninteresting so >> >> > I've left them out. >> >> > >> >> > - 0001/2: Prep commits I need to flush out (sorry for the noise). >> >> > - 0003: Add an explicit type map for ODR type uniquing. >> >> > - 0004: Prep. >> >> > - 0005: Add ODR type uniquing of members. >> >> >FYI, I hit some major problems with all of this. The short version is that I was testing -flto memory usage with the wrong command-line options (-gline-tables-only instead of -g), and now that I've done it correctly I've discovered a major memory blowup. I'm still investigating. I also have a question at the end about a semantic problem I uncovered. -- The attached patches were supposed to finish this off -- they've been mostly ready since Saturday -- but I had a big surprise when I finally configured correctly (apparently the Apple CMake caches default to line-tables only for RelWithDebInfo): 30GB peak for linking clang. Then I went back and ran a fresh bootstrap of ToT, and got 27GB. This is pretty terrible (we were down to around 17GB back in October). I'd been doing spot checks of memory usage (glancing at top) and it was surprisingly low; I just assumed someone had made improvements when I wasn't looking. Shame on me :(. (I do I have a bot setup that is supposed to catch major regressions like this (it doesn't exactly track memory usage directly, but it does a half- bootstrap of -flto=full -g). Except at some point it switched to using a CMake cache, and it has been using -gline-tables-only as well: lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8599/consoleFull ) I hope the blowup is from the work I've done over the last couple of weeks; I'll continue investigating today and revert whatever I have to. Sorry for the noise and the breakage. (It's also possible this is not a regression from my recent work, but that would be harder to recover from; let's hope it was from me.) -- I'm also concerned about the semantics of a particular case that I learned debugging correctness issues. If a type is defined inside a function, we create a DICompositeType (CT) whose "scope:" is the DISubprogram (SP) for that function's definition. This SP is not part of the type graph, it's the actual definition that eventually describes the code. However, the CT has an ODR identifier, which means it will get uniqued by the DITypeMap. This seems wrong. If the function is defined in a header you can end up with two copies of this type, and their scope is completely unrelated. Here it is without using the type map: -- !0 = distinct !DICompileUnit() !1 = distinct !DISubprogram(name: "foo", unit: !0) !2 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !1) !3 = distinct !DICompileUnit() !4 = distinct !DISubprogram(name: "foo", unit: !3) !5 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !4) -- I had thought that when !DICompositeType had an 'identifier:' (followed ODR-rules) then its 'scope:' must also have an 'identifier:', or at least follow ODR-rules. That's clearly not always the case right now. There are two simple ways to make it so: 1. Drop the 'identifier:' from composite types local to a subprogram. 2. Change the 'scope:' to point at a *declaration* of the subprogram (which lives in the type graph) and unique them there (and disallow having 'identifier:' types point at subprogram definitions). I'd appreciate feedback on whether either of these is a good idea. (Another way of looking at this, is that I'd imagined there were two parts of the debug info graph: the "types" part, and the "codegen" part (by which I mean stuff that describes the actual emitted object code); and I'd imagined that the "types" part never referenced the "codegen" part. However it does in this case.) -- (the patches that increased memory from 27GB to 30GB) -------------- next part -------------- A non-text attachment was scrubbed... Name: 0006-IR-Remove-DITypeRef-DIScopeRef-and-DINodeRef-llvm.patch Type: application/octet-stream Size: 322088 bytes Desc: not available URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160418/bef754e6/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0007-WIP-DebugInfo-Adapt-to-loss-of-DITypeRef-in-LL-clang.patch Type: application/octet-stream Size: 67049 bytes Desc: not available URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160418/bef754e6/attachment-0003.obj> -------------- next part -------------->> >> > What's this ^ for? >> >> > >> >> > In theory, the list of /members/ of a type shouldn't vary between instances. The special cases (implicit special members, member function template specializations, and nested types) don't appear in the member list - instead they just list the type as their scope (so it's one directional, instead of bidirectional) >> >> > >> >> > Is this to handle ODR violations? We currently don't handle them, right - we just pick one instance of the type & the user gets to keep the pieces. Is there a particular motivation for changing that? Some place where it's more problematic, we have to support it, etc? >> >> >> >> This is a strange one; I probably should have called it out actually. >> >> It's a bugfix vs. ToT; and without it, 0006 regresses DWARF output. >> >> >> >> File and line can change without violating ODR, which means that we >> >> end up with duplicate members. I think I may have written something >> >> up in the WIP commit message for 0005, but also have a look in tree at >> >> test/Linker/type-unique-odr-a.ll. >> >> >> >> It's legal, e.g., to have the following in two separate files, as long >> >> as the sequence of tokens is identical: >> >> -- >> >> struct S { void foo(); }; >> >> -- >> >> However, the DISubprogram(isDefinition: false) for S::foo will not be >> >> structurally equivalent (since the 'file:' node won't match), so we >> >> currently end up with two metadata nodes. The same thing happens for >> >> the DICompositeType for `S` (and also applies to any fields). >> >> >> >> The checked-in testcase confirms that the description of `S` only has >> >> one copy of `S::foo` in the DWARF output. (The testcase doesn't provide >> >> good enough coverage actually; it gets lucky because the definition of >> >> S::foo is in the second file being linked (type-unique-odr-b.ll). If >> >> you reverse the order of the files, then `S` will have two copies of >> >> `S::foo`.) >> >> >> >> 0005 solves the problem up front (and in both directions; I'll add a RUN >> >> line to test/Linker/type-unique-odr-a.ll). >> >> >> >> I came across this because the testcase started failing with 0006. The >> >> current (but not-quite-sufficient) logic relies on the resolve() logic >> >> in DwarfDebug.cpp, which indirects through the merged 'retainedTypes:' >> >> map (the asymmetry in building the merged map explains why the in-tree >> >> logic only works for one linking direction). >> > >> > Not sure I'm quite following. >> > >> > I think I understand the circumstance you're describing (indeed, the type itself can be defined on different lines/files without violating the ODR, or its members, or, etc... ). But if the subprogram declaration is in a different file in one definition - that'd make the composite type itself different (metadata uniquing - composite type references the subprogram, etc). >> > >> > But we should only every find one version of the type (all references to the type should be via string-based dityperefs). Sure, we'll pick one at random, but when/how would we get two? >> >> We get two because one of the modules contributes a definition of S::foo, >> and it may be pointing at the declaration from a version of S that isn't >> blessed. If you reverse the RUN line in the in-tree testcase you can see >> what happens, but here's a quick hand-written version: >> -- >> !llvm.dbg.cus = !{!0, !4} >> >> !0 = distinct !DICompileUnit(file: !1, retainedTypes: !{!2}) >> !1 = !DIFile(filename: "a.cpp") >> !2 = !DICompositeType(file: !1, name: "S", elements: !{!3}, >> identifier: "_ZTS1S") >> !3 = !DISubprogram(isDefinition: false, file: !1, name: "foo", >> scope: !2, linkageName: "_ZN1S3fooEv") >> >> !4 = distinct !DICompileUnit(file: !5, retainedTypes: !{!6}) >> !5 = !DIFile(filename: "b.cpp") >> !6 = !DICompositeType(file: !5, name: "S", elements: !{!7}, >> identifier: "_ZTS1S") >> !7 = !DISubprogram(isDefinition: false, file: !5, name: "foo", >> scope: !6, linkageName: "_ZN1S3fooEv") >> >> define void @foo() !dbg !8 { >> ret void >> } >> >> !8 = !DISubprogram(isDefinition: true, file: !5, name: "foo", >> unit: !4, linkageName: "_Z3foov", declaration: !7) >> -- >> >> If DwarfDebug picks !6 for the canonical "_ZTS1S", then there's no >> problem. !6 points at its function !7, and the definition !8 also >> points at !7. >> >> On the other hand, if DwarfDebug picks !2 for "_ZTS1S", then !2 will >> pull in !3 and !8 will still pull in !7. The llvm-dwarfdebug output >> will emit both !3 and !7 inside of !2. > > Ah, thanks for explaining it, I understand now. The problem is that the definition reaches into the type blob via the declaration with a direct reference. > > We're going to want to abstract that reference away at some point to cross the same boundary as for type references, I think - I imagine that's what your patch/plan is to do?(I missed this question before. I didn't exactly abstract it, just canonicalized it. I think we had a discussion when I committed it in r266548.)> As I think Reid chimed in here or elsewhere, he's going to need something like that to cross the boundary into the opaque type blob for CodeView output too, I imagine. > >> > >> > > >> > > - 0006: Remove DITypeRef (string-based references) and strip down >> > > 'retainedTypes:'. >> > > >> > > Once this is done, I expect the bitcode block layout improvements for >> > > lazy-loading of subprograms and composite types (steps 3 and 5 from >> > > this RFC) to be fairly straightforward. >> > > >> > > >> > > >> > > >> > > > On 2016-Mar-29, at 19:11, Eric Christopher <echristo at gmail.com> wrote: >> > > > >> > > > I have no objections to any of this FWIW :) >> > > > >> > > > -eric >> > > > >> > > > On Tue, Mar 29, 2016 at 6:46 PM Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> > > > >> > > > > On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> wrote: >> > > > > >> > > > >> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote: >> > > > >> +pcc, who had some other ideas/patch out for improving memory usage of debug info >> > > > >> +Reid, who's responsible for the windows/CodeView/PDB debug info which is motivating some of the ideas about changes to type emission >> > > > > >> > > > > So I discussed this with Adrian and Mehdi at the social last Thursday and I'm getting set to finish the write up. I think it'll have some bearing on this proposal as I think it'll change how we want to take a look at the format of DISubprogram metadata a bit more. >> > > > >> > > > (The interesting bit here is to make a clearer split between >> > > > DISubprogram declarations (part of the type hierarchY) and >> > > > DISubprogram definitions (part of the code/line table/variable >> > > > locations). I think that'll end up being mostly orthogonal to what >> > > > I'm trying to do.) >> > > > >> > > > > That said, most of it is orthogonal to the changes Duncan is talking about here. Just puts the pressure on to get the other proposal written up. >> > > > >> > > > Which is now here: >> > > > lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html >> > > > >> > > > >> Baking into the IR more about types as units has pretty direct overlap with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as you say, it's not in your immediate plan anyway, so we can come back to that - but would be good for whoever gets there first to discuss it with the others) >> > > > >> > > > After thinking for a few days, I don't think this will bake anything >> > > > new into the IR. If anything it removes a few special cases. >> > > > >> > > > More at the bottom. >> > > > >> > > > >>> Motivation >> > > > >>> =========>> > > > >>> >> > > > >>> Based on some analysis Mehdi ran (ping him for details), there are three >> > > > >>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`: >> > > > >>> >> > > > >>> a) Reading the large number of Metadata bitcode records in the global >> > > > >>> metadata block. I'm talking about raw `BitStreamer` calls here. >> > > > >>> >> > > > >>> b) Creating unnecessary `DI*` instances (that aren't relevant to code). >> > > > >>> >> > > > >>> c) Emitting unnecessary `DI*` instances (that aren't relevant to code). >> > > > >>> >> > > > >>> Here is my recollection of some peak memory stats on a small testcase >> > > > >>> during thin-LTO, which should be a decent indicator of (b): >> > > > >>> >> > > > >>> - ~150MB: DILocation >> > > > >>> - ~100MB: DISubprogram >> > > > >>> - ~70MB: DILocalVariable >> > > > >>> - ~50MB: (cumulative) DIType descendents >> > > > >>> >> > > > >>> It looks, suprisingly, like types are not the primary bottleneck. >> > > > >> > > > (Probably wrong for (a), BTW. Caveats matter.) >> > > > >> > > > >>> There are caveats: >> > > > >>> >> > > > >>> - `DISubprogram` declarations -- member function descriptors -- are >> > > > >>> part of the type hierarchy. >> > > > >>> - Most of the type hierarchy gets uniqued at parse time. >> > > > >>> - As a result, these data are a poor indicator for (a). >> > > > >> > > > ((a) is the main bottleneck for compile-time of -flto=thin (since it's >> > > > quadratic in the number of files). (b) only affects memory. Also >> > > > important, but at least it scales linearly.) >> > > > >> > > > >>> Even so, non-types are substantial. >> > > > >>> >> > > > >>> Proposal >> > > > >>> =======>> > > > >>> >> > > > >>> Short version >> > > > >>> ------------- >> > > > >>> >> > > > >>> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2). >> > > > >> > > > This is the part that's relevant to the new RFC Eric just posted. >> > > > >> > > > >>> Long version >> > > > >>> ------------- >> > > > >>> >> > > > >>> 4. Implement my proposal to remove the `DICompositeType` name map from >> > > > >>> `retainedTypes:`. >> > > > >>> >> > > > >>> lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html >> > > > >>> >> > > > >>> Similar to (2) above, this will naturally filter the types that get >> > > > >>> linked in to the ones actually used by the code being linked. >> > > > >>> >> > > > >>> It should also allow the reader to skip records for types that have >> > > > >>> already been loaded in the main module. >> > > > >> > > > The essential things I want to accomplish with this part: >> > > > >> > > > - Make `type:` operands less special: instead of referencing types >> > > > indirectly through MDString, point directly at the type node. >> > > > >> > > > - Stop using `retainedTypes:` by default (only for -gfull, etc.). >> > > > >> > > > - Avoid blowing up memory in -flto=full (which converting MDString >> > > > references back to pointers would do naively, through >> > > > re-introducing cycles). Note that this needs to be handled >> > > > somehow at BitcodeReader time. >> > > > >> > > > After chatting with Eric, I don't think this conflicts at all with the >> > > > other RFC. Unifying the `type:` operands might actually help both. >> > > > >> > > > One good point David mentioned last week was that we don't want to >> > > > teach the IR any more about types. Rather than inventing some new >> > > > context (as I suggested originally), I figure LTO plugins can just >> > > > pass a (StringRef -> DIType*) map to the BitcodeReader, and the Module >> > > > itself doesn't need to know anything about it. >
Duncan P. N. Exon Smith via llvm-dev
2016-Apr-18 15:12 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
> > (I do I have a bot setup that is supposed to catch major regressions like > this (it doesn't exactly track memory usage directly, but it does a half- > bootstrap of -flto=full -g). Except at some point it switched to using > a CMake cache, and it has been using -gline-tables-only as well: > lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build > lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8599/consoleFullScratch this, that's the wrong bot. This is the right one, and it still seems to be doing the right thing. It just doesn't seem to be blowing up as badly: lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/buildTimeTrend> )
Adrian Prantl via llvm-dev
2016-Apr-18 15:33 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
> On Apr 18, 2016, at 8:08 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > <snip> > > I'm also concerned about the semantics of a particular case that I > learned debugging correctness issues. If a type is defined inside a > function, we create a DICompositeType (CT) whose "scope:" is the > DISubprogram (SP) for that function's definition. This SP is not part of > the type graph, it's the actual definition that eventually describes the > code. > > However, the CT has an ODR identifier, which means it will get uniqued > by the DITypeMap. This seems wrong. If the function is defined in a > header you can end up with two copies of this type, and their scope is > completely unrelated. Here it is without using the type map: > -- > !0 = distinct !DICompileUnit() > !1 = distinct !DISubprogram(name: "foo", unit: !0) > !2 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !1) > !3 = distinct !DICompileUnit() > !4 = distinct !DISubprogram(name: "foo", unit: !3) > !5 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !4) > -- > > I had thought that when !DICompositeType had an 'identifier:' (followed > ODR-rules) then its 'scope:' must also have an 'identifier:', or at least > follow ODR-rules. That's clearly not always the case right now. There > are two simple ways to make it so: > > 1. Drop the 'identifier:' from composite types local to a subprogram. > > 2. Change the 'scope:' to point at a *declaration* of the subprogram > (which lives in the type graph) and unique them there (and disallow > having 'identifier:' types point at subprogram definitions).Intuitively option (2) seems to be the better solution — we want to separate out the type hierarchy from the function definitions. In the spirit of PR27352[1], could we even stop DISubprograms from being DIScopes in the class hierarchy and use uniqueable DIPrototypes as scopes instead? -- adrian [1] llvm.org/bugs/show_bug.cgi?id=27352> > I'd appreciate feedback on whether either of these is a good idea. > > (Another way of looking at this, is that I'd imagined there were two > parts of the debug info graph: the "types" part, and the "codegen" part > (by which I mean stuff that describes the actual emitted object code); > and I'd imagined that the "types" part never referenced the "codegen" > part. However it does in this case.)
Duncan P. N. Exon Smith via llvm-dev
2016-Apr-18 20:59 UTC
[llvm-dev] [RFC] Lazy-loading of debug info metadata
>>>>>> >>>>>> On Fri, Apr 15, 2016 at 9:50 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >>>>>> Since I haven't heard any objections to the direction, I'm planning to >>>>>> commit this (step 4) when I find some cycles; likely over the weekend. >>>>>> >>>>>> To make this more concrete (in case someone is silently concerned) I've >>>>>> posted my WIP patches below. They apply cleanly to r266414. There are >>>>>> a few mechanical changes missing that are tracked in the commit >>>>>> messages (such as LangRef, or new tests in some cases). To get >>>>>> `ninja check` to pass after the final commit you'll need to run the >>>>>> attached upgrade script. The clang changes are truly uninteresting so >>>>>> I've left them out. >>>>>> >>>>>> - 0001/2: Prep commits I need to flush out (sorry for the noise). >>>>>> - 0003: Add an explicit type map for ODR type uniquing. >>>>>> - 0004: Prep. >>>>>> - 0005: Add ODR type uniquing of members. >>>>>> > >> FYI, I hit some major problems with all of this. >> >> The short version is that I was testing -flto memory usage with the wrong >> command-line options (-gline-tables-only instead of -g), and now that >> I've done it correctly I've discovered a major memory blowup. I'm >> still investigating. >> >> I also have a question at the end about a semantic problem I uncovered. >> >> -- >> >> The attached patches were supposed to finish this off -- they've been >> mostly ready since Saturday -- but I had a big surprise when I finally >> configured correctly (apparently the Apple CMake caches default to >> line-tables only for RelWithDebInfo): 30GB peak for linking clang. >> >> Then I went back and ran a fresh bootstrap of ToT, and got 27GB. This is >> pretty terrible (we were down to around 17GB back in October). I'd been >> doing spot checks of memory usage (glancing at top) and it was >> surprisingly low; I just assumed someone had made improvements when I >> wasn't looking. Shame on me :(. >> >> (I do I have a bot setup that is supposed to catch major regressions like >> this (it doesn't exactly track memory usage directly, but it does a half- >> bootstrap of -flto=full -g). Except at some point it switched to using >> a CMake cache, and it has been using -gline-tables-only as well: >> lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build >> lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8599/consoleFull >> Scratch this, that's the wrong bot. This is the right one, and it still > seems to be doing the right thing. It just doesn't seem to be blowing up > as badly: > lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/buildTimeTrend(Not blowing up at all.)>> ) >> >> I hope the blowup is from the work I've done over the last couple of >> weeks; I'll continue investigating today and revert whatever I have to. >> Sorry for the noise and the breakage. (It's also possible this is not a >> regression from my recent work, but that would be harder to recover from; >> let's hope it was from me.) >What a waste of time. It looks like the command-line tool I use to collect memory usage doesn't give correct numbers anymore. I spent the morning trying to find a revision that *didn't* use a ton of memory. "Activity Monitor" and "top" give completely different answers than my tool. "Instruments.app" isn't scaling to linking clang with `-flto -g`, but on smaller cases (like linking verify-uselistorder) it agrees with "Activity Monitor" and "top". I guess there hasn't been a major regression after all? On the upside, I took the opportunity to collect new numbers for running `llc` on `verify-uselistorder.lto.opt.bc`, which is the series of commits starting with (and referencing) r236629. It looks like we've made improvements since I last checked (not entirely sure, since I used a different verify-uselistorder.lto.opt.bc as input). - r236629 mentions a starting point of 1160 MB. - ... - r240736 mentions 718 MB. - ... - r266588 gave me 665 MB (with the new input). Actual LTO is higher (~750 MB) since it runs the module linker and the optimizer (and ld64 uses a little bit of memory itself). (LTO for clang is still somewhere in the neighbourhood of ~16 GB.)