Peter Collingbourne via llvm-dev
2016-Feb-06  01:04 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
Thanks, I'll look into that. (Though earlier you told me that debug info for types could be extended while walking the IR, so I wouldn't have thought that would have worked.) Peter On Fri, Feb 05, 2016 at 03:52:19PM -0800, David Blaikie wrote:> Will look more closely soon - but I'd really try just writing out type > units to MC as soon as they're done. It should be relatively non-intrusive > (we build type units once, there's no ambiguity about when they're done) - > for non-fission+type units it might be a bit tricky, because the type units > still need a relocation for the stmt_list* (I'm trying to find where that's > added now... I seem to have lost it), but fission+type units should produce > entirely static type units that are knowable the moment the type is being > emitted so far as I can tell (including the type hash and everything - you > can write the bytes out to the AsmStreamer, etc and forget about them > entirely except to keep the hash to know that you don't need to emit it > again. > > I imagine this would provide all the memory savings we would need for much > of anything (since types are most of the debug info), and, if not, would be > a good start. > > *I think we might know what the stmt_list relocation is up-front, though - > if that's the case we'd be able to be as aggressive as I described is the > case for fission > > On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > > Hi all, > > > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > > we've found that one of the top consumers of memory is the DWARF emitter in > > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and > > I have a few ideas in mind for how to reduce its memory consumption. One > > idea I've had is to restructure the emitter so that (for the most part) it > > directly produces the bytes and relocations that need to go into the DWARF > > sections without going through other data structures such as DIE and > > DIEValue. > > > > I understand that the DWARF emitter needs to accommodate incomplete > > entities > > that may be completed elsewhere during tree construction (e.g. abstract > > origins > > for inlined functions, special members for types), so here's a quick > > high-level > > sketch of the data structures that I believe could support this design: > > > > struct DIEBlock { > > SmallVector<char, 1> Data; > > std::vector<InternalReloc> IntRelocs; > > std::vector<ExternalReloc> ExtRelocs; > > DIEBlock *Next; > > }; > > > > // This would be used to represent things like DW_AT_type references to > > types > > struct InternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > DIEBlock *Target; // the offset within Target is at > > Data[Offset...Offset+Size] > > }; > > > > // This would be used to represent things like pointers to > > .debug_loc/.debug_str or to functions/globals > > struct ExternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > MCSymbol *Target; // the offset within Target is at > > Data[Offset...Offset+Size] > > }; > > > > struct DwarfBuilder { > > DIEBlock *First; > > DIEBlock *Cur; > > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > > DenseMap<DIType *, DIEBlock *> Types; > > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > > // builder implementation goes here... > > }; > > > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > > needs to create a "gap" for an incomplete data structure (e.g. at the end > > of a > > subprogram or a struct type), it will create a new DIEBlock New, store it > > to > > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > > DIEBlock out of the DenseMap and start appending there. Once the IR is > > fully > > visited, the debug info writer will walk the linked list starting at First, > > calculate a byte offset for each DIEBlock, apply any internal relocations > > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > > other new interface that also supports relocations and avoids copying). > > > > Does that sound reasonable? Is there anything I haven't accounted for? > > > > Thanks, > > -- > > Peter > > > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15 > >-- Peter
David Blaikie via llvm-dev
2016-Feb-06  01:35 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
On Fri, Feb 5, 2016 at 5:04 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> Thanks, I'll look into that. (Though earlier you told me that debug info > for types could be extended while walking the IR, so I wouldn't have > thought > that would have worked.) > >Yeah, had to think about it more - and as I think about it - I'm moderately sure type units (which don't include these latent extensions) will be pretty close to static. With just the stmt_list relocation in non-fission type units which /should/ still be knowable up-front.> Peter > > On Fri, Feb 05, 2016 at 03:52:19PM -0800, David Blaikie wrote: > > Will look more closely soon - but I'd really try just writing out type > > units to MC as soon as they're done. It should be relatively > non-intrusive > > (we build type units once, there's no ambiguity about when they're done) > - > > for non-fission+type units it might be a bit tricky, because the type > units > > still need a relocation for the stmt_list* (I'm trying to find where > that's > > added now... I seem to have lost it), but fission+type units should > produce > > entirely static type units that are knowable the moment the type is being > > emitted so far as I can tell (including the type hash and everything - > you > > can write the bytes out to the AsmStreamer, etc and forget about them > > entirely except to keep the hash to know that you don't need to emit it > > again. > > > > I imagine this would provide all the memory savings we would need for > much > > of anything (since types are most of the debug info), and, if not, would > be > > a good start. > > > > *I think we might know what the stmt_list relocation is up-front, though > - > > if that's the case we'd be able to be as aggressive as I described is the > > case for fission > > > > On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > > > > > Hi all, > > > > > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, > and > > > we've found that one of the top consumers of memory is the DWARF > emitter in > > > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter > code and > > > I have a few ideas in mind for how to reduce its memory consumption. > One > > > idea I've had is to restructure the emitter so that (for the most > part) it > > > directly produces the bytes and relocations that need to go into the > DWARF > > > sections without going through other data structures such as DIE and > > > DIEValue. > > > > > > I understand that the DWARF emitter needs to accommodate incomplete > > > entities > > > that may be completed elsewhere during tree construction (e.g. abstract > > > origins > > > for inlined functions, special members for types), so here's a quick > > > high-level > > > sketch of the data structures that I believe could support this design: > > > > > > struct DIEBlock { > > > SmallVector<char, 1> Data; > > > std::vector<InternalReloc> IntRelocs; > > > std::vector<ExternalReloc> ExtRelocs; > > > DIEBlock *Next; > > > }; > > > > > > // This would be used to represent things like DW_AT_type references to > > > types > > > struct InternalReloc { > > > size_t Offset; // offset within DIEBlock::Data > > > DIEBlock *Target; // the offset within Target is at > > > Data[Offset...Offset+Size] > > > }; > > > > > > // This would be used to represent things like pointers to > > > .debug_loc/.debug_str or to functions/globals > > > struct ExternalReloc { > > > size_t Offset; // offset within DIEBlock::Data > > > MCSymbol *Target; // the offset within Target is at > > > Data[Offset...Offset+Size] > > > }; > > > > > > struct DwarfBuilder { > > > DIEBlock *First; > > > DIEBlock *Cur; > > > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > > > DenseMap<DIType *, DIEBlock *> Types; > > > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > > > // builder implementation goes here... > > > }; > > > > > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with > possibly > > > internal or external relocations to IntRelocs/ExtRelocs), but if it > ever > > > needs to create a "gap" for an incomplete data structure (e.g. at the > end > > > of a > > > subprogram or a struct type), it will create a new DIEBlock New, store > it > > > to > > > Cur->Next, store Cur in a DenseMap associated with the > subprogram/type/etc > > > and store New to Cur. To fill a gap later, the DwarfBuilder can pull > the > > > DIEBlock out of the DenseMap and start appending there. Once the IR is > > > fully > > > visited, the debug info writer will walk the linked list starting at > First, > > > calculate a byte offset for each DIEBlock, apply any internal > relocations > > > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe > some > > > other new interface that also supports relocations and avoids copying). > > > > > > Does that sound reasonable? Is there anything I haven't accounted for? > > > > > > Thanks, > > > -- > > > Peter > > > > > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15 > > > > > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/be149db1/attachment.html>
Peter Collingbourne via llvm-dev
2016-Feb-10  22:42 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
On Fri, Feb 05, 2016 at 05:35:14PM -0800, David Blaikie wrote:> On Fri, Feb 5, 2016 at 5:04 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > > Thanks, I'll look into that. (Though earlier you told me that debug info > > for types could be extended while walking the IR, so I wouldn't have > > thought > > that would have worked.) > > > > > Yeah, had to think about it more - and as I think about it - I'm moderately > sure type units (which don't include these latent extensions) will be > pretty close to static. With just the stmt_list relocation in non-fission > type units which /should/ still be knowable up-front.I've implemented a change which does this, and looked at impact on memory consumption and binary size running "llc" on Chromium's 50 largest (by bitcode size) translation units. Bottom line: *huge* savings in total memory consumption, median 17% when compared to before the change, median 7% when compared to type units disabled. (I'm not yet confident that my patch is correct (some of the section sizes are different and I'll need to double check what's going on there) but I'll send it out once I'm confident in it.) I think we can do better, though. With type units enabled, the size of .debug_info as a fraction of (.debug_info + .debug_types) is median ~40%, so I think there's another ~12% that can be saved by avoiding DIE/DIEValue retention for debug_info, bringing the total to ~30%. I expect numbers with type units disabled to be in the same ballpark (with type units enabled, we consume ~25% more space in the object file on .debug_info + .debug_types, so the proportional savings may be less, but the absolute memory consumption should be lower). This also roughly lines up with the heap profiler figures from before. My conclusion from all this: I think we should do it, and I think it would especially help in LTO mode with type units disabled: the type units feature is redundant with LTO deduplication and would therefore add unnecessary bloat to object files, which would mean increased memory usage (I measured a ~10% median increase in memory usage comparing the current type units implementation against type units disabled -- not an entirely fair comparison, but probably good enough). I have a plan in mind for doing this incrementally: we will start using the more efficient data structure at the leaves of the DIE tree, and gradually expand out to the root. You'll see what that looks like once I have my first patch ready. Thanks, -- Peter