Peter Collingbourne via llvm-dev
2016-Feb-05 23:17 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
Hi all, We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and we've found that one of the top consumers of memory is the DWARF emitter in lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and I have a few ideas in mind for how to reduce its memory consumption. One idea I've had is to restructure the emitter so that (for the most part) it directly produces the bytes and relocations that need to go into the DWARF sections without going through other data structures such as DIE and DIEValue. I understand that the DWARF emitter needs to accommodate incomplete entities that may be completed elsewhere during tree construction (e.g. abstract origins for inlined functions, special members for types), so here's a quick high-level sketch of the data structures that I believe could support this design: struct DIEBlock { SmallVector<char, 1> Data; std::vector<InternalReloc> IntRelocs; std::vector<ExternalReloc> ExtRelocs; DIEBlock *Next; }; // This would be used to represent things like DW_AT_type references to types struct InternalReloc { size_t Offset; // offset within DIEBlock::Data DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size] }; // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals struct ExternalReloc { size_t Offset; // offset within DIEBlock::Data MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size] }; struct DwarfBuilder { DIEBlock *First; DIEBlock *Cur; DenseMap<DISubprogram *, DIEBlock *> Subprograms; DenseMap<DIType *, DIEBlock *> Types; DwarfBuilder() : First(new DIEBlock), Cur(First) {} // builder implementation goes here... }; Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly internal or external relocations to IntRelocs/ExtRelocs), but if it ever needs to create a "gap" for an incomplete data structure (e.g. at the end of a subprogram or a struct type), it will create a new DIEBlock New, store it to Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc and store New to Cur. To fill a gap later, the DwarfBuilder can pull the DIEBlock out of the DenseMap and start appending there. Once the IR is fully visited, the debug info writer will walk the linked list starting at First, calculate a byte offset for each DIEBlock, apply any internal relocations and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some other new interface that also supports relocations and avoids copying). Does that sound reasonable? Is there anything I haven't accounted for? Thanks, -- Peter [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15
David Blaikie via llvm-dev
2016-Feb-05 23:52 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
Will look more closely soon - but I'd really try just writing out type units to MC as soon as they're done. It should be relatively non-intrusive (we build type units once, there's no ambiguity about when they're done) - for non-fission+type units it might be a bit tricky, because the type units still need a relocation for the stmt_list* (I'm trying to find where that's added now... I seem to have lost it), but fission+type units should produce entirely static type units that are knowable the moment the type is being emitted so far as I can tell (including the type hash and everything - you can write the bytes out to the AsmStreamer, etc and forget about them entirely except to keep the hash to know that you don't need to emit it again. I imagine this would provide all the memory savings we would need for much of anything (since types are most of the debug info), and, if not, would be a good start. *I think we might know what the stmt_list relocation is up-front, though - if that's the case we'd be able to be as aggressive as I described is the case for fission On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> Hi all, > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > we've found that one of the top consumers of memory is the DWARF emitter in > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and > I have a few ideas in mind for how to reduce its memory consumption. One > idea I've had is to restructure the emitter so that (for the most part) it > directly produces the bytes and relocations that need to go into the DWARF > sections without going through other data structures such as DIE and > DIEValue. > > I understand that the DWARF emitter needs to accommodate incomplete > entities > that may be completed elsewhere during tree construction (e.g. abstract > origins > for inlined functions, special members for types), so here's a quick > high-level > sketch of the data structures that I believe could support this design: > > struct DIEBlock { > SmallVector<char, 1> Data; > std::vector<InternalReloc> IntRelocs; > std::vector<ExternalReloc> ExtRelocs; > DIEBlock *Next; > }; > > // This would be used to represent things like DW_AT_type references to > types > struct InternalReloc { > size_t Offset; // offset within DIEBlock::Data > DIEBlock *Target; // the offset within Target is at > Data[Offset...Offset+Size] > }; > > // This would be used to represent things like pointers to > .debug_loc/.debug_str or to functions/globals > struct ExternalReloc { > size_t Offset; // offset within DIEBlock::Data > MCSymbol *Target; // the offset within Target is at > Data[Offset...Offset+Size] > }; > > struct DwarfBuilder { > DIEBlock *First; > DIEBlock *Cur; > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > DenseMap<DIType *, DIEBlock *> Types; > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > // builder implementation goes here... > }; > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > needs to create a "gap" for an incomplete data structure (e.g. at the end > of a > subprogram or a struct type), it will create a new DIEBlock New, store it > to > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > DIEBlock out of the DenseMap and start appending there. Once the IR is > fully > visited, the debug info writer will walk the linked list starting at First, > calculate a byte offset for each DIEBlock, apply any internal relocations > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > other new interface that also supports relocations and avoids copying). > > Does that sound reasonable? Is there anything I haven't accounted for? > > Thanks, > -- > Peter > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15 >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/4d6c85f5/attachment.html>
Duncan P. N. Exon Smith via llvm-dev
2016-Feb-06 00:28 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
> On 2016-Feb-05, at 15:17, Peter Collingbourne <peter at pcc.me.uk> wrote: > > Hi all, > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > we've found that one of the top consumers of memory is the DWARF emitter in > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and > I have a few ideas in mind for how to reduce its memory consumption. One > idea I've had is to restructure the emitter so that (for the most part) it > directly produces the bytes and relocations that need to go into the DWARF > sections without going through other data structures such as DIE and DIEValue. > > I understand that the DWARF emitter needs to accommodate incomplete entities > that may be completed elsewhere during tree construction (e.g. abstract origins > for inlined functions, special members for types), so here's a quick high-level > sketch of the data structures that I believe could support this design: > > struct DIEBlock { > SmallVector<char, 1> Data; > std::vector<InternalReloc> IntRelocs; > std::vector<ExternalReloc> ExtRelocs; > DIEBlock *Next; > }; > > // This would be used to represent things like DW_AT_type references to types > struct InternalReloc { > size_t Offset; // offset within DIEBlock::Data > DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size] > }; > > // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals > struct ExternalReloc { > size_t Offset; // offset within DIEBlock::Data > MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size] > }; > > struct DwarfBuilder { > DIEBlock *First; > DIEBlock *Cur; > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > DenseMap<DIType *, DIEBlock *> Types; > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > // builder implementation goes here... > }; > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > needs to create a "gap" for an incomplete data structure (e.g. at the end of a > subprogram or a struct type), it will create a new DIEBlock New, store it to > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > DIEBlock out of the DenseMap and start appending there. Once the IR is fully > visited, the debug info writer will walk the linked list starting at First, > calculate a byte offset for each DIEBlock, apply any internal relocations > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > other new interface that also supports relocations and avoids copying). > > Does that sound reasonable? Is there anything I haven't accounted for?Does this design work well with the way llvm-dsymutil uses DIEs and DIEValues? I'm also interested in whether this will be faster than the current one. I spent some time optimizing the teardown down of the DIE tree in the summer and it would be nice not to lose that. (Sorry, maybe it's obvious from above, but I've only had a moment to skim your proposal. I'll try to look in more detail over the weekend.)> Thanks, > -- > Peter > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15
Mehdi Amini via llvm-dev
2016-Feb-06 00:58 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
> On Feb 5, 2016, at 3:17 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > we've found that one of the top consumers of memory is the DWARF emitter in > lib/CodeGen/AsmPrinter/Dwarf*.I'm staring at the profile attached to the post #15 on the link you posted, can you confirm that the Dwarf emitter accounts for 6.7%+15.6%=22.3% of the the total allocated memory? If I understand correctly the numbers, this does not tell anything about how much the Dwarf emitter accounts on the *peak memory* usage (could be more, could be nothing...). Limiting the number of calls to the memory system is always welcome, so whatever the answer to my question is it does not remove any value to improvements you could make here :) Thanks, -- Mehdi> I've been reading the DWARF emitter code and > I have a few ideas in mind for how to reduce its memory consumption. One > idea I've had is to restructure the emitter so that (for the most part) it > directly produces the bytes and relocations that need to go into the DWARF > sections without going through other data structures such as DIE and DIEValue. > > I understand that the DWARF emitter needs to accommodate incomplete entities > that may be completed elsewhere during tree construction (e.g. abstract origins > for inlined functions, special members for types), so here's a quick high-level > sketch of the data structures that I believe could support this design: > > struct DIEBlock { > SmallVector<char, 1> Data; > std::vector<InternalReloc> IntRelocs; > std::vector<ExternalReloc> ExtRelocs; > DIEBlock *Next; > }; > > // This would be used to represent things like DW_AT_type references to types > struct InternalReloc { > size_t Offset; // offset within DIEBlock::Data > DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size] > }; > > // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals > struct ExternalReloc { > size_t Offset; // offset within DIEBlock::Data > MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size] > }; > > struct DwarfBuilder { > DIEBlock *First; > DIEBlock *Cur; > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > DenseMap<DIType *, DIEBlock *> Types; > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > // builder implementation goes here... > }; > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > needs to create a "gap" for an incomplete data structure (e.g. at the end of a > subprogram or a struct type), it will create a new DIEBlock New, store it to > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > DIEBlock out of the DenseMap and start appending there. Once the IR is fully > visited, the debug info writer will walk the linked list starting at First, > calculate a byte offset for each DIEBlock, apply any internal relocations > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > other new interface that also supports relocations and avoids copying). > > Does that sound reasonable? Is there anything I haven't accounted for? > > Thanks, > -- > Peter > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15 > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Peter Collingbourne via llvm-dev
2016-Feb-06 01:04 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
Thanks, I'll look into that. (Though earlier you told me that debug info for types could be extended while walking the IR, so I wouldn't have thought that would have worked.) Peter On Fri, Feb 05, 2016 at 03:52:19PM -0800, David Blaikie wrote:> Will look more closely soon - but I'd really try just writing out type > units to MC as soon as they're done. It should be relatively non-intrusive > (we build type units once, there's no ambiguity about when they're done) - > for non-fission+type units it might be a bit tricky, because the type units > still need a relocation for the stmt_list* (I'm trying to find where that's > added now... I seem to have lost it), but fission+type units should produce > entirely static type units that are knowable the moment the type is being > emitted so far as I can tell (including the type hash and everything - you > can write the bytes out to the AsmStreamer, etc and forget about them > entirely except to keep the hash to know that you don't need to emit it > again. > > I imagine this would provide all the memory savings we would need for much > of anything (since types are most of the debug info), and, if not, would be > a good start. > > *I think we might know what the stmt_list relocation is up-front, though - > if that's the case we'd be able to be as aggressive as I described is the > case for fission > > On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > > Hi all, > > > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > > we've found that one of the top consumers of memory is the DWARF emitter in > > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and > > I have a few ideas in mind for how to reduce its memory consumption. One > > idea I've had is to restructure the emitter so that (for the most part) it > > directly produces the bytes and relocations that need to go into the DWARF > > sections without going through other data structures such as DIE and > > DIEValue. > > > > I understand that the DWARF emitter needs to accommodate incomplete > > entities > > that may be completed elsewhere during tree construction (e.g. abstract > > origins > > for inlined functions, special members for types), so here's a quick > > high-level > > sketch of the data structures that I believe could support this design: > > > > struct DIEBlock { > > SmallVector<char, 1> Data; > > std::vector<InternalReloc> IntRelocs; > > std::vector<ExternalReloc> ExtRelocs; > > DIEBlock *Next; > > }; > > > > // This would be used to represent things like DW_AT_type references to > > types > > struct InternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > DIEBlock *Target; // the offset within Target is at > > Data[Offset...Offset+Size] > > }; > > > > // This would be used to represent things like pointers to > > .debug_loc/.debug_str or to functions/globals > > struct ExternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > MCSymbol *Target; // the offset within Target is at > > Data[Offset...Offset+Size] > > }; > > > > struct DwarfBuilder { > > DIEBlock *First; > > DIEBlock *Cur; > > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > > DenseMap<DIType *, DIEBlock *> Types; > > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > > // builder implementation goes here... > > }; > > > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > > needs to create a "gap" for an incomplete data structure (e.g. at the end > > of a > > subprogram or a struct type), it will create a new DIEBlock New, store it > > to > > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > > DIEBlock out of the DenseMap and start appending there. Once the IR is > > fully > > visited, the debug info writer will walk the linked list starting at First, > > calculate a byte offset for each DIEBlock, apply any internal relocations > > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > > other new interface that also supports relocations and avoids copying). > > > > Does that sound reasonable? Is there anything I haven't accounted for? > > > > Thanks, > > -- > > Peter > > > > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15 > >-- Peter
Peter Collingbourne via llvm-dev
2016-Feb-06 01:25 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
On Fri, Feb 05, 2016 at 04:28:53PM -0800, Duncan P. N. Exon Smith wrote:> > > On 2016-Feb-05, at 15:17, Peter Collingbourne <peter at pcc.me.uk> wrote: > > > > Hi all, > > > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > > we've found that one of the top consumers of memory is the DWARF emitter in > > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and > > I have a few ideas in mind for how to reduce its memory consumption. One > > idea I've had is to restructure the emitter so that (for the most part) it > > directly produces the bytes and relocations that need to go into the DWARF > > sections without going through other data structures such as DIE and DIEValue. > > > > I understand that the DWARF emitter needs to accommodate incomplete entities > > that may be completed elsewhere during tree construction (e.g. abstract origins > > for inlined functions, special members for types), so here's a quick high-level > > sketch of the data structures that I believe could support this design: > > > > struct DIEBlock { > > SmallVector<char, 1> Data; > > std::vector<InternalReloc> IntRelocs; > > std::vector<ExternalReloc> ExtRelocs; > > DIEBlock *Next; > > }; > > > > // This would be used to represent things like DW_AT_type references to types > > struct InternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size] > > }; > > > > // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals > > struct ExternalReloc { > > size_t Offset; // offset within DIEBlock::Data > > MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size] > > }; > > > > struct DwarfBuilder { > > DIEBlock *First; > > DIEBlock *Cur; > > DenseMap<DISubprogram *, DIEBlock *> Subprograms; > > DenseMap<DIType *, DIEBlock *> Types; > > DwarfBuilder() : First(new DIEBlock), Cur(First) {} > > // builder implementation goes here... > > }; > > > > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly > > internal or external relocations to IntRelocs/ExtRelocs), but if it ever > > needs to create a "gap" for an incomplete data structure (e.g. at the end of a > > subprogram or a struct type), it will create a new DIEBlock New, store it to > > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc > > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the > > DIEBlock out of the DenseMap and start appending there. Once the IR is fully > > visited, the debug info writer will walk the linked list starting at First, > > calculate a byte offset for each DIEBlock, apply any internal relocations > > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some > > other new interface that also supports relocations and avoids copying). > > > > Does that sound reasonable? Is there anything I haven't accounted for? > > Does this design work well with the way llvm-dsymutil uses DIEs and DIEValues?I haven't looked too closely at what llvm-dsymutil does, so I can't say for sure. If it only uses DIE/DIEValue to produce DIEs, then it most likely should work.> I'm also interested in whether this will be faster than the current one. I spent some time optimizing the teardown down of the DIE tree in the summer and it would be nice not to lose that. (Sorry, maybe it's obvious from above, but I've only had a moment to skim your proposal. I'll try to look in more detail over the weekend.)I think it should be possible to tweak the design to use a bump pointer allocator like we do now for DIE/DIEValue instead of allocating vectors on the heap, but I haven't fully thought it through. Thanks, -- Peter
Peter Collingbourne via llvm-dev
2016-Feb-06 01:40 UTC
[llvm-dev] Reducing DWARF emitter memory consumption
On Fri, Feb 05, 2016 at 04:58:45PM -0800, Mehdi Amini wrote:> > > On Feb 5, 2016, at 3:17 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi all, > > > > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and > > we've found that one of the top consumers of memory is the DWARF emitter in > > lib/CodeGen/AsmPrinter/Dwarf*. > > I'm staring at the profile attached to the post #15 on the link you posted, can you confirm that the Dwarf emitter accounts for 6.7%+15.6%=22.3% of the the total allocated memory? > If I understand correctly the numbers, this does not tell anything about how much the Dwarf emitter accounts on the *peak memory* usage (could be more, could be nothing...).I think these nodes represent allocations from the DWARF emitter: DwarfDebug::DwarfDebug 9.5% DwarfDebug::endFunction 15.6% DIEValueList::addValue 9.1% total 34.2% I believe they are totals, but my reading of the code is that the DWARF emitter does not deallocate its memory until the end of code generation, so total ~= peak in this case. I am not surprised by these figures -- see e.g. DIEValueList::Node which in the worst case can use up to 24 bytes on a 1-byte DWARF attribute record. Ivan was the person who collected the numbers, he may be able to comment more. Thanks, -- Peter