Duncan P. N. Exon Smith
2015-May-21 00:39 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
With just those four patches, memory usage went *up* slightly. Add in the 5th patch (which does #2 below), and we get an overall memory drop of 4%. The intermediate result of a memory increase makes sense. While the first four patches reduce the number of (and size of) `DIEValue` allocations, they increase the cost of the `SmallVector` overhead. 0005 (attached) squeezes the abbreviation data into `DIEValue` for free, next to the discriminator for the union. The 5 patches together are strictly an improvement to memory usage. It's nice to see the 4% memory drop, but this is all prep work for #3, where I expect the biggest memory usage improvements. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-WIP-Store-abbreviation-data-directly-in-DIEValue.patch Type: application/octet-stream Size: 25110 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: all-2.patch Type: application/octet-stream Size: 84163 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment-0001.obj> -------------- next part --------------> On 2015 May 20, at 15:56, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > To make this a little more concrete, I just hacked up a couple of > patches that achieve step #1. (0004 is the key patch, and probably > should be split up somehow before commit.) I'll collect some > results and report back. > > > <all.patch><0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch><0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch><0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch><0004-WIP-Change-DIEValue-to-be-stored-by-value.patch> > >> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote: >> >> Pete Cooper and I have been looking at memory profiles of running llc on >> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen >> of building verify-uselistorder with -flto -g). I've attached >> leak-backend.patch, which we're using to make Intrustruments more >> accurate (instead of effectively leaking things onto BumpPtrAllocators, >> really leak them with malloc()). (I've collected this data on top of a >> few not-yet-committed patches to cheapen `MCSymbol` and >> `EmitLabelDifference()` that chop around 8% of memory off the top, but >> otherwise these numbers should be reproducible in ToT.) >> >> The `DIE` class is huge. Directly, it accounts for about 15% of backend >> memory: >> >> Bytes Used Count Symbol Name >> 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) >> 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) >> 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) >> 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) >> >> A lot of this is the pair of `SmallVector<, 12>` it has for its values >> (look into `DIEAbbrev` for the second one). Here's a histogram of how >> many DIEs have each value count: >> >> # of Values DIEs with # with # or fewer >> 0 3128 3128 >> 1 109522 112650 >> 2 180382 293032 >> 3 90836 383868 >> 4 115552 499420 >> 5 90713 590133 >> 6 4125 594258 >> 7 17211 611469 >> 8 18144 629613 >> 9 22805 652418 >> 10 325 652743 >> 11 203 652946 >> 12 245 653191 >> >> It's crazy that we're paying for 12 up front on every DIE. (This is >> a reformatted version of num-values-with-totals.txt, which I've >> attached along with a few other histograms Pete collected.) >> >> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, >> also take up a huge amount of memory (around 4%): >> >> Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes >> 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> >> We can do better. >> >> 1. DIEValue should be a discriminated union that's passed by value >> instead of pointer. Most types just have 1 pointer of data. There >> are four "big" ones, which still need a side-allocation on the >> BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. >> Even for these, the side allocation just needs to store the data >> itself (skipping the discriminator and the vtable entry). >> 2. The contents of DIE's Abbrev field should be integrated with the >> list of DIEValues. In particular, DIEValue should contain a >> `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` >> will still be just two pointers (1st pointer: discriminator, Form, >> and Attribute; 2nd pointer: data). DIE should stop storing a >> `DIEAbbrev` itself, instead constructing one on demand, renaming >> `DIE::getAbbrev()` to >> `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. >> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a >> histogram Pete ran shows that half of DIEs have 2 or fewer values, >> and 85% have 4 or fewer values. We're paying for 12 (!) upfront >> right now for each DIE. Instead, we should optimize for 2-4 >> DIEValues. Not sure whether a std::forward_list would suffice, or if >> we should do something fancy like: >> >> struct List { >> DIEValue Values[2]; >> PointerIntPair<List *, 1> NextAndSize; >> }; >> >> Either way we should move the allocations to a BumpPtrAllocator >> (trivial if it's a list instead of vector). >> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but >> they're only ever used as the former. This is just a convenience >> for building up and emitting their DIEValues. Now that we've trimmed >> down and simplified that functionality in `DIE`, we can extract it >> out and make it reusable -- `DIELoc` should "have-a" DIEValue list, >> not "be-a" DIE. >> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which >> requires side allocations. If we use an intrusively linked list, >> it'll be easy to avoid side allocations without hitting the >> pointer-validity problem highlighted in the header file. >> 6. Now that DIE has no side allocations, we can move all the DIEs to a >> BumpPtrAllocator and remove the malloc traffic. >> >> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt> >
Duncan P. N. Exon Smith
2015-May-21 00:51 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > With just those four patches, memory usage went *up* slightly. Add in > the 5th patch (which does #2 below), and we get an overall memory drop > of 4%.Forgot to post numbers for this. Peak memory was at 920 MB before the five patches, and 884 MB after. (These exact numbers won't quite reproduce in ToT since I still haven't finished cleaning up and committing the MCSymbol and emitLabelDiff() work I hacked on top of, but the 36 MB drop should hold.)
Duncan P. N. Exon Smith
2015-May-24 18:55 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
> On 2015 May 20, at 17:51, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > >> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> With just those four patches, memory usage went *up* slightly. Add in >> the 5th patch (which does #2 below), and we get an overall memory drop >> of 4%. > > Forgot to post numbers for this. Peak memory was at 920 MB before > the five patches, and 884 MB after. (These exact numbers won't quite > reproduce in ToT since I still haven't finished cleaning up and > committing the MCSymbol and emitLabelDiff() work I hacked on top of, > but the 36 MB drop should hold.)I've cleaned all this up and committed the most obvious parts, as well as a few unrelated memory improvements. I'm attaching my (almost?) ready-to-go patches, which have the following effects on peak memory: - 0000: 845 MB (baseline) - 0001: 845 MB - refactor - 0002: 879 MB - pass DIEValue by value (momentary setback) - 0003: 829 MB - merge DIEAbbrevData into DIEValue - 0004: 829 MB - refactor - 0005: 829 MB - refactor - 0006: 829 MB - refactor - 0007: 764 MB - change DIE::Values to a linked list - 0008: 756 MB - change DIE::Children to a linked list (Still measuring memory on `llc` for `-flto -g`; details in r236629.) @Eric, you mentioned offline you'd like to have a look at this proposal before I proceed -- obviously I've been impatient ;). Let me know if I'm okay to move forward and start committing (modulo a couple of these that I'll want a review from David and Fred on). -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-AsmPrinter-Reorganize-DIE.h-NFC.patch Type: application/octet-stream Size: 9324 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-AsmPrinter-Change-DIEValue-to-be-stored-by-value.patch Type: application/octet-stream Size: 105558 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-AsmPrinter-Store-abbreviation-data-directly-in-DIE-a.patch Type: application/octet-stream Size: 23713 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-AsmPrinter-Stop-exposing-underlying-DIEValue-list-NF.patch Type: application/octet-stream Size: 13310 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0003.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-AsmPrinter-Return-added-DIE-from-DIE-addChild.patch Type: application/octet-stream Size: 1754 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0004.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0006-AsmPrinter-Stop-exposing-underlying-DIE-children-lis.patch Type: application/octet-stream Size: 4045 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0005.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0007-AsmPrinter-Convert-DIE-Values-to-a-linked-list.patch Type: application/octet-stream Size: 63526 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0006.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0008-AsmPrinter-Use-an-intrusively-linked-list-for-DIE-Ch.patch Type: application/octet-stream Size: 44101 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150524/bf7f761c/attachment-0007.obj>