Duncan P. N. Exon Smith
2015-May-20 18:28 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
Pete Cooper and I have been looking at memory profiles of running llc on verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen of building verify-uselistorder with -flto -g). I've attached leak-backend.patch, which we're using to make Intrustruments more accurate (instead of effectively leaking things onto BumpPtrAllocators, really leak them with malloc()). (I've collected this data on top of a few not-yet-committed patches to cheapen `MCSymbol` and `EmitLabelDifference()` that chop around 8% of memory off the top, but otherwise these numbers should be reproducible in ToT.) The `DIE` class is huge. Directly, it accounts for about 15% of backend memory: Bytes Used Count Symbol Name 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) A lot of this is the pair of `SmallVector<, 12>` it has for its values (look into `DIEAbbrev` for the second one). Here's a histogram of how many DIEs have each value count: # of Values DIEs with # with # or fewer 0 3128 3128 1 109522 112650 2 180382 293032 3 90836 383868 4 115552 499420 5 90713 590133 6 4125 594258 7 17211 611469 8 18144 629613 9 22805 652418 10 325 652743 11 203 652946 12 245 653191 It's crazy that we're paying for 12 up front on every DIE. (This is a reformatted version of num-values-with-totals.txt, which I've attached along with a few other histograms Pete collected.) The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, also take up a huge amount of memory (around 4%): Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 We can do better. 1. DIEValue should be a discriminated union that's passed by value instead of pointer. Most types just have 1 pointer of data. There are four "big" ones, which still need a side-allocation on the BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. Even for these, the side allocation just needs to store the data itself (skipping the discriminator and the vtable entry). 2. The contents of DIE's Abbrev field should be integrated with the list of DIEValues. In particular, DIEValue should contain a `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` will still be just two pointers (1st pointer: discriminator, Form, and Attribute; 2nd pointer: data). DIE should stop storing a `DIEAbbrev` itself, instead constructing one on demand, renaming `DIE::getAbbrev()` to `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a histogram Pete ran shows that half of DIEs have 2 or fewer values, and 85% have 4 or fewer values. We're paying for 12 (!) upfront right now for each DIE. Instead, we should optimize for 2-4 DIEValues. Not sure whether a std::forward_list would suffice, or if we should do something fancy like: struct List { DIEValue Values[2]; PointerIntPair<List *, 1> NextAndSize; }; Either way we should move the allocations to a BumpPtrAllocator (trivial if it's a list instead of vector). 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but they're only ever used as the former. This is just a convenience for building up and emitting their DIEValues. Now that we've trimmed down and simplified that functionality in `DIE`, we can extract it out and make it reusable -- `DIELoc` should "have-a" DIEValue list, not "be-a" DIE. 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which requires side allocations. If we use an intrusively linked list, it'll be easy to avoid side allocations without hitting the pointer-validity problem highlighted in the header file. 6. Now that DIE has no side allocations, we can move all the DIEs to a BumpPtrAllocator and remove the malloc traffic. -------------- next part -------------- A non-text attachment was scrubbed... Name: leak-backend.patch Type: application/octet-stream Size: 5240 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/687bdfa6/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: num-children-by-tag.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/687bdfa6/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: num-values-by-tag.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/687bdfa6/attachment-0001.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: num-values-with-totals.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/687bdfa6/attachment-0002.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: num-values.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/687bdfa6/attachment-0003.txt>
Frédéric Riss
2015-May-20 18:47 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
This is awesome.> On May 20, 2015, at 11:28 AM, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote: > > Pete Cooper and I have been looking at memory profiles of running llc on > verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen > of building verify-uselistorder with -flto -g). I've attached > leak-backend.patch, which we're using to make Intrustruments more > accurate (instead of effectively leaking things onto BumpPtrAllocators, > really leak them with malloc()). (I've collected this data on top of a > few not-yet-committed patches to cheapen `MCSymbol` and > `EmitLabelDifference()` that chop around 8% of memory off the top, but > otherwise these numbers should be reproducible in ToT.) > > The `DIE` class is huge. Directly, it accounts for about 15% of backend > memory: > > Bytes Used Count Symbol Name > 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) > 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) > 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) > 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) > > A lot of this is the pair of `SmallVector<, 12>` it has for its values > (look into `DIEAbbrev` for the second one). Here's a histogram of how > many DIEs have each value count: > > # of Values DIEs with # with # or fewer > 0 3128 3128 > 1 109522 112650 > 2 180382 293032 > 3 90836 383868 > 4 115552 499420 > 5 90713 590133 > 6 4125 594258 > 7 17211 611469 > 8 18144 629613 > 9 22805 652418 > 10 325 652743 > 11 203 652946 > 12 245 653191 > > It's crazy that we're paying for 12 up front on every DIE. (This is > a reformatted version of num-values-with-totals.txt, which I've > attached along with a few other histograms Pete collected.) > > The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, > also take up a huge amount of memory (around 4%): > > Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes > 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > > We can do better. > > 1. DIEValue should be a discriminated union that's passed by value > instead of pointer. Most types just have 1 pointer of data. There > are four "big" ones, which still need a side-allocation on the > BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. > Even for these, the side allocation just needs to store the data > itself (skipping the discriminator and the vtable entry). > 2. The contents of DIE's Abbrev field should be integrated with the > list of DIEValues. In particular, DIEValue should contain a > `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` > will still be just two pointers (1st pointer: discriminator, Form, > and Attribute; 2nd pointer: data). DIE should stop storing a > `DIEAbbrev` itself, instead constructing one on demand, renaming > `DIE::getAbbrev()` to > `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. > 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a > histogram Pete ran shows that half of DIEs have 2 or fewer values, > and 85% have 4 or fewer values. We're paying for 12 (!) upfront > right now for each DIE. Instead, we should optimize for 2-4 > DIEValues. Not sure whether a std::forward_list would suffice, or if > we should do something fancy like:DIEValues are already allocated from a BumpPtrAllocator, thus using a forward_list wouldn’t be practical. You’d need to use a ilist also or your fancy alternative.> struct List { > DIEValue Values[2]; > PointerIntPair<List *, 1> NextAndSize; > }; > > Either way we should move the allocations to a BumpPtrAllocator > (trivial if it's a list instead of vector). > 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but > they're only ever used as the former. This is just a convenience > for building up and emitting their DIEValues. Now that we've trimmed > down and simplified that functionality in `DIE`, we can extract it > out and make it reusable -- `DIELoc` should "have-a" DIEValue list, > not "be-a" DIE.Much needed cleanup!> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which > requires side allocations. If we use an intrusively linked list, > it'll be easy to avoid side allocations without hitting the > pointer-validity problem highlighted in the header file. > 6. Now that DIE has no side allocations, we can move all the DIEs to a > BumpPtrAllocator and remove the malloc traffic.Thanks! Fred> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>
Duncan P. N. Exon Smith
2015-May-20 22:52 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
> On 2015 May 20, at 11:47, Frédéric Riss <friss at apple.com> wrote: > > This is awesome. > >> On May 20, 2015, at 11:28 AM, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote: >> >> Pete Cooper and I have been looking at memory profiles of running llc on >> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen >> of building verify-uselistorder with -flto -g). I've attached >> leak-backend.patch, which we're using to make Intrustruments more >> accurate (instead of effectively leaking things onto BumpPtrAllocators, >> really leak them with malloc()). (I've collected this data on top of a >> few not-yet-committed patches to cheapen `MCSymbol` and >> `EmitLabelDifference()` that chop around 8% of memory off the top, but >> otherwise these numbers should be reproducible in ToT.) >> >> The `DIE` class is huge. Directly, it accounts for about 15% of backend >> memory: >> >> Bytes Used Count Symbol Name >> 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) >> 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) >> 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) >> 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) >> >> A lot of this is the pair of `SmallVector<, 12>` it has for its values >> (look into `DIEAbbrev` for the second one). Here's a histogram of how >> many DIEs have each value count: >> >> # of Values DIEs with # with # or fewer >> 0 3128 3128 >> 1 109522 112650 >> 2 180382 293032 >> 3 90836 383868 >> 4 115552 499420 >> 5 90713 590133 >> 6 4125 594258 >> 7 17211 611469 >> 8 18144 629613 >> 9 22805 652418 >> 10 325 652743 >> 11 203 652946 >> 12 245 653191 >> >> It's crazy that we're paying for 12 up front on every DIE. (This is >> a reformatted version of num-values-with-totals.txt, which I've >> attached along with a few other histograms Pete collected.) >> >> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, >> also take up a huge amount of memory (around 4%): >> >> Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes >> 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> >> We can do better. >> >> 1. DIEValue should be a discriminated union that's passed by value >> instead of pointer. Most types just have 1 pointer of data. There >> are four "big" ones, which still need a side-allocation on the >> BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. >> Even for these, the side allocation just needs to store the data >> itself (skipping the discriminator and the vtable entry). >> 2. The contents of DIE's Abbrev field should be integrated with the >> list of DIEValues. In particular, DIEValue should contain a >> `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` >> will still be just two pointers (1st pointer: discriminator, Form, >> and Attribute; 2nd pointer: data). DIE should stop storing a >> `DIEAbbrev` itself, instead constructing one on demand, renaming >> `DIE::getAbbrev()` to >> `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. >> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a >> histogram Pete ran shows that half of DIEs have 2 or fewer values, >> and 85% have 4 or fewer values. We're paying for 12 (!) upfront >> right now for each DIE. Instead, we should optimize for 2-4 >> DIEValues. Not sure whether a std::forward_list would suffice, or if >> we should do something fancy like: > > DIEValues are already allocated from a BumpPtrAllocator, thus using a > forward_list wouldn’t be practical. You’d need to use a ilist also or your > fancy alternative.Well, we could pass a BumpPtrAllocator to forward_list somehow, but maybe an ilist would be better. I'll look more closely once I get there :).> >> struct List { >> DIEValue Values[2]; >> PointerIntPair<List *, 1> NextAndSize; >> }; >> >> Either way we should move the allocations to a BumpPtrAllocator >> (trivial if it's a list instead of vector). >> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but >> they're only ever used as the former. This is just a convenience >> for building up and emitting their DIEValues. Now that we've trimmed >> down and simplified that functionality in `DIE`, we can extract it >> out and make it reusable -- `DIELoc` should "have-a" DIEValue list, >> not "be-a" DIE. > > Much needed cleanup! > >> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which >> requires side allocations. If we use an intrusively linked list, >> it'll be easy to avoid side allocations without hitting the >> pointer-validity problem highlighted in the header file. >> 6. Now that DIE has no side allocations, we can move all the DIEs to a >> BumpPtrAllocator and remove the malloc traffic. > > Thanks! > Fred > >> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Duncan P. N. Exon Smith
2015-May-20 22:56 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
To make this a little more concrete, I just hacked up a couple of patches that achieve step #1. (0004 is the key patch, and probably should be split up somehow before commit.) I'll collect some results and report back. -------------- next part -------------- A non-text attachment was scrubbed... Name: all.patch Type: application/octet-stream Size: 73699 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/d59e8a6f/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch Type: application/octet-stream Size: 1259 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/d59e8a6f/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch Type: application/octet-stream Size: 23708 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/d59e8a6f/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch Type: application/octet-stream Size: 721 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/d59e8a6f/attachment-0003.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-WIP-Change-DIEValue-to-be-stored-by-value.patch Type: application/octet-stream Size: 81233 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/d59e8a6f/attachment-0004.obj> -------------- next part --------------> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote: > > Pete Cooper and I have been looking at memory profiles of running llc on > verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen > of building verify-uselistorder with -flto -g). I've attached > leak-backend.patch, which we're using to make Intrustruments more > accurate (instead of effectively leaking things onto BumpPtrAllocators, > really leak them with malloc()). (I've collected this data on top of a > few not-yet-committed patches to cheapen `MCSymbol` and > `EmitLabelDifference()` that chop around 8% of memory off the top, but > otherwise these numbers should be reproducible in ToT.) > > The `DIE` class is huge. Directly, it accounts for about 15% of backend > memory: > > Bytes Used Count Symbol Name > 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) > 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) > 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) > 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) > > A lot of this is the pair of `SmallVector<, 12>` it has for its values > (look into `DIEAbbrev` for the second one). Here's a histogram of how > many DIEs have each value count: > > # of Values DIEs with # with # or fewer > 0 3128 3128 > 1 109522 112650 > 2 180382 293032 > 3 90836 383868 > 4 115552 499420 > 5 90713 590133 > 6 4125 594258 > 7 17211 611469 > 8 18144 629613 > 9 22805 652418 > 10 325 652743 > 11 203 652946 > 12 245 653191 > > It's crazy that we're paying for 12 up front on every DIE. (This is > a reformatted version of num-values-with-totals.txt, which I've > attached along with a few other histograms Pete collected.) > > The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, > also take up a huge amount of memory (around 4%): > > Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes > 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > > We can do better. > > 1. DIEValue should be a discriminated union that's passed by value > instead of pointer. Most types just have 1 pointer of data. There > are four "big" ones, which still need a side-allocation on the > BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. > Even for these, the side allocation just needs to store the data > itself (skipping the discriminator and the vtable entry). > 2. The contents of DIE's Abbrev field should be integrated with the > list of DIEValues. In particular, DIEValue should contain a > `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` > will still be just two pointers (1st pointer: discriminator, Form, > and Attribute; 2nd pointer: data). DIE should stop storing a > `DIEAbbrev` itself, instead constructing one on demand, renaming > `DIE::getAbbrev()` to > `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. > 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a > histogram Pete ran shows that half of DIEs have 2 or fewer values, > and 85% have 4 or fewer values. We're paying for 12 (!) upfront > right now for each DIE. Instead, we should optimize for 2-4 > DIEValues. Not sure whether a std::forward_list would suffice, or if > we should do something fancy like: > > struct List { > DIEValue Values[2]; > PointerIntPair<List *, 1> NextAndSize; > }; > > Either way we should move the allocations to a BumpPtrAllocator > (trivial if it's a list instead of vector). > 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but > they're only ever used as the former. This is just a convenience > for building up and emitting their DIEValues. Now that we've trimmed > down and simplified that functionality in `DIE`, we can extract it > out and make it reusable -- `DIELoc` should "have-a" DIEValue list, > not "be-a" DIE. > 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which > requires side allocations. If we use an intrusively linked list, > it'll be easy to avoid side allocations without hitting the > pointer-validity problem highlighted in the header file. > 6. Now that DIE has no side allocations, we can move all the DIEs to a > BumpPtrAllocator and remove the malloc traffic. > > <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>
Duncan P. N. Exon Smith
2015-May-21 00:39 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
With just those four patches, memory usage went *up* slightly. Add in the 5th patch (which does #2 below), and we get an overall memory drop of 4%. The intermediate result of a memory increase makes sense. While the first four patches reduce the number of (and size of) `DIEValue` allocations, they increase the cost of the `SmallVector` overhead. 0005 (attached) squeezes the abbreviation data into `DIEValue` for free, next to the discriminator for the union. The 5 patches together are strictly an improvement to memory usage. It's nice to see the 4% memory drop, but this is all prep work for #3, where I expect the biggest memory usage improvements. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-WIP-Store-abbreviation-data-directly-in-DIEValue.patch Type: application/octet-stream Size: 25110 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: all-2.patch Type: application/octet-stream Size: 84163 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment-0001.obj> -------------- next part --------------> On 2015 May 20, at 15:56, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > To make this a little more concrete, I just hacked up a couple of > patches that achieve step #1. (0004 is the key patch, and probably > should be split up somehow before commit.) I'll collect some > results and report back. > > > <all.patch><0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch><0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch><0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch><0004-WIP-Change-DIEValue-to-be-stored-by-value.patch> > >> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote: >> >> Pete Cooper and I have been looking at memory profiles of running llc on >> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen >> of building verify-uselistorder with -flto -g). I've attached >> leak-backend.patch, which we're using to make Intrustruments more >> accurate (instead of effectively leaking things onto BumpPtrAllocators, >> really leak them with malloc()). (I've collected this data on top of a >> few not-yet-committed patches to cheapen `MCSymbol` and >> `EmitLabelDifference()` that chop around 8% of memory off the top, but >> otherwise these numbers should be reproducible in ToT.) >> >> The `DIE` class is huge. Directly, it accounts for about 15% of backend >> memory: >> >> Bytes Used Count Symbol Name >> 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*) >> 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool) >> 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) >> 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*) >> >> A lot of this is the pair of `SmallVector<, 12>` it has for its values >> (look into `DIEAbbrev` for the second one). Here's a histogram of how >> many DIEs have each value count: >> >> # of Values DIEs with # with # or fewer >> 0 3128 3128 >> 1 109522 112650 >> 2 180382 293032 >> 3 90836 383868 >> 4 115552 499420 >> 5 90713 590133 >> 6 4125 594258 >> 7 17211 611469 >> 8 18144 629613 >> 9 22805 652418 >> 10 325 652743 >> 11 203 652946 >> 12 245 653191 >> >> It's crazy that we're paying for 12 up front on every DIE. (This is >> a reformatted version of num-values-with-totals.txt, which I've >> attached along with a few other histograms Pete collected.) >> >> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, >> also take up a huge amount of memory (around 4%): >> >> Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes >> 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 >> >> We can do better. >> >> 1. DIEValue should be a discriminated union that's passed by value >> instead of pointer. Most types just have 1 pointer of data. There >> are four "big" ones, which still need a side-allocation on the >> BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. >> Even for these, the side allocation just needs to store the data >> itself (skipping the discriminator and the vtable entry). >> 2. The contents of DIE's Abbrev field should be integrated with the >> list of DIEValues. In particular, DIEValue should contain a >> `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` >> will still be just two pointers (1st pointer: discriminator, Form, >> and Attribute; 2nd pointer: data). DIE should stop storing a >> `DIEAbbrev` itself, instead constructing one on demand, renaming >> `DIE::getAbbrev()` to >> `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. >> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a >> histogram Pete ran shows that half of DIEs have 2 or fewer values, >> and 85% have 4 or fewer values. We're paying for 12 (!) upfront >> right now for each DIE. Instead, we should optimize for 2-4 >> DIEValues. Not sure whether a std::forward_list would suffice, or if >> we should do something fancy like: >> >> struct List { >> DIEValue Values[2]; >> PointerIntPair<List *, 1> NextAndSize; >> }; >> >> Either way we should move the allocations to a BumpPtrAllocator >> (trivial if it's a list instead of vector). >> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but >> they're only ever used as the former. This is just a convenience >> for building up and emitting their DIEValues. Now that we've trimmed >> down and simplified that functionality in `DIE`, we can extract it >> out and make it reusable -- `DIELoc` should "have-a" DIEValue list, >> not "be-a" DIE. >> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which >> requires side allocations. If we use an intrusively linked list, >> it'll be easy to avoid side allocations without hitting the >> pointer-validity problem highlighted in the header file. >> 6. Now that DIE has no side allocations, we can move all the DIEs to a >> BumpPtrAllocator and remove the malloc traffic. >> >> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt> >
Sean Silva
2015-May-21 01:10 UTC
[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
Just wanted to say awesome data! -- Sean Silva On Wed, May 20, 2015 at 11:28 AM, Duncan P. N. Exon Smith < duncan at exonsmith.com> wrote:> Pete Cooper and I have been looking at memory profiles of running llc on > verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen > of building verify-uselistorder with -flto -g). I've attached > leak-backend.patch, which we're using to make Intrustruments more > accurate (instead of effectively leaking things onto BumpPtrAllocators, > really leak them with malloc()). (I've collected this data on top of a > few not-yet-committed patches to cheapen `MCSymbol` and > `EmitLabelDifference()` that chop around 8% of memory off the top, but > otherwise these numbers should be reproducible in ToT.) > > The `DIE` class is huge. Directly, it accounts for about 15% of backend > memory: > > Bytes Used Count Symbol Name > 77.87 MB 8.4% 318960 > llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode > const*) > 46.34 MB 5.0% 189810 > llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, > bool) > 25.57 MB 2.7% 104752 > llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) > 8.19 MB 0.8% 33547 > llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity > const*) > > A lot of this is the pair of `SmallVector<, 12>` it has for its values > (look into `DIEAbbrev` for the second one). Here's a histogram of how > many DIEs have each value count: > > # of Values DIEs with # with # or fewer > 0 3128 3128 > 1 109522 112650 > 2 180382 293032 > 3 90836 383868 > 4 115552 499420 > 5 90713 590133 > 6 4125 594258 > 7 17211 611469 > 8 18144 629613 > 9 22805 652418 > 10 325 652743 > 11 203 652946 > 12 245 653191 > > It's crazy that we're paying for 12 up front on every DIE. (This is > a reformatted version of num-values-with-totals.txt, which I've > attached along with a few other histograms Pete collected.) > > The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator, > also take up a huge amount of memory (around 4%): > > Graph Category Persistent Bytes # Persistent # > Transient Total Bytes # Total Transient/Total Bytes > 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB > 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB > 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB > 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB > 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 > <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 > <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 > <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB > 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 > <XRRatioObject: 0x608025658ea0> %0.00, %0.00 > > We can do better. > > 1. DIEValue should be a discriminated union that's passed by value > instead of pointer. Most types just have 1 pointer of data. There > are four "big" ones, which still need a side-allocation on the > BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta. > Even for these, the side allocation just needs to store the data > itself (skipping the discriminator and the vtable entry). > 2. The contents of DIE's Abbrev field should be integrated with the > list of DIEValues. In particular, DIEValue should contain a > `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)` > will still be just two pointers (1st pointer: discriminator, Form, > and Attribute; 2nd pointer: data). DIE should stop storing a > `DIEAbbrev` itself, instead constructing one on demand, renaming > `DIE::getAbbrev()` to > `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such. > 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a > histogram Pete ran shows that half of DIEs have 2 or fewer values, > and 85% have 4 or fewer values. We're paying for 12 (!) upfront > right now for each DIE. Instead, we should optimize for 2-4 > DIEValues. Not sure whether a std::forward_list would suffice, or if > we should do something fancy like: > > struct List { > DIEValue Values[2]; > PointerIntPair<List *, 1> NextAndSize; > }; > > Either way we should move the allocations to a BumpPtrAllocator > (trivial if it's a list instead of vector). > 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but > they're only ever used as the former. This is just a convenience > for building up and emitting their DIEValues. Now that we've trimmed > down and simplified that functionality in `DIE`, we can extract it > out and make it reusable -- `DIELoc` should "have-a" DIEValue list, > not "be-a" DIE. > 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which > requires side allocations. If we use an intrusively linked list, > it'll be easy to avoid side allocations without hitting the > pointer-validity problem highlighted in the header file. > 6. Now that DIE has no side allocations, we can move all the DIEs to a > BumpPtrAllocator and remove the malloc traffic. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/4afaf612/attachment.html>
Possibly Parallel Threads
- [LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
- Reducing DWARF emitter memory consumption
- [LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
- Reducing DWARF emitter memory consumption
- [LLVMdev] [patch] Dwarf Debug info support for COFF object files