Duncan P. N. Exon Smith
2014-Oct-18 01:04 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
> On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at gmail.com> wrote: > > this seems like the classic situation where the someone comes to you asking for X, but what they really want is a solution to underlying problem Y, for which the best solution, once you actually analyze Y, is Z.On the contrary, I came into this expecting to work with Eric on parallelizing the backend, but consistently found that callback-based RAUW traffic for metadata took almost as much CPU. Since debug info IR is at the heart of the RAUW bottleneck, I looked into its memory layout (it's a hog). I started working on PR17891 because, besides improving the memory usage, the plan promised to greatly reduce the number of nodes (indirectly reducing RAUW traffic). In the context of `llvm-lto`, "stage 1" knocked memory usage down from ~5GB to ~3GB -- but didn't reduce the number of nodes. Before starting stages "2" and "3", I counted nodes and operands to find which to tackle first. Unfortunately, our need to reference local variables and line table entries directly from the IR-proper limits our ability to refactor the schema, and those are the nodes we have the most of. This work will drop debug info memory usage in `llvm-lto` further, from ~3GB down to ~1GB. It's also a big step toward improving debug info maintainability. More importantly (for me), it enables us to refactor uniquing models and reorder serialization and linking to design away debug info RAUW traffic -- assuming switching to use-lists doesn't drop it off the profile. Regarding "the bigger problem" of LTO memory usage, I expect to see more than a 2GB drop from this work due to the nature of metadata uniquing and expiration. I'm not motivated to quantify it, since even a 2GB drop -- when combined with a first-class IR and the RAUW-related speedup -- is motivation enough. There's a lot of work left to do in LTO -- once I've finished this, I plan to look for another bottleneck. Not sure if I'll tackle memory usage or performance. As Bob suggested, please feel free to join the party! Less work for me to do later.
Sean Silva
2014-Oct-18 17:27 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Fri, Oct 17, 2014 at 6:04 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > this seems like the classic situation where the someone comes to you > asking for X, but what they really want is a solution to underlying problem > Y, for which the best solution, once you actually analyze Y, is Z. > > On the contrary, I came into this expecting to work with Eric on > parallelizing the backend, but consistently found that callback-based > RAUW traffic for metadata took almost as much CPU. >Derp. My bad. It would be nice in the future if you communicated this better in the OP. In the OP it sounds like you are doing this solely for memory, since there is no mention of CPU time or the excessive callback-based RAUW traffic.> > Since debug info IR is at the heart of the RAUW bottleneck, I looked > into its memory layout (it's a hog). I started working on PR17891 > because, besides improving the memory usage, the plan promised to > greatly reduce the number of nodes (indirectly reducing RAUW traffic). > > In the context of `llvm-lto`, "stage 1" knocked memory usage down from > ~5GB to ~3GB -- but didn't reduce the number of nodes.Please put these numbers in context. In the OP you were talking about 15.3GB peak for llvm-lto. Why is ~5GB now the peak? Also in the OP, the theoretical improvement, out of 15.3GB, was 2GB after stage 4. How are you getting 2GB improvement out of ~5GB with only stage 1?> Before starting > stages "2" and "3", I counted nodes and operands to find which to tackle > first. Unfortunately, our need to reference local variables and line > table entries directly from the IR-proper limits our ability to refactor > the schema, and those are the nodes we have the most of. > > This work will drop debug info memory usage in `llvm-lto` further, from > ~3GB down to ~1GB. It's also a big step toward improving debug info > maintainability. > > More importantly (for me), it enables us to refactor uniquing models and > reorder serialization and linking to design away debug info RAUW > traffic -- assuming switching to use-lists doesn't drop it off the > profile. > > Regarding "the bigger problem" of LTO memory usage, I expect to see more > than a 2GB drop from this work due to the nature of metadata uniquing > and expiration. I'm not motivated to quantify it, since even a 2GB drop > -- when combined with a first-class IR and the RAUW-related speedup -- > is motivation enough. > > There's a lot of work left to do in LTO -- once I've finished this, I > plan to look for another bottleneck. Not sure if I'll tackle memory > usage or performance. > > As Bob suggested, please feel free to join the party! Less work for me > to do later. >I'm planning on it. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141018/fd53e8a9/attachment.html>
Duncan P. N. Exon Smith
2014-Oct-18 21:04 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
> > On 2014 Oct 18, at 10:27, Sean Silva <chisophugis at gmail.com> wrote: > > Derp. My bad. It would be nice in the future if you communicated this better in the OP. In the OP it sounds like you are doing this solely for memory, since there is no mention of CPU time or the excessive callback-based RAUW traffic.It's clear that you found the OP misleading. I focused this RFC on what I thought the debug info maintainers would find most compelling. FTR, it was there, but I admit I assumed (too much) prior familiarity with the problem space in order to appreciate its import:>> By leveraging the use-list >> infrastructure for metadata operands -- i.e., only using value handles >> for non-metadata operands -- we'll [...] increase >> RAUW speed.[snip]>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW >> traffic during bitcode serialization. Now that metadata types are >> known, we can write debug info out in an order that makes it cheap >> to read back in. >> >> Note that using `MDUser` will make RAUW much cheaper, since we're >> using the use-list infrastructure for most of them. If RAUW isn't >> showing up in a profile, I may skip this.> On 2014 Oct 18, at 10:27, Sean Silva <chisophugis at gmail.com> wrote: > >> Since debug info IR is at the heart of the RAUW bottleneck, I looked >> into its memory layout (it's a hog). I started working on PR17891 >> because, besides improving the memory usage, the plan promised to >> greatly reduce the number of nodes (indirectly reducing RAUW traffic). >> >> In the context of `llvm-lto`, "stage 1" knocked memory usage down from >> ~5GB to ~3GB -- but didn't reduce the number of nodes. > > Please put these numbers in context. In the OP you were talking about 15.3GB peak for llvm-lto. Why is ~5GB now the peak? Also in the OP, the theoretical improvement, out of 15.3GB, was 2GB after stage 4. How are you getting 2GB improvement out of ~5GB with only stage 1?I'm talking variously about PR17891 and this proposal. I suppose that could be confusing. "Stage 1" of PR17891 -- have a look at the PR for context -- yielded 2.2GB reduction in peak memory usage in `llvm-lto`. After that change, we're at 15.3GB peak in `llvm-lto`. A conservative estimate of the allocated memory for debug info metadata, based on counting live nodes and operands (post-change), is ~3GB. Given that "stage 1" of PR17891 dropped peak memory usage by 2.2GB, I assume that the original cost was ~5GB. This proposal drops the conservative estimate by a further ~2GB to ~1GB.>> As Bob suggested, please feel free to join the party! Less work for me >> to do later. > > I'm planning on it.Great!