Duncan P. N. Exon Smith
2014-Oct-17 15:47 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com> wrote: > > Dig into this first!This isn't the right forum for digging into ld64.> In the OP you are talking about essentially a pure "optimization" (in the programmer-wisdom "beware of it" sense), to "save" 2GB of peak memory. But from your analysis it's not clear that this 2GB savings actually is reflected as peak memory usage savingIt's reflected in both links.> (since the ~30GB peak might be happening elsewhere in the LTO process). It is this ~30GB peak, and not the one you originally analyzed, which your customers presumably care about.This discussion is intentionally focused on llvm-lto.> To recap, the analysis you performed seems to support neither of the following conclusions: > - Peak memory usage during LTO would be improved by this planThe analysis is based on the nodes allocated at peak memory.> - Build time for LTO would be improved by this plan (from what you have posted, you didn't measure time at all)CPU profiles blame 25-35% of the CPU of the ld64 LTO link on callback-based metadata RAUW traffic, depending on the C++ program.> Of course, this is all tangential to the discussion of e.g. a more readable/writable .ll form for debug info, or debug info compatibility. However, it seems like you jumped into this from the point of view of it being an optimization, rather than a maintainability/compatibility thing.It's both.
Alex Rosenberg
2014-Oct-17 19:02 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Oct 17, 2014, at 8:47 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> >> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com> wrote: >> >> Dig into this first! > > This isn't the right forum for digging into ld64.I would at least hope that if the issue is in ld64 itself and not LTO, that we are making sure that lld is not repeating the same choices here. FWIW, we have our own custom linker and LTO builds easily chews through memory into this peak size realm, way beyond what a non-LTO link would do. +------------------------------------------------------------+ | Alexander M. Rosenberg <mailto:alexr at leftfield.org> | | Nobody cares what I say, so no disclaimer appears here. |
Sean Silva
2014-Oct-17 22:54 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Fri, Oct 17, 2014 at 8:47 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > > On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com> wrote: > > > > Dig into this first! > > This isn't the right forum for digging into ld64. > > > In the OP you are talking about essentially a pure "optimization" (in > the programmer-wisdom "beware of it" sense), to "save" 2GB of peak memory. > But from your analysis it's not clear that this 2GB savings actually is > reflected as peak memory usage saving > > It's reflected in both links. >Then it follows that there is ~15GB of low-hanging fruit that can be trivially shaved off by just splitting the last part of LTO into an independent call into llvm-lto. Although identifying the root cause would be better; as Alex said we don't want to make the same mistake in LLD. It doesn't make sense to follow an "aggressive plan" for 2GB savings when there is 15GB of low-hanging fruit. 2GB is *at most* 12% of the total we can *ever* expect to shave off (12% = 2/(15+2) <= 2/(15 + 2 + any other saving); this 15GB is *at least* 50% of the the memory we can ever expect to shave off (50% = 15/30 >= 15/(30 - anything we can't eliminate)).> > > (since the ~30GB peak might be happening elsewhere in the LTO process). > It is this ~30GB peak, and not the one you originally analyzed, which your > customers presumably care about. > > This discussion is intentionally focused on llvm-lto. >What is the intention? Do your customers actually run llvm-lto? I and at least one of my officemates didn't even know it existed.> > > To recap, the analysis you performed seems to support neither of the > following conclusions: > > - Peak memory usage during LTO would be improved by this plan > > The analysis is based on the nodes allocated at peak memory. > > > - Build time for LTO would be improved by this plan (from what you have > posted, you didn't measure time at all) > > CPU profiles blame 25-35% of the CPU of the ld64 LTO link on > callback-based metadata RAUW traffic, depending on the C++ program. >Wow, that's a lot. How much of this do you think your plan will be able to shave off? Did you see anything else on the profile? A pie chart would be much appreciated. Sorry if I sound "grouchy", but this seems like the classic situation where the someone comes to you asking for X, but what they really want is a solution to underlying problem Y, for which the best solution, once you actually analyze Y, is Z. Here X is debug info size, Y is excessive LTO time or excessive LTO memory usage, and Z is yet to be determined. It sounds to me like you started with an a priori idea of changing debug info and then tried to justify it a posteriori. A "solution looking for a problem". And since the focus has been on debug info, the results haven't been put in context: 2GB savings can be small or big; it's negligible compared to the 15GB flying under the radar. Is there a particular reason you are so intent on changing the debug info? It may very well be that a change to debug info will be the right solution. But your analyses don't seem to be aimed at establishing what is the right solution (nor does it seem like anybody has done such analyses); your analyses seem to be aimed at generating numbers for debug info, and as a result insufficient attention has been paid to putting the numbers in proper context so that it is clear what their significance is. To summarize the discussion so far, it seems that the plan ranks on the following dimensions: compatibility - looks interesting maintainability - unclear peak memory usage - ~6% improvement (2GB of 30GB) build time - promising, maybe up to ~30% -- Sean Silva> > > Of course, this is all tangential to the discussion of e.g. a more > readable/writable .ll form for debug info, or debug info compatibility. > However, it seems like you jumped into this from the point of view of it > being an optimization, rather than a maintainability/compatibility thing. > > It's both.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141017/7843a493/attachment.html>
Bob Wilson
2014-Oct-17 23:53 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
> On Oct 17, 2014, at 12:02 PM, Alex Rosenberg <alexr at leftfield.org> wrote: > > On Oct 17, 2014, at 8:47 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > >> >>> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com> wrote: >>> >>> Dig into this first! >> >> This isn't the right forum for digging into ld64. > > I would at least hope that if the issue is in ld64 itself and not LTO, that we are making sure that lld is not repeating the same choices here. > > FWIW, we have our own custom linker and LTO builds easily chews through memory into this peak size realm, way beyond what a non-LTO link would do.Yes, absolutely. There are quite a few different aspects of improving LTO scalability. Most of them are not specific to a particular linker. This is just one of them. If you guys want to tackle other problems, feel free.
Duncan P. N. Exon Smith
2014-Oct-18 01:04 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
> On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at gmail.com> wrote: > > this seems like the classic situation where the someone comes to you asking for X, but what they really want is a solution to underlying problem Y, for which the best solution, once you actually analyze Y, is Z.On the contrary, I came into this expecting to work with Eric on parallelizing the backend, but consistently found that callback-based RAUW traffic for metadata took almost as much CPU. Since debug info IR is at the heart of the RAUW bottleneck, I looked into its memory layout (it's a hog). I started working on PR17891 because, besides improving the memory usage, the plan promised to greatly reduce the number of nodes (indirectly reducing RAUW traffic). In the context of `llvm-lto`, "stage 1" knocked memory usage down from ~5GB to ~3GB -- but didn't reduce the number of nodes. Before starting stages "2" and "3", I counted nodes and operands to find which to tackle first. Unfortunately, our need to reference local variables and line table entries directly from the IR-proper limits our ability to refactor the schema, and those are the nodes we have the most of. This work will drop debug info memory usage in `llvm-lto` further, from ~3GB down to ~1GB. It's also a big step toward improving debug info maintainability. More importantly (for me), it enables us to refactor uniquing models and reorder serialization and linking to design away debug info RAUW traffic -- assuming switching to use-lists doesn't drop it off the profile. Regarding "the bigger problem" of LTO memory usage, I expect to see more than a 2GB drop from this work due to the nature of metadata uniquing and expiration. I'm not motivated to quantify it, since even a 2GB drop -- when combined with a first-class IR and the RAUW-related speedup -- is motivation enough. There's a lot of work left to do in LTO -- once I've finished this, I plan to look for another bottleneck. Not sure if I'll tackle memory usage or performance. As Bob suggested, please feel free to join the party! Less work for me to do later.