thr3ads.net - llvm dev - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Oct-18 01:04 UTC

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

> On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> this seems like the classic situation where the someone comes to you asking
for X, but what they really want is a solution to underlying problem Y, for
which the best solution, once you actually analyze Y, is Z.
On the contrary, I came into this expecting to work with Eric on
parallelizing the backend, but consistently found that callback-based
RAUW traffic for metadata took almost as much CPU.

Since debug info IR is at the heart of the RAUW bottleneck, I looked
into its memory layout (it's a hog).  I started working on PR17891
because, besides improving the memory usage, the plan promised to
greatly reduce the number of nodes (indirectly reducing RAUW traffic).

In the context of `llvm-lto`, "stage 1" knocked memory usage down from
~5GB to ~3GB -- but didn't reduce the number of nodes.  Before starting
stages "2" and "3", I counted nodes and operands to find
which to tackle
first.  Unfortunately, our need to reference local variables and line
table entries directly from the IR-proper limits our ability to refactor
the schema, and those are the nodes we have the most of.

This work will drop debug info memory usage in `llvm-lto` further, from
~3GB down to ~1GB.  It's also a big step toward improving debug info
maintainability.

More importantly (for me), it enables us to refactor uniquing models and
reorder serialization and linking to design away debug info RAUW
traffic -- assuming switching to use-lists doesn't drop it off the
profile.

Regarding "the bigger problem" of LTO memory usage, I expect to see
more
than a 2GB drop from this work due to the nature of metadata uniquing
and expiration.  I'm not motivated to quantify it, since even a 2GB drop
-- when combined with a first-class IR and the RAUW-related speedup --
is motivation enough.

There's a lot of work left to do in LTO -- once I've finished this, I
plan to look for another bottleneck.  Not sure if I'll tackle memory
usage or performance.

As Bob suggested, please feel free to join the party!  Less work for me
to do later.

Sean Silva

2014-Oct-18 17:27 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Fri, Oct 17, 2014 at 6:04 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
> > On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> > this seems like the classic situation where the someone comes to you
> asking for X, but what they really want is a solution to underlying problem
> Y, for which the best solution, once you actually analyze Y, is Z.
>
> On the contrary, I came into this expecting to work with Eric on
> parallelizing the backend, but consistently found that callback-based
> RAUW traffic for metadata took almost as much CPU.
>
Derp. My bad. It would be nice in the future if you communicated this
better in the OP. In the OP it sounds like you are doing this solely for
memory, since there is no mention of CPU time or the excessive
callback-based RAUW traffic.

>
> Since debug info IR is at the heart of the RAUW bottleneck, I looked
> into its memory layout (it's a hog).  I started working on PR17891
> because, besides improving the memory usage, the plan promised to
> greatly reduce the number of nodes (indirectly reducing RAUW traffic).
>
> In the context of `llvm-lto`, "stage 1" knocked memory usage down
from
> ~5GB to ~3GB -- but didn't reduce the number of nodes.

Please put these numbers in context. In the OP you were talking about
15.3GB peak for llvm-lto. Why is ~5GB now the peak? Also in the OP, the
theoretical improvement, out of 15.3GB, was 2GB after stage 4. How are you
getting 2GB improvement out of ~5GB with only stage 1?

>   Before starting
> stages "2" and "3", I counted nodes and operands to
find which to tackle
> first.  Unfortunately, our need to reference local variables and line
> table entries directly from the IR-proper limits our ability to refactor
> the schema, and those are the nodes we have the most of.
>
> This work will drop debug info memory usage in `llvm-lto` further, from
> ~3GB down to ~1GB.  It's also a big step toward improving debug info
> maintainability.
>
> More importantly (for me), it enables us to refactor uniquing models and
> reorder serialization and linking to design away debug info RAUW
> traffic -- assuming switching to use-lists doesn't drop it off the
> profile.
>
> Regarding "the bigger problem" of LTO memory usage, I expect to
see more
> than a 2GB drop from this work due to the nature of metadata uniquing
> and expiration.  I'm not motivated to quantify it, since even a 2GB
drop
> -- when combined with a first-class IR and the RAUW-related speedup --
> is motivation enough.
>
> There's a lot of work left to do in LTO -- once I've finished this,
I
> plan to look for another bottleneck.  Not sure if I'll tackle memory
> usage or performance.
>
> As Bob suggested, please feel free to join the party!  Less work for me
> to do later.
>
I'm planning on it.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141018/fd53e8a9/attachment.html>

Duncan P. N. Exon Smith

2014-Oct-18 21:04 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

> 
> On 2014 Oct 18, at 10:27, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> Derp. My bad. It would be nice in the future if you communicated this
better in the OP. In the OP it sounds like you are doing this solely for memory,
since there is no mention of CPU time or the excessive callback-based RAUW
traffic.
It's clear that you found the OP misleading.  I focused this RFC on what
I thought the debug info maintainers would find most compelling.

FTR, it was there, but I admit I assumed (too much) prior familiarity
with the problem space in order to appreciate its import:
>> By leveraging the use-list
>> infrastructure for metadata operands -- i.e., only using value handles
>> for non-metadata operands -- we'll [...] increase
>> RAUW speed.
[snip]
>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>>    traffic during bitcode serialization.  Now that metadata types are
>>    known, we can write debug info out in an order that makes it cheap
>>    to read back in.
>> 
>>    Note that using `MDUser` will make RAUW much cheaper, since
we're
>>    using the use-list infrastructure for most of them.  If RAUW
isn't
>>    showing up in a profile, I may skip this.
> On 2014 Oct 18, at 10:27, Sean Silva <chisophugis at gmail.com>
wrote:
> 
>> Since debug info IR is at the heart of the RAUW bottleneck, I looked
>> into its memory layout (it's a hog).  I started working on PR17891
>> because, besides improving the memory usage, the plan promised to
>> greatly reduce the number of nodes (indirectly reducing RAUW traffic).
>> 
>> In the context of `llvm-lto`, "stage 1" knocked memory usage
down from
>> ~5GB to ~3GB -- but didn't reduce the number of nodes.
> 
> Please put these numbers in context. In the OP you were talking about
15.3GB peak for llvm-lto. Why is ~5GB now the peak? Also in the OP, the
theoretical improvement, out of 15.3GB, was 2GB after stage 4. How are you
getting 2GB improvement out of ~5GB with only stage 1?
I'm talking variously about PR17891 and this proposal.  I suppose that
could be confusing.  "Stage 1" of PR17891 -- have a look at the PR for
context -- yielded 2.2GB reduction in peak memory usage in `llvm-lto`.
After that change, we're at 15.3GB peak in `llvm-lto`.

A conservative estimate of the allocated memory for debug info metadata,
based on counting live nodes and operands (post-change), is ~3GB.  Given
that "stage 1" of PR17891 dropped peak memory usage by 2.2GB, I assume
that the original cost was ~5GB.  This proposal drops the conservative
estimate by a further ~2GB to ~1GB.
>> As Bob suggested, please feel free to join the party!  Less work for me
>> to do later.
> 
> I'm planning on it.
Great!

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Oct 2014 - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Maybe Matching Threads