thr3ads.net - llvm dev - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Oct-17 15:47 UTC

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> Dig into this first!
This isn't the right forum for digging into ld64.
> In the OP you are talking about essentially a pure "optimization"
(in the programmer-wisdom "beware of it" sense), to "save"
2GB of peak memory. But from your analysis it's not clear that this 2GB
savings actually is reflected as peak memory usage saving
It's reflected in both links.
> (since the ~30GB peak might be happening elsewhere in the LTO process). It
is this ~30GB peak, and not the one you originally analyzed, which your
customers presumably care about.
This discussion is intentionally focused on llvm-lto.
> To recap, the analysis you performed seems to support neither of the
following conclusions:
> - Peak memory usage during LTO would be improved by this plan
The analysis is based on the nodes allocated at peak memory.
> - Build time for LTO would be improved by this plan (from what you have
posted, you didn't measure time at all)
CPU profiles blame 25-35% of the CPU of the ld64 LTO link on
callback-based metadata RAUW traffic, depending on the C++ program.
> Of course, this is all tangential to the discussion of e.g. a more
readable/writable .ll form for debug info, or debug info compatibility. However,
it seems like you jumped into this from the point of view of it being an
optimization, rather than a maintainability/compatibility thing.
It's both.

Alex Rosenberg

2014-Oct-17 19:02 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Oct 17, 2014, at 8:47 AM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> 
>> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com>
wrote:
>> 
>> Dig into this first!
> 
> This isn't the right forum for digging into ld64.
I would at least hope that if the issue is in ld64 itself and not LTO, that we
are making sure that lld is not repeating the same choices here.

FWIW, we have our own custom linker and LTO builds easily chews through memory
into this peak size realm, way beyond what a non-LTO link would do.

+------------------------------------------------------------+
| Alexander M. Rosenberg        <mailto:alexr at leftfield.org> |
| Nobody cares what I say, so no disclaimer appears here.    |

Sean Silva

2014-Oct-17 22:54 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Fri, Oct 17, 2014 at 8:47 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at gmail.com>
wrote:
> >
> > Dig into this first!
>
> This isn't the right forum for digging into ld64.
>
> > In the OP you are talking about essentially a pure
"optimization" (in
> the programmer-wisdom "beware of it" sense), to "save"
2GB of peak memory.
> But from your analysis it's not clear that this 2GB savings actually is
> reflected as peak memory usage saving
>
> It's reflected in both links.
>
Then it follows that there is ~15GB of low-hanging fruit that can be
trivially shaved off by just splitting the last part of LTO into an
independent call into llvm-lto. Although identifying the root cause would
be better; as Alex said we don't want to make the same mistake in LLD.

It doesn't make sense to follow an "aggressive plan" for 2GB
savings when
there is 15GB of low-hanging fruit. 2GB is *at most* 12% of the total we
can *ever* expect to shave off (12% = 2/(15+2) <= 2/(15 + 2 + any other
saving); this 15GB is *at least* 50% of the the memory we can ever expect
to shave off (50% = 15/30 >= 15/(30 - anything we can't eliminate)).

>
> > (since the ~30GB peak might be happening elsewhere in the LTO
process).
> It is this ~30GB peak, and not the one you originally analyzed, which your
> customers presumably care about.
>
> This discussion is intentionally focused on llvm-lto.
>
What is the intention? Do your customers actually run llvm-lto? I and at
least one of my officemates didn't even know it existed.

>
> > To recap, the analysis you performed seems to support neither of the
> following conclusions:
> > - Peak memory usage during LTO would be improved by this plan
>
> The analysis is based on the nodes allocated at peak memory.
>
> > - Build time for LTO would be improved by this plan (from what you
have
> posted, you didn't measure time at all)
>
> CPU profiles blame 25-35% of the CPU of the ld64 LTO link on
> callback-based metadata RAUW traffic, depending on the C++ program.
>
Wow, that's a lot. How much of this do you think your plan will be able to
shave off?
Did you see anything else on the profile? A pie chart would be much
appreciated.

Sorry if I sound "grouchy", but this seems like the classic situation
where
the someone comes to you asking for X, but what they really want is a
solution to underlying problem Y, for which the best solution, once you
actually analyze Y, is Z. Here X is debug info size, Y is excessive LTO
time or excessive LTO memory usage, and Z is yet to be determined. It
sounds to me like you started with an a priori idea of changing debug info
and then tried to justify it a posteriori. A "solution looking for a
problem". And since the focus has been on debug info, the results
haven't
been put in context: 2GB savings can be small or big; it's negligible
compared to the 15GB flying under the radar.

Is there a particular reason you are so intent on changing the debug info?
It may very well be that a change to debug info will be the right solution.
But your analyses don't seem to be aimed at establishing what is the right
solution (nor does it seem like anybody has done such analyses); your
analyses seem to be aimed at generating numbers for debug info, and as a
result insufficient attention has been paid to putting the numbers in
proper context so that it is clear what their significance is.

To summarize the discussion so far, it seems that the plan ranks on the
following dimensions:
compatibility - looks interesting
maintainability - unclear
peak memory usage - ~6% improvement (2GB of 30GB)
build time - promising, maybe up to ~30%

-- Sean Silva

>
> > Of course, this is all tangential to the discussion of e.g. a more
> readable/writable .ll form for debug info, or debug info compatibility.
> However, it seems like you jumped into this from the point of view of it
> being an optimization, rather than a maintainability/compatibility thing.
>
> It's both.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141017/7843a493/attachment.html>

Bob Wilson

2014-Oct-17 23:53 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

> On Oct 17, 2014, at 12:02 PM, Alex Rosenberg <alexr at leftfield.org>
wrote:
> 
> On Oct 17, 2014, at 8:47 AM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> 
>> 
>>> On 2014 Oct 16, at 22:09, Sean Silva <chisophugis at
gmail.com> wrote:
>>> 
>>> Dig into this first!
>> 
>> This isn't the right forum for digging into ld64.
> 
> I would at least hope that if the issue is in ld64 itself and not LTO, that
we are making sure that lld is not repeating the same choices here.
> 
> FWIW, we have our own custom linker and LTO builds easily chews through
memory into this peak size realm, way beyond what a non-LTO link would do.
Yes, absolutely. There are quite a few different aspects of improving LTO
scalability. Most of them are not specific to a particular linker. This is just
one of them. If you guys want to tackle other problems, feel free.

Duncan P. N. Exon Smith

2014-Oct-18 01:04 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

> On Oct 17, 2014, at 3:54 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> this seems like the classic situation where the someone comes to you asking
for X, but what they really want is a solution to underlying problem Y, for
which the best solution, once you actually analyze Y, is Z.
On the contrary, I came into this expecting to work with Eric on
parallelizing the backend, but consistently found that callback-based
RAUW traffic for metadata took almost as much CPU.

Since debug info IR is at the heart of the RAUW bottleneck, I looked
into its memory layout (it's a hog).  I started working on PR17891
because, besides improving the memory usage, the plan promised to
greatly reduce the number of nodes (indirectly reducing RAUW traffic).

In the context of `llvm-lto`, "stage 1" knocked memory usage down from
~5GB to ~3GB -- but didn't reduce the number of nodes.  Before starting
stages "2" and "3", I counted nodes and operands to find
which to tackle
first.  Unfortunately, our need to reference local variables and line
table entries directly from the IR-proper limits our ability to refactor
the schema, and those are the nodes we have the most of.

This work will drop debug info memory usage in `llvm-lto` further, from
~3GB down to ~1GB.  It's also a big step toward improving debug info
maintainability.

More importantly (for me), it enables us to refactor uniquing models and
reorder serialization and linking to design away debug info RAUW
traffic -- assuming switching to use-lists doesn't drop it off the
profile.

Regarding "the bigger problem" of LTO memory usage, I expect to see
more
than a 2GB drop from this work due to the nature of metadata uniquing
and expiration.  I'm not motivated to quantify it, since even a 2GB drop
-- when combined with a first-class IR and the RAUW-related speedup --
is motivation enough.

There's a lot of work left to do in LTO -- once I've finished this, I
plan to look for another bottleneck.  Not sure if I'll tackle memory
usage or performance.

As Bob suggested, please feel free to join the party!  Less work for me
to do later.

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Oct 2014 - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Possibly Parallel Threads