thr3ads.net - llvm dev - [llvm-dev] Reducing DWARF emitter memory consumption [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2016-Feb-06 01:04 UTC

[llvm-dev] Reducing DWARF emitter memory consumption

Thanks, I'll look into that. (Though earlier you told me that debug info
for types could be extended while walking the IR, so I wouldn't have thought
that would have worked.)

Peter

On Fri, Feb 05, 2016 at 03:52:19PM -0800, David Blaikie
wrote:> Will look more closely soon - but I'd really try just writing out type
> units to MC as soon as they're done. It should be relatively
non-intrusive
> (we build type units once, there's no ambiguity about when they're
done) -
> for non-fission+type units it might be a bit tricky, because the type units
> still need a relocation for the stmt_list* (I'm trying to find where
that's
> added now... I seem to have lost it), but fission+type units should produce
> entirely static type units that are knowable the moment the type is being
> emitted so far as I can tell (including the type hash and everything - you
> can write the bytes out to the AsmStreamer, etc and forget about them
> entirely except to keep the hash to know that you don't need to emit it
> again.
> 
> I imagine this would provide all the memory savings we would need for much
> of anything (since types are most of the debug info), and, if not, would be
> a good start.
> 
> *I think we might know what the stmt_list relocation is up-front, though -
> if that's the case we'd be able to be as aggressive as I described
is the
> case for fission
> 
> On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
> 
> > Hi all,
> >
> > We have profiled [1] the memory usage in LLVM when LTO'ing
Chromium, and
> > we've found that one of the top consumers of memory is the DWARF
emitter in
> > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter
code and
> > I have a few ideas in mind for how to reduce its memory consumption.
One
> > idea I've had is to restructure the emitter so that (for the most
part) it
> > directly produces the bytes and relocations that need to go into the
DWARF
> > sections without going through other data structures such as DIE and
> > DIEValue.
> >
> > I understand that the DWARF emitter needs to accommodate incomplete
> > entities
> > that may be completed elsewhere during tree construction (e.g.
abstract
> > origins
> > for inlined functions, special members for types), so here's a
quick
> > high-level
> > sketch of the data structures that I believe could support this
design:
> >
> > struct DIEBlock {
> >   SmallVector<char, 1> Data;
> >   std::vector<InternalReloc> IntRelocs;
> >   std::vector<ExternalReloc> ExtRelocs;
> >   DIEBlock *Next;
> > };
> >
> > // This would be used to represent things like DW_AT_type references
to
> > types
> > struct InternalReloc {
> >   size_t Offset; // offset within DIEBlock::Data
> >   DIEBlock *Target; // the offset within Target is at
> > Data[Offset...Offset+Size]
> > };
> >
> > // This would be used to represent things like pointers to
> > .debug_loc/.debug_str or to functions/globals
> > struct ExternalReloc {
> >   size_t Offset; // offset within DIEBlock::Data
> >   MCSymbol *Target; // the offset within Target is at
> > Data[Offset...Offset+Size]
> > };
> >
> > struct DwarfBuilder {
> >   DIEBlock *First;
> >   DIEBlock *Cur;
> >   DenseMap<DISubprogram *, DIEBlock *> Subprograms;
> >   DenseMap<DIType *, DIEBlock *> Types;
> >   DwarfBuilder() : First(new DIEBlock), Cur(First) {}
> >   // builder implementation goes here...
> > };
> >
> > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with
possibly
> > internal or external relocations to IntRelocs/ExtRelocs), but if it
ever
> > needs to create a "gap" for an incomplete data structure
(e.g. at the end
> > of a
> > subprogram or a struct type), it will create a new DIEBlock New, store
it
> > to
> > Cur->Next, store Cur in a DenseMap associated with the
subprogram/type/etc
> > and store New to Cur. To fill a gap later, the DwarfBuilder can pull
the
> > DIEBlock out of the DenseMap and start appending there. Once the IR is
> > fully
> > visited, the debug info writer will walk the linked list starting at
First,
> > calculate a byte offset for each DIEBlock, apply any internal
relocations
> > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe
some
> > other new interface that also supports relocations and avoids
copying).
> >
> > Does that sound reasonable? Is there anything I haven't accounted
for?
> >
> > Thanks,
> > --
> > Peter
> >
> > [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15
> >
-- 
Peter

David Blaikie via llvm-dev

2016-Feb-06 01:35 UTC

head link

[llvm-dev] Reducing DWARF emitter memory consumption

On Fri, Feb 5, 2016 at 5:04 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Thanks, I'll look into that. (Though earlier you told me that debug
info
> for types could be extended while walking the IR, so I wouldn't have
> thought
> that would have worked.)
>
>Yeah, had to think about it more - and as I think about it - I'm moderately
sure type units (which don't include these latent extensions) will be
pretty close to static. With just the stmt_list relocation in non-fission
type units which /should/ still be knowable up-front.

> Peter
>
> On Fri, Feb 05, 2016 at 03:52:19PM -0800, David Blaikie wrote:
> > Will look more closely soon - but I'd really try just writing out
type
> > units to MC as soon as they're done. It should be relatively
> non-intrusive
> > (we build type units once, there's no ambiguity about when
they're done)
> -
> > for non-fission+type units it might be a bit tricky, because the type
> units
> > still need a relocation for the stmt_list* (I'm trying to find
where
> that's
> > added now... I seem to have lost it), but fission+type units should
> produce
> > entirely static type units that are knowable the moment the type is
being
> > emitted so far as I can tell (including the type hash and everything -
> you
> > can write the bytes out to the AsmStreamer, etc and forget about them
> > entirely except to keep the hash to know that you don't need to
emit it
> > again.
> >
> > I imagine this would provide all the memory savings we would need for
> much
> > of anything (since types are most of the debug info), and, if not,
would
> be
> > a good start.
> >
> > *I think we might know what the stmt_list relocation is up-front,
though
> -
> > if that's the case we'd be able to be as aggressive as I
described is the
> > case for fission
> >
> > On Fri, Feb 5, 2016 at 3:17 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
> >
> > > Hi all,
> > >
> > > We have profiled [1] the memory usage in LLVM when LTO'ing
Chromium,
> and
> > > we've found that one of the top consumers of memory is the
DWARF
> emitter in
> > > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF
emitter
> code and
> > > I have a few ideas in mind for how to reduce its memory
consumption.
> One
> > > idea I've had is to restructure the emitter so that (for the
most
> part) it
> > > directly produces the bytes and relocations that need to go into
the
> DWARF
> > > sections without going through other data structures such as DIE
and
> > > DIEValue.
> > >
> > > I understand that the DWARF emitter needs to accommodate
incomplete
> > > entities
> > > that may be completed elsewhere during tree construction (e.g.
abstract
> > > origins
> > > for inlined functions, special members for types), so here's
a quick
> > > high-level
> > > sketch of the data structures that I believe could support this
design:
> > >
> > > struct DIEBlock {
> > >   SmallVector<char, 1> Data;
> > >   std::vector<InternalReloc> IntRelocs;
> > >   std::vector<ExternalReloc> ExtRelocs;
> > >   DIEBlock *Next;
> > > };
> > >
> > > // This would be used to represent things like DW_AT_type
references to
> > > types
> > > struct InternalReloc {
> > >   size_t Offset; // offset within DIEBlock::Data
> > >   DIEBlock *Target; // the offset within Target is at
> > > Data[Offset...Offset+Size]
> > > };
> > >
> > > // This would be used to represent things like pointers to
> > > .debug_loc/.debug_str or to functions/globals
> > > struct ExternalReloc {
> > >   size_t Offset; // offset within DIEBlock::Data
> > >   MCSymbol *Target; // the offset within Target is at
> > > Data[Offset...Offset+Size]
> > > };
> > >
> > > struct DwarfBuilder {
> > >   DIEBlock *First;
> > >   DIEBlock *Cur;
> > >   DenseMap<DISubprogram *, DIEBlock *> Subprograms;
> > >   DenseMap<DIType *, DIEBlock *> Types;
> > >   DwarfBuilder() : First(new DIEBlock), Cur(First) {}
> > >   // builder implementation goes here...
> > > };
> > >
> > > Normally, the DwarfBuilder will just emit bytes to Cur->Data
(with
> possibly
> > > internal or external relocations to IntRelocs/ExtRelocs), but if
it
> ever
> > > needs to create a "gap" for an incomplete data
structure (e.g. at the
> end
> > > of a
> > > subprogram or a struct type), it will create a new DIEBlock New,
store
> it
> > > to
> > > Cur->Next, store Cur in a DenseMap associated with the
> subprogram/type/etc
> > > and store New to Cur. To fill a gap later, the DwarfBuilder can
pull
> the
> > > DIEBlock out of the DenseMap and start appending there. Once the
IR is
> > > fully
> > > visited, the debug info writer will walk the linked list starting
at
> First,
> > > calculate a byte offset for each DIEBlock, apply any internal
> relocations
> > > and write Data using the AsmPrinter (e.g. using EmitBytes, or
maybe
> some
> > > other new interface that also supports relocations and avoids
copying).
> > >
> > > Does that sound reasonable? Is there anything I haven't
accounted for?
> > >
> > > Thanks,
> > > --
> > > Peter
> > >
> > > [1]
https://code.google.com/p/chromium/issues/detail?id=583551#c15
> > >
>
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/be149db1/attachment.html>

Peter Collingbourne via llvm-dev

2016-Feb-10 22:42 UTC

head link

[llvm-dev] Reducing DWARF emitter memory consumption

On Fri, Feb 05, 2016 at 05:35:14PM -0800, David Blaikie
wrote:> On Fri, Feb 5, 2016 at 5:04 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
> 
> > Thanks, I'll look into that. (Though earlier you told me that
debug info
> > for types could be extended while walking the IR, so I wouldn't
have
> > thought
> > that would have worked.)
> >
> >
> Yeah, had to think about it more - and as I think about it - I'm
moderately
> sure type units (which don't include these latent extensions) will be
> pretty close to static. With just the stmt_list relocation in non-fission
> type units which /should/ still be knowable up-front.
I've implemented a change which does this, and looked at impact on memory
consumption and binary size running "llc" on Chromium's 50 largest
(by
bitcode size) translation units. Bottom line: *huge* savings in total memory
consumption, median 17% when compared to before the change, median 7% when
compared to type units disabled.

(I'm not yet confident that my patch is correct (some of the section sizes
are different and I'll need to double check what's going on there) but
I'll
send it out once I'm confident in it.)

I think we can do better, though. With type units enabled, the size of
.debug_info as a fraction of (.debug_info + .debug_types) is median ~40%,
so I think there's another ~12% that can be saved by avoiding DIE/DIEValue
retention for debug_info, bringing the total to ~30%. I expect numbers with
type units disabled to be in the same ballpark (with type units enabled,
we consume ~25% more space in the object file on .debug_info + .debug_types,
so the proportional savings may be less, but the absolute memory consumption
should be lower).  This also roughly lines up with the heap profiler figures
from before.

My conclusion from all this: I think we should do it, and I think it would
especially help in LTO mode with type units disabled: the type units feature
is redundant with LTO deduplication and would therefore add unnecessary bloat
to object files, which would mean increased memory usage (I measured a ~10%
median increase in memory usage comparing the current type units implementation
against type units disabled -- not an entirely fair comparison, but probably
good enough).

I have a plan in mind for doing this incrementally: we will start using the
more efficient data structure at the leaves of the DIE tree, and gradually
expand out to the root. You'll see what that looks like once I have my first
patch ready.

Thanks,
-- 
Peter

llvm dev - Feb 2016 - Reducing DWARF emitter memory consumption

[llvm-dev] Reducing DWARF emitter memory consumption

[llvm-dev] Reducing DWARF emitter memory consumption

[llvm-dev] Reducing DWARF emitter memory consumption