thr3ads.net - llvm dev - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Sean Silva

2014-Oct-15 21:30 UTC

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com>
wrote:
> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> > For those interested, I've attached some pie charts based on
Duncan's
> data
> > in one of the other posts; successive slides break down the usage
> > increasingly finely. To my understanding, they represent the number of
> > Value's (and subclasses) allocated.
> >
> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> In r219010, I merged integer and string fields into a single
header
> >> field.  By reducing the number of metadata operands used in debug
info,
> >> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some
profiling
> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle
next, and
> >> I've concluded that they will be insufficient.
> >>
> >> Instead, I'd like to implement a more aggressive plan, which
as a
> >> side-effect cleans up the much "loved" debug info IR
assembly syntax.
> >>
> >> At a high-level, the idea is to create distinct subclasses of
`Value`
> >> for each debug info concept, starting with line table entries and
moving
> >> on to the DIDescriptor hierarchy.  By leveraging the use-list
> >> infrastructure for metadata operands -- i.e., only using value
handles
> >> for non-metadata operands -- we'll improve memory usage and
increase
> >> RAUW speed.
> >>
> >> My rough plan follows.  I quote some numbers for memory savings
below
> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running
`llvm-lto`
> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by
ld64's
> >> -save-temps option) that currently peaks at 15.3GB.
> >
> >
> > Stupid question, but when I was working on LTO last Summer the primary
> > culprit for excessive memory use was due to us not being smart when
> linking
> > the IR together (Espindola would know more details). Do we still have
> that
> > problem? For starters, how does the memory usage of just llvm-link
> compare
> > to the memory usage of the actual LTO run? If the issue I was seeing
last
> > Summer is still there, you should see that the invocation of llvm-link
is
> > actually the most memory-intensive part of the LTO step, by far.
> >
>
> This is vague. Could you be more specific on where you saw all of the
> memory?
>
Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g;
without -g it completed with much less). The increasing could be easily
watched on the system "process monitor" in real time.

-- Sean Silva

>
> -eric
>
> >
> > Also, you seem to really like saying "peak" here. Is there a
definite
> peak?
> > When does it occur?
> >
> >
> >>
> >>
> >>  1. Introduce `MDUser`, which inherits from `User`, and whose
`Use`s
> >>     must all be metadata.  The cost per operand is 1 pointer, vs.
4
> >>     pointers in an `MDNode`.
> >>
> >>  2. Create `MDLineTable` as the first subclass of `MDUser`.  Use
normal
> >>     fields (not `Value`s) for the line and column, and use `Use`
> >>     operands for the metadata operands.
> >>
> >>     On x86-64, this will save 104B / line table entry.  Linking
> >>     `llvm-lto` uses ~7M line-table entries, so this on its own
saves
> >>     ~700MB.
> >>
> >>
> >>     Sketch of class definition:
> >>
> >>         class MDLineTable : public MDUser {
> >>           unsigned Line;
> >>           unsigned Column;
> >>         public:
> >>           static MDLineTable *get(unsigned Line, unsigned Column,
> >>                                   MDNode *Scope);
> >>           static MDLineTable *getInlined(MDLineTable *Base, MDNode
> >> *Scope);
> >>           static MDLineTable *getBase(MDLineTable *Inlined);
> >>
> >>           unsigned getLine() const { return Line; }
> >>           unsigned getColumn() const { return Column; }
> >>           bool isInlined() const { return getNumOperands() == 2; }
> >>           MDNode *getScope() const { return getOperand(0); }
> >>           MDNode *getInlinedAt() const { return getOperand(1); }
> >>         };
> >>
> >>     Proposed assembly syntax:
> >>
> >>         ; Not inlined.
> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata
> >> !9)
> >>
> >>         ; Inlined.
> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata
> >> !9,
> >>                                    inlinedAt: metadata !10)
> >>
> >>         ; Column defaulted to 0.
> >>         !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
> >>
> >>     (What colour should that bike shed be?)
> >>
> >>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling
shows
> >>     that we have 3.5M entries in the `DebugLoc` side-vectors for
7M line
> >>     table entries.  The cost of these is ~180B each, for another
> >>     ~600MB.
> >>
> >>     If we integrate a side-table of `MDLineTable`s into its
uniquing,
> >>     the overhead is only ~12B / line table entry, or ~80MB.  This
saves
> >>     520MB.
> >>
> >>     This is somewhat perpendicular to redesigning the metadata
format,
> >>     but IMO it's worth doing as soon as it's possible.
> >>
> >>  4. Create `GenericDebugMDNode`, a transitional subclass of
`MDUser`
> >>     through an intermediate class `DebugMDNode` with an
> >>     allocation-time-optional `CallbackVH` available for
referencing
> >>     non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode`
instead
> >>     of an `MDNode`.
> >>
> >>     This saves another ~960MB, for a running total of ~2GB.
> >
> >
> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we
have
> a
> > single pie slice near 40% of the # of Value's allocated and
another at
> 21%.
> > Especially this being "step 4".
> >
> > As a rough back of the envelope calculation, dividing 15.3GB by ~24
> million
> > Values gives about 600 bytes per Value. That seems sort of excessive
> (but is
> > it realistic?). All of the data types that you are proposing to shrink
> fall
> > far short of this "average size", meaning that if you are
trying to
> reduce
> > memory usage, you might be looking in the wrong place. Something
smells
> > fishy. At the very least, this would indicate that the real memory
usage
> is
> > elsewhere.
> >
> > A pie chart breaking down the total memory usage seems essential to
have
> > here.
> >
> >>
> >>
> >>     Proposed assembly syntax:
> >>
> >>         !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_compile_unit,
> >>                                           fields: "0\00clang
3.6\00...",
> >>                                           operands: { metadata !8,
...
> })
> >>
> >>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
> >>                                           fields:
"global_var\00...",
> >>                                           operands: { metadata !8,
...
> },
> >>                                           handle: i32*
@global_var)
> >>
> >>     This syntax pulls the tag out of the current header-string,
calls
> >>     the rest of the header "fields", and includes the
metadata operands
> >>     in "operands".
> >>
> >>  5. Incrementally create subclasses of `DebugMDNode`, such as
> >>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace
the
> >>     "fields" and "operands" catch-alls with
explicit names for each
> >>     operand.
> >>
> >>     Proposed assembly syntax:
> >>
> >>         !7 = metadata !MDSubprogram(line: 45, name:
"foo", displayName:
> >> "foo",
> >>                                     linkageName:
"_Z3foov", file:
> metadata
> >> !8,
> >>                                     function: i32 (i32)* @foo)
> >>
> >>  6. Remove the dead code for `GenericDebugMDNode`.
> >>
> >>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
> >>     traffic during bitcode serialization.  Now that metadata types
are
> >>     known, we can write debug info out in an order that makes it
cheap
> >>     to read back in.
> >>
> >>     Note that using `MDUser` will make RAUW much cheaper, since
we're
> >>     using the use-list infrastructure for most of them.  If RAUW
isn't
> >>     showing up in a profile, I may skip this.
> >>
> >> Does this direction seem reasonable?  Any major problems I've
missed?
> >
> >
> > You need more data. Right now you have essentially one data point, and
> it's
> > not even clear what you measured really. If your goal is saving
memory, I
> > would expect at least a pie chart that breaks down LLVM's memory
usage
> (not
> > just # of allocations of different sorts; an approximation is fine, as
> long
> > as you explain how you arrived at it and in what sense it approximates
> the
> > true number).
> >
> > Do the numbers change significantly for different projects? (e.g.
> Chromium
> > or Firefox or a kernel or a large app you have handy to compile with
> LTO?).
> > If you have specific data you want (and a suggestion for how to gather
> it),
> > I can also get your numbers for one of our internal games as well.
> >
> > Once you have some more data, then as a first step, I would like to
see
> an
> > analysis of how much we can "ideally" expect to gain (back
of the
> envelope
> > calculations == win).
> >
> > -- Sean Silva
> >
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141015/776ebe91/attachment.html>

Eric Christopher

2014-Oct-15 21:31 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>>
>> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>> > For those interested, I've attached some pie charts based on
Duncan's
>> > data
>> > in one of the other posts; successive slides break down the usage
>> > increasingly finely. To my understanding, they represent the
number of
>> > Value's (and subclasses) allocated.
>> >
>> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith
>> > <dexonsmith at apple.com> wrote:
>> >>
>> >> In r219010, I merged integer and string fields into a single
header
>> >> field.  By reducing the number of metadata operands used in
debug info,
>> >> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done
some profiling
>> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle
next, and
>> >> I've concluded that they will be insufficient.
>> >>
>> >> Instead, I'd like to implement a more aggressive plan,
which as a
>> >> side-effect cleans up the much "loved" debug info IR
assembly syntax.
>> >>
>> >> At a high-level, the idea is to create distinct subclasses of
`Value`
>> >> for each debug info concept, starting with line table entries
and
>> >> moving
>> >> on to the DIDescriptor hierarchy.  By leveraging the use-list
>> >> infrastructure for metadata operands -- i.e., only using value
handles
>> >> for non-metadata operands -- we'll improve memory usage
and increase
>> >> RAUW speed.
>> >>
>> >> My rough plan follows.  I quote some numbers for memory
savings below
>> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running
`llvm-lto`
>> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by
ld64's
>> >> -save-temps option) that currently peaks at 15.3GB.
>> >
>> >
>> > Stupid question, but when I was working on LTO last Summer the
primary
>> > culprit for excessive memory use was due to us not being smart
when
>> > linking
>> > the IR together (Espindola would know more details). Do we still
have
>> > that
>> > problem? For starters, how does the memory usage of just llvm-link
>> > compare
>> > to the memory usage of the actual LTO run? If the issue I was
seeing
>> > last
>> > Summer is still there, you should see that the invocation of
llvm-link
>> > is
>> > actually the most memory-intensive part of the LTO step, by far.
>> >
>>
>> This is vague. Could you be more specific on where you saw all of the
>> memory?
>
>
> Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g;
> without -g it completed with much less). The increasing could be easily
> watched on the system "process monitor" in real time.
>
This is likely what we've already discussed and was handled a long
while ago now.

-eric
> -- Sean Silva
>
>>
>>
>> -eric
>>
>> >
>> > Also, you seem to really like saying "peak" here. Is
there a definite
>> > peak?
>> > When does it occur?
>> >
>> >
>> >>
>> >>
>> >>  1. Introduce `MDUser`, which inherits from `User`, and whose
`Use`s
>> >>     must all be metadata.  The cost per operand is 1 pointer,
vs. 4
>> >>     pointers in an `MDNode`.
>> >>
>> >>  2. Create `MDLineTable` as the first subclass of `MDUser`. 
Use normal
>> >>     fields (not `Value`s) for the line and column, and use
`Use`
>> >>     operands for the metadata operands.
>> >>
>> >>     On x86-64, this will save 104B / line table entry. 
Linking
>> >>     `llvm-lto` uses ~7M line-table entries, so this on its own
saves
>> >>     ~700MB.
>> >>
>> >>
>> >>     Sketch of class definition:
>> >>
>> >>         class MDLineTable : public MDUser {
>> >>           unsigned Line;
>> >>           unsigned Column;
>> >>         public:
>> >>           static MDLineTable *get(unsigned Line, unsigned
Column,
>> >>                                   MDNode *Scope);
>> >>           static MDLineTable *getInlined(MDLineTable *Base,
MDNode
>> >> *Scope);
>> >>           static MDLineTable *getBase(MDLineTable *Inlined);
>> >>
>> >>           unsigned getLine() const { return Line; }
>> >>           unsigned getColumn() const { return Column; }
>> >>           bool isInlined() const { return getNumOperands() ==
2; }
>> >>           MDNode *getScope() const { return getOperand(0); }
>> >>           MDNode *getInlinedAt() const { return getOperand(1);
}
>> >>         };
>> >>
>> >>     Proposed assembly syntax:
>> >>
>> >>         ; Not inlined.
>> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata
>> >> !9)
>> >>
>> >>         ; Inlined.
>> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata
>> >> !9,
>> >>                                    inlinedAt: metadata !10)
>> >>
>> >>         ; Column defaulted to 0.
>> >>         !7 = metadata !MDLineTable(line: 45, scope: metadata
!9)
>> >>
>> >>     (What colour should that bike shed be?)
>> >>
>> >>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling
shows
>> >>     that we have 3.5M entries in the `DebugLoc` side-vectors
for 7M
>> >> line
>> >>     table entries.  The cost of these is ~180B each, for
another
>> >>     ~600MB.
>> >>
>> >>     If we integrate a side-table of `MDLineTable`s into its
uniquing,
>> >>     the overhead is only ~12B / line table entry, or ~80MB. 
This saves
>> >>     520MB.
>> >>
>> >>     This is somewhat perpendicular to redesigning the metadata
format,
>> >>     but IMO it's worth doing as soon as it's possible.
>> >>
>> >>  4. Create `GenericDebugMDNode`, a transitional subclass of
`MDUser`
>> >>     through an intermediate class `DebugMDNode` with an
>> >>     allocation-time-optional `CallbackVH` available for
referencing
>> >>     non-metadata.  Change `DIDescriptor` to wrap a
`DebugMDNode`
>> >> instead
>> >>     of an `MDNode`.
>> >>
>> >>     This saves another ~960MB, for a running total of ~2GB.
>> >
>> >
>> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when
we have
>> > a
>> > single pie slice near 40% of the # of Value's allocated and
another at
>> > 21%.
>> > Especially this being "step 4".
>> >
>> > As a rough back of the envelope calculation, dividing 15.3GB by
~24
>> > million
>> > Values gives about 600 bytes per Value. That seems sort of
excessive
>> > (but is
>> > it realistic?). All of the data types that you are proposing to
shrink
>> > fall
>> > far short of this "average size", meaning that if you
are trying to
>> > reduce
>> > memory usage, you might be looking in the wrong place. Something
smells
>> > fishy. At the very least, this would indicate that the real memory
usage
>> > is
>> > elsewhere.
>> >
>> > A pie chart breaking down the total memory usage seems essential
to have
>> > here.
>> >
>> >>
>> >>
>> >>     Proposed assembly syntax:
>> >>
>> >>         !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_compile_unit,
>> >>                                           fields:
"0\00clang
>> >> 3.6\00...",
>> >>                                           operands: { metadata
!8, ...
>> >> })
>> >>
>> >>         !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_variable,
>> >>                                           fields:
"global_var\00...",
>> >>                                           operands: { metadata
!8, ...
>> >> },
>> >>                                           handle: i32*
@global_var)
>> >>
>> >>     This syntax pulls the tag out of the current
header-string, calls
>> >>     the rest of the header "fields", and includes
the metadata operands
>> >>     in "operands".
>> >>
>> >>  5. Incrementally create subclasses of `DebugMDNode`, such as
>> >>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes
replace the
>> >>     "fields" and "operands" catch-alls
with explicit names for each
>> >>     operand.
>> >>
>> >>     Proposed assembly syntax:
>> >>
>> >>         !7 = metadata !MDSubprogram(line: 45, name:
"foo", displayName:
>> >> "foo",
>> >>                                     linkageName:
"_Z3foov", file:
>> >> metadata
>> >> !8,
>> >>                                     function: i32 (i32)* @foo)
>> >>
>> >>  6. Remove the dead code for `GenericDebugMDNode`.
>> >>
>> >>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize
RAUW
>> >>     traffic during bitcode serialization.  Now that metadata
types are
>> >>     known, we can write debug info out in an order that makes
it cheap
>> >>     to read back in.
>> >>
>> >>     Note that using `MDUser` will make RAUW much cheaper,
since we're
>> >>     using the use-list infrastructure for most of them.  If
RAUW isn't
>> >>     showing up in a profile, I may skip this.
>> >>
>> >> Does this direction seem reasonable?  Any major problems
I've missed?
>> >
>> >
>> > You need more data. Right now you have essentially one data point,
and
>> > it's
>> > not even clear what you measured really. If your goal is saving
memory,
>> > I
>> > would expect at least a pie chart that breaks down LLVM's
memory usage
>> > (not
>> > just # of allocations of different sorts; an approximation is
fine, as
>> > long
>> > as you explain how you arrived at it and in what sense it
approximates
>> > the
>> > true number).
>> >
>> > Do the numbers change significantly for different projects? (e.g.
>> > Chromium
>> > or Firefox or a kernel or a large app you have handy to compile
with
>> > LTO?).
>> > If you have specific data you want (and a suggestion for how to
gather
>> > it),
>> > I can also get your numbers for one of our internal games as well.
>> >
>> > Once you have some more data, then as a first step, I would like
to see
>> > an
>> > analysis of how much we can "ideally" expect to gain
(back of the
>> > envelope
>> > calculations == win).
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>
>

Sean Silva

2014-Oct-15 21:32 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Wed, Oct 15, 2014 at 2:31 PM, Eric Christopher <echristo at gmail.com>
wrote:
> On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> >
> > On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at
gmail.com>
> > wrote:
> >>
> >> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at
gmail.com>
> wrote:
> >> > For those interested, I've attached some pie charts based
on Duncan's
> >> > data
> >> > in one of the other posts; successive slides break down the
usage
> >> > increasingly finely. To my understanding, they represent the
number of
> >> > Value's (and subclasses) allocated.
> >> >
> >> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith
> >> > <dexonsmith at apple.com> wrote:
> >> >>
> >> >> In r219010, I merged integer and string fields into a
single header
> >> >> field.  By reducing the number of metadata operands used
in debug
> info,
> >> >> this saved 2.2GB on an `llvm-lto` bootstrap.  I've
done some
> profiling
> >> >> of DW_TAGs to see what parts of PR17891 and PR17892 to
tackle next,
> and
> >> >> I've concluded that they will be insufficient.
> >> >>
> >> >> Instead, I'd like to implement a more aggressive
plan, which as a
> >> >> side-effect cleans up the much "loved" debug
info IR assembly syntax.
> >> >>
> >> >> At a high-level, the idea is to create distinct
subclasses of `Value`
> >> >> for each debug info concept, starting with line table
entries and
> >> >> moving
> >> >> on to the DIDescriptor hierarchy.  By leveraging the
use-list
> >> >> infrastructure for metadata operands -- i.e., only using
value
> handles
> >> >> for non-metadata operands -- we'll improve memory
usage and increase
> >> >> RAUW speed.
> >> >>
> >> >> My rough plan follows.  I quote some numbers for memory
savings below
> >> >> based on an -flto -g bootstrap of `llvm-lto` (i.e.,
running
> `llvm-lto`
> >> >> on `llvm-lto.lto.bc`, an already-linked bitcode file
dumped by ld64's
> >> >> -save-temps option) that currently peaks at 15.3GB.
> >> >
> >> >
> >> > Stupid question, but when I was working on LTO last Summer
the primary
> >> > culprit for excessive memory use was due to us not being
smart when
> >> > linking
> >> > the IR together (Espindola would know more details). Do we
still have
> >> > that
> >> > problem? For starters, how does the memory usage of just
llvm-link
> >> > compare
> >> > to the memory usage of the actual LTO run? If the issue I was
seeing
> >> > last
> >> > Summer is still there, you should see that the invocation of
llvm-link
> >> > is
> >> > actually the most memory-intensive part of the LTO step, by
far.
> >> >
> >>
> >> This is vague. Could you be more specific on where you saw all of
the
> >> memory?
> >
> >
> > Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with
-g;
> > without -g it completed with much less). The increasing could be
easily
> > watched on the system "process monitor" in real time.
> >
>
> This is likely what we've already discussed and was handled a long
> while ago now.
>
>I was reading the thread in sequential order (and replying without
finishing). derp.

-- Sean Silva

> -eric
>
> > -- Sean Silva
> >
> >>
> >>
> >> -eric
> >>
> >> >
> >> > Also, you seem to really like saying "peak" here.
Is there a definite
> >> > peak?
> >> > When does it occur?
> >> >
> >> >
> >> >>
> >> >>
> >> >>  1. Introduce `MDUser`, which inherits from `User`, and
whose `Use`s
> >> >>     must all be metadata.  The cost per operand is 1
pointer, vs. 4
> >> >>     pointers in an `MDNode`.
> >> >>
> >> >>  2. Create `MDLineTable` as the first subclass of
`MDUser`.  Use
> normal
> >> >>     fields (not `Value`s) for the line and column, and
use `Use`
> >> >>     operands for the metadata operands.
> >> >>
> >> >>     On x86-64, this will save 104B / line table entry. 
Linking
> >> >>     `llvm-lto` uses ~7M line-table entries, so this on
its own saves
> >> >>     ~700MB.
> >> >>
> >> >>
> >> >>     Sketch of class definition:
> >> >>
> >> >>         class MDLineTable : public MDUser {
> >> >>           unsigned Line;
> >> >>           unsigned Column;
> >> >>         public:
> >> >>           static MDLineTable *get(unsigned Line, unsigned
Column,
> >> >>                                   MDNode *Scope);
> >> >>           static MDLineTable *getInlined(MDLineTable
*Base, MDNode
> >> >> *Scope);
> >> >>           static MDLineTable *getBase(MDLineTable
*Inlined);
> >> >>
> >> >>           unsigned getLine() const { return Line; }
> >> >>           unsigned getColumn() const { return Column; }
> >> >>           bool isInlined() const { return
getNumOperands() == 2; }
> >> >>           MDNode *getScope() const { return
getOperand(0); }
> >> >>           MDNode *getInlinedAt() const { return
getOperand(1); }
> >> >>         };
> >> >>
> >> >>     Proposed assembly syntax:
> >> >>
> >> >>         ; Not inlined.
> >> >>         !7 = metadata !MDLineTable(line: 45, column: 7,
scope:
> metadata
> >> >> !9)
> >> >>
> >> >>         ; Inlined.
> >> >>         !7 = metadata !MDLineTable(line: 45, column: 7,
scope:
> metadata
> >> >> !9,
> >> >>                                    inlinedAt: metadata
!10)
> >> >>
> >> >>         ; Column defaulted to 0.
> >> >>         !7 = metadata !MDLineTable(line: 45, scope:
metadata !9)
> >> >>
> >> >>     (What colour should that bike shed be?)
> >> >>
> >> >>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My
profiling shows
> >> >>     that we have 3.5M entries in the `DebugLoc`
side-vectors for 7M
> >> >> line
> >> >>     table entries.  The cost of these is ~180B each, for
another
> >> >>     ~600MB.
> >> >>
> >> >>     If we integrate a side-table of `MDLineTable`s into
its uniquing,
> >> >>     the overhead is only ~12B / line table entry, or
~80MB.  This
> saves
> >> >>     520MB.
> >> >>
> >> >>     This is somewhat perpendicular to redesigning the
metadata
> format,
> >> >>     but IMO it's worth doing as soon as it's
possible.
> >> >>
> >> >>  4. Create `GenericDebugMDNode`, a transitional subclass
of `MDUser`
> >> >>     through an intermediate class `DebugMDNode` with an
> >> >>     allocation-time-optional `CallbackVH` available for
referencing
> >> >>     non-metadata.  Change `DIDescriptor` to wrap a
`DebugMDNode`
> >> >> instead
> >> >>     of an `MDNode`.
> >> >>
> >> >>     This saves another ~960MB, for a running total of
~2GB.
> >> >
> >> >
> >> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings
when we
> have
> >> > a
> >> > single pie slice near 40% of the # of Value's allocated
and another at
> >> > 21%.
> >> > Especially this being "step 4".
> >> >
> >> > As a rough back of the envelope calculation, dividing 15.3GB
by ~24
> >> > million
> >> > Values gives about 600 bytes per Value. That seems sort of
excessive
> >> > (but is
> >> > it realistic?). All of the data types that you are proposing
to shrink
> >> > fall
> >> > far short of this "average size", meaning that if
you are trying to
> >> > reduce
> >> > memory usage, you might be looking in the wrong place.
Something
> smells
> >> > fishy. At the very least, this would indicate that the real
memory
> usage
> >> > is
> >> > elsewhere.
> >> >
> >> > A pie chart breaking down the total memory usage seems
essential to
> have
> >> > here.
> >> >
> >> >>
> >> >>
> >> >>     Proposed assembly syntax:
> >> >>
> >> >>         !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_compile_unit,
> >> >>                                           fields:
"0\00clang
> >> >> 3.6\00...",
> >> >>                                           operands: {
metadata !8,
> ...
> >> >> })
> >> >>
> >> >>         !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_variable,
> >> >>                                           fields:
"global_var\00...",
> >> >>                                           operands: {
metadata !8,
> ...
> >> >> },
> >> >>                                           handle: i32*
@global_var)
> >> >>
> >> >>     This syntax pulls the tag out of the current
header-string, calls
> >> >>     the rest of the header "fields", and
includes the metadata
> operands
> >> >>     in "operands".
> >> >>
> >> >>  5. Incrementally create subclasses of `DebugMDNode`,
such as
> >> >>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed
nodes replace
> the
> >> >>     "fields" and "operands"
catch-alls with explicit names for each
> >> >>     operand.
> >> >>
> >> >>     Proposed assembly syntax:
> >> >>
> >> >>         !7 = metadata !MDSubprogram(line: 45, name:
"foo",
> displayName:
> >> >> "foo",
> >> >>                                     linkageName:
"_Z3foov", file:
> >> >> metadata
> >> >> !8,
> >> >>                                     function: i32 (i32)*
@foo)
> >> >>
> >> >>  6. Remove the dead code for `GenericDebugMDNode`.
> >> >>
> >> >>  7. (Optional) Refactor `DebugMDNode` sub-classes to
minimize RAUW
> >> >>     traffic during bitcode serialization.  Now that
metadata types
> are
> >> >>     known, we can write debug info out in an order that
makes it
> cheap
> >> >>     to read back in.
> >> >>
> >> >>     Note that using `MDUser` will make RAUW much cheaper,
since we're
> >> >>     using the use-list infrastructure for most of them. 
If RAUW
> isn't
> >> >>     showing up in a profile, I may skip this.
> >> >>
> >> >> Does this direction seem reasonable?  Any major problems
I've missed?
> >> >
> >> >
> >> > You need more data. Right now you have essentially one data
point, and
> >> > it's
> >> > not even clear what you measured really. If your goal is
saving
> memory,
> >> > I
> >> > would expect at least a pie chart that breaks down LLVM's
memory usage
> >> > (not
> >> > just # of allocations of different sorts; an approximation is
fine, as
> >> > long
> >> > as you explain how you arrived at it and in what sense it
approximates
> >> > the
> >> > true number).
> >> >
> >> > Do the numbers change significantly for different projects?
(e.g.
> >> > Chromium
> >> > or Firefox or a kernel or a large app you have handy to
compile with
> >> > LTO?).
> >> > If you have specific data you want (and a suggestion for how
to gather
> >> > it),
> >> > I can also get your numbers for one of our internal games as
well.
> >> >
> >> > Once you have some more data, then as a first step, I would
like to
> see
> >> > an
> >> > analysis of how much we can "ideally" expect to
gain (back of the
> >> > envelope
> >> > calculations == win).
> >> >
> >> > -- Sean Silva
> >> >
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >> >
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141015/ff744eac/attachment.html>

Diego Novillo

2014-Oct-17 14:27 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On 15/10/2014, 17:31 , Eric Christopher wrote:> On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>> Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g;
>> without -g it completed with much less). The increasing could be easily
>> watched on the system "process monitor" in real time.
>>
> This is likely what we've already discussed and was handled a long
> while ago now.
Wait, really? I can definitely get my 64Gb box to thrash just trying to 
llvm-link -g bitcode files. By 'handled' you mean fixed in trunk or 
'plan to fix'?


Diego.

llvm dev - Oct 2014 - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR