thr3ads.net - llvm dev - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Eric Christopher

2014-Oct-16 06:30 UTC

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at leftfield.org>
wrote:> As all of these transforms are 1-to-1, can we still support the older
metadata and convert it on the fly?
>
I'd prefer not to keep all of that code around to interpret both
versions without a very good reason.

-eric
> Alex
>
>> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
>>
>> In r219010, I merged integer and string fields into a single header
>> field.  By reducing the number of metadata operands used in debug info,
>> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some
profiling
>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
>> I've concluded that they will be insufficient.
>>
>> Instead, I'd like to implement a more aggressive plan, which as a
>> side-effect cleans up the much "loved" debug info IR assembly
syntax.
>>
>> At a high-level, the idea is to create distinct subclasses of `Value`
>> for each debug info concept, starting with line table entries and
moving
>> on to the DIDescriptor hierarchy.  By leveraging the use-list
>> infrastructure for metadata operands -- i.e., only using value handles
>> for non-metadata operands -- we'll improve memory usage and
increase
>> RAUW speed.
>>
>> My rough plan follows.  I quote some numbers for memory savings below
>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by
ld64's
>> -save-temps option) that currently peaks at 15.3GB.
>>
>> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
>>    must all be metadata.  The cost per operand is 1 pointer, vs. 4
>>    pointers in an `MDNode`.
>>
>> 2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal
>>    fields (not `Value`s) for the line and column, and use `Use`
>>    operands for the metadata operands.
>>
>>    On x86-64, this will save 104B / line table entry.  Linking
>>    `llvm-lto` uses ~7M line-table entries, so this on its own saves
>>    ~700MB.
>>
>>    Sketch of class definition:
>>
>>        class MDLineTable : public MDUser {
>>          unsigned Line;
>>          unsigned Column;
>>        public:
>>          static MDLineTable *get(unsigned Line, unsigned Column,
>>                                  MDNode *Scope);
>>          static MDLineTable *getInlined(MDLineTable *Base, MDNode
*Scope);
>>          static MDLineTable *getBase(MDLineTable *Inlined);
>>
>>          unsigned getLine() const { return Line; }
>>          unsigned getColumn() const { return Column; }
>>          bool isInlined() const { return getNumOperands() == 2; }
>>          MDNode *getScope() const { return getOperand(0); }
>>          MDNode *getInlinedAt() const { return getOperand(1); }
>>        };
>>
>>    Proposed assembly syntax:
>>
>>        ; Not inlined.
>>        !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
!9)
>>
>>        ; Inlined.
>>        !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
!9,
>>                                   inlinedAt: metadata !10)
>>
>>        ; Column defaulted to 0.
>>        !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>>
>>    (What colour should that bike shed be?)
>>
>> 3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows
>>    that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line
>>    table entries.  The cost of these is ~180B each, for another
>>    ~600MB.
>>
>>    If we integrate a side-table of `MDLineTable`s into its uniquing,
>>    the overhead is only ~12B / line table entry, or ~80MB.  This saves
>>    520MB.
>>
>>    This is somewhat perpendicular to redesigning the metadata format,
>>    but IMO it's worth doing as soon as it's possible.
>>
>> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
>>    through an intermediate class `DebugMDNode` with an
>>    allocation-time-optional `CallbackVH` available for referencing
>>    non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode` instead
>>    of an `MDNode`.
>>
>>    This saves another ~960MB, for a running total of ~2GB.
>>
>>    Proposed assembly syntax:
>>
>>        !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
>>                                          fields: "0\00clang
3.6\00...",
>>                                          operands: { metadata !8, ...
})
>>
>>        !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>>                                          fields:
"global_var\00...",
>>                                          operands: { metadata !8, ...
},
>>                                          handle: i32* @global_var)
>>
>>    This syntax pulls the tag out of the current header-string, calls
>>    the rest of the header "fields", and includes the metadata
operands
>>    in "operands".
>>
>> 5. Incrementally create subclasses of `DebugMDNode`, such as
>>    `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the
>>    "fields" and "operands" catch-alls with explicit
names for each
>>    operand.
>>
>>    Proposed assembly syntax:
>>
>>        !7 = metadata !MDSubprogram(line: 45, name: "foo",
displayName: "foo",
>>                                    linkageName: "_Z3foov",
file: metadata !8,
>>                                    function: i32 (i32)* @foo)
>>
>> 6. Remove the dead code for `GenericDebugMDNode`.
>>
>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>>    traffic during bitcode serialization.  Now that metadata types are
>>    known, we can write debug info out in an order that makes it cheap
>>    to read back in.
>>
>>    Note that using `MDUser` will make RAUW much cheaper, since
we're
>>    using the use-list infrastructure for most of them.  If RAUW
isn't
>>    showing up in a profile, I may skip this.
>>
>> Does this direction seem reasonable?  Any major problems I've
missed?
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Alex Rosenberg

2014-Oct-16 14:05 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Oct 15, 2014, at 11:30 PM, Eric Christopher <echristo at gmail.com>
wrote:> 
>> On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at
leftfield.org> wrote:
>> As all of these transforms are 1-to-1, can we still support the older
metadata and convert it on the fly?
> 
> I'd prefer not to keep all of that code around to interpret both
> versions without a very good reason.
I was thinking of this as a first step toward IR compatibility going forward.

Alex
> -eric
> 
>> Alex
>> 
>>> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith <dexonsmith
at apple.com> wrote:
>>> 
>>> In r219010, I merged integer and string fields into a single header
>>> field.  By reducing the number of metadata operands used in debug
info,
>>> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some
profiling
>>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next,
and
>>> I've concluded that they will be insufficient.
>>> 
>>> Instead, I'd like to implement a more aggressive plan, which as
a
>>> side-effect cleans up the much "loved" debug info IR
assembly syntax.
>>> 
>>> At a high-level, the idea is to create distinct subclasses of
`Value`
>>> for each debug info concept, starting with line table entries and
moving
>>> on to the DIDescriptor hierarchy.  By leveraging the use-list
>>> infrastructure for metadata operands -- i.e., only using value
handles
>>> for non-metadata operands -- we'll improve memory usage and
increase
>>> RAUW speed.
>>> 
>>> My rough plan follows.  I quote some numbers for memory savings
below
>>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running
`llvm-lto`
>>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by
ld64's
>>> -save-temps option) that currently peaks at 15.3GB.
>>> 
>>> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
>>>   must all be metadata.  The cost per operand is 1 pointer, vs. 4
>>>   pointers in an `MDNode`.
>>> 
>>> 2. Create `MDLineTable` as the first subclass of `MDUser`.  Use
normal
>>>   fields (not `Value`s) for the line and column, and use `Use`
>>>   operands for the metadata operands.
>>> 
>>>   On x86-64, this will save 104B / line table entry.  Linking
>>>   `llvm-lto` uses ~7M line-table entries, so this on its own saves
>>>   ~700MB.
>>> 
>>>   Sketch of class definition:
>>> 
>>>       class MDLineTable : public MDUser {
>>>         unsigned Line;
>>>         unsigned Column;
>>>       public:
>>>         static MDLineTable *get(unsigned Line, unsigned Column,
>>>                                 MDNode *Scope);
>>>         static MDLineTable *getInlined(MDLineTable *Base, MDNode
*Scope);
>>>         static MDLineTable *getBase(MDLineTable *Inlined);
>>> 
>>>         unsigned getLine() const { return Line; }
>>>         unsigned getColumn() const { return Column; }
>>>         bool isInlined() const { return getNumOperands() == 2; }
>>>         MDNode *getScope() const { return getOperand(0); }
>>>         MDNode *getInlinedAt() const { return getOperand(1); }
>>>       };
>>> 
>>>   Proposed assembly syntax:
>>> 
>>>       ; Not inlined.
>>>       !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata !9)
>>> 
>>>       ; Inlined.
>>>       !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata !9,
>>>                                  inlinedAt: metadata !10)
>>> 
>>>       ; Column defaulted to 0.
>>>       !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>>> 
>>>   (What colour should that bike shed be?)
>>> 
>>> 3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows
>>>   that we have 3.5M entries in the `DebugLoc` side-vectors for 7M
line
>>>   table entries.  The cost of these is ~180B each, for another
>>>   ~600MB.
>>> 
>>>   If we integrate a side-table of `MDLineTable`s into its uniquing,
>>>   the overhead is only ~12B / line table entry, or ~80MB.  This
saves
>>>   520MB.
>>> 
>>>   This is somewhat perpendicular to redesigning the metadata
format,
>>>   but IMO it's worth doing as soon as it's possible.
>>> 
>>> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
>>>   through an intermediate class `DebugMDNode` with an
>>>   allocation-time-optional `CallbackVH` available for referencing
>>>   non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode`
instead
>>>   of an `MDNode`.
>>> 
>>>   This saves another ~960MB, for a running total of ~2GB.
>>> 
>>>   Proposed assembly syntax:
>>> 
>>>       !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
>>>                                         fields: "0\00clang
3.6\00...",
>>>                                         operands: { metadata !8,
... })
>>> 
>>>       !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>>>                                         fields:
"global_var\00...",
>>>                                         operands: { metadata !8,
... },
>>>                                         handle: i32* @global_var)
>>> 
>>>   This syntax pulls the tag out of the current header-string, calls
>>>   the rest of the header "fields", and includes the
metadata operands
>>>   in "operands".
>>> 
>>> 5. Incrementally create subclasses of `DebugMDNode`, such as
>>>   `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace
the
>>>   "fields" and "operands" catch-alls with
explicit names for each
>>>   operand.
>>> 
>>>   Proposed assembly syntax:
>>> 
>>>       !7 = metadata !MDSubprogram(line: 45, name: "foo",
displayName: "foo",
>>>                                   linkageName: "_Z3foov",
file: metadata !8,
>>>                                   function: i32 (i32)* @foo)
>>> 
>>> 6. Remove the dead code for `GenericDebugMDNode`.
>>> 
>>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>>>   traffic during bitcode serialization.  Now that metadata types
are
>>>   known, we can write debug info out in an order that makes it
cheap
>>>   to read back in.
>>> 
>>>   Note that using `MDUser` will make RAUW much cheaper, since
we're
>>>   using the use-list infrastructure for most of them.  If RAUW
isn't
>>>   showing up in a profile, I may skip this.
>>> 
>>> Does this direction seem reasonable?  Any major problems I've
missed?
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Eric Christopher

2014-Oct-16 20:11 UTC

head link

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

On Thu, Oct 16, 2014 at 7:05 AM, Alex Rosenberg <alexr at leftfield.org>
wrote:> On Oct 15, 2014, at 11:30 PM, Eric Christopher <echristo at
gmail.com> wrote:
>>
>>> On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at
leftfield.org> wrote:
>>> As all of these transforms are 1-to-1, can we still support the
older metadata and convert it on the fly?
>>
>> I'd prefer not to keep all of that code around to interpret both
>> versions without a very good reason.
>
> I was thinking of this as a first step toward IR compatibility going
forward.
I'll keep it in mind. No promises.

-eric
>
> Alex
>
>> -eric
>>
>>> Alex
>>>
>>>> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>>>>
>>>> In r219010, I merged integer and string fields into a single
header
>>>> field.  By reducing the number of metadata operands used in
debug info,
>>>> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done
some profiling
>>>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle
next, and
>>>> I've concluded that they will be insufficient.
>>>>
>>>> Instead, I'd like to implement a more aggressive plan,
which as a
>>>> side-effect cleans up the much "loved" debug info IR
assembly syntax.
>>>>
>>>> At a high-level, the idea is to create distinct subclasses of
`Value`
>>>> for each debug info concept, starting with line table entries
and moving
>>>> on to the DIDescriptor hierarchy.  By leveraging the use-list
>>>> infrastructure for metadata operands -- i.e., only using value
handles
>>>> for non-metadata operands -- we'll improve memory usage and
increase
>>>> RAUW speed.
>>>>
>>>> My rough plan follows.  I quote some numbers for memory savings
below
>>>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running
`llvm-lto`
>>>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by
ld64's
>>>> -save-temps option) that currently peaks at 15.3GB.
>>>>
>>>> 1. Introduce `MDUser`, which inherits from `User`, and whose
`Use`s
>>>>   must all be metadata.  The cost per operand is 1 pointer, vs.
4
>>>>   pointers in an `MDNode`.
>>>>
>>>> 2. Create `MDLineTable` as the first subclass of `MDUser`.  Use
normal
>>>>   fields (not `Value`s) for the line and column, and use `Use`
>>>>   operands for the metadata operands.
>>>>
>>>>   On x86-64, this will save 104B / line table entry.  Linking
>>>>   `llvm-lto` uses ~7M line-table entries, so this on its own
saves
>>>>   ~700MB.
>>>>
>>>>   Sketch of class definition:
>>>>
>>>>       class MDLineTable : public MDUser {
>>>>         unsigned Line;
>>>>         unsigned Column;
>>>>       public:
>>>>         static MDLineTable *get(unsigned Line, unsigned Column,
>>>>                                 MDNode *Scope);
>>>>         static MDLineTable *getInlined(MDLineTable *Base,
MDNode *Scope);
>>>>         static MDLineTable *getBase(MDLineTable *Inlined);
>>>>
>>>>         unsigned getLine() const { return Line; }
>>>>         unsigned getColumn() const { return Column; }
>>>>         bool isInlined() const { return getNumOperands() == 2;
}
>>>>         MDNode *getScope() const { return getOperand(0); }
>>>>         MDNode *getInlinedAt() const { return getOperand(1); }
>>>>       };
>>>>
>>>>   Proposed assembly syntax:
>>>>
>>>>       ; Not inlined.
>>>>       !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata !9)
>>>>
>>>>       ; Inlined.
>>>>       !7 = metadata !MDLineTable(line: 45, column: 7, scope:
metadata !9,
>>>>                                  inlinedAt: metadata !10)
>>>>
>>>>       ; Column defaulted to 0.
>>>>       !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>>>>
>>>>   (What colour should that bike shed be?)
>>>>
>>>> 3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling
shows
>>>>   that we have 3.5M entries in the `DebugLoc` side-vectors for
7M line
>>>>   table entries.  The cost of these is ~180B each, for another
>>>>   ~600MB.
>>>>
>>>>   If we integrate a side-table of `MDLineTable`s into its
uniquing,
>>>>   the overhead is only ~12B / line table entry, or ~80MB.  This
saves
>>>>   520MB.
>>>>
>>>>   This is somewhat perpendicular to redesigning the metadata
format,
>>>>   but IMO it's worth doing as soon as it's possible.
>>>>
>>>> 4. Create `GenericDebugMDNode`, a transitional subclass of
`MDUser`
>>>>   through an intermediate class `DebugMDNode` with an
>>>>   allocation-time-optional `CallbackVH` available for
referencing
>>>>   non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode`
instead
>>>>   of an `MDNode`.
>>>>
>>>>   This saves another ~960MB, for a running total of ~2GB.
>>>>
>>>>   Proposed assembly syntax:
>>>>
>>>>       !7 = metadata !GenericDebugMDNode(tag:
DW_TAG_compile_unit,
>>>>                                         fields: "0\00clang
3.6\00...",
>>>>                                         operands: { metadata
!8, ... })
>>>>
>>>>       !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>>>>                                         fields:
"global_var\00...",
>>>>                                         operands: { metadata
!8, ... },
>>>>                                         handle: i32*
@global_var)
>>>>
>>>>   This syntax pulls the tag out of the current header-string,
calls
>>>>   the rest of the header "fields", and includes the
metadata operands
>>>>   in "operands".
>>>>
>>>> 5. Incrementally create subclasses of `DebugMDNode`, such as
>>>>   `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes
replace the
>>>>   "fields" and "operands" catch-alls with
explicit names for each
>>>>   operand.
>>>>
>>>>   Proposed assembly syntax:
>>>>
>>>>       !7 = metadata !MDSubprogram(line: 45, name:
"foo", displayName: "foo",
>>>>                                   linkageName:
"_Z3foov", file: metadata !8,
>>>>                                   function: i32 (i32)* @foo)
>>>>
>>>> 6. Remove the dead code for `GenericDebugMDNode`.
>>>>
>>>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize
RAUW
>>>>   traffic during bitcode serialization.  Now that metadata
types are
>>>>   known, we can write debug info out in an order that makes it
cheap
>>>>   to read back in.
>>>>
>>>>   Note that using `MDUser` will make RAUW much cheaper, since
we're
>>>>   using the use-list infrastructure for most of them.  If RAUW
isn't
>>>>   showing up in a profile, I may skip this.
>>>>
>>>> Does this direction seem reasonable?  Any major problems
I've missed?
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - Oct 2014 - [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR