Eric Christopher
2014-Oct-16 06:30 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at leftfield.org> wrote:> As all of these transforms are 1-to-1, can we still support the older metadata and convert it on the fly? >I'd prefer not to keep all of that code around to interpret both versions without a very good reason. -eric> Alex > >> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> In r219010, I merged integer and string fields into a single header >> field. By reducing the number of metadata operands used in debug info, >> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and >> I've concluded that they will be insufficient. >> >> Instead, I'd like to implement a more aggressive plan, which as a >> side-effect cleans up the much "loved" debug info IR assembly syntax. >> >> At a high-level, the idea is to create distinct subclasses of `Value` >> for each debug info concept, starting with line table entries and moving >> on to the DIDescriptor hierarchy. By leveraging the use-list >> infrastructure for metadata operands -- i.e., only using value handles >> for non-metadata operands -- we'll improve memory usage and increase >> RAUW speed. >> >> My rough plan follows. I quote some numbers for memory savings below >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto` >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's >> -save-temps option) that currently peaks at 15.3GB. >> >> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s >> must all be metadata. The cost per operand is 1 pointer, vs. 4 >> pointers in an `MDNode`. >> >> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal >> fields (not `Value`s) for the line and column, and use `Use` >> operands for the metadata operands. >> >> On x86-64, this will save 104B / line table entry. Linking >> `llvm-lto` uses ~7M line-table entries, so this on its own saves >> ~700MB. >> >> Sketch of class definition: >> >> class MDLineTable : public MDUser { >> unsigned Line; >> unsigned Column; >> public: >> static MDLineTable *get(unsigned Line, unsigned Column, >> MDNode *Scope); >> static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope); >> static MDLineTable *getBase(MDLineTable *Inlined); >> >> unsigned getLine() const { return Line; } >> unsigned getColumn() const { return Column; } >> bool isInlined() const { return getNumOperands() == 2; } >> MDNode *getScope() const { return getOperand(0); } >> MDNode *getInlinedAt() const { return getOperand(1); } >> }; >> >> Proposed assembly syntax: >> >> ; Not inlined. >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9) >> >> ; Inlined. >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9, >> inlinedAt: metadata !10) >> >> ; Column defaulted to 0. >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >> >> (What colour should that bike shed be?) >> >> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows >> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line >> table entries. The cost of these is ~180B each, for another >> ~600MB. >> >> If we integrate a side-table of `MDLineTable`s into its uniquing, >> the overhead is only ~12B / line table entry, or ~80MB. This saves >> 520MB. >> >> This is somewhat perpendicular to redesigning the metadata format, >> but IMO it's worth doing as soon as it's possible. >> >> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` >> through an intermediate class `DebugMDNode` with an >> allocation-time-optional `CallbackVH` available for referencing >> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` instead >> of an `MDNode`. >> >> This saves another ~960MB, for a running total of ~2GB. >> >> Proposed assembly syntax: >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, >> fields: "0\00clang 3.6\00...", >> operands: { metadata !8, ... }) >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, >> fields: "global_var\00...", >> operands: { metadata !8, ... }, >> handle: i32* @global_var) >> >> This syntax pulls the tag out of the current header-string, calls >> the rest of the header "fields", and includes the metadata operands >> in "operands". >> >> 5. Incrementally create subclasses of `DebugMDNode`, such as >> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the >> "fields" and "operands" catch-alls with explicit names for each >> operand. >> >> Proposed assembly syntax: >> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo", >> linkageName: "_Z3foov", file: metadata !8, >> function: i32 (i32)* @foo) >> >> 6. Remove the dead code for `GenericDebugMDNode`. >> >> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW >> traffic during bitcode serialization. Now that metadata types are >> known, we can write debug info out in an order that makes it cheap >> to read back in. >> >> Note that using `MDUser` will make RAUW much cheaper, since we're >> using the use-list infrastructure for most of them. If RAUW isn't >> showing up in a profile, I may skip this. >> >> Does this direction seem reasonable? Any major problems I've missed? >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Alex Rosenberg
2014-Oct-16 14:05 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Oct 15, 2014, at 11:30 PM, Eric Christopher <echristo at gmail.com> wrote:> >> On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at leftfield.org> wrote: >> As all of these transforms are 1-to-1, can we still support the older metadata and convert it on the fly? > > I'd prefer not to keep all of that code around to interpret both > versions without a very good reason.I was thinking of this as a first step toward IR compatibility going forward. Alex> -eric > >> Alex >> >>> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >>> >>> In r219010, I merged integer and string fields into a single header >>> field. By reducing the number of metadata operands used in debug info, >>> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling >>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and >>> I've concluded that they will be insufficient. >>> >>> Instead, I'd like to implement a more aggressive plan, which as a >>> side-effect cleans up the much "loved" debug info IR assembly syntax. >>> >>> At a high-level, the idea is to create distinct subclasses of `Value` >>> for each debug info concept, starting with line table entries and moving >>> on to the DIDescriptor hierarchy. By leveraging the use-list >>> infrastructure for metadata operands -- i.e., only using value handles >>> for non-metadata operands -- we'll improve memory usage and increase >>> RAUW speed. >>> >>> My rough plan follows. I quote some numbers for memory savings below >>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto` >>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's >>> -save-temps option) that currently peaks at 15.3GB. >>> >>> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s >>> must all be metadata. The cost per operand is 1 pointer, vs. 4 >>> pointers in an `MDNode`. >>> >>> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal >>> fields (not `Value`s) for the line and column, and use `Use` >>> operands for the metadata operands. >>> >>> On x86-64, this will save 104B / line table entry. Linking >>> `llvm-lto` uses ~7M line-table entries, so this on its own saves >>> ~700MB. >>> >>> Sketch of class definition: >>> >>> class MDLineTable : public MDUser { >>> unsigned Line; >>> unsigned Column; >>> public: >>> static MDLineTable *get(unsigned Line, unsigned Column, >>> MDNode *Scope); >>> static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope); >>> static MDLineTable *getBase(MDLineTable *Inlined); >>> >>> unsigned getLine() const { return Line; } >>> unsigned getColumn() const { return Column; } >>> bool isInlined() const { return getNumOperands() == 2; } >>> MDNode *getScope() const { return getOperand(0); } >>> MDNode *getInlinedAt() const { return getOperand(1); } >>> }; >>> >>> Proposed assembly syntax: >>> >>> ; Not inlined. >>> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9) >>> >>> ; Inlined. >>> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9, >>> inlinedAt: metadata !10) >>> >>> ; Column defaulted to 0. >>> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >>> >>> (What colour should that bike shed be?) >>> >>> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows >>> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line >>> table entries. The cost of these is ~180B each, for another >>> ~600MB. >>> >>> If we integrate a side-table of `MDLineTable`s into its uniquing, >>> the overhead is only ~12B / line table entry, or ~80MB. This saves >>> 520MB. >>> >>> This is somewhat perpendicular to redesigning the metadata format, >>> but IMO it's worth doing as soon as it's possible. >>> >>> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` >>> through an intermediate class `DebugMDNode` with an >>> allocation-time-optional `CallbackVH` available for referencing >>> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` instead >>> of an `MDNode`. >>> >>> This saves another ~960MB, for a running total of ~2GB. >>> >>> Proposed assembly syntax: >>> >>> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, >>> fields: "0\00clang 3.6\00...", >>> operands: { metadata !8, ... }) >>> >>> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, >>> fields: "global_var\00...", >>> operands: { metadata !8, ... }, >>> handle: i32* @global_var) >>> >>> This syntax pulls the tag out of the current header-string, calls >>> the rest of the header "fields", and includes the metadata operands >>> in "operands". >>> >>> 5. Incrementally create subclasses of `DebugMDNode`, such as >>> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the >>> "fields" and "operands" catch-alls with explicit names for each >>> operand. >>> >>> Proposed assembly syntax: >>> >>> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo", >>> linkageName: "_Z3foov", file: metadata !8, >>> function: i32 (i32)* @foo) >>> >>> 6. Remove the dead code for `GenericDebugMDNode`. >>> >>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW >>> traffic during bitcode serialization. Now that metadata types are >>> known, we can write debug info out in an order that makes it cheap >>> to read back in. >>> >>> Note that using `MDUser` will make RAUW much cheaper, since we're >>> using the use-list infrastructure for most of them. If RAUW isn't >>> showing up in a profile, I may skip this. >>> >>> Does this direction seem reasonable? Any major problems I've missed? >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Eric Christopher
2014-Oct-16 20:11 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Thu, Oct 16, 2014 at 7:05 AM, Alex Rosenberg <alexr at leftfield.org> wrote:> On Oct 15, 2014, at 11:30 PM, Eric Christopher <echristo at gmail.com> wrote: >> >>> On Wed, Oct 15, 2014 at 8:53 PM, Alex Rosenberg <alexr at leftfield.org> wrote: >>> As all of these transforms are 1-to-1, can we still support the older metadata and convert it on the fly? >> >> I'd prefer not to keep all of that code around to interpret both >> versions without a very good reason. > > I was thinking of this as a first step toward IR compatibility going forward.I'll keep it in mind. No promises. -eric> > Alex > >> -eric >> >>> Alex >>> >>>> On Oct 13, 2014, at 3:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >>>> >>>> In r219010, I merged integer and string fields into a single header >>>> field. By reducing the number of metadata operands used in debug info, >>>> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling >>>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and >>>> I've concluded that they will be insufficient. >>>> >>>> Instead, I'd like to implement a more aggressive plan, which as a >>>> side-effect cleans up the much "loved" debug info IR assembly syntax. >>>> >>>> At a high-level, the idea is to create distinct subclasses of `Value` >>>> for each debug info concept, starting with line table entries and moving >>>> on to the DIDescriptor hierarchy. By leveraging the use-list >>>> infrastructure for metadata operands -- i.e., only using value handles >>>> for non-metadata operands -- we'll improve memory usage and increase >>>> RAUW speed. >>>> >>>> My rough plan follows. I quote some numbers for memory savings below >>>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto` >>>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's >>>> -save-temps option) that currently peaks at 15.3GB. >>>> >>>> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s >>>> must all be metadata. The cost per operand is 1 pointer, vs. 4 >>>> pointers in an `MDNode`. >>>> >>>> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal >>>> fields (not `Value`s) for the line and column, and use `Use` >>>> operands for the metadata operands. >>>> >>>> On x86-64, this will save 104B / line table entry. Linking >>>> `llvm-lto` uses ~7M line-table entries, so this on its own saves >>>> ~700MB. >>>> >>>> Sketch of class definition: >>>> >>>> class MDLineTable : public MDUser { >>>> unsigned Line; >>>> unsigned Column; >>>> public: >>>> static MDLineTable *get(unsigned Line, unsigned Column, >>>> MDNode *Scope); >>>> static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope); >>>> static MDLineTable *getBase(MDLineTable *Inlined); >>>> >>>> unsigned getLine() const { return Line; } >>>> unsigned getColumn() const { return Column; } >>>> bool isInlined() const { return getNumOperands() == 2; } >>>> MDNode *getScope() const { return getOperand(0); } >>>> MDNode *getInlinedAt() const { return getOperand(1); } >>>> }; >>>> >>>> Proposed assembly syntax: >>>> >>>> ; Not inlined. >>>> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9) >>>> >>>> ; Inlined. >>>> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9, >>>> inlinedAt: metadata !10) >>>> >>>> ; Column defaulted to 0. >>>> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >>>> >>>> (What colour should that bike shed be?) >>>> >>>> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows >>>> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line >>>> table entries. The cost of these is ~180B each, for another >>>> ~600MB. >>>> >>>> If we integrate a side-table of `MDLineTable`s into its uniquing, >>>> the overhead is only ~12B / line table entry, or ~80MB. This saves >>>> 520MB. >>>> >>>> This is somewhat perpendicular to redesigning the metadata format, >>>> but IMO it's worth doing as soon as it's possible. >>>> >>>> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` >>>> through an intermediate class `DebugMDNode` with an >>>> allocation-time-optional `CallbackVH` available for referencing >>>> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` instead >>>> of an `MDNode`. >>>> >>>> This saves another ~960MB, for a running total of ~2GB. >>>> >>>> Proposed assembly syntax: >>>> >>>> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, >>>> fields: "0\00clang 3.6\00...", >>>> operands: { metadata !8, ... }) >>>> >>>> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, >>>> fields: "global_var\00...", >>>> operands: { metadata !8, ... }, >>>> handle: i32* @global_var) >>>> >>>> This syntax pulls the tag out of the current header-string, calls >>>> the rest of the header "fields", and includes the metadata operands >>>> in "operands". >>>> >>>> 5. Incrementally create subclasses of `DebugMDNode`, such as >>>> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the >>>> "fields" and "operands" catch-alls with explicit names for each >>>> operand. >>>> >>>> Proposed assembly syntax: >>>> >>>> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo", >>>> linkageName: "_Z3foov", file: metadata !8, >>>> function: i32 (i32)* @foo) >>>> >>>> 6. Remove the dead code for `GenericDebugMDNode`. >>>> >>>> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW >>>> traffic during bitcode serialization. Now that metadata types are >>>> known, we can write debug info out in an order that makes it cheap >>>> to read back in. >>>> >>>> Note that using `MDUser` will make RAUW much cheaper, since we're >>>> using the use-list infrastructure for most of them. If RAUW isn't >>>> showing up in a profile, I may skip this. >>>> >>>> Does this direction seem reasonable? Any major problems I've missed? >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev