Sean Silva
2014-Oct-15 21:30 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com> wrote:> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote: > > For those interested, I've attached some pie charts based on Duncan's > data > > in one of the other posts; successive slides break down the usage > > increasingly finely. To my understanding, they represent the number of > > Value's (and subclasses) allocated. > > > > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith > > <dexonsmith at apple.com> wrote: > >> > >> In r219010, I merged integer and string fields into a single header > >> field. By reducing the number of metadata operands used in debug info, > >> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling > >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and > >> I've concluded that they will be insufficient. > >> > >> Instead, I'd like to implement a more aggressive plan, which as a > >> side-effect cleans up the much "loved" debug info IR assembly syntax. > >> > >> At a high-level, the idea is to create distinct subclasses of `Value` > >> for each debug info concept, starting with line table entries and moving > >> on to the DIDescriptor hierarchy. By leveraging the use-list > >> infrastructure for metadata operands -- i.e., only using value handles > >> for non-metadata operands -- we'll improve memory usage and increase > >> RAUW speed. > >> > >> My rough plan follows. I quote some numbers for memory savings below > >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto` > >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's > >> -save-temps option) that currently peaks at 15.3GB. > > > > > > Stupid question, but when I was working on LTO last Summer the primary > > culprit for excessive memory use was due to us not being smart when > linking > > the IR together (Espindola would know more details). Do we still have > that > > problem? For starters, how does the memory usage of just llvm-link > compare > > to the memory usage of the actual LTO run? If the issue I was seeing last > > Summer is still there, you should see that the invocation of llvm-link is > > actually the most memory-intensive part of the LTO step, by far. > > > > This is vague. Could you be more specific on where you saw all of the > memory? >Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g; without -g it completed with much less). The increasing could be easily watched on the system "process monitor" in real time. -- Sean Silva> > -eric > > > > > Also, you seem to really like saying "peak" here. Is there a definite > peak? > > When does it occur? > > > > > >> > >> > >> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s > >> must all be metadata. The cost per operand is 1 pointer, vs. 4 > >> pointers in an `MDNode`. > >> > >> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal > >> fields (not `Value`s) for the line and column, and use `Use` > >> operands for the metadata operands. > >> > >> On x86-64, this will save 104B / line table entry. Linking > >> `llvm-lto` uses ~7M line-table entries, so this on its own saves > >> ~700MB. > >> > >> > >> Sketch of class definition: > >> > >> class MDLineTable : public MDUser { > >> unsigned Line; > >> unsigned Column; > >> public: > >> static MDLineTable *get(unsigned Line, unsigned Column, > >> MDNode *Scope); > >> static MDLineTable *getInlined(MDLineTable *Base, MDNode > >> *Scope); > >> static MDLineTable *getBase(MDLineTable *Inlined); > >> > >> unsigned getLine() const { return Line; } > >> unsigned getColumn() const { return Column; } > >> bool isInlined() const { return getNumOperands() == 2; } > >> MDNode *getScope() const { return getOperand(0); } > >> MDNode *getInlinedAt() const { return getOperand(1); } > >> }; > >> > >> Proposed assembly syntax: > >> > >> ; Not inlined. > >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata > >> !9) > >> > >> ; Inlined. > >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata > >> !9, > >> inlinedAt: metadata !10) > >> > >> ; Column defaulted to 0. > >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) > >> > >> (What colour should that bike shed be?) > >> > >> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows > >> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line > >> table entries. The cost of these is ~180B each, for another > >> ~600MB. > >> > >> If we integrate a side-table of `MDLineTable`s into its uniquing, > >> the overhead is only ~12B / line table entry, or ~80MB. This saves > >> 520MB. > >> > >> This is somewhat perpendicular to redesigning the metadata format, > >> but IMO it's worth doing as soon as it's possible. > >> > >> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` > >> through an intermediate class `DebugMDNode` with an > >> allocation-time-optional `CallbackVH` available for referencing > >> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` instead > >> of an `MDNode`. > >> > >> This saves another ~960MB, for a running total of ~2GB. > > > > > > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have > a > > single pie slice near 40% of the # of Value's allocated and another at > 21%. > > Especially this being "step 4". > > > > As a rough back of the envelope calculation, dividing 15.3GB by ~24 > million > > Values gives about 600 bytes per Value. That seems sort of excessive > (but is > > it realistic?). All of the data types that you are proposing to shrink > fall > > far short of this "average size", meaning that if you are trying to > reduce > > memory usage, you might be looking in the wrong place. Something smells > > fishy. At the very least, this would indicate that the real memory usage > is > > elsewhere. > > > > A pie chart breaking down the total memory usage seems essential to have > > here. > > > >> > >> > >> Proposed assembly syntax: > >> > >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, > >> fields: "0\00clang 3.6\00...", > >> operands: { metadata !8, ... > }) > >> > >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, > >> fields: "global_var\00...", > >> operands: { metadata !8, ... > }, > >> handle: i32* @global_var) > >> > >> This syntax pulls the tag out of the current header-string, calls > >> the rest of the header "fields", and includes the metadata operands > >> in "operands". > >> > >> 5. Incrementally create subclasses of `DebugMDNode`, such as > >> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the > >> "fields" and "operands" catch-alls with explicit names for each > >> operand. > >> > >> Proposed assembly syntax: > >> > >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: > >> "foo", > >> linkageName: "_Z3foov", file: > metadata > >> !8, > >> function: i32 (i32)* @foo) > >> > >> 6. Remove the dead code for `GenericDebugMDNode`. > >> > >> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW > >> traffic during bitcode serialization. Now that metadata types are > >> known, we can write debug info out in an order that makes it cheap > >> to read back in. > >> > >> Note that using `MDUser` will make RAUW much cheaper, since we're > >> using the use-list infrastructure for most of them. If RAUW isn't > >> showing up in a profile, I may skip this. > >> > >> Does this direction seem reasonable? Any major problems I've missed? > > > > > > You need more data. Right now you have essentially one data point, and > it's > > not even clear what you measured really. If your goal is saving memory, I > > would expect at least a pie chart that breaks down LLVM's memory usage > (not > > just # of allocations of different sorts; an approximation is fine, as > long > > as you explain how you arrived at it and in what sense it approximates > the > > true number). > > > > Do the numbers change significantly for different projects? (e.g. > Chromium > > or Firefox or a kernel or a large app you have handy to compile with > LTO?). > > If you have specific data you want (and a suggestion for how to gather > it), > > I can also get your numbers for one of our internal games as well. > > > > Once you have some more data, then as a first step, I would like to see > an > > analysis of how much we can "ideally" expect to gain (back of the > envelope > > calculations == win). > > > > -- Sean Silva > > > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141015/776ebe91/attachment.html>
Eric Christopher
2014-Oct-15 21:31 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com> > wrote: >> >> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > For those interested, I've attached some pie charts based on Duncan's >> > data >> > in one of the other posts; successive slides break down the usage >> > increasingly finely. To my understanding, they represent the number of >> > Value's (and subclasses) allocated. >> > >> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith >> > <dexonsmith at apple.com> wrote: >> >> >> >> In r219010, I merged integer and string fields into a single header >> >> field. By reducing the number of metadata operands used in debug info, >> >> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling >> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and >> >> I've concluded that they will be insufficient. >> >> >> >> Instead, I'd like to implement a more aggressive plan, which as a >> >> side-effect cleans up the much "loved" debug info IR assembly syntax. >> >> >> >> At a high-level, the idea is to create distinct subclasses of `Value` >> >> for each debug info concept, starting with line table entries and >> >> moving >> >> on to the DIDescriptor hierarchy. By leveraging the use-list >> >> infrastructure for metadata operands -- i.e., only using value handles >> >> for non-metadata operands -- we'll improve memory usage and increase >> >> RAUW speed. >> >> >> >> My rough plan follows. I quote some numbers for memory savings below >> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto` >> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's >> >> -save-temps option) that currently peaks at 15.3GB. >> > >> > >> > Stupid question, but when I was working on LTO last Summer the primary >> > culprit for excessive memory use was due to us not being smart when >> > linking >> > the IR together (Espindola would know more details). Do we still have >> > that >> > problem? For starters, how does the memory usage of just llvm-link >> > compare >> > to the memory usage of the actual LTO run? If the issue I was seeing >> > last >> > Summer is still there, you should see that the invocation of llvm-link >> > is >> > actually the most memory-intensive part of the LTO step, by far. >> > >> >> This is vague. Could you be more specific on where you saw all of the >> memory? > > > Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g; > without -g it completed with much less). The increasing could be easily > watched on the system "process monitor" in real time. >This is likely what we've already discussed and was handled a long while ago now. -eric> -- Sean Silva > >> >> >> -eric >> >> > >> > Also, you seem to really like saying "peak" here. Is there a definite >> > peak? >> > When does it occur? >> > >> > >> >> >> >> >> >> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s >> >> must all be metadata. The cost per operand is 1 pointer, vs. 4 >> >> pointers in an `MDNode`. >> >> >> >> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal >> >> fields (not `Value`s) for the line and column, and use `Use` >> >> operands for the metadata operands. >> >> >> >> On x86-64, this will save 104B / line table entry. Linking >> >> `llvm-lto` uses ~7M line-table entries, so this on its own saves >> >> ~700MB. >> >> >> >> >> >> Sketch of class definition: >> >> >> >> class MDLineTable : public MDUser { >> >> unsigned Line; >> >> unsigned Column; >> >> public: >> >> static MDLineTable *get(unsigned Line, unsigned Column, >> >> MDNode *Scope); >> >> static MDLineTable *getInlined(MDLineTable *Base, MDNode >> >> *Scope); >> >> static MDLineTable *getBase(MDLineTable *Inlined); >> >> >> >> unsigned getLine() const { return Line; } >> >> unsigned getColumn() const { return Column; } >> >> bool isInlined() const { return getNumOperands() == 2; } >> >> MDNode *getScope() const { return getOperand(0); } >> >> MDNode *getInlinedAt() const { return getOperand(1); } >> >> }; >> >> >> >> Proposed assembly syntax: >> >> >> >> ; Not inlined. >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> >> !9) >> >> >> >> ; Inlined. >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> >> !9, >> >> inlinedAt: metadata !10) >> >> >> >> ; Column defaulted to 0. >> >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >> >> >> >> (What colour should that bike shed be?) >> >> >> >> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows >> >> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M >> >> line >> >> table entries. The cost of these is ~180B each, for another >> >> ~600MB. >> >> >> >> If we integrate a side-table of `MDLineTable`s into its uniquing, >> >> the overhead is only ~12B / line table entry, or ~80MB. This saves >> >> 520MB. >> >> >> >> This is somewhat perpendicular to redesigning the metadata format, >> >> but IMO it's worth doing as soon as it's possible. >> >> >> >> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` >> >> through an intermediate class `DebugMDNode` with an >> >> allocation-time-optional `CallbackVH` available for referencing >> >> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` >> >> instead >> >> of an `MDNode`. >> >> >> >> This saves another ~960MB, for a running total of ~2GB. >> > >> > >> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have >> > a >> > single pie slice near 40% of the # of Value's allocated and another at >> > 21%. >> > Especially this being "step 4". >> > >> > As a rough back of the envelope calculation, dividing 15.3GB by ~24 >> > million >> > Values gives about 600 bytes per Value. That seems sort of excessive >> > (but is >> > it realistic?). All of the data types that you are proposing to shrink >> > fall >> > far short of this "average size", meaning that if you are trying to >> > reduce >> > memory usage, you might be looking in the wrong place. Something smells >> > fishy. At the very least, this would indicate that the real memory usage >> > is >> > elsewhere. >> > >> > A pie chart breaking down the total memory usage seems essential to have >> > here. >> > >> >> >> >> >> >> Proposed assembly syntax: >> >> >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, >> >> fields: "0\00clang >> >> 3.6\00...", >> >> operands: { metadata !8, ... >> >> }) >> >> >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, >> >> fields: "global_var\00...", >> >> operands: { metadata !8, ... >> >> }, >> >> handle: i32* @global_var) >> >> >> >> This syntax pulls the tag out of the current header-string, calls >> >> the rest of the header "fields", and includes the metadata operands >> >> in "operands". >> >> >> >> 5. Incrementally create subclasses of `DebugMDNode`, such as >> >> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the >> >> "fields" and "operands" catch-alls with explicit names for each >> >> operand. >> >> >> >> Proposed assembly syntax: >> >> >> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: >> >> "foo", >> >> linkageName: "_Z3foov", file: >> >> metadata >> >> !8, >> >> function: i32 (i32)* @foo) >> >> >> >> 6. Remove the dead code for `GenericDebugMDNode`. >> >> >> >> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW >> >> traffic during bitcode serialization. Now that metadata types are >> >> known, we can write debug info out in an order that makes it cheap >> >> to read back in. >> >> >> >> Note that using `MDUser` will make RAUW much cheaper, since we're >> >> using the use-list infrastructure for most of them. If RAUW isn't >> >> showing up in a profile, I may skip this. >> >> >> >> Does this direction seem reasonable? Any major problems I've missed? >> > >> > >> > You need more data. Right now you have essentially one data point, and >> > it's >> > not even clear what you measured really. If your goal is saving memory, >> > I >> > would expect at least a pie chart that breaks down LLVM's memory usage >> > (not >> > just # of allocations of different sorts; an approximation is fine, as >> > long >> > as you explain how you arrived at it and in what sense it approximates >> > the >> > true number). >> > >> > Do the numbers change significantly for different projects? (e.g. >> > Chromium >> > or Firefox or a kernel or a large app you have handy to compile with >> > LTO?). >> > If you have specific data you want (and a suggestion for how to gather >> > it), >> > I can also get your numbers for one of our internal games as well. >> > >> > Once you have some more data, then as a first step, I would like to see >> > an >> > analysis of how much we can "ideally" expect to gain (back of the >> > envelope >> > calculations == win). >> > >> > -- Sean Silva >> > >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > > >
Sean Silva
2014-Oct-15 21:32 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On Wed, Oct 15, 2014 at 2:31 PM, Eric Christopher <echristo at gmail.com> wrote:> On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > > > On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com> > > wrote: > >> > >> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> > wrote: > >> > For those interested, I've attached some pie charts based on Duncan's > >> > data > >> > in one of the other posts; successive slides break down the usage > >> > increasingly finely. To my understanding, they represent the number of > >> > Value's (and subclasses) allocated. > >> > > >> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith > >> > <dexonsmith at apple.com> wrote: > >> >> > >> >> In r219010, I merged integer and string fields into a single header > >> >> field. By reducing the number of metadata operands used in debug > info, > >> >> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some > profiling > >> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, > and > >> >> I've concluded that they will be insufficient. > >> >> > >> >> Instead, I'd like to implement a more aggressive plan, which as a > >> >> side-effect cleans up the much "loved" debug info IR assembly syntax. > >> >> > >> >> At a high-level, the idea is to create distinct subclasses of `Value` > >> >> for each debug info concept, starting with line table entries and > >> >> moving > >> >> on to the DIDescriptor hierarchy. By leveraging the use-list > >> >> infrastructure for metadata operands -- i.e., only using value > handles > >> >> for non-metadata operands -- we'll improve memory usage and increase > >> >> RAUW speed. > >> >> > >> >> My rough plan follows. I quote some numbers for memory savings below > >> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running > `llvm-lto` > >> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's > >> >> -save-temps option) that currently peaks at 15.3GB. > >> > > >> > > >> > Stupid question, but when I was working on LTO last Summer the primary > >> > culprit for excessive memory use was due to us not being smart when > >> > linking > >> > the IR together (Espindola would know more details). Do we still have > >> > that > >> > problem? For starters, how does the memory usage of just llvm-link > >> > compare > >> > to the memory usage of the actual LTO run? If the issue I was seeing > >> > last > >> > Summer is still there, you should see that the invocation of llvm-link > >> > is > >> > actually the most memory-intensive part of the LTO step, by far. > >> > > >> > >> This is vague. Could you be more specific on where you saw all of the > >> memory? > > > > > > Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g; > > without -g it completed with much less). The increasing could be easily > > watched on the system "process monitor" in real time. > > > > This is likely what we've already discussed and was handled a long > while ago now. > >I was reading the thread in sequential order (and replying without finishing). derp. -- Sean Silva> -eric > > > -- Sean Silva > > > >> > >> > >> -eric > >> > >> > > >> > Also, you seem to really like saying "peak" here. Is there a definite > >> > peak? > >> > When does it occur? > >> > > >> > > >> >> > >> >> > >> >> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s > >> >> must all be metadata. The cost per operand is 1 pointer, vs. 4 > >> >> pointers in an `MDNode`. > >> >> > >> >> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use > normal > >> >> fields (not `Value`s) for the line and column, and use `Use` > >> >> operands for the metadata operands. > >> >> > >> >> On x86-64, this will save 104B / line table entry. Linking > >> >> `llvm-lto` uses ~7M line-table entries, so this on its own saves > >> >> ~700MB. > >> >> > >> >> > >> >> Sketch of class definition: > >> >> > >> >> class MDLineTable : public MDUser { > >> >> unsigned Line; > >> >> unsigned Column; > >> >> public: > >> >> static MDLineTable *get(unsigned Line, unsigned Column, > >> >> MDNode *Scope); > >> >> static MDLineTable *getInlined(MDLineTable *Base, MDNode > >> >> *Scope); > >> >> static MDLineTable *getBase(MDLineTable *Inlined); > >> >> > >> >> unsigned getLine() const { return Line; } > >> >> unsigned getColumn() const { return Column; } > >> >> bool isInlined() const { return getNumOperands() == 2; } > >> >> MDNode *getScope() const { return getOperand(0); } > >> >> MDNode *getInlinedAt() const { return getOperand(1); } > >> >> }; > >> >> > >> >> Proposed assembly syntax: > >> >> > >> >> ; Not inlined. > >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: > metadata > >> >> !9) > >> >> > >> >> ; Inlined. > >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: > metadata > >> >> !9, > >> >> inlinedAt: metadata !10) > >> >> > >> >> ; Column defaulted to 0. > >> >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) > >> >> > >> >> (What colour should that bike shed be?) > >> >> > >> >> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows > >> >> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M > >> >> line > >> >> table entries. The cost of these is ~180B each, for another > >> >> ~600MB. > >> >> > >> >> If we integrate a side-table of `MDLineTable`s into its uniquing, > >> >> the overhead is only ~12B / line table entry, or ~80MB. This > saves > >> >> 520MB. > >> >> > >> >> This is somewhat perpendicular to redesigning the metadata > format, > >> >> but IMO it's worth doing as soon as it's possible. > >> >> > >> >> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser` > >> >> through an intermediate class `DebugMDNode` with an > >> >> allocation-time-optional `CallbackVH` available for referencing > >> >> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` > >> >> instead > >> >> of an `MDNode`. > >> >> > >> >> This saves another ~960MB, for a running total of ~2GB. > >> > > >> > > >> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we > have > >> > a > >> > single pie slice near 40% of the # of Value's allocated and another at > >> > 21%. > >> > Especially this being "step 4". > >> > > >> > As a rough back of the envelope calculation, dividing 15.3GB by ~24 > >> > million > >> > Values gives about 600 bytes per Value. That seems sort of excessive > >> > (but is > >> > it realistic?). All of the data types that you are proposing to shrink > >> > fall > >> > far short of this "average size", meaning that if you are trying to > >> > reduce > >> > memory usage, you might be looking in the wrong place. Something > smells > >> > fishy. At the very least, this would indicate that the real memory > usage > >> > is > >> > elsewhere. > >> > > >> > A pie chart breaking down the total memory usage seems essential to > have > >> > here. > >> > > >> >> > >> >> > >> >> Proposed assembly syntax: > >> >> > >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit, > >> >> fields: "0\00clang > >> >> 3.6\00...", > >> >> operands: { metadata !8, > ... > >> >> }) > >> >> > >> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable, > >> >> fields: "global_var\00...", > >> >> operands: { metadata !8, > ... > >> >> }, > >> >> handle: i32* @global_var) > >> >> > >> >> This syntax pulls the tag out of the current header-string, calls > >> >> the rest of the header "fields", and includes the metadata > operands > >> >> in "operands". > >> >> > >> >> 5. Incrementally create subclasses of `DebugMDNode`, such as > >> >> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace > the > >> >> "fields" and "operands" catch-alls with explicit names for each > >> >> operand. > >> >> > >> >> Proposed assembly syntax: > >> >> > >> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", > displayName: > >> >> "foo", > >> >> linkageName: "_Z3foov", file: > >> >> metadata > >> >> !8, > >> >> function: i32 (i32)* @foo) > >> >> > >> >> 6. Remove the dead code for `GenericDebugMDNode`. > >> >> > >> >> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW > >> >> traffic during bitcode serialization. Now that metadata types > are > >> >> known, we can write debug info out in an order that makes it > cheap > >> >> to read back in. > >> >> > >> >> Note that using `MDUser` will make RAUW much cheaper, since we're > >> >> using the use-list infrastructure for most of them. If RAUW > isn't > >> >> showing up in a profile, I may skip this. > >> >> > >> >> Does this direction seem reasonable? Any major problems I've missed? > >> > > >> > > >> > You need more data. Right now you have essentially one data point, and > >> > it's > >> > not even clear what you measured really. If your goal is saving > memory, > >> > I > >> > would expect at least a pie chart that breaks down LLVM's memory usage > >> > (not > >> > just # of allocations of different sorts; an approximation is fine, as > >> > long > >> > as you explain how you arrived at it and in what sense it approximates > >> > the > >> > true number). > >> > > >> > Do the numbers change significantly for different projects? (e.g. > >> > Chromium > >> > or Firefox or a kernel or a large app you have handy to compile with > >> > LTO?). > >> > If you have specific data you want (and a suggestion for how to gather > >> > it), > >> > I can also get your numbers for one of our internal games as well. > >> > > >> > Once you have some more data, then as a first step, I would like to > see > >> > an > >> > analysis of how much we can "ideally" expect to gain (back of the > >> > envelope > >> > calculations == win). > >> > > >> > -- Sean Silva > >> > > >> >> > >> >> > >> >> _______________________________________________ > >> >> LLVM Developers mailing list > >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > >> > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141015/ff744eac/attachment.html>
Diego Novillo
2014-Oct-17 14:27 UTC
[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
On 15/10/2014, 17:31 , Eric Christopher wrote:> On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote: >> Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g; >> without -g it completed with much less). The increasing could be easily >> watched on the system "process monitor" in real time. >> > This is likely what we've already discussed and was handled a long > while ago now.Wait, really? I can definitely get my 64Gb box to thrash just trying to llvm-link -g bitcode files. By 'handled' you mean fixed in trunk or 'plan to fix'? Diego.