Eric Christopher via llvm-dev
2016-Mar-30 02:43 UTC
[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm
On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk> wrote:> Thanks for sharing this. Mostly seems like a reasonable plan to me. A few > comments below. > >Thanks Peter!> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > >> Hi All, >> >> This is something that's been talked about for some time and it's >> probably time to propose it. >> >> The "We" in this document is everyone on the cc line plus me. >> >> Please go ahead and take a look. >> >> Thanks! >> >> -eric >> >> >> Objective (and TL;DR) >> ================>> >> Migrate debug type information generation from the backends to the front >> end. >> >> This will enable: >> 1. Separation of concerns and maintainability: LLVM shouldn’t have to >> know about C preprocessor macros, Obj-C properties, or extensive details >> about debug information binary formats. >> 2. Performance: Skipping a serialization should speed up normal >> compilations. >> 3. Memory usage: The DI metadata structures are smaller than they were, >> but are still fairly large and pointer heavy. >> >> Motivation >> =======>> >> Currently, types in LLVM debug info are described by the DIType class >> hierarchy. This hierarchy evolved organically from a more flexible >> sea-of-nodes representation into what it is today - a large, only somewhat >> format neutral representation of debug types. Making this more format >> neutral will only increase the memory use - and for no reason as type >> information is static (or nearly so). Debug formats already have a memory >> efficient serialization, their own binary format so we should support a >> front end emitting type information with sufficient representation to allow >> the backend to emit debug information based on the more normal IR features: >> functions, scopes, variables, etc. >> >> Scope/Impact >> ==========>> >> This is going to involve large scale changes across both LLVM and clang. >> This will also affect any out-of-tree front ends, however, we expect the >> impact to be on the order of a large API change rather than needing massive >> infrastructure changes. >> >> Related work >> =========>> >> This is related to the efforts to support CodeView in LLVM and clang as >> well as efforts to reduce overall memory consumption when compiling with >> debug information enabled; in particular efforts to prune LTO memory usage. >> >> >> Concerns >> =======>> >> >> We need a good story for transitioning all the debug info testcases in >> the backend without giving up coverage and/or readability. David believes >> he has a plan here. >> >> Proposal >> ======>> >> Short version >> ----------------- >> >> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line >> Table. >> 2. Split the clang CGDebugInfo API into Types and Line Table to match. >> 3. Add a LLVM DWARF emission library similar to the existing CodeView one. >> 4. Migrate the Types API into a clang internal API taking clang AST >> structures and use the LLVM binary emission libraries to produce type >> information. >> 5. Remove the old binary emission out of LLVM. >> >> >> Questions/Thoughts/Elaboration >> ------------------------------------------- >> >> Splitting the DIBuilder API >> ~~~~~~~~~~~~~~~~~~~~ >> Will DISubprogram be part of both? >> * We should split it in two: Full declarations with type and a slimmed >> down version with an abstract origin. >> >> How will we reference types in the DWARF blob? >> * ODR types can be referenced by name >> * Non-odr types by full DWARF hash >> * Each type can be a pair(tuple) of identifier (DITypeRef today) and >> blob. >> * For < DWARF4 we can emit each type as a unit, but not a DWARF Type >> Unit and use references and module relocations for the offsets. (See below) >> >> How will we handle references in DWARF2 or global relocations for >> non-type template parameters? >> * We can use a “relocation” metadata as part of the format. >> * Representable as a tuple that has the DIType and the offset within >> the DIBlob as where to write the final relocation/offset for the reference >> at emission time. >> >> Why break up the types at all? >> * To enable non-debug format aware linking and type uniquing for LTO >> that won’t be huge in size. We break up the types so we don’t need to parse >> debug information to link two modules together efficiently. >> > > How do you plan to handle abbreviations? You wouldn't necessarily be able > to embed them directly in the blob, as when doing LTO each compilation unit > would have its own set of abbreviations. I suppose you could do something > like treat them as a special sort of reference to an abbreviation table > entry, or maybe pre-allocate in the frontend (but would complicate > cross-frontend LTO) but curious what you have in mind. >Thanks for reminding me, I knew I was forgetting something I'd talked about when writing all of this down. :) Basically to handle abbreviations you can do them the similarly to types by creating a blob with an index/hash/etc and then reference that as part of the type tuple, e.g.: $1 = { DIAbbrev: 0x1234, DIBlob: <blah> } $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> } and keep them uniqued during emission and remember to merge these as well during module merge time.> > Any other concerns there? >> * Debug information without type units might be slightly larger in >> this scheme due to parents being duplicated (declarations and abstract >> origin, not full parents). It may be possible to extend dsymutil/etc to >> merge all siblings into a common parent. Open question for better ways to >> solve this. >> > > When we were thinking about teaching the backend to produce blobs from IR > metadata we were thinking about cases where the debug info emitter would > discover special member functions during IR traversal. I guess since we're > moving all of that to the frontend we can just ask the frontend directly > which special members are needed on the class. That solves the problem for > a single translation unit. But what do you plan to do in the multiple > translation unit case where two TUs declare different special members on a > class? Would it be fine to just emit the two definitions and let the > debugger sort it out? I guess this is the type of thing that debuggers > normally deal with in the non-LTO case, so I suppose so? >Pretty much. This is one area where I have... disagreements with the DWARF committee and I don't think there's anything else we can do here. TBH right now I think we'd have issues with type units and special member functions since we're using ODR-ness to unique. -eric> > >> How should we handle DWARF5/Apple Accelerator Tables? >> * Thoughts: >> * We can parse the dwarf in the back end and generate them. >> * We can emit in the front end for the base case of non-LTO (with help >> from the backend for relocation aspects). >> * We can use dsymutil on LTO debug information to generate them. >> >> Why isn’t this a more detailed spec? >> * Mostly because we’ve thought about the issues, but we can’t plan for >> everything during implementation. >> >> >> Future work >> ---------------- >> >> Not contained as part of this, but an obvious future direction is that >> the Module linker could grow support for debug aware linking. Then we can >> have all of the type information for a single translation unit in a single >> blob and use the debug aware linking to handle merging types. >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> >> > > > -- > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/d0b6f8d1/attachment-0001.html>
Peter Collingbourne via llvm-dev
2016-Mar-30 03:11 UTC
[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm
On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at gmail.com> wrote:> > > On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk> > wrote: > >> Thanks for sharing this. Mostly seems like a reasonable plan to me. A few >> comments below. >> >> > Thanks Peter! > > >> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev < >> cfe-dev at lists.llvm.org> wrote: >> >>> Hi All, >>> >>> This is something that's been talked about for some time and it's >>> probably time to propose it. >>> >>> The "We" in this document is everyone on the cc line plus me. >>> >>> Please go ahead and take a look. >>> >>> Thanks! >>> >>> -eric >>> >>> >>> Objective (and TL;DR) >>> ================>>> >>> Migrate debug type information generation from the backends to the front >>> end. >>> >>> This will enable: >>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to >>> know about C preprocessor macros, Obj-C properties, or extensive details >>> about debug information binary formats. >>> 2. Performance: Skipping a serialization should speed up normal >>> compilations. >>> 3. Memory usage: The DI metadata structures are smaller than they were, >>> but are still fairly large and pointer heavy. >>> >>> Motivation >>> =======>>> >>> Currently, types in LLVM debug info are described by the DIType class >>> hierarchy. This hierarchy evolved organically from a more flexible >>> sea-of-nodes representation into what it is today - a large, only somewhat >>> format neutral representation of debug types. Making this more format >>> neutral will only increase the memory use - and for no reason as type >>> information is static (or nearly so). Debug formats already have a memory >>> efficient serialization, their own binary format so we should support a >>> front end emitting type information with sufficient representation to allow >>> the backend to emit debug information based on the more normal IR features: >>> functions, scopes, variables, etc. >>> >>> Scope/Impact >>> ==========>>> >>> This is going to involve large scale changes across both LLVM and clang. >>> This will also affect any out-of-tree front ends, however, we expect the >>> impact to be on the order of a large API change rather than needing massive >>> infrastructure changes. >>> >>> Related work >>> =========>>> >>> This is related to the efforts to support CodeView in LLVM and clang as >>> well as efforts to reduce overall memory consumption when compiling with >>> debug information enabled; in particular efforts to prune LTO memory usage. >>> >>> >>> Concerns >>> =======>>> >>> >>> We need a good story for transitioning all the debug info testcases in >>> the backend without giving up coverage and/or readability. David believes >>> he has a plan here. >>> >>> Proposal >>> ======>>> >>> Short version >>> ----------------- >>> >>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line >>> Table. >>> 2. Split the clang CGDebugInfo API into Types and Line Table to match. >>> 3. Add a LLVM DWARF emission library similar to the existing CodeView >>> one. >>> 4. Migrate the Types API into a clang internal API taking clang AST >>> structures and use the LLVM binary emission libraries to produce type >>> information. >>> 5. Remove the old binary emission out of LLVM. >>> >>> >>> Questions/Thoughts/Elaboration >>> ------------------------------------------- >>> >>> Splitting the DIBuilder API >>> ~~~~~~~~~~~~~~~~~~~~ >>> Will DISubprogram be part of both? >>> * We should split it in two: Full declarations with type and a >>> slimmed down version with an abstract origin. >>> >>> How will we reference types in the DWARF blob? >>> * ODR types can be referenced by name >>> * Non-odr types by full DWARF hash >>> * Each type can be a pair(tuple) of identifier (DITypeRef today) and >>> blob. >>> * For < DWARF4 we can emit each type as a unit, but not a DWARF Type >>> Unit and use references and module relocations for the offsets. (See below) >>> >>> How will we handle references in DWARF2 or global relocations for >>> non-type template parameters? >>> * We can use a “relocation” metadata as part of the format. >>> * Representable as a tuple that has the DIType and the offset within >>> the DIBlob as where to write the final relocation/offset for the reference >>> at emission time. >>> >>> Why break up the types at all? >>> * To enable non-debug format aware linking and type uniquing for LTO >>> that won’t be huge in size. We break up the types so we don’t need to parse >>> debug information to link two modules together efficiently. >>> >> >> How do you plan to handle abbreviations? You wouldn't necessarily be able >> to embed them directly in the blob, as when doing LTO each compilation unit >> would have its own set of abbreviations. I suppose you could do something >> like treat them as a special sort of reference to an abbreviation table >> entry, or maybe pre-allocate in the frontend (but would complicate >> cross-frontend LTO) but curious what you have in mind. >> > > Thanks for reminding me, I knew I was forgetting something I'd talked > about when writing all of this down. :) > > Basically to handle abbreviations you can do them the similarly to types > by creating a blob with an index/hash/etc and then reference that as part > of the type tuple, e.g.: > > $1 = { DIAbbrev: 0x1234, DIBlob: <blah> } > $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> } > > and keep them uniqued during emission and remember to merge these as well > during module merge time. >Makes sense, but wouldn't you need multiple abbreviations for each DIType, in order to represent DITypes formed of multiple DIEs (e.g. enums, records)? Maybe something like this would work: $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> } $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> } $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob: <8 bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator attrs><8 bytes of DW_TAG_enumerator attrs><0> } ?> >> >> Any other concerns there? >>> * Debug information without type units might be slightly larger in >>> this scheme due to parents being duplicated (declarations and abstract >>> origin, not full parents). It may be possible to extend dsymutil/etc to >>> merge all siblings into a common parent. Open question for better ways to >>> solve this. >>> >> >> When we were thinking about teaching the backend to produce blobs from IR >> metadata we were thinking about cases where the debug info emitter would >> discover special member functions during IR traversal. I guess since we're >> moving all of that to the frontend we can just ask the frontend directly >> which special members are needed on the class. That solves the problem for >> a single translation unit. But what do you plan to do in the multiple >> translation unit case where two TUs declare different special members on a >> class? Would it be fine to just emit the two definitions and let the >> debugger sort it out? I guess this is the type of thing that debuggers >> normally deal with in the non-LTO case, so I suppose so? >> > > Pretty much. This is one area where I have... disagreements with the DWARF > committee and I don't think there's anything else we can do here. TBH right > now I think we'd have issues with type units and special member functions > since we're using ODR-ness to unique. > > -eric > > >> >> >>> How should we handle DWARF5/Apple Accelerator Tables? >>> * Thoughts: >>> * We can parse the dwarf in the back end and generate them. >>> * We can emit in the front end for the base case of non-LTO (with >>> help from the backend for relocation aspects). >>> * We can use dsymutil on LTO debug information to generate them. >>> >>> Why isn’t this a more detailed spec? >>> * Mostly because we’ve thought about the issues, but we can’t plan >>> for everything during implementation. >>> >>> >>> Future work >>> ---------------- >>> >>> Not contained as part of this, but an obvious future direction is that >>> the Module linker could grow support for debug aware linking. Then we can >>> have all of the type information for a single translation unit in a single >>> blob and use the debug aware linking to handle merging types. >>> >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>> >>> >> >> >> -- >> -- >> Peter >> >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160329/0b043e48/attachment.html>
Eric Christopher via llvm-dev
2016-Mar-30 03:15 UTC
[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm
On Tue, Mar 29, 2016 at 8:11 PM Peter Collingbourne <peter at pcc.me.uk> wrote:> On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at gmail.com> > wrote: > >> >> >> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk> >> wrote: >> >>> Thanks for sharing this. Mostly seems like a reasonable plan to me. A >>> few comments below. >>> >>> >> Thanks Peter! >> >> >>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev < >>> cfe-dev at lists.llvm.org> wrote: >>> >>>> Hi All, >>>> >>>> This is something that's been talked about for some time and it's >>>> probably time to propose it. >>>> >>>> The "We" in this document is everyone on the cc line plus me. >>>> >>>> Please go ahead and take a look. >>>> >>>> Thanks! >>>> >>>> -eric >>>> >>>> >>>> Objective (and TL;DR) >>>> ================>>>> >>>> Migrate debug type information generation from the backends to the >>>> front end. >>>> >>>> This will enable: >>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to >>>> know about C preprocessor macros, Obj-C properties, or extensive details >>>> about debug information binary formats. >>>> 2. Performance: Skipping a serialization should speed up normal >>>> compilations. >>>> 3. Memory usage: The DI metadata structures are smaller than they were, >>>> but are still fairly large and pointer heavy. >>>> >>>> Motivation >>>> =======>>>> >>>> Currently, types in LLVM debug info are described by the DIType class >>>> hierarchy. This hierarchy evolved organically from a more flexible >>>> sea-of-nodes representation into what it is today - a large, only somewhat >>>> format neutral representation of debug types. Making this more format >>>> neutral will only increase the memory use - and for no reason as type >>>> information is static (or nearly so). Debug formats already have a memory >>>> efficient serialization, their own binary format so we should support a >>>> front end emitting type information with sufficient representation to allow >>>> the backend to emit debug information based on the more normal IR features: >>>> functions, scopes, variables, etc. >>>> >>>> Scope/Impact >>>> ==========>>>> >>>> This is going to involve large scale changes across both LLVM and >>>> clang. This will also affect any out-of-tree front ends, however, we expect >>>> the impact to be on the order of a large API change rather than needing >>>> massive infrastructure changes. >>>> >>>> Related work >>>> =========>>>> >>>> This is related to the efforts to support CodeView in LLVM and clang as >>>> well as efforts to reduce overall memory consumption when compiling with >>>> debug information enabled; in particular efforts to prune LTO memory usage. >>>> >>>> >>>> Concerns >>>> =======>>>> >>>> >>>> We need a good story for transitioning all the debug info testcases in >>>> the backend without giving up coverage and/or readability. David believes >>>> he has a plan here. >>>> >>>> Proposal >>>> ======>>>> >>>> Short version >>>> ----------------- >>>> >>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line >>>> Table. >>>> 2. Split the clang CGDebugInfo API into Types and Line Table to match. >>>> 3. Add a LLVM DWARF emission library similar to the existing CodeView >>>> one. >>>> 4. Migrate the Types API into a clang internal API taking clang AST >>>> structures and use the LLVM binary emission libraries to produce type >>>> information. >>>> 5. Remove the old binary emission out of LLVM. >>>> >>>> >>>> Questions/Thoughts/Elaboration >>>> ------------------------------------------- >>>> >>>> Splitting the DIBuilder API >>>> ~~~~~~~~~~~~~~~~~~~~ >>>> Will DISubprogram be part of both? >>>> * We should split it in two: Full declarations with type and a >>>> slimmed down version with an abstract origin. >>>> >>>> How will we reference types in the DWARF blob? >>>> * ODR types can be referenced by name >>>> * Non-odr types by full DWARF hash >>>> * Each type can be a pair(tuple) of identifier (DITypeRef today) and >>>> blob. >>>> * For < DWARF4 we can emit each type as a unit, but not a DWARF Type >>>> Unit and use references and module relocations for the offsets. (See below) >>>> >>>> How will we handle references in DWARF2 or global relocations for >>>> non-type template parameters? >>>> * We can use a “relocation” metadata as part of the format. >>>> * Representable as a tuple that has the DIType and the offset within >>>> the DIBlob as where to write the final relocation/offset for the reference >>>> at emission time. >>>> >>>> Why break up the types at all? >>>> * To enable non-debug format aware linking and type uniquing for LTO >>>> that won’t be huge in size. We break up the types so we don’t need to parse >>>> debug information to link two modules together efficiently. >>>> >>> >>> How do you plan to handle abbreviations? You wouldn't necessarily be >>> able to embed them directly in the blob, as when doing LTO each compilation >>> unit would have its own set of abbreviations. I suppose you could do >>> something like treat them as a special sort of reference to an abbreviation >>> table entry, or maybe pre-allocate in the frontend (but would complicate >>> cross-frontend LTO) but curious what you have in mind. >>> >> >> Thanks for reminding me, I knew I was forgetting something I'd talked >> about when writing all of this down. :) >> >> Basically to handle abbreviations you can do them the similarly to types >> by creating a blob with an index/hash/etc and then reference that as part >> of the type tuple, e.g.: >> >> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> } >> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> } >> >> and keep them uniqued during emission and remember to merge these as well >> during module merge time. >> > > Makes sense, but wouldn't you need multiple abbreviations for each DIType, > in order to represent DITypes formed of multiple DIEs (e.g. enums, records)? > > Maybe something like this would work: > > $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> } > $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> } > $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob: <8 > bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator > attrs><8 bytes of DW_TAG_enumerator attrs><0> } > > ? >*nod* That (or something similar) will work. -eric> > >> >>> >>> Any other concerns there? >>>> * Debug information without type units might be slightly larger in >>>> this scheme due to parents being duplicated (declarations and abstract >>>> origin, not full parents). It may be possible to extend dsymutil/etc to >>>> merge all siblings into a common parent. Open question for better ways to >>>> solve this. >>>> >>> >>> When we were thinking about teaching the backend to produce blobs from >>> IR metadata we were thinking about cases where the debug info emitter would >>> discover special member functions during IR traversal. I guess since we're >>> moving all of that to the frontend we can just ask the frontend directly >>> which special members are needed on the class. That solves the problem for >>> a single translation unit. But what do you plan to do in the multiple >>> translation unit case where two TUs declare different special members on a >>> class? Would it be fine to just emit the two definitions and let the >>> debugger sort it out? I guess this is the type of thing that debuggers >>> normally deal with in the non-LTO case, so I suppose so? >>> >> >> Pretty much. This is one area where I have... disagreements with the >> DWARF committee and I don't think there's anything else we can do here. TBH >> right now I think we'd have issues with type units and special member >> functions since we're using ODR-ness to unique. >> >> -eric >> >> >>> >>> >>>> How should we handle DWARF5/Apple Accelerator Tables? >>>> * Thoughts: >>>> * We can parse the dwarf in the back end and generate them. >>>> * We can emit in the front end for the base case of non-LTO (with >>>> help from the backend for relocation aspects). >>>> * We can use dsymutil on LTO debug information to generate them. >>>> >>>> Why isn’t this a more detailed spec? >>>> * Mostly because we’ve thought about the issues, but we can’t plan >>>> for everything during implementation. >>>> >>>> >>>> Future work >>>> ---------------- >>>> >>>> Not contained as part of this, but an obvious future direction is that >>>> the Module linker could grow support for debug aware linking. Then we can >>>> have all of the type information for a single translation unit in a single >>>> blob and use the debug aware linking to handle merging types. >>>> >>>> _______________________________________________ >>>> cfe-dev mailing list >>>> cfe-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>>> >>>> >>> >>> >>> -- >>> -- >>> Peter >>> >> > > > -- > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/635cb17b/attachment.html>
Possibly Parallel Threads
- [cfe-dev] RFC: Up front type information generation in clang and llvm
- [cfe-dev] RFC: Up front type information generation in clang and llvm
- [cfe-dev] RFC: Up front type information generation in clang and llvm
- [cfe-dev] RFC: Up front type information generation in clang and llvm
- [cfe-dev] RFC: Up front type information generation in clang and llvm