Jeremy Morse via llvm-dev
2020-Nov-17 19:20 UTC
[llvm-dev] [DebugInfo] Enabling constructor homing by default
Hi debug-info folks, I've recently been experimenting with the -debug-info-kind=constructor model for debug-info creation, which is leading to some significant reductions in .debug_info on our large C++ benchmarks, which is great! I see in PR46537 that there's a plan to eventually enable this by default -- is this something we can target for LLVM12, or are there outstanding issues? While experimenting I was also interested to see that for DWARF and constructor homing, we emit skeleton type definitions if functions have an inlined copy in the translation unit. For example in [0] where I've uploaded a couple of dexter tests for constructor homing, in partial-type/main.cpp we get: DW_TAG_class_type DW_AT_name ("foo") DW_AT_declaration (true) DW_TAG_subprogram DW_AT_linkage_name ("_ZNK3foo8asStringB5cxx11Ev") DW_AT_name ("asString") DW_AT_decl_file ("./theclass.h") DW_AT_decl_line (12) DW_AT_type (0x000014e6 "string") DW_AT_declaration (true) DW_AT_external (true) DW_AT_accessibility (DW_ACCESS_public) DW_TAG_formal_parameter DW_AT_type (0x0000371b "const foo*") DW_AT_artificial (true) NULL And as expected no further type information (aside from the destructor, also inlined). It seems gdb and lldb are able to find the full type definition even when there's a skeleton type. When exploring this with Paul, we worried a bit that LTO could de-duplicate to the skeleton type definition rather than the full one, is there protection against that happening somewhere? [0] https://reviews.llvm.org/D91648 -- Thanks, Jeremy
David Blaikie via llvm-dev
2020-Nov-17 19:52 UTC
[llvm-dev] [DebugInfo] Enabling constructor homing by default
On Tue, Nov 17, 2020 at 11:20 AM Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote:> > Hi debug-info folks, > > I've recently been experimenting with the -debug-info-kind=constructor > model for debug-info creation, which is leading to some significant > reductions in .debug_info on our large C++ benchmarks, which is great! > I see in PR46537 that there's a plan to eventually enable this by > default -- is this something we can target for LLVM12, or are there > outstanding issues?There's some discussion around the issue found/patch proposed here: https://reviews.llvm.org/D90719 - I hope we can fix libc++ instead of adding a workaround in ctor homing itself.> While experimenting I was also interested to see that for DWARF and > constructor homing, we emit skeleton type definitions if functions > have an inlined copy in the translation unit. For example in [0] where > I've uploaded a couple of dexter tests for constructor homing, in > partial-type/main.cpp we get: > > DW_TAG_class_type > DW_AT_name ("foo") > DW_AT_declaration (true) > > DW_TAG_subprogram > DW_AT_linkage_name ("_ZNK3foo8asStringB5cxx11Ev") > DW_AT_name ("asString") > DW_AT_decl_file ("./theclass.h") > DW_AT_decl_line (12) > DW_AT_type (0x000014e6 "string") > DW_AT_declaration (true) > DW_AT_external (true) > DW_AT_accessibility (DW_ACCESS_public) > > DW_TAG_formal_parameter > DW_AT_type (0x0000371b "const foo*") > DW_AT_artificial (true) > > NULL > > And as expected no further type information (aside from the > destructor, also inlined). It seems gdb and lldb are able to find the > full type definition even when there's a skeleton type.Yep, this kind of DWARF is already generated for a number of other cases - the other two forms of type homing that are implemented in clang already: * vtable based type homing (gcc implements this as well): struct t1 { virtual void f1(); }; t1 v1; // use.cpp void t1::f1() { } // definition.cpp (the file with "use" will not have a definition of t1, the definition of t1 will appear in the file containing the definition of f1) * explicit template instantiation decl/def (gcc doesn't implement this): template<typename T> struct t1 { }; extern template struct t1<int>; // any use of t1<int> that is covered by this decl will have t1 as a declaration, not a definition template struct t1<int>; // this will force the definition of t1<int> to be emitted, even if it's otherwise unreferenced All sorts of members can appear in these skeletal definitions. Even non-inline members can appear there - if the ctor isn't defined in this translation unit, for instance (eg: if you have an inline ctor - and your implementation file defines the non-inline members, as it should, then the type may not be defined in the DWARF for that translation unit). This sort of behavior is also seen with type units - where the type is declared in the CU (DW_TAG_structure_type with DW_AT_declaration true and DW_AT_signature) but then any members that need to be referenced (eg: member function declarations that need to be referenced from member function definitions outside the type definition/type unit - or nested types that need to be referenced, etc) are included in this skeletal type declaration. GCC implements this the same way. Because GCC implements the vtable homing and the type unit member situation the same way (not a coincidence, I copied both of these from GCC when working on reducing Clang's debug info size), it's a pretty solid foundation to build other homing strategies on top of.> When exploring > this with Paul, we worried a bit that LTO could de-duplicate to the > skeleton type definition rather than the full one, is there protection > against that happening somewhere?Yep - this is the same logic that's used for the simpler cases (eg: one file contains "struct x; x *y;" and another file contains "struct x { }; x z;" - LTO is already designed to deduplicate those two and prefer the definition). For more detail, consider the LLVM IR metadata representation of thees "skeleton types" - actually they're more identical to a pure declaration (as would be produced by the 'y' example above): type definitions in LLVM IR debug info metadata include a list of members, but this list is not exhaustive (even without any interesting type homing) - for instance implicit special members, member/nested types, and instantiations of member function templates - all those kinds of members do not appear in the member list, but instead they appear separately (held alive by the llvm::Function or similar that refers to them) and declare that their scope is the type they are a member of. Effectively they insert themselves into the type. This means that when two LLVM IR modules are linked together, the types can be deduplicated based on the ODR (using the types mangled name as the key) without trying to merge the member lists - but if one module has mem<int> defined and another has mem<float> defined, they naturally merge - the type is deduplicated and so the "scope" of those members that aren't in the member list naturally end up referring to the singular chosen type definition. Type homing adds the possibility that these non-member-list definitions can also be plain member functions, not in that special list of 3 kinds of entities, and that these non-member-list definitions can refer to a declaration of a type rather than a definition. Same merging happens - declaration gets deduplicated with the definition, and all these non-member-list definitions now refer to the definition. (this would be important even if there were no member functions, eg in C code one file might have "struct t1; void f1(t1*) {}" and another might have "struct t1 { }; void f2(t1) {}" and we would want to ensure that when those modules are linked together, both the type of the pointer in f1 and the type of the parameter in f2 refer to the same type - and that that type is the definition, not the declaration)> > [0] https://reviews.llvm.org/D91648 > > -- > Thanks, > Jeremy
Reasonably Related Threads
- [Proposal][Debuginfo] dsymutil-like tool for ELF.
- [PATCH] D14358: DWARF's forward decl of a template should have template parameters.
- [PATCH] D14358: DWARF's forward decl of a template should have template parameters.
- [PATCH] D14358: DWARF's forward decl of a template should have template parameters.
- DWARFv5 DW_FORM_implicit_const support in LLVM