In an effort to fix inlined information for backtraces under DWARF Fission in the absence of the split DWARF (.dwo) files, I'm planning on adding -gmlt-like data to the .o file, alongside the skeleton CU. Since that will involve teaching the LLVM about -gmlt (moreso than it already has - the debug info LLVM metadata already describes -gmlt for the purposes of omitting pubnames in that case) I figured I'd take the opportunity to move the existing -gmlt functionality to the backend to begin with, and, in doing so, minimize it a little further since we wouldn't need to emit debug info for every function - possibly just those that have functions inlined into them. So here's an example of some of my ideas about minimized debug info. I'm wondering if I'm right about what's needed for backtracing. I've removed uninteresting things, like DW_AT_accessibility (which is a bug anyway), DW_AT_external (there's no reason symbolication needs that, is there?), but also less obviously uninteresting things like DW_AT_frame_base (the location of the frame pointer - is that needed for symbolication?) Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are those needed? I don't think so. But importantly: the only DW_TAG_subprograms are either functions that have been inlined, or functions that have been inlined into. Is that enough? Is it OK that I haven't included debug info for out of line definitions of inline functions? I'm assuming all that information can be retrieved from the symbol table. (one other thing I noticed is that we don't use the mangled names for functions in -gmlt - how on earth does that work? The backtrace would look really strange if it included the unmangled names of functions - or does the symbolizer use the address range of the out of line definition (if there is one?) of the inlined function (in which case I'd need to provide it... ) to find it in the symbol table, get the mangled name, and use that?) One thing I was thinking of doing as well, is that since the DW_AT_abstract_origin just points to a trivial subprogram with a name and DW_AT_inline - perhaps instead of an abstract origin, we could just use DW_AT_name directly? (with the mangled name, probably) That'd save us emitting the extra indirection and the name is uniqued already anyway. (and DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would mean extra relocations...) - and perhaps in the near future, DW_FORM_strp could be replaced by DW_FORM_str_index to reduce relocations) So... yes/no/maybe? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/81edfe40/attachment.html> -------------- next part -------------- diff --git lib/CodeGen/CGDebugInfo.cpp lib/CodeGen/CGDebugInfo.cpp index 0b20f54..e3f23c4 100644 --- lib/CodeGen/CGDebugInfo.cpp +++ lib/CodeGen/CGDebugInfo.cpp @@ -2534,8 +2534,12 @@ void CGDebugInfo::EmitFunctionStart(GlobalDecl GD, if (Loc.isInvalid()) CurLoc = SourceLocation(); } - unsigned LineNo = getLineNumber(Loc); - unsigned ScopeLine = getLineNumber(ScopeLoc); + unsigned LineNo = 0; + unsigned ScopeLine = 0; + if (DebugKind > CodeGenOptions::DebugLineTablesOnly) { + LineNo = getLineNumber(Loc); + ScopeLine = getLineNumber(ScopeLoc); + } // FIXME: The function declaration we're constructing here is mostly reusing // declarations from CXXMethodDecl and not constructing new ones for arbitrary diff --git test/CodeGenCXX/debug-info-blocks.cpp test/CodeGenCXX/debug-info-blocks.cpp index 5b20db5..207e2b0 100644 --- test/CodeGenCXX/debug-info-blocks.cpp +++ test/CodeGenCXX/debug-info-blocks.cpp @@ -10,5 +10,5 @@ void test() { __block A a; } -// CHECK: [ DW_TAG_subprogram ] [line 10] [local] [def] [__Block_byref_object_copy_] -// CHECK: [ DW_TAG_subprogram ] [line 10] [local] [def] [__Block_byref_object_dispose_] +// CHECK: [ DW_TAG_subprogram ] [line 0] [local] [def] [__Block_byref_object_copy_] +// CHECK: [ DW_TAG_subprogram ] [line 0] [local] [def] [__Block_byref_object_dispose_] -------------- next part -------------- diff --git lib/CodeGen/AsmPrinter/DwarfDebug.cpp lib/CodeGen/AsmPrinter/DwarfDebug.cpp index 58bc96d..da48e24 100644 --- lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -319,9 +319,12 @@ DIE &DwarfDebug::updateSubprogramScopeDIE(DwarfCompileUnit &SPCU, attachLowHighPC(SPCU, *SPDie, FunctionBeginSym, FunctionEndSym); - const TargetRegisterInfo *RI = Asm->TM.getSubtargetImpl()->getRegisterInfo(); - MachineLocation Location(RI->getFrameRegister(*Asm->MF)); - SPCU.addAddress(*SPDie, dwarf::DW_AT_frame_base, Location); + if (SPCU.getCUNode().getEmissionKind() != DIBuilder::LineTablesOnly) { + const TargetRegisterInfo *RI + Asm->TM.getSubtargetImpl()->getRegisterInfo(); + MachineLocation Location(RI->getFrameRegister(*Asm->MF)); + SPCU.addAddress(*SPDie, dwarf::DW_AT_frame_base, Location); + } // Add name to the name table, we do this here because we're guaranteed // to have concrete versions of our DW_TAG_subprogram nodes. @@ -751,6 +754,11 @@ void DwarfDebug::beginModule() { for (MDNode *N : CU_Nodes->operands()) { DICompileUnit CUNode(N); DwarfCompileUnit &CU = constructDwarfCompileUnit(CUNode); + DIArray SPs = CUNode.getSubprograms(); + for (unsigned i = 0, e = SPs.getNumElements(); i != e; ++i) + SPMap.insert(std::make_pair(SPs.getElement(i), &CU)); + if (CU.getCUNode().getEmissionKind() == DIBuilder::LineTablesOnly) + continue; DIArray ImportedEntities = CUNode.getImportedEntities(); for (unsigned i = 0, e = ImportedEntities.getNumElements(); i != e; ++i) ScopesWithImportedEntities.push_back(std::make_pair( @@ -761,9 +769,6 @@ void DwarfDebug::beginModule() { DIArray GVs = CUNode.getGlobalVariables(); for (unsigned i = 0, e = GVs.getNumElements(); i != e; ++i) CU.createGlobalVariableDIE(DIGlobalVariable(GVs.getElement(i))); - DIArray SPs = CUNode.getSubprograms(); - for (unsigned i = 0, e = SPs.getNumElements(); i != e; ++i) - SPMap.insert(std::make_pair(SPs.getElement(i), &CU)); DIArray EnumTypes = CUNode.getEnumTypes(); for (unsigned i = 0, e = EnumTypes.getNumElements(); i != e; ++i) { DIType Ty(EnumTypes.getElement(i)); @@ -833,12 +838,13 @@ void DwarfDebug::finishSubprogramDefinitions() { // If this subprogram has an abstract definition, reference that SPCU->addDIEEntry(*D, dwarf::DW_AT_abstract_origin, *AbsSPDIE); } else { - if (!D) + if (!D && TheCU.getEmissionKind() != DIBuilder::LineTablesOnly) // Lazily construct the subprogram if we didn't see either concrete or // inlined versions during codegen. D = SPCU->getOrCreateSubprogramDIE(SP); - // And attach the attributes - SPCU->applySubprogramAttributesToDefinition(SP, *D); + if (D) + // And attach the attributes + SPCU->applySubprogramAttributesToDefinition(SP, *D); } } } @@ -1670,6 +1676,17 @@ void DwarfDebug::endFunction(const MachineFunction *MF) { LexicalScope *FnScope = LScopes.getCurrentFunctionScope(); DwarfCompileUnit &TheCU = *SPMap.lookup(FnScope->getScopeNode()); + if (TheCU.getCUNode().getEmissionKind() == DIBuilder::LineTablesOnly && LScopes.getAbstractScopesList().empty()) { + assert(ScopeVariables.empty()); + assert(CurrentFnArguments.empty()); + assert(DbgValues.empty()); + assert(AbstractVariables.empty()); + LabelsBeforeInsn.clear(); + LabelsAfterInsn.clear(); + PrevLabel = nullptr; + CurFn = nullptr; + return; + } // Construct abstract scopes. for (LexicalScope *AScope : LScopes.getAbstractScopesList()) { diff --git lib/CodeGen/AsmPrinter/DwarfUnit.cpp lib/CodeGen/AsmPrinter/DwarfUnit.cpp index e0be080..78ac1e6 100644 --- lib/CodeGen/AsmPrinter/DwarfUnit.cpp +++ lib/CodeGen/AsmPrinter/DwarfUnit.cpp @@ -1518,6 +1518,9 @@ void DwarfUnit::applySubprogramAttributes(DISubprogram SP, DIE &SPDie) { constructSubprogramArguments(SPDie, Args); } + if(getCUNode().getEmissionKind() == DIBuilder::LineTablesOnly) + return; + if (SP.isArtificial()) addFlag(SPDie, dwarf::DW_AT_artificial); -------------- next part -------------- A non-text attachment was scrubbed... Name: funcs.cpp Type: text/x-c++src Size: 175 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/81edfe40/attachment.cpp> -------------- next part -------------- A non-text attachment was scrubbed... Name: funcs.s Type: application/octet-stream Size: 9751 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/81edfe40/attachment.obj>
On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote:> DW_AT_frame_base (the location of the frame pointer - is that needed for > symbolication?)I think this is used by libunwind style stack unwinders in conjunction with -fomit-frame-pointer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/d8b06e02/attachment.html>
On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote:> So... yes/no/maybe?Other than the frame pointer, all of these sound abstractly good to me.... but what do I know. Looking forward to others' with more information chiming in. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/a4bc6461/attachment.html>
On Wed, Aug 27, 2014 at 4:53 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> DW_AT_frame_base (the location of the frame pointer - is that needed for >> symbolication?) > > > I think this is used by libunwind style stack unwinders in conjunction > with -fomit-frame-pointer >Sounds plausible - any way I might test such a thing? - David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/0ad3c798/attachment.html>
This sounds great. Teaching backend about the -gmlt might help us in another way: we might enforce full debug info generation in the frontend for -fsanitize= flags, then rely on some parts of this debug info in instrumentation passes and prune it before the actual object file generation. This would be somewhat similar to what -Rpass does, only it kills all the debug info, while we would need to turn full debug info into gmlt-like. Anyway, to backtracing: On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote:> In an effort to fix inlined information for backtraces under DWARF Fission > in the absence of the split DWARF (.dwo) files, I'm planning on adding > -gmlt-like data to the .o file, alongside the skeleton CU. > > Since that will involve teaching the LLVM about -gmlt (moreso than it > already has - the debug info LLVM metadata already describes -gmlt for the > purposes of omitting pubnames in that case) I figured I'd take the > opportunity to move the existing -gmlt functionality to the backend to > begin with, and, in doing so, minimize it a little further since we > wouldn't need to emit debug info for every function - possibly just those > that have functions inlined into them. >Right. Currently, if the symbolizer is unable to find a subprogram DIE corresponding to a PC, it tries to at least fetch the file/line info from the line table, and assumes that function name might be available in the symbol table.> > So here's an example of some of my ideas about minimized debug info. I'm > wondering if I'm right about what's needed for backtracing. > > I've removed uninteresting things, like DW_AT_accessibility (which is a > bug anyway), DW_AT_external (there's no reason symbolication needs that, is > there?), but also less obviously uninteresting things like DW_AT_frame_base > (the location of the frame pointer - is that needed for symbolication?) >We don't use DW_AT_accessibility and DW_AT_external. As Chandler suggests, DW_AT_frame_base might be required for unwinders, but I don't really know that.> > Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit > the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are > those needed? I don't think so. >We don't use them.> > But importantly: the only DW_TAG_subprograms are either functions that > have been inlined, or functions that have been inlined into. Is that enough? > > Is it OK that I haven't included debug info for out of line definitions of > inline functions? > > I'm assuming all that information can be retrieved from the symbol table. >See above. Looks like this information is not necessary.> > (one other thing I noticed is that we don't use the mangled names for > functions in -gmlt - how on earth does that work? >Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary size for heavily templated code). So, instead of Foo::Bar<double>::Baz we have only "Baz". And we live with that - we fetch just "Baz" from subprogram entries. If a function is not inlined, then we're able to fetch its fully-qualified name from the symbol table, if it is inlined and there's no symbol table entry - fine then, we print just the short name. Generally this is enough for readable stack traces, as we still have file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function names fetched from DW_AT_linkange_name and/or symbol table are demangled with a call to __cxa_demangle (we assume that it's just available on the system, and 95% we are right).> The backtrace would look really strange if it included the unmangled names > of functions - or does the symbolizer use the address range of the out of > line definition (if there is one?) of the inlined function (in which case > I'd need to provide it... ) to find it in the symbol table, get the mangled > name, and use that?) > > One thing I was thinking of doing as well, is that since the > DW_AT_abstract_origin just points to a trivial subprogram with a name and > DW_AT_inline - perhaps instead of an abstract origin, we could just use > DW_AT_name directly? (with the mangled name, probably) That'd save us > emitting the extra indirection and the name is uniqued already anyway. (and > DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would > mean extra relocations...) - and perhaps in the near future, DW_FORM_strp > could be replaced by DW_FORM_str_index to reduce relocations) >Yes, this might work. Generally, when we find a subprogram/inlined_subroutine DIE we calculate its name by following the DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with DW_AT_name provided. If we're able to get the name directly things will only be better.> > So... yes/no/maybe? >Speaking of testing, we have some nontrivial amount of sanitizer tests in compiler-rt that match the expected symbolized stack trace. Currently the sources are built with "-g", but I think we can detect if the compiler we test supports -gmlt and/or fission and use the strictest debug info flag settings we still want to provide nice reports for.> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Alexey Samsonov vonosmas at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140828/e4a06650/attachment.html>
On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com> wrote:> This sounds great. Teaching backend about the -gmlt might help us in > another way: we might enforce full debug info generation in the frontend > for -fsanitize= flags, then rely on some parts of this debug info in > instrumentation passes and prune it before the actual object file > generation. This would be somewhat similar to what -Rpass does, only it > kills all the debug info, while we would need to turn full debug info into > gmlt-like. >Yep, this crossed my mind (removing most of the extra codepaths from Clang would be nice) but I figured we'd probably keep it this way for now, since it reduces the amount of metadata we have to build when we don't need it. But if sanitizers end up needing more of that information for whatever reason (while not wanting to emit more debug info) this will provide a basis for such a state of affairs in the future.> Anyway, to backtracing: > > On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> In an effort to fix inlined information for backtraces under DWARF >> Fission in the absence of the split DWARF (.dwo) files, I'm planning on >> adding -gmlt-like data to the .o file, alongside the skeleton CU. >> >> Since that will involve teaching the LLVM about -gmlt (moreso than it >> already has - the debug info LLVM metadata already describes -gmlt for the >> purposes of omitting pubnames in that case) I figured I'd take the >> opportunity to move the existing -gmlt functionality to the backend to >> begin with, and, in doing so, minimize it a little further since we >> wouldn't need to emit debug info for every function - possibly just those >> that have functions inlined into them. >> > > Right. Currently, if the symbolizer is unable to find a subprogram DIE > corresponding to a PC, it tries to at least fetch the file/line info from > the line table, and assumes that function name might be available in the > symbol table. > >> >> So here's an example of some of my ideas about minimized debug info. I'm >> wondering if I'm right about what's needed for backtracing. >> >> I've removed uninteresting things, like DW_AT_accessibility (which is a >> bug anyway), DW_AT_external (there's no reason symbolication needs that, is >> there?), but also less obviously uninteresting things like DW_AT_frame_base >> (the location of the frame pointer - is that needed for symbolication?) >> > > We don't use DW_AT_accessibility and DW_AT_external. >Great> As Chandler suggests, DW_AT_frame_base might be required for unwinders, > but I don't really know that. >> >> >> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit >> the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are >> those needed? I don't think so. >> > > We don't use them. >Excellent> > >> >> But importantly: the only DW_TAG_subprograms are either functions that >> have been inlined, or functions that have been inlined into. Is that enough? >> >> Is it OK that I haven't included debug info for out of line definitions >> of inline functions? >> >> I'm assuming all that information can be retrieved from the symbol table. >> > > > See above. Looks like this information is not necessary. >Perfect.> > >> >> (one other thing I noticed is that we don't use the mangled names for >> functions in -gmlt - how on earth does that work? >> > > Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, > only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary > size for heavily templated code). So, instead of Foo::Bar<double>::Baz we > have only "Baz". And we live with that - we fetch just "Baz" from > subprogram entries. If a function is not inlined, then we're able to fetch > its fully-qualified name from the symbol table, if it is inlined and > there's no symbol table entry - fine then, we print just the short name. > Generally this is enough for readable stack traces, as we still have > file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function > names fetched from DW_AT_linkange_name and/or symbol table are demangled > with a call to __cxa_demangle (we assume that it's just available on the > system, and 95% we are right). >OK - if that's the tradeoff you guys have made, I'm happy not to meddle with it. (did you do a comparison with compression enabled for the strings section? At Google I know we don't compress the linked debug info, but we could - this might help in general, and make it not so costly to go from short names to fully mangled names)> > >> The backtrace would look really strange if it included the unmangled >> names of functions - or does the symbolizer use the address range of the >> out of line definition (if there is one?) of the inlined function (in which >> case I'd need to provide it... ) to find it in the symbol table, get the >> mangled name, and use that?) >> >> One thing I was thinking of doing as well, is that since the >> DW_AT_abstract_origin just points to a trivial subprogram with a name and >> DW_AT_inline - perhaps instead of an abstract origin, we could just use >> DW_AT_name directly? (with the mangled name, probably) That'd save us >> emitting the extra indirection and the name is uniqued already anyway. (and >> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would >> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp >> could be replaced by DW_FORM_str_index to reduce relocations) >> > > Yes, this might work. Generally, when we find a > subprogram/inlined_subroutine DIE we calculate its name by following the > DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with > DW_AT_name provided. If we're able to get the name directly things will > only be better. >So long as you look for the name on the inlined_subroutine first, before walking DW_AT_specification/DW_AT_abstract_origin links, that'll work perfectly if/when we do this. (might have to teach it about DW_FORM_str_index, at some point, though)> > >> >> So... yes/no/maybe? >> > > Speaking of testing, we have some nontrivial amount of sanitizer tests in > compiler-rt that match the expected symbolized stack trace. Currently the > sources are built with "-g", but I think we can detect if the compiler we > test supports -gmlt and/or fission and use the strictest debug info flag > settings we still want to provide nice reports for. >Right, that sounds like a thing to do - I'd rather not make my changes until we've got that in place (& once it's in place I'll try a few obvious "break this and see if the tests fail" sort of things to check that my changes are being properly validated). Can you let me know if you need help/want me to do that work (not that I'm terribly well versed in CMake, but I guess that's true of most of us) and/or when it's done and then I'll see about getting this work committed and moving onto the gmlt-esque+fission stuff. (side note, just to write it down: the gmlt+fission part of this (after this patch that minimizes gmlt by using backend knowledge) will require a fair bit of refactoring, but it'll be good to have the minimized-gmlt work in first and actively tested so I have that as a good baseline that my refactorings are making sense)> > >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > -- > Alexey Samsonov > vonosmas at gmail.com >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140828/6f910863/attachment.html>
DW_AT_frame_base is largely used as a beginning offset to find things for other expressions, for example it can be used to short cut location information for variables (+/- from the frame base which could be a complicated expression) or used to find captured variables that might be on another function's stack (largely nested functions). It can also help with giving a shortcut for inlined function stack frames if they're distinct, but afaik we don't do that :) So unless we're going to start giving out variable contents with gmlt we don't need to worry. -eric On Wed, Aug 27, 2014 at 4:55 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> So... yes/no/maybe? > > > Other than the frame pointer, all of these sound abstractly good to me.... > but what do I know. Looking forward to others' with more information > chiming in. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/f09b9854/attachment.html>