On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com> wrote:> This sounds great. Teaching backend about the -gmlt might help us in > another way: we might enforce full debug info generation in the frontend > for -fsanitize= flags, then rely on some parts of this debug info in > instrumentation passes and prune it before the actual object file > generation. This would be somewhat similar to what -Rpass does, only it > kills all the debug info, while we would need to turn full debug info into > gmlt-like. >Yep, this crossed my mind (removing most of the extra codepaths from Clang would be nice) but I figured we'd probably keep it this way for now, since it reduces the amount of metadata we have to build when we don't need it. But if sanitizers end up needing more of that information for whatever reason (while not wanting to emit more debug info) this will provide a basis for such a state of affairs in the future.> Anyway, to backtracing: > > On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> In an effort to fix inlined information for backtraces under DWARF >> Fission in the absence of the split DWARF (.dwo) files, I'm planning on >> adding -gmlt-like data to the .o file, alongside the skeleton CU. >> >> Since that will involve teaching the LLVM about -gmlt (moreso than it >> already has - the debug info LLVM metadata already describes -gmlt for the >> purposes of omitting pubnames in that case) I figured I'd take the >> opportunity to move the existing -gmlt functionality to the backend to >> begin with, and, in doing so, minimize it a little further since we >> wouldn't need to emit debug info for every function - possibly just those >> that have functions inlined into them. >> > > Right. Currently, if the symbolizer is unable to find a subprogram DIE > corresponding to a PC, it tries to at least fetch the file/line info from > the line table, and assumes that function name might be available in the > symbol table. > >> >> So here's an example of some of my ideas about minimized debug info. I'm >> wondering if I'm right about what's needed for backtracing. >> >> I've removed uninteresting things, like DW_AT_accessibility (which is a >> bug anyway), DW_AT_external (there's no reason symbolication needs that, is >> there?), but also less obviously uninteresting things like DW_AT_frame_base >> (the location of the frame pointer - is that needed for symbolication?) >> > > We don't use DW_AT_accessibility and DW_AT_external. >Great> As Chandler suggests, DW_AT_frame_base might be required for unwinders, > but I don't really know that. >> >> >> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit >> the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are >> those needed? I don't think so. >> > > We don't use them. >Excellent> > >> >> But importantly: the only DW_TAG_subprograms are either functions that >> have been inlined, or functions that have been inlined into. Is that enough? >> >> Is it OK that I haven't included debug info for out of line definitions >> of inline functions? >> >> I'm assuming all that information can be retrieved from the symbol table. >> > > > See above. Looks like this information is not necessary. >Perfect.> > >> >> (one other thing I noticed is that we don't use the mangled names for >> functions in -gmlt - how on earth does that work? >> > > Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, > only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary > size for heavily templated code). So, instead of Foo::Bar<double>::Baz we > have only "Baz". And we live with that - we fetch just "Baz" from > subprogram entries. If a function is not inlined, then we're able to fetch > its fully-qualified name from the symbol table, if it is inlined and > there's no symbol table entry - fine then, we print just the short name. > Generally this is enough for readable stack traces, as we still have > file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function > names fetched from DW_AT_linkange_name and/or symbol table are demangled > with a call to __cxa_demangle (we assume that it's just available on the > system, and 95% we are right). >OK - if that's the tradeoff you guys have made, I'm happy not to meddle with it. (did you do a comparison with compression enabled for the strings section? At Google I know we don't compress the linked debug info, but we could - this might help in general, and make it not so costly to go from short names to fully mangled names)> > >> The backtrace would look really strange if it included the unmangled >> names of functions - or does the symbolizer use the address range of the >> out of line definition (if there is one?) of the inlined function (in which >> case I'd need to provide it... ) to find it in the symbol table, get the >> mangled name, and use that?) >> >> One thing I was thinking of doing as well, is that since the >> DW_AT_abstract_origin just points to a trivial subprogram with a name and >> DW_AT_inline - perhaps instead of an abstract origin, we could just use >> DW_AT_name directly? (with the mangled name, probably) That'd save us >> emitting the extra indirection and the name is uniqued already anyway. (and >> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would >> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp >> could be replaced by DW_FORM_str_index to reduce relocations) >> > > Yes, this might work. Generally, when we find a > subprogram/inlined_subroutine DIE we calculate its name by following the > DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with > DW_AT_name provided. If we're able to get the name directly things will > only be better. >So long as you look for the name on the inlined_subroutine first, before walking DW_AT_specification/DW_AT_abstract_origin links, that'll work perfectly if/when we do this. (might have to teach it about DW_FORM_str_index, at some point, though)> > >> >> So... yes/no/maybe? >> > > Speaking of testing, we have some nontrivial amount of sanitizer tests in > compiler-rt that match the expected symbolized stack trace. Currently the > sources are built with "-g", but I think we can detect if the compiler we > test supports -gmlt and/or fission and use the strictest debug info flag > settings we still want to provide nice reports for. >Right, that sounds like a thing to do - I'd rather not make my changes until we've got that in place (& once it's in place I'll try a few obvious "break this and see if the tests fail" sort of things to check that my changes are being properly validated). Can you let me know if you need help/want me to do that work (not that I'm terribly well versed in CMake, but I guess that's true of most of us) and/or when it's done and then I'll see about getting this work committed and moving onto the gmlt-esque+fission stuff. (side note, just to write it down: the gmlt+fission part of this (after this patch that minimizes gmlt by using backend knowledge) will require a fair bit of refactoring, but it'll be good to have the minimized-gmlt work in first and actively tested so I have that as a good baseline that my refactorings are making sense)> > >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > -- > Alexey Samsonov > vonosmas at gmail.com >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140828/6f910863/attachment.html>
On Thu, Aug 28, 2014 at 1:51 PM, David Blaikie <dblaikie at gmail.com> wrote:> > > > On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com> > wrote: > >> This sounds great. Teaching backend about the -gmlt might help us in >> another way: we might enforce full debug info generation in the frontend >> for -fsanitize= flags, then rely on some parts of this debug info in >> instrumentation passes and prune it before the actual object file >> generation. This would be somewhat similar to what -Rpass does, only it >> kills all the debug info, while we would need to turn full debug info into >> gmlt-like. >> > > Yep, this crossed my mind (removing most of the extra codepaths from Clang > would be nice) but I figured we'd probably keep it this way for now, since > it reduces the amount of metadata we have to build when we don't need it. > > But if sanitizers end up needing more of that information for whatever > reason (while not wanting to emit more debug info) this will provide a > basis for such a state of affairs in the future. >Sounds good.> > >> Anyway, to backtracing: >> >> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >>> In an effort to fix inlined information for backtraces under DWARF >>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on >>> adding -gmlt-like data to the .o file, alongside the skeleton CU. >>> >>> Since that will involve teaching the LLVM about -gmlt (moreso than it >>> already has - the debug info LLVM metadata already describes -gmlt for the >>> purposes of omitting pubnames in that case) I figured I'd take the >>> opportunity to move the existing -gmlt functionality to the backend to >>> begin with, and, in doing so, minimize it a little further since we >>> wouldn't need to emit debug info for every function - possibly just those >>> that have functions inlined into them. >>> >> >> Right. Currently, if the symbolizer is unable to find a subprogram DIE >> corresponding to a PC, it tries to at least fetch the file/line info from >> the line table, and assumes that function name might be available in the >> symbol table. >> >>> >>> So here's an example of some of my ideas about minimized debug info. I'm >>> wondering if I'm right about what's needed for backtracing. >>> >>> I've removed uninteresting things, like DW_AT_accessibility (which is a >>> bug anyway), DW_AT_external (there's no reason symbolication needs that, is >>> there?), but also less obviously uninteresting things like DW_AT_frame_base >>> (the location of the frame pointer - is that needed for symbolication?) >>> >> >> We don't use DW_AT_accessibility and DW_AT_external. >> > > Great > > >> As Chandler suggests, DW_AT_frame_base might be required for unwinders, >> but I don't really know that. >> > >> >>> >>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to >>> omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - >>> are those needed? I don't think so. >>> >> >> We don't use them. >> > > Excellent > > >> >> >>> >>> But importantly: the only DW_TAG_subprograms are either functions that >>> have been inlined, or functions that have been inlined into. Is that enough? >>> >>> Is it OK that I haven't included debug info for out of line definitions >>> of inline functions? >>> >>> I'm assuming all that information can be retrieved from the symbol table. >>> >> >> >> See above. Looks like this information is not necessary. >> > > Perfect. > > >> >> >>> >>> (one other thing I noticed is that we don't use the mangled names for >>> functions in -gmlt - how on earth does that work? >>> >> >> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, >> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary >> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we >> have only "Baz". And we live with that - we fetch just "Baz" from >> subprogram entries. If a function is not inlined, then we're able to fetch >> its fully-qualified name from the symbol table, if it is inlined and >> there's no symbol table entry - fine then, we print just the short name. >> Generally this is enough for readable stack traces, as we still have >> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function >> names fetched from DW_AT_linkange_name and/or symbol table are demangled >> with a call to __cxa_demangle (we assume that it's just available on the >> system, and 95% we are right). >> > > OK - if that's the tradeoff you guys have made, I'm happy not to meddle > with it. > > (did you do a comparison with compression enabled for the strings section? > At Google I know we don't compress the linked debug info, but we could - > this might help in general, and make it not so costly to go from short > names to fully mangled names) >In fact, for ASan builds we do use -Wl,--compress-debug-sections=zlib in ASan builds... I haven't measured the difference linkage names would cause for compressed sections, though. I don't remember any user complaints about missing names for inlined functions, but, sure, we might want to add them later.> > >> >> >>> The backtrace would look really strange if it included the unmangled >>> names of functions - or does the symbolizer use the address range of the >>> out of line definition (if there is one?) of the inlined function (in which >>> case I'd need to provide it... ) to find it in the symbol table, get the >>> mangled name, and use that?) >>> >>> One thing I was thinking of doing as well, is that since the >>> DW_AT_abstract_origin just points to a trivial subprogram with a name and >>> DW_AT_inline - perhaps instead of an abstract origin, we could just use >>> DW_AT_name directly? (with the mangled name, probably) That'd save us >>> emitting the extra indirection and the name is uniqued already anyway. (and >>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would >>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp >>> could be replaced by DW_FORM_str_index to reduce relocations) >>> >> >> Yes, this might work. Generally, when we find a >> subprogram/inlined_subroutine DIE we calculate its name by following the >> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with >> DW_AT_name provided. If we're able to get the name directly things will >> only be better. >> > > So long as you look for the name on the inlined_subroutine first, before > walking DW_AT_specification/DW_AT_abstract_origin links, that'll work > perfectly if/when we do this. > > (might have to teach it about DW_FORM_str_index, at some point, though) > > >> >> >>> >>> So... yes/no/maybe? >>> >> >> Speaking of testing, we have some nontrivial amount of sanitizer tests in >> compiler-rt that match the expected symbolized stack trace. Currently the >> sources are built with "-g", but I think we can detect if the compiler we >> test supports -gmlt and/or fission and use the strictest debug info flag >> settings we still want to provide nice reports for. >> > > Right, that sounds like a thing to do - I'd rather not make my changes > until we've got that in place (& once it's in place I'll try a few obvious > "break this and see if the tests fail" sort of things to check that my > changes are being properly validated). >OK, I'll let you know once we use -gmlt for sanitizers' test suite.> > Can you let me know if you need help/want me to do that work (not that I'm > terribly well versed in CMake, but I guess that's true of most of us) > and/or when it's done and then I'll see about getting this work committed > and moving onto the gmlt-esque+fission stuff. > > (side note, just to write it down: the gmlt+fission part of this (after > this patch that minimizes gmlt by using backend knowledge) will require a > fair bit of refactoring, but it'll be good to have the minimized-gmlt work > in first and actively tested so I have that as a good baseline that my > refactorings are making sense) > > >> >> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> >> -- >> Alexey Samsonov >> vonosmas at gmail.com >> > >-- Alexey Samsonov vonosmas at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140829/1dbfc666/attachment.html>
On Fri, Aug 29, 2014 at 10:49 AM, Alexey Samsonov <vonosmas at gmail.com> wrote:> > > > On Thu, Aug 28, 2014 at 1:51 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> >> >> >> On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com> >> wrote: >> >>> This sounds great. Teaching backend about the -gmlt might help us in >>> another way: we might enforce full debug info generation in the frontend >>> for -fsanitize= flags, then rely on some parts of this debug info in >>> instrumentation passes and prune it before the actual object file >>> generation. This would be somewhat similar to what -Rpass does, only it >>> kills all the debug info, while we would need to turn full debug info into >>> gmlt-like. >>> >> >> Yep, this crossed my mind (removing most of the extra codepaths from >> Clang would be nice) but I figured we'd probably keep it this way for now, >> since it reduces the amount of metadata we have to build when we don't need >> it. >> >> But if sanitizers end up needing more of that information for whatever >> reason (while not wanting to emit more debug info) this will provide a >> basis for such a state of affairs in the future. >> > > Sounds good. > > >> >> >>> Anyway, to backtracing: >>> >>> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> >>> wrote: >>> >>>> In an effort to fix inlined information for backtraces under DWARF >>>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on >>>> adding -gmlt-like data to the .o file, alongside the skeleton CU. >>>> >>>> Since that will involve teaching the LLVM about -gmlt (moreso than it >>>> already has - the debug info LLVM metadata already describes -gmlt for the >>>> purposes of omitting pubnames in that case) I figured I'd take the >>>> opportunity to move the existing -gmlt functionality to the backend to >>>> begin with, and, in doing so, minimize it a little further since we >>>> wouldn't need to emit debug info for every function - possibly just those >>>> that have functions inlined into them. >>>> >>> >>> Right. Currently, if the symbolizer is unable to find a subprogram DIE >>> corresponding to a PC, it tries to at least fetch the file/line info from >>> the line table, and assumes that function name might be available in the >>> symbol table. >>> >>>> >>>> So here's an example of some of my ideas about minimized debug info. >>>> I'm wondering if I'm right about what's needed for backtracing. >>>> >>>> I've removed uninteresting things, like DW_AT_accessibility (which is a >>>> bug anyway), DW_AT_external (there's no reason symbolication needs that, is >>>> there?), but also less obviously uninteresting things like DW_AT_frame_base >>>> (the location of the frame pointer - is that needed for symbolication?) >>>> >>> >>> We don't use DW_AT_accessibility and DW_AT_external. >>> >> >> Great >> >> >>> As Chandler suggests, DW_AT_frame_base might be required for unwinders, >>> but I don't really know that. >>> >> >>> >>>> >>>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to >>>> omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - >>>> are those needed? I don't think so. >>>> >>> >>> We don't use them. >>> >> >> Excellent >> >> >>> >>> >>>> >>>> But importantly: the only DW_TAG_subprograms are either functions that >>>> have been inlined, or functions that have been inlined into. Is that enough? >>>> >>>> Is it OK that I haven't included debug info for out of line definitions >>>> of inline functions? >>>> >>>> I'm assuming all that information can be retrieved from the symbol >>>> table. >>>> >>> >>> >>> See above. Looks like this information is not necessary. >>> >> >> Perfect. >> >> >>> >>> >>>> >>>> (one other thing I noticed is that we don't use the mangled names for >>>> functions in -gmlt - how on earth does that work? >>>> >>> >>> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, >>> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary >>> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we >>> have only "Baz". And we live with that - we fetch just "Baz" from >>> subprogram entries. If a function is not inlined, then we're able to fetch >>> its fully-qualified name from the symbol table, if it is inlined and >>> there's no symbol table entry - fine then, we print just the short name. >>> Generally this is enough for readable stack traces, as we still have >>> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function >>> names fetched from DW_AT_linkange_name and/or symbol table are demangled >>> with a call to __cxa_demangle (we assume that it's just available on the >>> system, and 95% we are right). >>> >> >> OK - if that's the tradeoff you guys have made, I'm happy not to meddle >> with it. >> >> (did you do a comparison with compression enabled for the strings >> section? At Google I know we don't compress the linked debug info, but we >> could - this might help in general, and make it not so costly to go from >> short names to fully mangled names) >> > > In fact, for ASan builds we do use -Wl,--compress-debug-sections=zlib in > ASan builds... I haven't measured the difference linkage names would cause > for compressed sections, though. I don't remember any user complaints about > missing names for inlined functions, but, sure, we might want to add them > later. > > >> >> >>> >>> >>>> The backtrace would look really strange if it included the unmangled >>>> names of functions - or does the symbolizer use the address range of the >>>> out of line definition (if there is one?) of the inlined function (in which >>>> case I'd need to provide it... ) to find it in the symbol table, get the >>>> mangled name, and use that?) >>>> >>>> One thing I was thinking of doing as well, is that since the >>>> DW_AT_abstract_origin just points to a trivial subprogram with a name and >>>> DW_AT_inline - perhaps instead of an abstract origin, we could just use >>>> DW_AT_name directly? (with the mangled name, probably) That'd save us >>>> emitting the extra indirection and the name is uniqued already anyway. (and >>>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would >>>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp >>>> could be replaced by DW_FORM_str_index to reduce relocations) >>>> >>> >>> Yes, this might work. Generally, when we find a >>> subprogram/inlined_subroutine DIE we calculate its name by following the >>> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with >>> DW_AT_name provided. If we're able to get the name directly things will >>> only be better. >>> >> >> So long as you look for the name on the inlined_subroutine first, before >> walking DW_AT_specification/DW_AT_abstract_origin links, that'll work >> perfectly if/when we do this. >> >> (might have to teach it about DW_FORM_str_index, at some point, though) >> >> >>> >>> >>>> >>>> So... yes/no/maybe? >>>> >>> >>> Speaking of testing, we have some nontrivial amount of sanitizer tests >>> in compiler-rt that match the expected symbolized stack trace. Currently >>> the sources are built with "-g", but I think we can detect if the compiler >>> we test supports -gmlt and/or fission and use the strictest debug info flag >>> settings we still want to provide nice reports for. >>> >> >> Right, that sounds like a thing to do - I'd rather not make my changes >> until we've got that in place (& once it's in place I'll try a few obvious >> "break this and see if the tests fail" sort of things to check that my >> changes are being properly validated). >> > > OK, I'll let you know once we use -gmlt for sanitizers' test suite. >I've switched sanitizers' test suites to -gmlt in r217284.> > >> >> Can you let me know if you need help/want me to do that work (not that >> I'm terribly well versed in CMake, but I guess that's true of most of us) >> and/or when it's done and then I'll see about getting this work committed >> and moving onto the gmlt-esque+fission stuff. >> >> (side note, just to write it down: the gmlt+fission part of this (after >> this patch that minimizes gmlt by using backend knowledge) will require a >> fair bit of refactoring, but it'll be good to have the minimized-gmlt work >> in first and actively tested so I have that as a good baseline that my >> refactorings are making sense) >> >> >>> >>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> >>> -- >>> Alexey Samsonov >>> vonosmas at gmail.com >>> >> >> > > > -- > Alexey Samsonov > vonosmas at gmail.com >-- Alexey Samsonov vonosmas at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/fec59a9e/attachment.html>