On Wed, Mar 30, 2011 at 11:17 AM, Devang Patel <dpatel at apple.com>
wrote:
>
> On Mar 29, 2011, at 7:29 PM, Talin wrote:
>
> I've been trying to track down the problem with the DWARF info that is
> being emitted by my front end, which has been broken for about a month now.
> Here's what happens when I attempt to use gdb to debug one of my
programs on
> OS X:
>
> gdb stack crawl at point of internal error:
> [ 0 ] /usr/libexec/gdb/gdb-i386-apple-darwin (align_down+0x0) [0x122300]
> [ 1 ] /usr/libexec/gdb/gdb-i386-apple-darwin
> (find_partial_die_in_comp_unit+0x65) [0xc0e19]
> [ 2 ] /usr/libexec/gdb/gdb-i386-apple-darwin (find_partial_die+0x2d4)
> [0xcf07f]
> [ 3 ] /usr/libexec/gdb/gdb-i386-apple-darwin (fixup_partial_die+0x29)
> [0xcf0b3]
> [ 4 ] /usr/libexec/gdb/gdb-i386-apple-darwin (scan_partial_symbols+0x26)
> [0xcf9e7]
> [ 5 ] /usr/libexec/gdb/gdb-i386-apple-darwin (dwarf2_build_psymtabs+0xc54)
> [0xd093c]
> [ 6 ] /usr/libexec/gdb/gdb-i386-apple-darwin (macho_symfile_read+0x145)
> [0x163b15]
> [ 7 ] /usr/libexec/gdb/gdb-i386-apple-darwin (syms_from_objfile+0x62d)
> [0x52259]
> [ 8 ] /usr/libexec/gdb/gdb-i386-apple-darwin
> (symbol_file_add_with_addrs_or_offsets_using_objfile+0x338) [0x561e7]
> [ 9 ] /usr/libexec/gdb/gdb-i386-apple-darwin
> (symbol_file_add_with_addrs_or_offsets_using_objfile+0x2da) [0x56189]
> [ 10 ] /usr/libexec/gdb/gdb-i386-apple-darwin
> (symbol_file_add_name_with_addrs_or_offsets+0x7a) [0x563c9]
> [ 11 ] /usr/libexec/gdb/gdb-i386-apple-darwin (symbol_file_add_main_1+0xf2)
> [0x56e36]
> [ 12 ] /usr/libexec/gdb/gdb-i386-apple-darwin (catch_command_errors+0x4d)
> [0x7ac88]
> /SourceCache/gdb/gdb-966/src/gdb/dwarf2read.c:7593: internal-error: could
> not find partial DIE in cache
>
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n)
>
>
> Now, all of this was working earlier, and I don't know whether it was
> something I did or a change in LLVM, but that's not important. The real
> question is how to track down the problem.
>
>
> I have seen gdb crash with this back trace when it has seen a subprogram
> specification DIE at top level, but the actual subprogram definition is not
> found. The definition DIE may not be found because either it is hiding deep
> in nested subclass or it may be missing all together in compiler output.
> One easy way to rule out this is to check all specification DIE's
> indentation level in dwarfdump output and check corresponding level of
> definition die referred by it.
>
OK, given that much information I was able to track it down, which is that I
was passing my struct type as the context parameter to
DIBuilder.createMethod. If I change it to compile unit, this problem goes
away. I had thought I had read somewhere that it was legal to use the
enclosing class definition as the subroutine context, but now I can't find
where I read it. In any case, I guess this means that I don't know the
proper way to declare member functions in DWARF - that is, how can I declare
method A of class B so that I can say "B.A" in the debugger and gdb
knows
where to find it?
> In the past, the way that I have dealt with DWARF-related problems is to
> try a number of strategies:
>
> 1) Reduce the problem to the smallest reproducible case. In the past I have
> had some success with this, but not in this case. You see, one of the
> problems with object-oriented languages is that even simple operations -
> such as appending an element to an array - can end up pulling in a very
> large number of classes (For example, the array class might throw an
> exception if your index is invalid, which pulls in the exception hierarchy
> and so on...)
>
> I have a special script which attempts to compile a "minimal"
test case,
> without the standard library and with garbage collection disabled.
> Unfortunately, none of the "small" test cases that I have been
able to come
> up with exhibit the problem, and any time I use certain language features I
> am forced to link in the standard library which makes the test program
huge.
> I have plenty of example cases which exhibit the problem, but they are all
> bitcode files on the order of 100K or more in size. And I'm not going
to
> have much luck tracking down a needle in such a large haystack.
>
> 2) Use dwarfdump to try and verify the validity of the debug symbols.
>
> Unfortunately, the information from dwarfdump is not too useful in this
> case. Here's what I get:
>
> - On OS X, with the "small" test cases I created, I get no
errors at
> all.
> - On OS X, with my normal unit tests (with the standard library) I get
> hundreds of error messages of the following form:
>
> 0x00000882: DIE attribute 0x00000883: AT_type/FORM_ref4 has a value
> 0x00000592 that is not in the current compile unit in the .debug_info
> section.
>
>
> This indicates that while DwarfDebug.cpp was preparing dwarf info, it
> created a DIE 0x00000592 that was referred by another DIE 0x00000883 but
> somehow DIE 0x00000592 was not emitted. This could be a bug in
> DwarfDebug.cpp or how debug info is generated by FE.
>
> In DwarfDebug.cpp, you'll see code like
>
> addDIEEntry(VariableSpecDIE,
> dwarf::DW_AT_specification, dwarf::DW_FORM_ref4, VariableDIE);
>
> Here VariableSpecDIE is referring VariableDIE, but VariableDIE is missing
> from the output. There are other uses of DW_FORM_ref4 also. So check in our
> dwarfdump output what is 0x00000883 and set appropriate breakpoint in
> debugger and see why it is not reaching to DwarfDebug::emitDIE().
>
> OK I'm still trying to track this one down, it is apparently unrelated
to
the earlier problem. After fixing the problem with the subroutine context
mentioned above, I now see the following in gdb:
Die: DW_TAG_formal_parameter (abbrev = 27, offset = 14760)
has children: FALSE
attributes:
DW_AT_name (DW_FORM_strp) string: "testType"
DW_AT_decl_file (DW_FORM_data1) constant: 74
DW_AT_decl_line (DW_FORM_data1) constant: 47
DW_AT_type (DW_FORM_ref4) constant ref: 43711 (adjusted)
DW_AT_location (DW_FORM_block1) block: size 2
Dwarf Error: Cannot find type of die [in module
/Users/talin/Projects/tart/build-eclipse/test/stdlib/BitTricksTest.dSYM/Contents/Resources/DWARF/BitTricksTest]
This is good because I know exactly where that parameter is - now the
question is to figure out what is wrong with it.
> -
> Devang
>
> 0x000009a9: DIE attribute 0x000009ae: AT_type/FORM_ref4 has a value
> 0x000001c2 that is not in the current compile unit in the .debug_info
> section.
> 0x00000b85: DIE attribute 0x00000b8a: AT_type/FORM_ref4 has a value
> 0x0000055c that is not in the current compile unit in the .debug_info
> section.
> 0x00000c88: DIE attribute 0x00000c89: AT_type/FORM_ref4 has a value
> 0x0000055c that is not in the current compile unit in the .debug_info
> section.
> 0x00000d2f: DIE attribute 0x00000d34: AT_type/FORM_ref4 has a value
> 0x0000055c that is not in the current compile unit in the .debug_info
> section.
> 0x00000d9a: DIE attribute 0x00000d9f: AT_type/FORM_ref4 has a value
> 0x00000584 that is not in the current compile unit in the .debug_info
> section.
> 0x00000e43: DIE attribute 0x00000e48: AT_type/FORM_ref4 has a value
> 0x000011ac that is not in the current compile unit in the .debug_info
> section.
> 0x00000ea3: DIE attribute 0x00000ea8: AT_type/FORM_ref4 has a value
> 0x00001225 that is not in the current compile unit in the .debug_info
> section.
> 0x00000ebe: DIE attribute 0x00000ebf: AT_type/FORM_ref4 has a value
> 0x00001248 that is not in the current compile unit in the .debug_info
> section.
> 0x00000ee3: DIE attribute 0x00000ee4: AT_type/FORM_ref4 has a value
> 0x00001285 that is not in the current compile unit in the .debug_info
> section.
>
>
> - On Linux - well the problem here is that even when my DWARF info was
> working, dwarfdump would spit out a ton of error messages about bad file
> DIEs and other spam - in other words, I've never been able to use
LLVM to
> produce a binary on Linux that was dwarfdump-error free. So any
"new" errors
> are mixed in with all of the "old" errors I was seeing before.
>
> 3) Use llbrowse to manually inspect the DIEs and see if they make sense.
> (Which is part of the reason why I wrote llbrowse.) Again, the problem is
> that I don't know where to look, and the files are simply too large to
> inspect manually.
>
> --
> -- Talin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
--
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110331/dd1e6de6/attachment.html>