Richard Smith
2008-Apr-23 16:37 UTC
[LLVMdev] Compile units in debugging intrinsics / globals
I have a question about the llvm debugging records, especially wrt compile units. In the non-LLVM sense, a compile unit is essentially everything contained within a single .o file, and it is derived from one or more source and header files. Included in a compile unit are functions and global data. Dwarf records refer to compile units in the same way: a compile unit record has children which define subprograms and variables, and definitions of types used with these. In the LLVM sense, there is a compile unit record for _every_ source file. The globals in the llvm.metadata section and debug intrisics reference the appropriate compile unit record to indicate where in source the item they are describing appears. By the time the llvm linker has finished its work all the functions and global data, along with their debugging records, may be rearranged and I want to pull them all together into the appropriate Dwarf compile units. To do that I can look at the (llvm) compile unit records but this only works when all are defined in the same source file. If data or a function is defined in an included file then they appear to be in a different compile unit. Suppose I have the following source: file1: #include "file2" #include "file3" int fn1(void) ... file2: int a; file3: int fn2(void) ... then fn1, along with all the base types etc appear to be in compile unit "file1", the variable a appears to be in compile unit "file2" (and there are no basic types in file2, so int is not defined), and fn2 appears to be in compile unit "file3". My dwarf records are therefore incorrect, appearing something like TAG_compile_unit "file1" TAG_subprogram "fn1" ... ... TAG_base_type "int" ... TAG_compile_init "file2" TAG_variable "a" ... TAG_compile_unit "file3" TAG_subprogram "fn2" ... ... When, in fact, these compile units "file2" and "file3" are bogus and everything should be part of compile_unit "file1". My question is: can I tell that these three (llvm) compile units are in fact components of the single (non-LLVM) compile unit? Or is there some other way I should be determining which (non-LLVM) compile unit the records are part of? Many thanks! -- Regards, Richard Smith Antix Labs Ltd 400 Thames Valley Park Drive, Reading, Berkshire, RG6 1PT Tel.: +44 (0) 118 357 0 357
Duncan Sands
2008-Apr-24 07:43 UTC
[LLVMdev] Compile units in debugging intrinsics / globals
Hi,> Suppose I have the following source: > > file1: > #include "file2" > #include "file3" > int fn1(void) ... > > file2: > int a; > > file3: > int fn2(void) ... > > then fn1, along with all the base types etc appear to be in compile unit > "file1", the variable a appears to be in compile unit "file2" (and there are > no basic types in file2, so int is not defined), and fn2 appears to be in > compile unit "file3". My dwarf records are therefore incorrect, appearing > something like > > TAG_compile_unit "file1" > TAG_subprogram "fn1" ... > ... > TAG_base_type "int" ... > > TAG_compile_init "file2" > TAG_variable "a" ... > > TAG_compile_unit "file3" > TAG_subprogram "fn2" ... > ... > > When, in fact, these compile units "file2" and "file3" are bogus and > everything should be part of compile_unit "file1".this is not clear to me. Isn't it useful to know where to find the definition of fn2 (in file3)? I'm pretty sure this is how gcc does things too: the debugger seems to know that some objects were defined in header files.> My question is: can I tell that these three (llvm) compile units are in fact > components of the single (non-LLVM) compile unit? Or is there some other way > I should be determining which (non-LLVM) compile unit the records are part > of?If you compile file1 into an LLVM module M, then by definition all debug info in M is for the compile unit file1. So as long as you're not doing link time optimization, can't you just grab all debug info from M? Ciao, Duncan.
Richard Smith
2008-Apr-24 11:41 UTC
[LLVMdev] Compile units in debugging intrinsics / globals
Hi, thanks for responding. I think I did not explain my problem well. To illustrate it further, consider these two modules which I will compile and link together using gcc: Module 1 is comprised of one source file: main.c: static int a = 1; extern int fn1(void); int main (int argc, char **argv) { return fn1(); } I compile this with the command-line gcc main.c -g -c -o main.o Module 2 is comprised of three source files: file1.c: #include "file2.h" #include "file3.h" int fn1(void) { return fn2(a); } file2.h: static int a = 2; file3.h: int fn2(int p) { return p * 2; } I compile this with the command-line gcc file1.c -g -c -o file1.o Finally I link the modules gcc main.o file1.o -o main In the non-llvm sense, each of these two modules is a compile unit. To see the debug records I use: objdump -W main > objdump.gcc.txt Looking at this file, I see two compile units as I would expect (plus the C libraries): Compilation Unit @ offset 0x1a1: ... <0><1ac>: Abbrev Number: 1 (DW_TAG_compile_unit) ... DW_AT_name : main.c ... <1><208>: Abbrev Number: 2 (DW_TAG_subprogram) ... DW_AT_name : main ... <1><25a>: Abbrev Number: 6 (DW_TAG_variable) DW_AT_name : a And Compilation Unit @ offset 0x25b: ... <0><266>: Abbrev Number: 1 (DW_TAG_compile_unit) ... DW_AT_name : file1.c ... <1><2c3>: Abbrev Number: 2 (DW_TAG_subprogram) ... DW_AT_name : fn2 ... <1><2f4>: Abbrev Number: 5 (DW_TAG_subprogram) ... DW_AT_name : fn1 ... <1><30d>: Abbrev Number: 6 (DW_TAG_variable) DW_AT_name : a The problem I have is that llvm considers a _source file_ to be a compile unit. My code generator - a back-end I have built for llc - uses the compile unit information in the llvm *but an llvm compile unit is indistinct from a source file*. It is true that I _also_ want to know what source file the declarations are in, but using the information I have, my code generator erroneously emits debug records for _four_ different compile units: the one named "main.c" contains the definition of main and variable a, compile unit "file1.c" contains the definition of fn1, compile unit "file2.h" contains the definition of variable a and compile unit "file3.h" contains the definition of fn2. The problem with using the module information you suggested is that at the time of code generation the linker has created a single module, and using this technique you only get _one_ compile unit, which is also wrong. The 2.2 release seems to have this problem. If I compile my sources as follows: llvm-gcc -c -g main.c -o main.o llvm-gcc -c -g file1.c -o file1.o llvm-ld -disable-opt main.o file1.o -o main llc main.bc -f -o main -march=x86 gcc main.s -o main objdump -W main > objdump.llvm.txt I find that the debug records claim that everything is contained in a single compile unit named "file1.c". I also note that because both of the compile units contained variables named a, llvm has only emitted one debug record for such a variable and no matter where I query the value of it when debugging I always get given the value of the variable in main.c. As the "standard" code generators get this wrong I suspect the answer is "no", but what I what I wanted to establish was whether I could determine the actual compile units (in the non-llvm sense) the debug records were part of, not simply the source files. It appears that the llvm records are incorrect in not making a distinction between compile units and source files, but this could be resolved if there was some way of linking the source files (llvm compile units) together to determine the modules (non-llvm compile units). -- Regards, Richard Smith Antix Labs Ltd 400 Thames Valley Park Drive, Reading, Berkshire, RG6 1PT Tel.: +44 (0) 118 357 0 357 -----Original Message----- From: Duncan Sands [mailto:baldrick at free.fr] Sent: 24 April 2008 08:43 To: llvmdev at cs.uiuc.edu Cc: Richard Smith Subject: Re: [LLVMdev] Compile units in debugging intrinsics / globals Hi,> Suppose I have the following source: > > file1: > #include "file2" > #include "file3" > int fn1(void) ... > > file2: > int a; > > file3: > int fn2(void) ... > > then fn1, along with all the base types etc appear to be in compile unit > "file1", the variable a appears to be in compile unit "file2" (and thereare> no basic types in file2, so int is not defined), and fn2 appears to be in > compile unit "file3". My dwarf records are therefore incorrect, appearing > something like > > TAG_compile_unit "file1" > TAG_subprogram "fn1" ... > ... > TAG_base_type "int" ... > > TAG_compile_init "file2" > TAG_variable "a" ... > > TAG_compile_unit "file3" > TAG_subprogram "fn2" ... > ... > > When, in fact, these compile units "file2" and "file3" are bogus and > everything should be part of compile_unit "file1".this is not clear to me. Isn't it useful to know where to find the definition of fn2 (in file3)? I'm pretty sure this is how gcc does things too: the debugger seems to know that some objects were defined in header files.> My question is: can I tell that these three (llvm) compile units are infact> components of the single (non-LLVM) compile unit? Or is there some otherway> I should be determining which (non-LLVM) compile unit the records are part > of?If you compile file1 into an LLVM module M, then by definition all debug info in M is for the compile unit file1. So as long as you're not doing link time optimization, can't you just grab all debug info from M? Ciao, Duncan.
Mike Stump
2008-Apr-24 19:04 UTC
[LLVMdev] Compile units in debugging intrinsics / globals
On Apr 24, 2008, at 4:41 AM, Richard Smith wrote:> Hi, thanks for responding. I think I did not explain my problem > well. To illustrate it furtherYou might be interested in: gcc -combine file1.c main.c -S -o t.s -g -dA :-)> no matter where I query the value of it when debugging I always get > given the value of the variable in main.c.gcc does the same thing. Yeah, seems like a bug, would be nice to fix it.
Reasonably Related Threads
- [LLVMdev] Compile units in debugging intrinsics / globals
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions