Robinson, Paul via llvm-dev
2017-Jul-05 20:34 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
There was some discussion about this in D34765, and I had a follow-up chat with Wolfgang separately, plus spent a fair amount of time in reading and thinking today. I thought I would write my understanding all down here so we can reach a common understanding of how it ought to work, and therefore what our code should do. For any non-DWARF-experts who might be interested, in principle this section is straightforward: It's an array of offsets into .debug_str, which in turn is a standard object-file string section. The idea behind .debug_str_offsets is that string references from other parts of the DWARF can use an index into the array, instead of a direct reference to the string section. This means we end up with only one object-file relocation per string, rather than one per reference. Fewer relocations = smaller object files and faster link times. To the extent that strings are referenced more than once, we win. The devil is in the details. There are three distinct interesting cases, when we are talking about the standard DWARF layout of this section, and then some wrinkles added by the GCC split-DWARF style which does not use exactly the same layout. [And then the DWARF committee failed to use a different section name. Our bad.] First the three cases for standard DWARF. These the "normal" (or relocatable/executable) case, the "split" case, and the "package" case. (a) For a .o file, the compiler produces a .debug_str_offsets section which has one or more "contributions" in it. Each contribution has a header, which gives its size and whether the array elements are 32 or 64 bits wide. Any DWARF compile-unit or type-unit that uses the array (that is, any unit that uses any of the "strx" forms) has a DW_AT_str_offsets_base attribute that points to the 0th element of the array. The producer chooses whether to have one contribution shared by all units, one contribution per unit, or somewhere in between. There's an implication for how to read the .debug_str_offsets section, which is that the reader has to parse the section before using it to look up any strings. "Parsing" here really means just following the sequence of contribution-headers to determine what element size to associate with each contribution. [I have previously described the layout differently, and I think that was wrong. Specifically there is no "array slicing" or other disjoint sharing of contributions across units. Thanks to Wolfgang for getting me to understand that.] An executable (or linker "-r" output) where all input files use the standard DWARF style can be handled exactly the same way. The linker will append all the .debug_str_offsets contributions together, do all the relocations, and the net result is a new sequence of headers and arrays. (b) For a .dwo file, the compiler produces a .debug_str_offsets.dwo section, which is laid out like a single "contribution" in the .o file (that is, there is one header describing the entire section, and only one array of offsets). Unlike the .o file, the DWARF spec says units in the .dwo file do *not* have a DW_AT_str_offsets_base attribute; this means all units in the .dwo file must share the one and only array. The missing attribute implicitly points to the 0th element of that array, and "parsing" the section means looking at exactly one header. [I currently think that requiring a single contribution in the .dwo file is not a bug, but a feature, because it means .debug_line.dwo can use .debug_str_offsets.dwo without worrying about which contribution to use.] (c) For a .dwp file, the packaging tool (like the linker) will append all the section contributions from the various .dwo files together. In lieu of relocations, the packager is required to construct an "index" table, which allows a consumer to associate a particular DWARF unit with the .debug_str_offsets.dwo contribution from the same .dwo file. Note that this index tells the reader how to find the header, and from there it can find the 0th element, just as when it is reading a .dwo file. That's how things work for standard DWARF. There was also a prototype of this done in GCC, prior to standardization, which differs slightly. (Clang also does this, but for simplicity I'll call it the GCC style.) It differs in that there is no header and the offsets are always 32-bit. AFAIK the GCC style is tied to split DWARF, meaning we see this style only in a .debug_str_offsets.dwo section and not .debug_str_offsets. Certainly we should see the GCC style only in the context of DWARF v4, as a v5 producer ought to be using the standard style. Also, GCC has defined its own "form" codes, so any references to the table from other parts of the DWARF can be decoded unambiguously. What this does mean is that we can't look at .debug_str_offsets.dwo in isolation and be sure how to interpret it. It might have a standard header, or it might be a GCC style table with no header. We need to look at the version of the associated .debug_info.dwo section, or know which form codes are used to reference the table, before we can decide whether a given .debug_str_offsets.dwo contribution is standard or GCC style. The same holds true for a .dwp file, where we do have the index to slice up the section for us but any individual slice has the same problem as the .dwo file it came from. Things get way trickier in an object (executable or "-r" ouput) that has a mix of GCC and standard contributions. AFAICT there's no equivalent of DW_AT_str_offsets_base in the GCC style, so about all we can do is something like this: (1) Walk through all units to find all DW_AT_str_offsets_base pointers; (2) for each one, poke around in the prior 8-16 bytes looking for the header; this is more reliable than it sounds; (3) assume everything else in the section is GCC style. At least that's what the dumper will have to do. The debugger can probably do it more lazily, but still kind of annoying. Questions and brickbats welcome. --paulr P.S. Ah, you clever reader, who noticed I carefully said nothing about LTO of mixed-DWARF-version compilations! Haven't thought about it.
Pieb, Wolfgang via llvm-dev
2017-Jul-05 22:13 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Robinson, Paul via llvm-dev > Sent: Wednesday, July 05, 2017 1:35 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section >snip ...> > Things get way trickier in an object (executable or "-r" ouput) that > has a mix of GCC and standard contributions. AFAICT there's no > equivalent of DW_AT_str_offsets_base in the GCC style, so about all > we can do is something like this: > (1) Walk through all units to find all DW_AT_str_offsets_base pointers; > (2) for each one, poke around in the prior 8-16 bytes looking for > the header; this is more reliable than it sounds; > (3) assume everything else in the section is GCC style.I believe a mix of GCC and standard contributions should only be an issue in a split-DWARF (fission) scenario, as there is no .debug_str_offsets section in a non-split pre-V5 compilation AFAIK. And given that we don't have a DW_AT_str_offsets_base attribute in .debug_info.dwo sections by standard decree, all units (whether standard V5 or GCC-style) would have to share the single contribution in the .debug_str_offsets.dwo section (or the single contribution in a slice of the section via dwp index table). So the only tricky part for the reader would be to figure out whether a .debug_str_offsets.dwo section (or a slice of it) is GCC-style or a (single) v5 standard contribution. As you pointed out earlier, either looking at the individual unit versions in the .debug_info.dwo section or the individual forms used could do the trick. -- wolfgangp> Questions and brickbats welcome. > --paulr > > P.S. Ah, you clever reader, who noticed I carefully said nothing about > LTO of mixed-DWARF-version compilations! Haven't thought about it. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Robinson, Paul via llvm-dev
2017-Jul-06 13:35 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Pieb, > Wolfgang via llvm-dev > Sent: Wednesday, July 05, 2017 6:14 PM > To: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > -----Original Message----- > > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > > Robinson, Paul via llvm-dev > > Sent: Wednesday, July 05, 2017 1:35 PM > > To: llvm-dev at lists.llvm.org > > Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > snip ... > > > > Things get way trickier in an object (executable or "-r" ouput) that > > has a mix of GCC and standard contributions. AFAICT there's no > > equivalent of DW_AT_str_offsets_base in the GCC style, so about all > > we can do is something like this: > > (1) Walk through all units to find all DW_AT_str_offsets_base pointers; > > (2) for each one, poke around in the prior 8-16 bytes looking for > > the header; this is more reliable than it sounds; > > (3) assume everything else in the section is GCC style. > > I believe a mix of GCC and standard contributions should only be an issue > in a split-DWARF (fission) scenario, as there is no .debug_str_offsets > section in a non-split pre-V5 compilation AFAIK.Oh, of course! So a normal object file is always standard. Excellent!> > And given that we don't have a DW_AT_str_offsets_base attribute in > .debug_info.dwo sections by standard decree, all units (whether standard > V5 or GCC-style) would have to share the single contribution in the > .debug_str_offsets.dwo section (or the single contribution in a slice of > the section via dwp index table). > > So the only tricky part for the reader would be to figure out whether a > .debug_str_offsets.dwo section (or a slice of it) is GCC-style or a > (single) v5 standard contribution. As you pointed out earlier, either > looking at the individual unit versions in the .debug_info.dwo section or > the individual forms used could do the trick. > > -- wolfgangp > > > Questions and brickbats welcome. > > --paulr > > > > P.S. Ah, you clever reader, who noticed I carefully said nothing about > > LTO of mixed-DWARF-version compilations! Haven't thought about it.This is still a question. However I think it's not that hard to handle. If LTO sees a mix of v4 and v5, we emit a standard .debug_str_offsets.dwo section but force 32-bit offsets (i.e. leave a comment for our descendants to make sure that happens) and then the GCC forms will just DTRT. I think. The dumper would treat it as a standard section as long as there are any v5 units in the .dwo (or that contribution to the .dwp). IIRC the string sections aren't actually emitted until after all units have been processed, because everything shares the same string section, so remembering whether any v5 units have crossed our path and then adding the v5 header at the last moment should be feasible. --paulr> > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev