thr3ads.net - llvm dev - [llvm-dev] [DWARFv5] Reading the .debug_str

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2017-Jul-05 20:34 UTC

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section

There was some discussion about this in D34765, and I had a follow-up
chat with Wolfgang separately, plus spent a fair amount of time in
reading and thinking today.  I thought I would write my understanding
all down here so we can reach a common understanding of how it ought
to work, and therefore what our code should do.

For any non-DWARF-experts who might be interested, in principle this
section is straightforward: It's an array of offsets into .debug_str,
which in turn is a standard object-file string section.  The idea
behind .debug_str_offsets is that string references from other parts
of the DWARF can use an index into the array, instead of a direct
reference to the string section.  This means we end up with only one
object-file relocation per string, rather than one per reference.
Fewer relocations = smaller object files and faster link times.
To the extent that strings are referenced more than once, we win.

The devil is in the details.  There are three distinct interesting
cases, when we are talking about the standard DWARF layout of this
section, and then some wrinkles added by the GCC split-DWARF style
which does not use exactly the same layout.  [And then the DWARF
committee failed to use a different section name.  Our bad.]

First the three cases for standard DWARF.  These the "normal" (or
relocatable/executable) case, the "split" case, and the
"package" case.

(a) For a .o file, the compiler produces a .debug_str_offsets section
which has one or more "contributions" in it.  Each contribution has a
header, which gives its size and whether the array elements are 32 or 
64 bits wide.  Any DWARF compile-unit or type-unit that uses the array
(that is, any unit that uses any of the "strx" forms) has a 
DW_AT_str_offsets_base attribute that points to the 0th element of the
array.  The producer chooses whether to have one contribution shared by
all units, one contribution per unit, or somewhere in between.

There's an implication for how to read the .debug_str_offsets section,
which is that the reader has to parse the section before using it to
look up any strings.  "Parsing" here really means just following the
sequence of contribution-headers to determine what element size to
associate with each contribution.

[I have previously described the layout differently, and I think that
was wrong.  Specifically there is no "array slicing" or other disjoint
sharing of contributions across units.  Thanks to Wolfgang for getting 
me to understand that.]

An executable (or linker "-r" output) where all input files use the
standard DWARF style can be handled exactly the same way.  The linker
will append all the .debug_str_offsets contributions together, do all
the relocations, and the net result is a new sequence of headers and
arrays.

(b) For a .dwo file, the compiler produces a .debug_str_offsets.dwo
section, which is laid out like a single "contribution" in the .o file
(that is, there is one header describing the entire section, and only
one array of offsets).  Unlike the .o file, the DWARF spec says units in
the .dwo file do *not* have a DW_AT_str_offsets_base attribute; this 
means all units in the .dwo file must share the one and only array.  The 
missing attribute implicitly points to the 0th element of that array,
and "parsing" the section means looking at exactly one header.

[I currently think that requiring a single contribution in the .dwo file
is not a bug, but a feature, because it means .debug_line.dwo can use 
.debug_str_offsets.dwo without worrying about which contribution to use.]

(c) For a .dwp file, the packaging tool (like the linker) will append
all the section contributions from the various .dwo files together.
In lieu of relocations, the packager is required to construct an
"index"
table, which allows a consumer to associate a particular DWARF unit with
the .debug_str_offsets.dwo contribution from the same .dwo file.  Note 
that this index tells the reader how to find the header, and from there
it can find the 0th element, just as when it is reading a .dwo file.


That's how things work for standard DWARF.  There was also a prototype
of this done in GCC, prior to standardization, which differs slightly.
(Clang also does this, but for simplicity I'll call it the GCC style.)
It differs in that there is no header and the offsets are always 32-bit.
AFAIK the GCC style is tied to split DWARF, meaning we see this style
only in a .debug_str_offsets.dwo section and not .debug_str_offsets.
Certainly we should see the GCC style only in the context of DWARF v4,
as a v5 producer ought to be using the standard style.  Also, GCC has
defined its own "form" codes, so any references to the table from
other
parts of the DWARF can be decoded unambiguously.

What this does mean is that we can't look at .debug_str_offsets.dwo in
isolation and be sure how to interpret it.  It might have a standard
header, or it might be a GCC style table with no header.  We need to
look at the version of the associated .debug_info.dwo section, or know
which form codes are used to reference the table, before we can decide
whether a given .debug_str_offsets.dwo contribution is standard or GCC
style.

The same holds true for a .dwp file, where we do have the index to
slice up the section for us but any individual slice has the same
problem as the .dwo file it came from.

Things get way trickier in an object (executable or "-r" ouput) that
has a mix of GCC and standard contributions. AFAICT there's no 
equivalent of DW_AT_str_offsets_base in the GCC style, so about all 
we can do is something like this:
(1) Walk through all units to find all DW_AT_str_offsets_base pointers;
(2) for each one, poke around in the prior 8-16 bytes looking for
    the header; this is more reliable than it sounds;
(3) assume everything else in the section is GCC style.

At least that's what the dumper will have to do.  The debugger can
probably do it more lazily, but still kind of annoying.

Questions and brickbats welcome.
--paulr

P.S. Ah, you clever reader, who noticed I carefully said nothing about
LTO of mixed-DWARF-version compilations!  Haven't thought about it.

Pieb, Wolfgang via llvm-dev

2017-Jul-05 22:13 UTC

head link

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Robinson, Paul via llvm-dev
> Sent: Wednesday, July 05, 2017 1:35 PM
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> 
snip ...> 
> Things get way trickier in an object (executable or "-r" ouput)
that
> has a mix of GCC and standard contributions. AFAICT there's no
> equivalent of DW_AT_str_offsets_base in the GCC style, so about all
> we can do is something like this:
> (1) Walk through all units to find all DW_AT_str_offsets_base pointers;
> (2) for each one, poke around in the prior 8-16 bytes looking for
>     the header; this is more reliable than it sounds;
> (3) assume everything else in the section is GCC style.
I believe a mix of GCC and standard contributions should only be an issue in a
split-DWARF (fission) scenario, as there is no .debug_str_offsets section in a
non-split pre-V5 compilation AFAIK.

And given that we don't have a DW_AT_str_offsets_base attribute in
.debug_info.dwo sections by standard decree, all units (whether standard V5 or
GCC-style) would have to share the single contribution in the
.debug_str_offsets.dwo section (or the single contribution in a slice of the
section via dwp index table).

So the only tricky part for the reader would be to figure out whether a
.debug_str_offsets.dwo section (or a slice of it) is GCC-style or a (single) v5
standard contribution. As you pointed out earlier, either looking at the
individual unit versions in the .debug_info.dwo section or the individual forms
used could do the trick.

-- wolfgangp
> Questions and brickbats welcome.
> --paulr
> 
> P.S. Ah, you clever reader, who noticed I carefully said nothing about
> LTO of mixed-DWARF-version compilations!  Haven't thought about it.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Robinson, Paul via llvm-dev

2017-Jul-06 13:35 UTC

head link

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Pieb,
> Wolfgang via llvm-dev
> Sent: Wednesday, July 05, 2017 6:14 PM
> To: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> 
> > -----Original Message-----
> > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf
Of
> > Robinson, Paul via llvm-dev
> > Sent: Wednesday, July 05, 2017 1:35 PM
> > To: llvm-dev at lists.llvm.org
> > Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> >
> snip ...
> >
> > Things get way trickier in an object (executable or "-r"
ouput) that
> > has a mix of GCC and standard contributions. AFAICT there's no
> > equivalent of DW_AT_str_offsets_base in the GCC style, so about all
> > we can do is something like this:
> > (1) Walk through all units to find all DW_AT_str_offsets_base
pointers;
> > (2) for each one, poke around in the prior 8-16 bytes looking for
> >     the header; this is more reliable than it sounds;
> > (3) assume everything else in the section is GCC style.
> 
> I believe a mix of GCC and standard contributions should only be an issue
> in a split-DWARF (fission) scenario, as there is no .debug_str_offsets
> section in a non-split pre-V5 compilation AFAIK.
Oh, of course!  So a normal object file is always standard.  Excellent!
> 
> And given that we don't have a DW_AT_str_offsets_base attribute in
> .debug_info.dwo sections by standard decree, all units (whether standard
> V5 or GCC-style) would have to share the single contribution in the
> .debug_str_offsets.dwo section (or the single contribution in a slice of
> the section via dwp index table).
> 
> So the only tricky part for the reader would be to figure out whether a
> .debug_str_offsets.dwo section (or a slice of it) is GCC-style or a
> (single) v5 standard contribution. As you pointed out earlier, either
> looking at the individual unit versions in the .debug_info.dwo section or
> the individual forms used could do the trick.
> 
> -- wolfgangp
> 
> > Questions and brickbats welcome.
> > --paulr
> >
> > P.S. Ah, you clever reader, who noticed I carefully said nothing about
> > LTO of mixed-DWARF-version compilations!  Haven't thought about
it.
This is still a question.  However I think it's not that hard to 
handle.  If LTO sees a mix of v4 and v5, we emit a standard 
.debug_str_offsets.dwo section but force 32-bit offsets (i.e. leave 
a comment for our descendants to make sure that happens) and then 
the GCC forms will just DTRT.  I think.
The dumper would treat it as a standard section as long as there are
any v5 units in the .dwo (or that contribution to the .dwp).

IIRC the string sections aren't actually emitted until after all 
units have been processed, because everything shares the same string 
section, so remembering whether any v5 units have crossed our path 
and then adding the v5 header at the last moment should be feasible.
--paulr
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jul 2017 - [DWARFv5] Reading the .debug_str_offsets section

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section

[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section