Robinson, Paul via llvm-dev
2017-Jul-06 13:35 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Pieb, > Wolfgang via llvm-dev > Sent: Wednesday, July 05, 2017 6:14 PM > To: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > -----Original Message----- > > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > > Robinson, Paul via llvm-dev > > Sent: Wednesday, July 05, 2017 1:35 PM > > To: llvm-dev at lists.llvm.org > > Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > snip ... > > > > Things get way trickier in an object (executable or "-r" ouput) that > > has a mix of GCC and standard contributions. AFAICT there's no > > equivalent of DW_AT_str_offsets_base in the GCC style, so about all > > we can do is something like this: > > (1) Walk through all units to find all DW_AT_str_offsets_base pointers; > > (2) for each one, poke around in the prior 8-16 bytes looking for > > the header; this is more reliable than it sounds; > > (3) assume everything else in the section is GCC style. > > I believe a mix of GCC and standard contributions should only be an issue > in a split-DWARF (fission) scenario, as there is no .debug_str_offsets > section in a non-split pre-V5 compilation AFAIK.Oh, of course! So a normal object file is always standard. Excellent!> > And given that we don't have a DW_AT_str_offsets_base attribute in > .debug_info.dwo sections by standard decree, all units (whether standard > V5 or GCC-style) would have to share the single contribution in the > .debug_str_offsets.dwo section (or the single contribution in a slice of > the section via dwp index table). > > So the only tricky part for the reader would be to figure out whether a > .debug_str_offsets.dwo section (or a slice of it) is GCC-style or a > (single) v5 standard contribution. As you pointed out earlier, either > looking at the individual unit versions in the .debug_info.dwo section or > the individual forms used could do the trick. > > -- wolfgangp > > > Questions and brickbats welcome. > > --paulr > > > > P.S. Ah, you clever reader, who noticed I carefully said nothing about > > LTO of mixed-DWARF-version compilations! Haven't thought about it.This is still a question. However I think it's not that hard to handle. If LTO sees a mix of v4 and v5, we emit a standard .debug_str_offsets.dwo section but force 32-bit offsets (i.e. leave a comment for our descendants to make sure that happens) and then the GCC forms will just DTRT. I think. The dumper would treat it as a standard section as long as there are any v5 units in the .dwo (or that contribution to the .dwp). IIRC the string sections aren't actually emitted until after all units have been processed, because everything shares the same string section, so remembering whether any v5 units have crossed our path and then adding the v5 header at the last moment should be feasible. --paulr> > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
David Blaikie via llvm-dev
2017-Jul-06 16:45 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
Yep, Wolfgang picked up on the one thing I saw too. Minor comments: Yeah, it's a pity some stuff (pre-standard split DWARF) doesn't have headers to allow standalone decoding. (I assume there's a mandate on the DWARF committee to have version headers for everything in every section that can have them now (the only ones that can't that I can think of are the actual string dedup sections because the linker actually looks at them as strings, manipulates them, etc)) I think it's a bit of a pity that str_offsets_base can point into the middle of a str_offsets contribution in some ways (because it means the actual str offsets will be harder to read in that case "oh, string 5, that means take the str_offsets_base address and add 5 * pointer size (which you find by looking at the address range of the str_offsets_base)" - so if you were dumping str_offsets you might print "header, string 1: "foo", string 2: "bar", etc... " - but now "string 5" might actually be "string 11" because str_offsets_base is the start of string 6 (but it's not '6', its the address... so you have to do all the decoding to figure that out) - a dumper could do some of this so when it's dumping str_offsets_base it could dump "0xdeadbeef, which is string 5 in the 3rd contribution"). Still, that's a pretty unique form of cross-CU sharing that I've not seen elsewhere. I think there'd be some point/benefit to being able to have multiple contributions in a DWO file, but I don't think it's /buggy/ that it's not allowed - just a possible future enhancement. "if you have a str_offsets_base, great, otherwise it's assumed to be 0 + <header size> (or just 0 if the CU uses GNU_dwo_id, rather than DWARF5 dwo_id, for example)) On Thu, Jul 6, 2017 at 6:35 AM Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > > -----Original Message----- > > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Pieb, > > Wolfgang via llvm-dev > > Sent: Wednesday, July 05, 2017 6:14 PM > > To: llvm-dev at lists.llvm.org > > Subject: Re: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > > > -----Original Message----- > > > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > > > Robinson, Paul via llvm-dev > > > Sent: Wednesday, July 05, 2017 1:35 PM > > > To: llvm-dev at lists.llvm.org > > > Subject: [llvm-dev] [DWARFv5] Reading the .debug_str_offsets section > > > > > snip ... > > > > > > Things get way trickier in an object (executable or "-r" ouput) that > > > has a mix of GCC and standard contributions. AFAICT there's no > > > equivalent of DW_AT_str_offsets_base in the GCC style, so about all > > > we can do is something like this: > > > (1) Walk through all units to find all DW_AT_str_offsets_base pointers; > > > (2) for each one, poke around in the prior 8-16 bytes looking for > > > the header; this is more reliable than it sounds; > > > (3) assume everything else in the section is GCC style. > > > > I believe a mix of GCC and standard contributions should only be an issue > > in a split-DWARF (fission) scenario, as there is no .debug_str_offsets > > section in a non-split pre-V5 compilation AFAIK. > > Oh, of course! So a normal object file is always standard. Excellent! > > > > > And given that we don't have a DW_AT_str_offsets_base attribute in > > .debug_info.dwo sections by standard decree, all units (whether standard > > V5 or GCC-style) would have to share the single contribution in the > > .debug_str_offsets.dwo section (or the single contribution in a slice of > > the section via dwp index table). > > > > So the only tricky part for the reader would be to figure out whether a > > .debug_str_offsets.dwo section (or a slice of it) is GCC-style or a > > (single) v5 standard contribution. As you pointed out earlier, either > > looking at the individual unit versions in the .debug_info.dwo section or > > the individual forms used could do the trick. > > > > -- wolfgangp > > > > > Questions and brickbats welcome. > > > --paulr > > > > > > P.S. Ah, you clever reader, who noticed I carefully said nothing about > > > LTO of mixed-DWARF-version compilations! Haven't thought about it. > > This is still a question. However I think it's not that hard to > handle. If LTO sees a mix of v4 and v5, we emit a standard > .debug_str_offsets.dwo section but force 32-bit offsets (i.e. leave > a comment for our descendants to make sure that happens) and then > the GCC forms will just DTRT. I think. > The dumper would treat it as a standard section as long as there are > any v5 units in the .dwo (or that contribution to the .dwp). > > IIRC the string sections aren't actually emitted until after all > units have been processed, because everything shares the same string > section, so remembering whether any v5 units have crossed our path > and then adding the v5 header at the last moment should be feasible. > --paulr > > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/c52c3a91/attachment.html>
Robinson, Paul via llvm-dev
2017-Jul-06 18:59 UTC
[llvm-dev] [DWARFv5] Reading the .debug_str_offsets section
Yep, Wolfgang picked up on the one thing I saw too. This is why I like having people review my stuff. I think it's a bit of a pity that str_offsets_base can point into the middle of a str_offsets contribution in some ways Actually I changed my mind after saying that in the review, and in this writeup I concluded that it cannot do that. str_offsets_base points to the element immediately after the header. You can have multiple units sharing the (entire) same contribution, but there's no slicing. I think there'd be some point/benefit to being able to have multiple contributions in a DWO file, but I don't think it's /buggy/ that it's not allowed - just a possible future enhancement. The trick with having multiple contributions in a DWO is that the .debug_line.dwo section has no way to specify which contribution to use. I mean, you could identify which CU points to that line table header, and then find the contribution associated with that unit, but it seems less complicated to say the DWO has only one contribution and not have to bother with all that. I'm also not persuaded that having multiple str_offsets contributions in a normal .o file is all that helpful, or able to save space. You save a byte, maybe two, per reference; but each unique string costs you 4, plus the header. You can construct examples where it's a savings but in the general case I'm not so sure. Thanks, --paulr -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/010a9809/attachment.html>