>> Oh, is there any consequence for deduplication in LTO? Isn’t that name-based? > Should be OK - that's based on the fully mangled/linkage name of the type, which would be untouched by this.I’ve recently been reminded that type-unit signatures are hashes of the name, not using the standard-recommended algorithm of hashing the content; I tried to work out which name is actually used, but it’s buried deeper than I am comfortable excavating. Can we make sure that hash is using either the name-with-parameters, or the linkage name, as the input string? We don’t want “foo<int>” and “foo<float>” using the same type-unit signature! --paulr -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/b43eb5c8/attachment.html>
David Blaikie via llvm-dev
2021-Jun-23 22:06 UTC
[llvm-dev] [DWARF] using simplified template names
On Wed, Jun 23, 2021 at 1:14 PM <paul.robinson at sony.com> wrote:> > >> Oh, is there any consequence for deduplication in LTO? Isn’t that name-based? > > > Should be OK - that's based on the fully mangled/linkage name of the type, which would be untouched by this. > > I’ve recently been reminded that type-unit signatures are hashes of the name, not using the standard-recommended algorithm of hashing the content; I tried to work out which name is actually used, but it’s buried deeper than I am comfortable excavating. Can we make sure that hash is using either the name-with-parameters, or the linkage name, as the input string? We don’t want “foo<int>” and “foo<float>” using the same type-unit signature!Worth checking, but yeah, not a problem - we don't emit class linkage names, so the only reason we carry the linkage name on types is for ODR deduplicating during LTO linking, and also using it for type units when those are enabled - the linkage name is stored in the DICompositeType's "identifier" field - not something readily confused with being guaranteed to be the linkage name nor used for DW_AT_linkage_name, etc. Only used as a unique identifier. That won't be touched. As an aside: I do have another direction I'm interested in pursuing that's related to linkage names, rather than the pretty names: We could reduce the number of DW_AT_linkage_names we emit by reconstituting linkage names in symbolizers instead (eg: if we see a function called "f3" with a single "int" formal parameter and void return type - we can reconstruct the linkage name for that function as _Zf3iv or whatever it is). On one particularly pathological case I'm looking at, the simplified pretty template names is worth 43% reduction in the final dwp .debug_str.dwo and a rough estimate on the linkage name (omitting linkage names from most cases when Clang's building the IR - there are certain kinds of template cases that are hard to reconstruct, but others that are easy/do-able with our current DWARF) 52%, and combined for 95% reduction in debug string size. (a less pathalogical case, one of Google's largest binaries, it was 26%/56% for 82% total reduction)
David Blaikie via llvm-dev
2021-Oct-09 00:48 UTC
[llvm-dev] [DWARF] using simplified template names
I think I'm down to one of the last pieces for rebuilding the names & being able to round-trip them through llvm-dwarfdump --verify: https://reviews.llvm.org/D111477 - in case anyone's got opinions on what we should do with integer type suffixes on non-type template parameters. A few other remaining pieces: 1) make the "operator" detection a bit better: https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/DWARF/DWARFDie.cpp#L308-L311 - Don't think we can rely on there being a space after the word "operator" (because it might be "operator<" for instance) so maybe I just need a full regex/exhaustive list of valid identifier characters, so if it's "operator" followed by an identifier character, then it doesn't trigger this special case? Or the inverse - special case all the whitespace+first characters in operator overloads. That's probably a smaller set. 2) Integrate this into llvm-symbolizer so it rebuilds the names automatically Then there's some lldb bugs to fix, etc. On Wed, Jun 23, 2021 at 3:06 PM David Blaikie <dblaikie at gmail.com> wrote:> On Wed, Jun 23, 2021 at 1:14 PM <paul.robinson at sony.com> wrote: > > > > >> Oh, is there any consequence for deduplication in LTO? Isn’t that > name-based? > > > > > Should be OK - that's based on the fully mangled/linkage name of the > type, which would be untouched by this. > > > > I’ve recently been reminded that type-unit signatures are hashes of the > name, not using the standard-recommended algorithm of hashing the content; > I tried to work out which name is actually used, but it’s buried deeper > than I am comfortable excavating. Can we make sure that hash is using > either the name-with-parameters, or the linkage name, as the input string? > We don’t want “foo<int>” and “foo<float>” using the same type-unit > signature! > > Worth checking, but yeah, not a problem - we don't emit class linkage > names, so the only reason we carry the linkage name on types is for > ODR deduplicating during LTO linking, and also using it for type units > when those are enabled - the linkage name is stored in the > DICompositeType's "identifier" field - not something readily confused > with being guaranteed to be the linkage name nor used for > DW_AT_linkage_name, etc. Only used as a unique identifier. That won't > be touched. > > > As an aside: I do have another direction I'm interested in pursuing > that's related to linkage names, rather than the pretty names: We > could reduce the number of DW_AT_linkage_names we emit by > reconstituting linkage names in symbolizers instead (eg: if we see a > function called "f3" with a single "int" formal parameter and void > return type - we can reconstruct the linkage name for that function as > _Zf3iv or whatever it is). > > On one particularly pathological case I'm looking at, the simplified > pretty template names is worth 43% reduction in the final dwp > .debug_str.dwo and a rough estimate on the linkage name (omitting > linkage names from most cases when Clang's building the IR - there are > certain kinds of template cases that are hard to reconstruct, but > others that are easy/do-able with our current DWARF) 52%, and combined > for 95% reduction in debug string size. (a less pathalogical case, one > of Google's largest binaries, it was 26%/56% for 82% total reduction) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211008/87be9e5c/attachment.html>