thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2020-Jun-03 17:25 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>DWARF was designed in an era when COMDAT and ICF were not a thing, or at
least not common,
>certainly not when talking about function code.  The overhead of a unit
occurred only once per
>translation unit, so that expense was reasonably amortized.
>Splitting functions into their own object-file sections and making them
excludable is an evolution of
>compiler/linker technology that DWARF has not kept up with.  The
linker-friendly solutions (COMDAT
>DWARF) would put function-related .debug_* contributions into a
section-group along with the function
>.text itself; this multiplies the total number of sections to deal with,
regardless of the tactics used for the
> content of each per-function DWARF section.  The fully DWARF-conformant
solution would create one
> partial_unit per function, with the corresponding overhead of unit headers
(especially painful in the
> .debug_line section).  Alternatively we fragment DWARF into sections
without headers and rely on the
> linker to make everything look right in the linked executable; this
produces .o files that are not DWARF
>conformant (unless we can standardize this in DWARF v6) and would be a big
hassle for consumers
>other than the linker.
>Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the
linker.
Probably we could try to make DWARF easy to parsing, trimming, rewriting so that
full DWARF
parsing solution would not take too much time?

f.e. -debug-types-section solution uses COMDAT sections to split and deduplicate
types.
That solution works quite fast. It has already mentioned drawback with a big
size
overhead(because of section headers/type unit headers sizes). But, the fact that
type units
could be identified just by hash-id(without parsing type names and types
hierarchies)
allows the linker to reject duplications quickly. Another thing is that the
linker drops
duplicated COMDAT sections without any additional check. After duplications are
deleted,
the debug info is still consistent.

There could be done DWARF aware solution working using the same two principles:

1. compare types by hash-id.
2. drop duplications without analyzing contents.

If all types are put into a separate type table and have hash-id, then it would
be much easier to
deduplicate them. The idea demonstrated here - https://reviews.llvm.org/P8164.
(It still has a
questions: whether base types should be put into type table, whether references
into type table
should be done by DW_AT_signature or just by offset, etc.. ) While handling that
separate type table
the DWARF aware linker would check the only hash_id and put only one type
description
with the same id in the final type table. It also would allow us to solve that
-flto=thin problem -
 http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is
dsymutil example there).
i.e., the case when type definition would be removed will not occur.

Thank you, Alexey.
>--paulr
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James
Henderson via llvm-dev
Sent: Wednesday, June 3, 2020 3:48 AM
To: David Blaikie <dblaikie at gmail.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in
lld.

It makes me sad that the linker (via a library or otherwise) has to be
"DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
--icf etc for debug info, without leaving large blocks of data kicking around.

The patching to -1 (or equivalent) is probably a good lightweight solution
(though I'd love it if it could be done based on section type in the future
rather than section name, but that's probably outside the realm of DWARF),
as it requires only minimal understanding in the linker, but anything beyond
that seems to be complicated logic that is mostly due to the structure of DWARF.
Patching to -1 does feel a bit like a sticking plaster/band aid to patch over
the issue rather than properly solving it too - there will still be debug data
(potentially significant amounts in COMDAT-heavy objects) that the linker has to
write and the debugger has to somehow know how to skip (even if it knows that -1
is special-case due to the standard being updated, it needs to get as far as the
-1), which is all wasted effort.

We've already seen from Alexey's prototyping, and from our own
experiences with the Sony proprietary linker (which tried to rewrite .debug_line
only) that deconstructing the DWARF so that it can be more optimally reassembled
at link time is slow going, and will probably inevitably be however much effort
is put into optimising it. For a start, given the current standards, it's
impossible to know how to deconstruct it without having to parse vast amounts of
DWARF, which is typically going to mean a lot more parsing work than the linker
would normally have to deal with. Additionally, much of this parsing work is
wasted effort, since it seems unlikely in many links that large amounts of the
DWARF will be redundant. Having an option to opt-in doesn't help much there,
since it just means the logic exists without most people using it, due to it not
being good enough, or potentially they don't even know it exists.

I don't have particularly concrete suggestions as to how to solve the
structural problems with DWARF at this point. The only thing that seems obvious
to me is a more "blessed" approach to fragmentation of sections,
similar to what I tried with my prototype mentioned earlier in the thread,
although we'd need to figure out the previously stated performance issues.
Other ideas might tie into this, like somehow sharing the various table headers
a bit like CIEs in .eh_frame that could be merged by the linker - each object
could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.

Just some thoughts.

James

On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
<alapshin at accesssoftek.com<mailto:alapshin at accesssoftek.com>>
wrote:>
> Hi David, please find my comments inside:
>
>
> >>>Broad question: Do you have any specific motivation/users/etc
in implementing this (if you can speak about it)?
>
> >>> - it might help motivate the work, understand what tradeoffs
might be suitable for you/your users, etc.
>
> >>There are two general requirements:
> >> 1) Remove (or clean) invalid debug info.
>
> >
> >Perhaps a simpler direct solution for your immediate needs might be a
much narrower,
> >and more efficient linker-DWARF-awareness feature:
> >
> > With DWARFv5, rnglists present an opportunity for a DWARF linker to
rewrite the ranges
> > without parsing the rest of the DWARF. /technically/ this isn't
guaranteed - rnglist entries
> > can be referenced either directly, or by index. If all rnglists are
referenced by index, then
> > a linker could parse only the debug_rnglists section and rewrite
ranges to remove any
> > address ranges that refer to optimized-out code.
> >
> > This would only be correct for rnglists that had no direct references
to them (that only were
> > referenced via the indexes) - but we could either implement it with
that assumption, or could
> > add an LLVM extension attribute on the CU that would say "I
promise I only referenced rnglists
> > via rnglistx forms/indexes). If this DWARF-aware linking would have to
read the CU DIE (not
> > all the other DIEs) it /could/ also then rewrite high/low_pc if the CU
wasn't using ranges...
> > but that wouldn't come up in the function-removal case, because
then you'd have ranges anyway,
> > so no need for that.
> >
> > Such a DWARF-aware rnglist linking could also simplify rnglists, in
cases where functions
> > ended up being laid out next to each other, the linker could coalesce
their ranges together.
> >
> > I imagine this could be implemented with very little overhead to
linking, especially compared
> > to the overhead of full DWARF-aware linking.
> >
> >Though none of this fixes Split DWARF, where the linker doesn't get
a chance to see the
> > addresses being used - but if you only want/need the CU-level ranges
to be correct, this
> > might be a viable fix, and quite efficient.
>
> Yes, we think about that alternative. This would resolve our problem of
invalid debug info
> and would work much faster. Thus, if we would not have good results for
D74169 then we
> will implement it. Do you think it could be useful to have this solution in
upstream?
A pure rnglist rewriting - I think it'd be OK to have in upstream -
again, cost/benefit/etc would have to be weighed. I'm not sure it
would save enough space to be particularly valuable beyond the
correctness issue - and it doesn't completely solve the correctness
issue for zero-address usage or low-address usage (because you could
still have overlapping subprograms inside a CU - so if you were
symbolizing you could use the correct rnglist to filter, but then go
look inside the CU only to find two subprograms that had that address
& not know which one was the correct one an which one was the
discarded one).

rnglist rewriting might be easy enough to prototype - but depends what
you want to spend your time on, I know this whole issue has been a
huge investment of your time already - but maybe this recent
revitalization of the conversation around having an explicit value in
the linker might be sufficient to address everyone's needs... *fingers
crossed*)

> >> 2) Optimize the DWARF size.
>
>
> > Do your users care much about this? I imagine if they had significant
DWARF size issues,
> > they'd have significant link time issues and the kind of cost to
link time this feature has would
> > be prohibitive - but perhaps they're sharing linked binaries much
more often than they're
> > actually performing linking.
>
> Yes, they do. They also have significant link-time issues.
> So current performance results of D74169 are not very acceptable.
> We hope to improve it.
>
>
>
> >>The specifics which our users have:
> >>  - embedded platform which uses 0 as start of .text section.
> >>  - custom toolset which does not support all features yet(f.e.
split dwarf).
> >>  - tolerant of the link-time increase.
> >>  - need a useful way to share debug builds.
>
>
> > Sharing two files (executable and dwp) is significantly less useful
than sharing one file?
>
> Probably not significantly, but yes, it looks less useful comparing to
D74169.
> Having only two files (executable and .dwp) looks significantly better than
having executable and multiple .dwo files.
> Having only one file(executable) with minimal size looks better than the
two files with a bigger size.
>
> clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G for
.dwp.
> clang compiled with -gc-debuginfo takes only 0.76G for single executable.
>
>
>
> >>For the first point: we have a problem "Overlapping address
ranges starting from 0"(D59553).
>
> >>We use custom solution, but the general solution like D74169 would
be better here.
>
>
> > If CU ranges are the only ones that need fixing, then I think the
above solution might be as
> > good/better - if more than CU ranges need fixing, then I think we
might want to start talking about
> > how to fix DWARF itself (split and non-split) to signal certain
addresses point to dead code with a
> > specific blessed value that linkers would need to implement - because
with Split DWARF there's
> > no way to solve the non-CU addresses at the linker.
>
> I think the worthful solution for that signal value would be LowPC >
HighPC.
> That does not require additional bits in DWARF.
> It would be natural to skip such address ranges since they explicitly
marked as invalid.
> It could be implemented in a linker very easily. Probably, it would make
sense to describe that
> usage in DWARF standard.
>
> As to the addresses which are not seen by the linker(since they are in .dwo
files) - yes,
> they need to have another solution. Could you show an example of such a
case, please?
>
>
>
> >>>2. Support of type units.
>
> >>>
>
> >>>>  That could be implemented further.
>
> >>>Enabling type units increases object size to make it easier to
deduplicate at link time by a DWARF-unaware
>
> >>>linker. With a DWARF aware linker it'd be generally
desirable not to have to add that object size overhead to
>
> >>>get the linking improvements.
>
> >>
>
> >>But, DWARFLinker should adequately work with type units since they
are already implemented.
>
>
> > Maybe - it'd be nice & all, but I don't think it's an
outright necessity - if someone knows they're using
> > a DWARF-aware linker, they'd probably not use type units in their
object files. It's possible someone
> > doesn't know for sure & maybe they have pre-canned debug
object files from someone else, etc.
>
> I see.
>
> >>Another thing is that the idea behind type units has the potential
to help Dwarf-aware linker to work faster.
>
> >>Currently, DWARFLinker analyzes context to understand whether types
are the same or not.
>
>
> >When you say "analyzes context" what do you mean? Usually
I'd take that to mean
> > "looks at things outside the type itself - like what namespace
it's in, etc" - which, yes,
> > it should do that, but it doesn't seem very expensive to do. But I
guess you actually
> > mean something about doing structural equivalence in some way, looking
at things inside the type?
>
> I think it could be useful for both cases. Currently, dsymutil does only
first thing
> (look at type name, namespace name, etc..) and does not do the second thing
> (doing structural equivalence). Analyzing type names is currently quite
expensive
> (the only search in string pool takes ~10 sec from 70 sec of overall time).
> That is expensive because of many things should be done to work with
strings:
> parse DWARF, search and resolve relocations, compute a hash for strings,
> put data into a string pool, create a fully qualified name(like
namespace::function::name).
> It looks like it could be optimized and finally require less time, but it
still would be a noticeable
> part of the overall time.
>
> If dsymutil starts to check for the structural equivalence, then the
process would be even more slowly.
> So, If instead of comparing types structure, there would be checked single
hash-id - then this process
> would also be faster.
>
> Thus I think using hash-id to compare types would allow to make current
implementation faster and would
> allow handling incomplete types by DWARFLinker without massive performance
degradation also.
>
> >> But the context is known when types are generated. So, no need to
spent the time analyzing it.
>
> >> If types could be compared without analyzing context, then
Dwarf-aware linker would work faster.
>
> >> That is just an idea(not for immediate implementation): If types
would be stored in some "type table"
>
> >> (instead of COMDAT section group) and could be accessed through
hash-id(like type units
>
> >> - then it would be the solution requiring fewer bits to store but
allowing to compare types
>
> >> by hash-id(not analysing context).
> >> In this case, size increasing would be small. And processing time
could be done faster.
> >>
> >> this is just an idea and could be discussed separately from the
problem of integrating of D74169.
>
> >> >> 6. -flto=thin
>
> >> >>    That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in
>
> >> >> current DWARFLinker/dsymutil implementation. I think that
problem should be discussed more: it could
>
> >> >> probably be fixed by avoiding generation of such
incomplete declaration during thinlto,
>
> >> >> That would be costly to produce extra/redundant debug
info in ThinLTO - actually ThinLTO could be doing
>
> >> >> more to reduce that redundancy early on (actually
removing definitions from some llvm Modules if the type
>
> >> >> definition is known to exist in another Module, etc)
> >> >I don't know if it's a problem since that patch was
reverted.
>
> >>
>
> >> Yes. That patch was reverted, but this patch(D74169) has the same
problem.
>
> >> if D74169 would be applied and --gc-debuginfo used then structure
type
> >> definition would be removed.
>
> >> DWARFLinker could handle that case - "removing definitions
from some llvm Modules if the type
> >> definition is known to exist in another Module".
> >> i.e. DWARFLinker could replace the declaration with the
definition.
>
> >> But that problem could be more easily resolved when debug info is
generated(probably without
> >> significant increase of debug info size):
>
> >> Here we have:
>
> >> DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete
instance for function "f".
> >> DW_TAG_compile_unit(0x00000073) - compile unit containing abstract
instance root for function "f".
> >> DW_TAG_compile_unit(0x000000c1) - compile unit containing function
"f" definition.
>
> >> Code for function "f" was deleted. gc-debuginfo deletes
compile unit DW_TAG_compile_unit(0x000000c1)
> >> containing "f" definition (since there is no
corresponding code). But it has structure "Foo" definition
> >> DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
> >> by declaration DW_TAG_structure_type(0x000000ae). That declaration
is exactly the case when definition
> >> was removed by thinlto and replaced with declaration.
>
> >> Would it cost too much if type definition would not be replaced
with declaration for "abstract instance root"?
> >> The number of concrete instances is bigger than number of abstract
instance roots.
> >> Probably, it would not be too costly to leave definition in
abstract instance root?
>
>
>
> >> Alternatively, Would it cost too much if type definition would not
be replaced with declaration when
> >> declaration references type from not used function? (lto could
understand that concrete function is not used).
>
>
> >I don't follow this example - could you provide a small concrete
test case I could reproduce?
>
> I would provide a test case if necessary. But it looks like this issue is
finally clear, and you already commented on that.
>
> > Oh, I guess this is happening perhaps because ThinLTO can't know
for sure that a standalone
> > definition of 'f' won't be needed - so it produces one in
case one of the inlining opportunities
> > doesn't end up inlining. Then it turns out all calls got inlined,
so the external definition wasn't needed.
>
> > Oh, you're suggesting that these 3 CUs got emitted into one object
file during LTO, but that DWARFLinker
> > drops a CU without any code in it - even though... So far as I know,
in LTO, LLVM directly references
> > types across units if the CUs are all emitted in the same object file.
(and if they weren't in the same
> > object file - then the abstract_origin couldn't be pointing
cross-CU).
>
> > I guess some basic things to say:
>
> > With ThinLTO, the concrete/standalone function definition is emitted
in case some call sites don't end up
> > being inlined. So we know it'll be emitted (but might not be
needed by the actual linker)
> > ANy number of inline calls might exist - but we shouldn't put the
type information into those, because
> > they aren't guaranteed to emit it (if the inline function gets
optimized away, there would be nothing to
> > enforce the type being emitted) - and even if we forced the type
information to be emitted into one
> > object file that has an inline copy of the function - there's no
guarantee that object file will get linked in either.
>
> > So, no, I don't think there's much we can do to keep the size
of object files down, while guaranteeing
> > the type information will be emitted with the usual linker semantics.
>
> Then dsymutil/DWARFLinker could be changed to handle that(though it would
probably be not very efficient).
> If thinlto would understand that function is not used finally(and then must
not contain referenced type definition),
> then this situation could be handled more effectively.
>
> Thank you, Alexey.
>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200603/3d9be895/attachment-0001.html>

David Blaikie via llvm-dev

2020-Jun-03 21:43 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On Wed, Jun 3, 2020 at 10:25 AM Alexey Lapshin
<alapshin at accesssoftek.com> wrote:>
> >DWARF was designed in an era when COMDAT and ICF were not a thing, or
at least not common,
> >certainly not when talking about function code.  The overhead of a unit
occurred only once per
> >translation unit, so that expense was reasonably amortized.
> >
> >Splitting functions into their own object-file sections and making them
excludable is an evolution of
> >compiler/linker technology that DWARF has not kept up with.  The
linker-friendly solutions (COMDAT
> >DWARF) would put function-related .debug_* contributions into a
section-group along with the function
> >.text itself; this multiplies the total number of sections to deal
with, regardless of the tactics used for the
> > content of each per-function DWARF section.  The fully
DWARF-conformant solution would create one
> > partial_unit per function, with the corresponding overhead of unit
headers (especially painful in the
>
> > .debug_line section).  Alternatively we fragment DWARF into sections
without headers and rely on the
> > linker to make everything look right in the linked executable; this
produces .o files that are not DWARF
> >conformant (unless we can standardize this in DWARF v6) and would be a
big hassle for consumers
> >other than the linker.
> >Or we pay the cost of parsing, trimming, and rewriting all the DWARF in
the linker.
>
> Probably we could try to make DWARF easy to parsing, trimming, rewriting so
that full DWARF
> parsing solution would not take too much time?
>
> f.e. -debug-types-section solution uses COMDAT sections to split and
deduplicate types.
> That solution works quite fast. It has already mentioned drawback with a
big size
> overhead(because of section headers/type unit headers sizes). But, the fact
that type units
> could be identified just by hash-id(without parsing type names and types
hierarchies)
> allows the linker to reject duplications quickly. Another thing is that the
linker drops
> duplicated COMDAT sections without any additional check. After duplications
are deleted,
> the debug info is still consistent.
> There could be done DWARF aware solution working using the same two
principles:
> 1. compare types by hash-id.
> 2. drop duplications without analyzing contents.
>
> If all types are put into a separate type table and have hash-id, then it
would be much easier to
> deduplicate them. The idea demonstrated here -
https://reviews.llvm.org/P8164. (It still has a
> questions: whether base types should be put into type table, whether
references into type table
> should be done by DW_AT_signature or just by offset, etc.. ) While handling
that separate type table
> the DWARF aware linker would check the only hash_id and put only one type
description
> with the same id in the final type table. It also would allow us to solve
that -flto=thin problem -
>  http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is
dsymutil example there).
> i.e., the case when type definition would be removed will not occur.
I think there is scope for lower-overhead type deduplication,
especially now with type units being merged into the debug_info
section. Perhaps we could drop dwo_ids and use section references to
refer to types & rely on the linker to keep those referenced sections
alive - though section references are longer than CU-relative
references. (but we need the extra length - because if the linker
deduplicates a type definition - one CU may be referencing a type very
far away, so the shorter reference might be inadequate) I don't think
the indirection through the type hash is /super/ significant to the
cost - I think it's more in the duplication of many DIEs especially
for function definitions (since the type unit sig8 system only
provides a way to reference the type - not its member functions, their
parameters, etc - so all those DIEs get duplicated in any CU that
needs to provide a definition of a member function). We could
prototype cross-unit DIE references to lower the cost of that
duplication, though rumor has it that constructor based type homing
might provide enough value to obviate the need for type units (or at
least make the overhead not worthwhile - so revisiting the overhead to
reduce it might make it worthwhile again... ).

Probably wouldn't be super hard to use LLVM's existing cross-unit DIE
Referencing machinery (implemented for LTO) to refer directly to DIEs
in a type unit without using the signature system... - hmm, that'd
only work if your type unit DIEs were identical? /maybe/ ? Not sure
how that'd work if you wanted to refer into a type unit, but the type
unit got deduplicated. Might be able to rely on the linker to preserve
every unique copy of the type unit that's referenced if we phrase
things carefully - so if your compiler does produce exactly identical
type units they get deduplicated and sec_refs refer to the uniquely
preserved copy - but otherwise it preserves as many distinct copies as
needed. (I don't know enough about how that works to be sure - but I
know that these linkonce/inline function deduplication does seem to
cause the DWARF to refer to the singular function if that function is
identical, and if it isn't, then you get 0 - so there's /something/ in
the linker that can adjust for deduplicating identical duplicates... )
>
>
> Thank you, Alexey.
>
>
> >--paulr
>
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
James Henderson via llvm-dev
> Sent: Wednesday, June 3, 2020 3:48 AM
> To: David Blaikie <dblaikie at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.
>
>
>
> It makes me sad that the linker (via a library or otherwise) has to be
"DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
--icf etc for debug info, without leaving large blocks of data kicking around.
>
>
>
> The patching to -1 (or equivalent) is probably a good lightweight solution
(though I'd love it if it could be done based on section type in the future
rather than section name, but that's probably outside the realm of DWARF),
as it requires only minimal understanding in the linker, but anything beyond
that seems to be complicated logic that is mostly due to the structure of DWARF.
Patching to -1 does feel a bit like a sticking plaster/band aid to patch over
the issue rather than properly solving it too - there will still be debug data
(potentially significant amounts in COMDAT-heavy objects) that the linker has to
write and the debugger has to somehow know how to skip (even if it knows that -1
is special-case due to the standard being updated, it needs to get as far as the
-1), which is all wasted effort.
>
>
>
> We've already seen from Alexey's prototyping, and from our own
experiences with the Sony proprietary linker (which tried to rewrite .debug_line
only) that deconstructing the DWARF so that it can be more optimally reassembled
at link time is slow going, and will probably inevitably be however much effort
is put into optimising it. For a start, given the current standards, it's
impossible to know how to deconstruct it without having to parse vast amounts of
DWARF, which is typically going to mean a lot more parsing work than the linker
would normally have to deal with. Additionally, much of this parsing work is
wasted effort, since it seems unlikely in many links that large amounts of the
DWARF will be redundant. Having an option to opt-in doesn't help much there,
since it just means the logic exists without most people using it, due to it not
being good enough, or potentially they don't even know it exists.
>
>
>
> I don't have particularly concrete suggestions as to how to solve the
structural problems with DWARF at this point. The only thing that seems obvious
to me is a more "blessed" approach to fragmentation of sections,
similar to what I tried with my prototype mentioned earlier in the thread,
although we'd need to figure out the previously stated performance issues.
Other ideas might tie into this, like somehow sharing the various table headers
a bit like CIEs in .eh_frame that could be merged by the linker - each object
could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
>
>
>
> Just some thoughts.
>
>
>
> James
>
>
>
> On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
> >
> > Hi David, please find my comments inside:
> >
> >
> > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
> >
> > >>> - it might help motivate the work, understand what
tradeoffs might be suitable for you/your users, etc.
> >
> > >>There are two general requirements:
> > >> 1) Remove (or clean) invalid debug info.
> >
> > >
> > >Perhaps a simpler direct solution for your immediate needs might
be a much narrower,
> > >and more efficient linker-DWARF-awareness feature:
> > >
> > > With DWARFv5, rnglists present an opportunity for a DWARF linker
to rewrite the ranges
> > > without parsing the rest of the DWARF. /technically/ this
isn't guaranteed - rnglist entries
> > > can be referenced either directly, or by index. If all rnglists
are referenced by index, then
> > > a linker could parse only the debug_rnglists section and rewrite
ranges to remove any
> > > address ranges that refer to optimized-out code.
> > >
> > > This would only be correct for rnglists that had no direct
references to them (that only were
> > > referenced via the indexes) - but we could either implement it
with that assumption, or could
> > > add an LLVM extension attribute on the CU that would say "I
promise I only referenced rnglists
> > > via rnglistx forms/indexes). If this DWARF-aware linking would
have to read the CU DIE (not
> > > all the other DIEs) it /could/ also then rewrite high/low_pc if
the CU wasn't using ranges...
> > > but that wouldn't come up in the function-removal case,
because then you'd have ranges anyway,
> > > so no need for that.
> > >
> > > Such a DWARF-aware rnglist linking could also simplify rnglists,
in cases where functions
> > > ended up being laid out next to each other, the linker could
coalesce their ranges together.
> > >
> > > I imagine this could be implemented with very little overhead to
linking, especially compared
> > > to the overhead of full DWARF-aware linking.
> > >
> > >Though none of this fixes Split DWARF, where the linker
doesn't get a chance to see the
> > > addresses being used - but if you only want/need the CU-level
ranges to be correct, this
> > > might be a viable fix, and quite efficient.
> >
> > Yes, we think about that alternative. This would resolve our problem
of invalid debug info
> > and would work much faster. Thus, if we would not have good results
for D74169 then we
> > will implement it. Do you think it could be useful to have this
solution in upstream?
>
> A pure rnglist rewriting - I think it'd be OK to have in upstream -
> again, cost/benefit/etc would have to be weighed. I'm not sure it
> would save enough space to be particularly valuable beyond the
> correctness issue - and it doesn't completely solve the correctness
> issue for zero-address usage or low-address usage (because you could
> still have overlapping subprograms inside a CU - so if you were
> symbolizing you could use the correct rnglist to filter, but then go
> look inside the CU only to find two subprograms that had that address
> & not know which one was the correct one an which one was the
> discarded one).
>
> rnglist rewriting might be easy enough to prototype - but depends what
> you want to spend your time on, I know this whole issue has been a
> huge investment of your time already - but maybe this recent
> revitalization of the conversation around having an explicit value in
> the linker might be sufficient to address everyone's needs... *fingers
> crossed*)
>
>
> > >> 2) Optimize the DWARF size.
> >
> >
> > > Do your users care much about this? I imagine if they had
significant DWARF size issues,
> > > they'd have significant link time issues and the kind of cost
to link time this feature has would
> > > be prohibitive - but perhaps they're sharing linked binaries
much more often than they're
> > > actually performing linking.
> >
> > Yes, they do. They also have significant link-time issues.
> > So current performance results of D74169 are not very acceptable.
> > We hope to improve it.
> >
> >
> >
> > >>The specifics which our users have:
> > >>  - embedded platform which uses 0 as start of .text section.
> > >>  - custom toolset which does not support all features
yet(f.e. split dwarf).
> > >>  - tolerant of the link-time increase.
> > >>  - need a useful way to share debug builds.
> >
> >
> > > Sharing two files (executable and dwp) is significantly less
useful than sharing one file?
> >
> > Probably not significantly, but yes, it looks less useful comparing to
D74169.
> > Having only two files (executable and .dwp) looks significantly better
than having executable and multiple .dwo files.
> > Having only one file(executable) with minimal size looks better than
the two files with a bigger size.
> >
> > clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G
for .dwp.
> > clang compiled with -gc-debuginfo takes only 0.76G for single
executable.
> >
> >
> >
> > >>For the first point: we have a problem "Overlapping
address ranges starting from 0"(D59553).
> >
> > >>We use custom solution, but the general solution like D74169
would be better here.
> >
> >
> > > If CU ranges are the only ones that need fixing, then I think the
above solution might be as
> > > good/better - if more than CU ranges need fixing, then I think we
might want to start talking about
> > > how to fix DWARF itself (split and non-split) to signal certain
addresses point to dead code with a
> > > specific blessed value that linkers would need to implement -
because with Split DWARF there's
> > > no way to solve the non-CU addresses at the linker.
> >
> > I think the worthful solution for that signal value would be LowPC
> HighPC.
> > That does not require additional bits in DWARF.
> > It would be natural to skip such address ranges since they explicitly
marked as invalid.
> > It could be implemented in a linker very easily. Probably, it would
make sense to describe that
> > usage in DWARF standard.
> >
> > As to the addresses which are not seen by the linker(since they are in
.dwo files) - yes,
> > they need to have another solution. Could you show an example of such
a case, please?
> >
> >
> >
> > >>>2. Support of type units.
> >
> > >>>
> >
> > >>>>  That could be implemented further.
> >
> > >>>Enabling type units increases object size to make it
easier to deduplicate at link time by a DWARF-unaware
> >
> > >>>linker. With a DWARF aware linker it'd be generally
desirable not to have to add that object size overhead to
> >
> > >>>get the linking improvements.
> >
> > >>
> >
> > >>But, DWARFLinker should adequately work with type units since
they are already implemented.
> >
> >
> > > Maybe - it'd be nice & all, but I don't think
it's an outright necessity - if someone knows they're using
> > > a DWARF-aware linker, they'd probably not use type units in
their object files. It's possible someone
> > > doesn't know for sure & maybe they have pre-canned debug
object files from someone else, etc.
> >
> > I see.
> >
> > >>Another thing is that the idea behind type units has the
potential to help Dwarf-aware linker to work faster.
> >
> > >>Currently, DWARFLinker analyzes context to understand whether
types are the same or not.
> >
> >
> > >When you say "analyzes context" what do you mean?
Usually I'd take that to mean
> > > "looks at things outside the type itself - like what
namespace it's in, etc" - which, yes,
> > > it should do that, but it doesn't seem very expensive to do.
But I guess you actually
> > > mean something about doing structural equivalence in some way,
looking at things inside the type?
> >
> > I think it could be useful for both cases. Currently, dsymutil does
only first thing
> > (look at type name, namespace name, etc..) and does not do the second
thing
> > (doing structural equivalence). Analyzing type names is currently
quite expensive
> > (the only search in string pool takes ~10 sec from 70 sec of overall
time).
> > That is expensive because of many things should be done to work with
strings:
> > parse DWARF, search and resolve relocations, compute a hash for
strings,
> > put data into a string pool, create a fully qualified name(like
namespace::function::name).
> > It looks like it could be optimized and finally require less time, but
it still would be a noticeable
> > part of the overall time.
> >
> > If dsymutil starts to check for the structural equivalence, then the
process would be even more slowly.
> > So, If instead of comparing types structure, there would be checked
single hash-id - then this process
> > would also be faster.
> >
> > Thus I think using hash-id to compare types would allow to make
current implementation faster and would
> > allow handling incomplete types by DWARFLinker without massive
performance degradation also.
> >
> > >> But the context is known when types are generated. So, no
need to spent the time analyzing it.
> >
> > >> If types could be compared without analyzing context, then
Dwarf-aware linker would work faster.
> >
> > >> That is just an idea(not for immediate implementation): If
types would be stored in some "type table"
> >
> > >> (instead of COMDAT section group) and could be accessed
through hash-id(like type units
> >
> > >> - then it would be the solution requiring fewer bits to store
but allowing to compare types
> >
> > >> by hash-id(not analysing context).
> > >> In this case, size increasing would be small. And processing
time could be done faster.
> > >>
> > >> this is just an idea and could be discussed separately from
the problem of integrating of D74169.
> >
> > >> >> 6. -flto=thin
> >
> > >> >>    That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in
> >
> > >> >> current DWARFLinker/dsymutil implementation. I think
that problem should be discussed more: it could
> >
> > >> >> probably be fixed by avoiding generation of such
incomplete declaration during thinlto,
> >
> > >> >> That would be costly to produce extra/redundant
debug info in ThinLTO - actually ThinLTO could be doing
> >
> > >> >> more to reduce that redundancy early on (actually
removing definitions from some llvm Modules if the type
> >
> > >> >> definition is known to exist in another Module, etc)
> > >> >I don't know if it's a problem since that patch
was reverted.
> >
> > >>
> >
> > >> Yes. That patch was reverted, but this patch(D74169) has the
same problem.
> >
> > >> if D74169 would be applied and --gc-debuginfo used then
structure type
> > >> definition would be removed.
> >
> > >> DWARFLinker could handle that case - "removing
definitions from some llvm Modules if the type
> > >> definition is known to exist in another Module".
> > >> i.e. DWARFLinker could replace the declaration with the
definition.
> >
> > >> But that problem could be more easily resolved when debug
info is generated(probably without
> > >> significant increase of debug info size):
> >
> > >> Here we have:
> >
> > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing
concrete instance for function "f".
> > >> DW_TAG_compile_unit(0x00000073) - compile unit containing
abstract instance root for function "f".
> > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing
function "f" definition.
> >
> > >> Code for function "f" was deleted. gc-debuginfo
deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > >> containing "f" definition (since there is no
corresponding code). But it has structure "Foo" definition
> > >> DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
> > >> by declaration DW_TAG_structure_type(0x000000ae). That
declaration is exactly the case when definition
> > >> was removed by thinlto and replaced with declaration.
> >
> > >> Would it cost too much if type definition would not be
replaced with declaration for "abstract instance root"?
> > >> The number of concrete instances is bigger than number of
abstract instance roots.
> > >> Probably, it would not be too costly to leave definition in
abstract instance root?
> >
> >
> >
> > >> Alternatively, Would it cost too much if type definition
would not be replaced with declaration when
> > >> declaration references type from not used function? (lto
could understand that concrete function is not used).
> >
> >
> > >I don't follow this example - could you provide a small
concrete test case I could reproduce?
> >
> > I would provide a test case if necessary. But it looks like this issue
is finally clear, and you already commented on that.
> >
> > > Oh, I guess this is happening perhaps because ThinLTO can't
know for sure that a standalone
> > > definition of 'f' won't be needed - so it produces
one in case one of the inlining opportunities
> > > doesn't end up inlining. Then it turns out all calls got
inlined, so the external definition wasn't needed.
> >
> > > Oh, you're suggesting that these 3 CUs got emitted into one
object file during LTO, but that DWARFLinker
> > > drops a CU without any code in it - even though... So far as I
know, in LTO, LLVM directly references
> > > types across units if the CUs are all emitted in the same object
file. (and if they weren't in the same
> > > object file - then the abstract_origin couldn't be pointing
cross-CU).
> >
> > > I guess some basic things to say:
> >
> > > With ThinLTO, the concrete/standalone function definition is
emitted in case some call sites don't end up
> > > being inlined. So we know it'll be emitted (but might not be
needed by the actual linker)
> > > ANy number of inline calls might exist - but we shouldn't put
the type information into those, because
> > > they aren't guaranteed to emit it (if the inline function
gets optimized away, there would be nothing to
> > > enforce the type being emitted) - and even if we forced the type
information to be emitted into one
> > > object file that has an inline copy of the function - there's
no guarantee that object file will get linked in either.
> >
> > > So, no, I don't think there's much we can do to keep the
size of object files down, while guaranteeing
> > > the type information will be emitted with the usual linker
semantics.
> >
> > Then dsymutil/DWARFLinker could be changed to handle that(though it
would probably be not very efficient).
> > If thinlto would understand that function is not used finally(and then
must not contain referenced type definition),
> > then this situation could be handled more effectively.
> >
> > Thank you, Alexey.
> >
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alexey Lapshin via llvm-dev

2020-Jun-05 20:55 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>>
>> >DWARF was designed in an era when COMDAT and ICF were not a thing,
or at least not common,
>> >certainly not when talking about function code.  The overhead of a
unit occurred only once per
>> >translation unit, so that expense was reasonably amortized.
>> >
>> >Splitting functions into their own object-file sections and making
them excludable is an evolution of
>> >compiler/linker technology that DWARF has not kept up with.  The
linker-friendly solutions (COMDAT
>> >DWARF) would put function-related .debug_* contributions into a
section-group along with the function
>> >.text itself; this multiplies the total number of sections to deal
with, regardless of the tactics used for the
>> > content of each per-function DWARF section.  The fully
DWARF-conformant solution would create one
>> > partial_unit per function, with the corresponding overhead of unit
headers (especially painful in the
>>
> > .debug_line section).  Alternatively we fragment DWARF into sections
without headers and rely on the
>> > linker to make everything look right in the linked executable;
this produces .o files that are not DWARF
>> >conformant (unless we can standardize this in DWARF v6) and would
be a big hassle for consumers
>> >other than the linker.
>> >Or we pay the cost of parsing, trimming, and rewriting all the
DWARF in the linker.
>>
>Alexey> Probably we could try to make DWARF easy to parsing, trimming,
rewriting so that full DWARF
>Alexey> parsing solution would not take too much time?
>Alexey>
>Alexey> f.e. -debug-types-section solution uses COMDAT sections to split
and deduplicate types.
>Alexey> That solution works quite fast. It has already mentioned drawback
with a big size
>Alexey> overhead(because of section headers/type unit headers sizes).
But, the fact that type units
>Alexey> could be identified just by hash-id(without parsing type names
and types hierarchies)
>Alexey> allows the linker to reject duplications quickly. Another thing
is that the linker drops
>Alexey> duplicated COMDAT sections without any additional check. After
duplications are deleted,
>Alexey> the debug info is still consistent.
>Alexey> There could be done DWARF aware solution working using the same
two principles:
>Alexey> 1. compare types by hash-id.
>Alexey> 2. drop duplications without analyzing contents.
>Alexey>
>Alexey> If all types are put into a separate type table and have hash-id,
then it would be much easier to
>Alexey> deduplicate them. The idea demonstrated here -
https://reviews.llvm.org/P8164. (It still has a
>Alexey> questions: whether base types should be put into type table,
whether references into type table
>Alexey> should be done by DW_AT_signature or just by offset, etc.. )
While handling that separate type table
>Alexey> the DWARF aware linker would check the only hash_id and put only
one type description
>Alexey> with the same id in the final type table. It also would allow us
to solve that -flto=thin problem -
>Alexey>  http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html
(there is dsymutil example there).
>Alexey> i.e., the case when type definition would be removed will not
occur.
David>I think there is scope for lower-overhead type deduplication,
David>especially now with type units being merged into the debug_info
David>section. Perhaps we could drop dwo_ids and use section references to
David>refer to types & rely on the linker to keep those referenced
sections
David>alive - though section references are longer than CU-relative
David>references. (but we need the extra length - because if the linker
David>deduplicates a type definition - one CU may be referencing a type very
David>far away, so the shorter reference might be inadequate) I don't
think
David>the indirection through the type hash is /super/ significant to the
David>cost - I think it's more in the duplication of many DIEs especially
David>for function definitions (since the type unit sig8 system only
David>provides a way to reference the type - not its member functions, their
David>parameters, etc - so all those DIEs get duplicated in any CU that
David>needs to provide a definition of a member function). We could
David>prototype cross-unit DIE references to lower the cost of that
David>duplication, though rumor has it that constructor based type homing
David>might provide enough value to obviate the need for type units (or at
David>least make the overhead not worthwhile - so revisiting the overhead to
David>reduce it might make it worthwhile again... ).

David>Probably wouldn't be super hard to use LLVM's existing
cross-unit DIE
David>Referencing machinery (implemented for LTO) to refer directly to DIEs
David>in a type unit without using the signature system... - hmm, that'd
David>only work if your type unit DIEs were identical? /maybe/ ? Not sure
David>how that'd work if you wanted to refer into a type unit, but the
type
David>unit got deduplicated. Might be able to rely on the linker to preserve
David>every unique copy of the type unit that's referenced if we phrase
David>things carefully - so if your compiler does produce exactly identical
David>type units they get deduplicated and sec_refs refer to the uniquely
David>preserved copy - but otherwise it preserves as many distinct copies as
David>needed. (I don't know enough about how that works to be sure - but
I
David>know that these linkonce/inline function deduplication does seem to
David>cause the DWARF to refer to the singular function if that function is
David>identical, and if it isn't, then you get 0 - so there's
/something/ in
David>the linker that can adjust for deduplicating identical duplicates... )

Probably I was a bit unclear: the above idea is not for types 
(placed in COMDAT sections) deduplicated by the linker. 
This idea goes in another direction than fragmenting dwarf 
using elf sections&tricks. It seems to me that the cost of fragmenting is
too high.
It is not only the sizes of structures describing fragments but also the
complexity
of tools that should be taught to work with fragmented DWARF.
(f.e. llvm-dwarfdump applied to object file should be able to read fragmented
DWARF,
but applied to linked executable it should work with non-fragmented DWARF).
That idea is for the tool which works the same way as dsymutil ODR.

I will shortly describe the idea of making DWARF be easier processed by
dsymutil/DWARFLinker:

The idea is to have only one "type table" per object file(special
section .debug_types_table).
This "type table" would contain all types.
There could be a special type of reference - type_offset - that offset points
into the type table.
Basic types could always be placed into the start of "type table"
thus, offsets to basic types
most often would be 1 byte. There also would be a special kind of reference -
reference inside the type.
Type units sig8 system - would not be used to reference types.

Types deduplication is assumed to be done, not by linker mechanism for COMDAT, 
but by a tool like dsymutil. This tool would create resulting .debug_types_table
by putting there
types from source .debug_types_table-s. Only one copy of the type would be
placed into the
resulting table. All references pointing to the deleted copy would be corrected
to point
to the single copy inside "type table". (that is how dsymutil works
currently)

sig8 hash-id would be used to compare types and to deduplicate them. 
It would speed up the current dsymutil context analysis. 
Types having the same hash-id could be deduplicated. 
This would allow deduplicating a more number of types than current dsymutil. 
Incomplete type definitions having a similar set of members are not deduplicated
by dsymutil currently.
In this case they would have the same hash-id.  

This "type table" would take less space than current "type
units" and current ODR solution.

Above is just an idea on how to help DWARF-aware linker(based on idea removing
obsolete debug info)
to work faster(if that is interesting).

Alexey.
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
James Henderson via llvm-dev
> Sent: Wednesday, June 3, 2020 3:48 AM
> To: David Blaikie <dblaikie at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.
>
>
>
> It makes me sad that the linker (via a library or otherwise) has to be
"DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
--icf etc for debug info, without leaving large blocks of data kicking around.
>
>
>
> The patching to -1 (or equivalent) is probably a good lightweight solution
(though I'd love it if it could be done based on section type in the future
rather than section name, but that's probably outside the realm of DWARF),
as it requires only minimal understanding in the linker, but anything beyond
that seems to be complicated logic that is mostly due to the structure of DWARF.
Patching to -1 does feel a bit like a sticking plaster/band aid to patch over
the issue rather than properly solving it too - there will still be debug data
(potentially significant amounts in COMDAT-heavy objects) that the linker has to
write and the debugger has to somehow know how to skip (even if it knows that -1
is special-case due to the standard being updated, it needs to get as far as the
-1), which is all wasted effort.
>
>
>
> We've already seen from Alexey's prototyping, and from our own
experiences with the Sony proprietary linker (which tried to rewrite .debug_line
only) that deconstructing the DWARF so that it can be more optimally reassembled
at link time is slow going, and will probably inevitably be however much effort
is put into optimising it. For a start, given the current standards, it's
impossible to know how to deconstruct it without having to parse vast amounts of
DWARF, which is typically going to mean a lot more parsing work than the linker
would normally have to deal with. Additionally, much of this parsing work is
wasted effort, since it seems unlikely in many links that large amounts of the
DWARF will be redundant. Having an option to opt-in doesn't help much there,
since it just means the logic exists without most people using it, due to it not
being good enough, or potentially they don't even know it exists.
>
>
>
> I don't have particularly concrete suggestions as to how to solve the
structural problems with DWARF at this point. The only thing that seems obvious
to me is a more "blessed" approach to fragmentation of sections,
similar to what I tried with my prototype mentioned earlier in the thread,
although we'd need to figure out the previously stated performance issues.
Other ideas might tie into this, like somehow sharing the various table headers
a bit like CIEs in .eh_frame that could be merged by the linker - each object
could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
>
>
>
> Just some thoughts.
>
>
>
> James
>
>
>
> On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
> >
> > Hi David, please find my comments inside:
> >
> >
> > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
> >
> > >>> - it might help motivate the work, understand what
tradeoffs might be suitable for you/your users, etc.
> >
> > >>There are two general requirements:
> > >> 1) Remove (or clean) invalid debug info.
> >
> > >
> > >Perhaps a simpler direct solution for your immediate needs might
be a much narrower,
> > >and more efficient linker-DWARF-awareness feature:
> > >
> > > With DWARFv5, rnglists present an opportunity for a DWARF linker
to rewrite the ranges
> > > without parsing the rest of the DWARF. /technically/ this
isn't guaranteed - rnglist entries
> > > can be referenced either directly, or by index. If all rnglists
are referenced by index, then
> > > a linker could parse only the debug_rnglists section and rewrite
ranges to remove any
> > > address ranges that refer to optimized-out code.
> > >
> > > This would only be correct for rnglists that had no direct
references to them (that only were
> > > referenced via the indexes) - but we could either implement it
with that assumption, or could
> > > add an LLVM extension attribute on the CU that would say "I
promise I only referenced rnglists
> > > via rnglistx forms/indexes). If this DWARF-aware linking would
have to read the CU DIE (not
> > > all the other DIEs) it /could/ also then rewrite high/low_pc if
the CU wasn't using ranges...
> > > but that wouldn't come up in the function-removal case,
because then you'd have ranges anyway,
> > > so no need for that.
> > >
> > > Such a DWARF-aware rnglist linking could also simplify rnglists,
in cases where functions
> > > ended up being laid out next to each other, the linker could
coalesce their ranges together.
> > >
> > > I imagine this could be implemented with very little overhead to
linking, especially compared
> > > to the overhead of full DWARF-aware linking.
> > >
> > >Though none of this fixes Split DWARF, where the linker
doesn't get a chance to see the
> > > addresses being used - but if you only want/need the CU-level
ranges to be correct, this
> > > might be a viable fix, and quite efficient.
> >
> > Yes, we think about that alternative. This would resolve our problem
of invalid debug info
> > and would work much faster. Thus, if we would not have good results
for D74169 then we
> > will implement it. Do you think it could be useful to have this
solution in upstream?
>
> A pure rnglist rewriting - I think it'd be OK to have in upstream -
> again, cost/benefit/etc would have to be weighed. I'm not sure it
> would save enough space to be particularly valuable beyond the
> correctness issue - and it doesn't completely solve the correctness
> issue for zero-address usage or low-address usage (because you could
> still have overlapping subprograms inside a CU - so if you were
> symbolizing you could use the correct rnglist to filter, but then go
> look inside the CU only to find two subprograms that had that address
> & not know which one was the correct one an which one was the
> discarded one).
>
> rnglist rewriting might be easy enough to prototype - but depends what
> you want to spend your time on, I know this whole issue has been a
> huge investment of your time already - but maybe this recent
> revitalization of the conversation around having an explicit value in
> the linker might be sufficient to address everyone's needs... *fingers
> crossed*)
>
>
> > >> 2) Optimize the DWARF size.
> >
> >
> > > Do your users care much about this? I imagine if they had
significant DWARF size issues,
> > > they'd have significant link time issues and the kind of cost
to link time this feature has would
> > > be prohibitive - but perhaps they're sharing linked binaries
much more often than they're
> > > actually performing linking.
> >
> > Yes, they do. They also have significant link-time issues.
> > So current performance results of D74169 are not very acceptable.
> > We hope to improve it.
> >
> >
> >
> > >>The specifics which our users have:
> > >>  - embedded platform which uses 0 as start of .text section.
> > >>  - custom toolset which does not support all features
yet(f.e. split dwarf).
> > >>  - tolerant of the link-time increase.
> > >>  - need a useful way to share debug builds.
> >
> >
> > > Sharing two files (executable and dwp) is significantly less
useful than sharing one file?
> >
> > Probably not significantly, but yes, it looks less useful comparing to
D74169.
> > Having only two files (executable and .dwp) looks significantly better
than having executable and multiple .dwo files.
> > Having only one file(executable) with minimal size looks better than
the two files with a bigger size.
> >
> > clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G
for .dwp.
> > clang compiled with -gc-debuginfo takes only 0.76G for single
executable.
> >
> >
> >
> > >>For the first point: we have a problem "Overlapping
address ranges starting from 0"(D59553).
> >
> > >>We use custom solution, but the general solution like D74169
would be better here.
> >
> >
> > > If CU ranges are the only ones that need fixing, then I think the
above solution might be as
> > > good/better - if more than CU ranges need fixing, then I think we
might want to start talking about
> > > how to fix DWARF itself (split and non-split) to signal certain
addresses point to dead code with a
> > > specific blessed value that linkers would need to implement -
because with Split DWARF there's
> > > no way to solve the non-CU addresses at the linker.
> >
> > I think the worthful solution for that signal value would be LowPC
> HighPC.
> > That does not require additional bits in DWARF.
> > It would be natural to skip such address ranges since they explicitly
marked as invalid.
> > It could be implemented in a linker very easily. Probably, it would
make sense to describe that
> > usage in DWARF standard.
> >
> > As to the addresses which are not seen by the linker(since they are in
.dwo files) - yes,
> > they need to have another solution. Could you show an example of such
a case, please?
> >
> >
> >
> > >>>2. Support of type units.
> >
> > >>>
> >
> > >>>>  That could be implemented further.
> >
> > >>>Enabling type units increases object size to make it
easier to deduplicate at link time by a DWARF-unaware
> >
> > >>>linker. With a DWARF aware linker it'd be generally
desirable not to have to add that object size overhead to
> >
> > >>>get the linking improvements.
> >
> > >>
> >
> > >>But, DWARFLinker should adequately work with type units since
they are already implemented.
> >
> >
> > > Maybe - it'd be nice & all, but I don't think
it's an outright necessity - if someone knows they're using
> > > a DWARF-aware linker, they'd probably not use type units in
their object files. It's possible someone
> > > doesn't know for sure & maybe they have pre-canned debug
object files from someone else, etc.
> >
> > I see.
> >
> > >>Another thing is that the idea behind type units has the
potential to help Dwarf-aware linker to work faster.
> >
> > >>Currently, DWARFLinker analyzes context to understand whether
types are the same or not.
> >
> >
> > >When you say "analyzes context" what do you mean?
Usually I'd take that to mean
> > > "looks at things outside the type itself - like what
namespace it's in, etc" - which, yes,
> > > it should do that, but it doesn't seem very expensive to do.
But I guess you actually
> > > mean something about doing structural equivalence in some way,
looking at things inside the type?
> >
> > I think it could be useful for both cases. Currently, dsymutil does
only first thing
> > (look at type name, namespace name, etc..) and does not do the second
thing
> > (doing structural equivalence). Analyzing type names is currently
quite expensive
> > (the only search in string pool takes ~10 sec from 70 sec of overall
time).
> > That is expensive because of many things should be done to work with
strings:
> > parse DWARF, search and resolve relocations, compute a hash for
strings,
> > put data into a string pool, create a fully qualified name(like
namespace::function::name).
> > It looks like it could be optimized and finally require less time, but
it still would be a noticeable
> > part of the overall time.
> >
> > If dsymutil starts to check for the structural equivalence, then the
process would be even more slowly.
> > So, If instead of comparing types structure, there would be checked
single hash-id - then this process
> > would also be faster.
> >
> > Thus I think using hash-id to compare types would allow to make
current implementation faster and would
> > allow handling incomplete types by DWARFLinker without massive
performance degradation also.
> >
> > >> But the context is known when types are generated. So, no
need to spent the time analyzing it.
> >
> > >> If types could be compared without analyzing context, then
Dwarf-aware linker would work faster.
> >
> > >> That is just an idea(not for immediate implementation): If
types would be stored in some "type table"
> >
> > >> (instead of COMDAT section group) and could be accessed
through hash-id(like type units
> >
> > >> - then it would be the solution requiring fewer bits to store
but allowing to compare types
> >
> > >> by hash-id(not analysing context).
> > >> In this case, size increasing would be small. And processing
time could be done faster.
> > >>
> > >> this is just an idea and could be discussed separately from
the problem of integrating of D74169.
> >
> > >> >> 6. -flto=thin
> >
> > >> >>    That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in
> >
> > >> >> current DWARFLinker/dsymutil implementation. I think
that problem should be discussed more: it could
> >
> > >> >> probably be fixed by avoiding generation of such
incomplete declaration during thinlto,
> >
> > >> >> That would be costly to produce extra/redundant
debug info in ThinLTO - actually ThinLTO could be doing
> >
> > >> >> more to reduce that redundancy early on (actually
removing definitions from some llvm Modules if the type
> >
> > >> >> definition is known to exist in another Module, etc)
> > >> >I don't know if it's a problem since that patch
was reverted.
> >
> > >>
> >
> > >> Yes. That patch was reverted, but this patch(D74169) has the
same problem.
> >
> > >> if D74169 would be applied and --gc-debuginfo used then
structure type
> > >> definition would be removed.
> >
> > >> DWARFLinker could handle that case - "removing
definitions from some llvm Modules if the type
> > >> definition is known to exist in another Module".
> > >> i.e. DWARFLinker could replace the declaration with the
definition.
> >
> > >> But that problem could be more easily resolved when debug
info is generated(probably without
> > >> significant increase of debug info size):
> >
> > >> Here we have:
> >
> > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing
concrete instance for function "f".
> > >> DW_TAG_compile_unit(0x00000073) - compile unit containing
abstract instance root for function "f".
> > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing
function "f" definition.
> >
> > >> Code for function "f" was deleted. gc-debuginfo
deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > >> containing "f" definition (since there is no
corresponding code). But it has structure "Foo" definition
> > >> DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
> > >> by declaration DW_TAG_structure_type(0x000000ae). That
declaration is exactly the case when definition
> > >> was removed by thinlto and replaced with declaration.
> >
> > >> Would it cost too much if type definition would not be
replaced with declaration for "abstract instance root"?
> > >> The number of concrete instances is bigger than number of
abstract instance roots.
> > >> Probably, it would not be too costly to leave definition in
abstract instance root?
> >
> >
> >
> > >> Alternatively, Would it cost too much if type definition
would not be replaced with declaration when
> > >> declaration references type from not used function? (lto
could understand that concrete function is not used).
> >
> >
> > >I don't follow this example - could you provide a small
concrete test case I could reproduce?
> >
> > I would provide a test case if necessary. But it looks like this issue
is finally clear, and you already commented on that.
> >
> > > Oh, I guess this is happening perhaps because ThinLTO can't
know for sure that a standalone
> > > definition of 'f' won't be needed - so it produces
one in case one of the inlining opportunities
> > > doesn't end up inlining. Then it turns out all calls got
inlined, so the external definition wasn't needed.
> >
> > > Oh, you're suggesting that these 3 CUs got emitted into one
object file during LTO, but that DWARFLinker
> > > drops a CU without any code in it - even though... So far as I
know, in LTO, LLVM directly references
> > > types across units if the CUs are all emitted in the same object
file. (and if they weren't in the same
> > > object file - then the abstract_origin couldn't be pointing
cross-CU).
> >
> > > I guess some basic things to say:
> >
> > > With ThinLTO, the concrete/standalone function definition is
emitted in case some call sites don't end up
> > > being inlined. So we know it'll be emitted (but might not be
needed by the actual linker)
> > > ANy number of inline calls might exist - but we shouldn't put
the type information into those, because
> > > they aren't guaranteed to emit it (if the inline function
gets optimized away, there would be nothing to
> > > enforce the type being emitted) - and even if we forced the type
information to be emitted into one
> > > object file that has an inline copy of the function - there's
no guarantee that object file will get linked in either.
> >
> > > So, no, I don't think there's much we can do to keep the
size of object files down, while guaranteeing
> > > the type information will be emitted with the usual linker
semantics.
> >
> > Then dsymutil/DWARFLinker could be changed to handle that(though it
would probably be not very efficient).
> > If thinlto would understand that function is not used finally(and then
must not contain referenced type definition),
> > then this situation could be handled more effectively.
> >
> > Thank you, Alexey.
> >
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Jun 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Seemingly Similar Threads