thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2020-Jun-26 16:28 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>> >> >> >> This idea goes in another direction than
fragmenting dwarf
>> >> >> >> using elf sections&tricks. It seems to
me that the cost of fragmenting is too high.
>> >> >>
>> >> >> >I tend to agree - but I'm sort of leaning
towards trying to use object
>> >> >> >features as much as possible, then implementing
just enough custom
>> >> >> >handling in the linker to recoup overhead, etc.
(eg: add some kind of
>> >> >> >small header/brief description that makes it easy
for the linker to
>> >> >> >slice-and-dice - but hopefully a domain-specific
such header can be a
>> >> >> >bit more compact than the fully general ELF form)
>> >> >>
>> >> >> I think this indeed should be implemented and
evaluated.
>> >> >> So that various approaches could be compared.
>> >> >>
>> >> >> >> It is not only the sizes of structures
describing fragments but also the complexity
>> >> >> >> of tools that should be taught to work with
fragmented DWARF.
>> >> >> >> (f.e. llvm-dwarfdump applied to object file
should be able to read fragmented DWARF,
>> >> >> >> but applied to linked executable it should
work with non-fragmented DWARF).
>> >> >> >> That idea is for the tool which works the
same way as dsymutil ODR.
>> >> >> >>
>> >> >> >> I will shortly describe the idea of making
DWARF be easier processed by dsymutil/DWARFLinker:
>> >> >> >>
>> >> >> >> The idea is to have only one "type
table" per object file(special section .debug_types_table).
>> >> >> >> This "type table" would contain
all types.
>> >> >> >> There could be a special type of reference -
type_offset - that offset points into the type table.
>> >> >> >> Basic types could always be placed into the
start of "type table" thus, offsets to basic types
>> >> >> >> most often would be 1 byte. There also would
be a special kind of reference - reference inside the type.
>> >> >> >> Type units sig8 system - would not be used
to reference types.
>> >> >> >>
>> >> >> >> Types deduplication is assumed to be done,
not by linker mechanism for COMDAT,
>> >> >> >> but by a tool like dsymutil. This tool would
create resulting .debug_types_table by putting there
>> >> >> >> types from source .debug_types_table-s. Only
one copy of the type would be placed into the
>> >> >> >> resulting table. All references pointing to
the deleted copy would be corrected to point
>> >> >> >> to the single copy inside "type
table". (that is how dsymutil works currently)
>> >> >>
>> >> >> >^ that's the step that's probably a bit
expensive for a general-use
>> >> >> >tool - it implies parsing all the DWARF to find
those references and
>> >> >> >rewrite them, I think. For a high-performance
solution that could be
>> >> >> >run by the linker I think it'd be necessary
to have a solution that
>> >> >> >doesn't involve parsing all the DIEs.
>> >> >>
>> >> >> According to the current dsymutil processing,
>> >> >> exactly this process is not the most time-consuming.
>> >> >> That could be done relatively fast.
>> >>
>> >> >Fair enough - though I'd still imagine any solution
that involves
>> >> >parsing all the DIEs still wouldn't be fast enough
(maybe an order of
>> >> >magnitude faster than the current solution even - but
that's stuill,
>> >> >what, 6 or 7x slower than linking without the feature?)
for most users
>> >> >to consider it a good trade-off.
>> >>
>> >> It seems to me that even the current 6x-7x slowdown could be
useful.
>> >> Users who already use dsymutil or llvm-dwp(assuming
DWARFLinker
>> >> would be taught to work with a split dwarf) tools spend this
time and,
>> >> in some scenarios, waste disk space by inter-mediate files.
>> >
>> >FWIW, dwp (llvm-dwp hasn't really been optimized compared to
binutils
>> >dwp) is designed to be very quick - by not needing to do a lot of
>> >parsing/fixups. Which, yes, means larger output files than would be
>> >possible with more parsing/etc. It also doesn't take any input
from
>> >the linker (so it can run in parallel with the linker) - so it
can't
>> >remove dead subprograms. Given Google's the major (perhaps only
>> >significant?) user of Split DWARF - I can say that the needs
don't
>> >necessarily overlap well with something that would take
significantly
>> >longer to run or use significantly more memory. Faster/cheaper/with
>> >somewhat bigger output files is probably the right tradeoff for
>> >Google's use case, at least.
>> >
>> >I imagine Apple's use for dsymutil is somewhat similar -
it's not used
>> >in the iterative development cycle, only in final releases - well,
>> >maybe their situation is more "neutral" - not a major
pain point in
>> >any case I'd guess.
>> >
>>>
>> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from lld:
>>
>> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
>>     generating time for .dwp 53 sec, clang=997M clang.dwp=1.1G.
>FWIW, llvm-dwp is not very well optimized (which is to say: it is not
>optimized), binutils dwp might be a better comparison (& even that
>doesn't have the parallelism & some potential further memory savings
>that lld has that we could take advantage of in a dwp-like tool)
>
>What build mode was the clang binary built in? Optimized or unoptimized?
right, that is unoptimized build with -ffunction-sections.
>> 2. DWARFLinker from lld = linking time for clang 72 sec, clang=760M.
>It does seem a tad strange that the clang binary would be smaller
>non-split with DWARF linking than it was split. Though I could imagine
>this might be possible in an optimized build (wehre debug_ranges
>become quite relatively expensive in the .o file contribution with
>Split DWARF)
>Could you compare the section sizes between these two clang binaries,
perhaps?
.debug_ranges is three times bigger and .debug_line is twice bigger.
>> >> Thus if they would use this LLD feature in its current state
>> >> - they would still receive benefits.
>> >>
>> >> Speaking of performance results - LLD is a multi-thread
linker;
>> >> it handles sections in parallel. DWARFLinker generates DWARF
using
>> >> AsmPrinter which is a stream - so it could make resulting
DWARF only
>> >> continuously. It is not surprising that the parallel solution
works faster.
>> >> Making DWARFLinker truly multi-threaded would probably allow
us
>> >> to make slowdown to be at 2x-4x range.
>> >
>> >*nod* that's still a really expensive link - but I understand
that's a
>> >suitable tradeoff for your users
>> >
>>
>> Btw, 2x or 7x is for pure linking time. Overall compilation slowdown
>> is not so significant. Building LLVM codebase has only 20% slowdown.
>
>Understood - that's still quite significant to most users, I'd
imagine.
I see.
>> >> >> Anyway, I think the dsymutil approach is still
valuable, and it
>> >> >> would be useful to optimize it.
>> >> >> Do you think it would be useful to make
dsymutil/DWARFLinker truly multi-thread?
>> >> >> (To make dsymutil/DWARFLinker able to process each
object file in a separate thread)
>> >>
>> >> >Perhaps - that I'd probably leave up to the folks who
are more
>> >> >invested in dsymutil (Adrian Prantl et al). Maybe one day
we'll get it
>> >> >integrated into llvm-dwp and then I'll be interested
in getting as
>> >> >much performance out of it as lld - so multithreading and
things would
>> >> >be on the books.
>> >>
>> >> I think improving dsymutil is a valuable thing.
>> >> Though there are several directions which might be considered
>> >> to make it more robust:
>> >>
>> >> 1. support of latest DWARF - DWARF5/DWARF64...
>> >
>> >I expect/though some of the Apple folks had already worked on
DWARF5 support?
>> >DWARF64 - that's been around for a while, and just hasn't
been needed
>> >by LLVM users thus far, it seems (until recently - where some
>> >developers have started working on that)
>>
>> There already implemented debug_names table, but debug_rnglists,
>> debug_loclists, type units - are not implemented yet.
>
>Superficially, type units wouldn't be on the list of features (like
>DWARF64 - it's optional) I'd try to support in dsymutil - since
their
>size overhead is more justified for a DWARF-agnostic linker that's
>using comdat groups. With a DWARF-aware linker I'd be specifically
>hoping to avoid using type units to help
>> The thing which
>> should probably be changed is that dsymutil should not have its version
>> of code generating DWARF tables. It should call already existed
>> DWARF5/DWARF64 implementations. Then dsymutil would always
>> use last DWARF generators.
>
>Possibly - I don't know what the architectural tradeoffs for that look
>like - I'd imagine DWARFLinker has sufficiently different
>needs/tradeoffs than LLVM's DWARF generation code (rewriting existing
>DIEs compared to building new ones from scratch, etc) that it might be
>hard for them to share a lot of their implementation.
It is not easy, and would require some additions, but it would benefit 
in that all format implementation is in one place. Thus changing that place 
would reflect in other places. There are at least three implementations for 
.debug_ranges, .debug_aranges currently...

>> >> 2. implement multi-threaded execution.
>> >> 3. support of split DWARF.
>> >
>> >Maybe, though I'm still not sure it'd be the right tradeoff
-
>> >especially if it involved having to wait to run the .dwo merger
(call
>> >it DWARF-aware dwp, or dsymutil with dwp support) until after the
>> >linker ran.
>> >
>> >> 4. implement dsymutil for non-darwin platform.
>> >
>> >That's probably, essentially (3), more-or-less. Split DWARF is
>> >somewhat of a formalization of Apple's/MachO DWARF distribution
model
>> >(leave DWARF it in files that aren't linked/use them from a
debugger,
>> >but also be able to merge them into some final file (dsym or dwp)
for
>> >archival purposes)
>> >
>> >> All of this is a massive piece of work.
>> >> Our original investment was to solve two problems:
>> >>
>> >> 1. Overlapped address ranges, which is currently close to
being solved. Thank you for helping with that!
>> >
>> >Yeah, again, sorry that's taken quite so long/somewhat
circuitous route.
>> >
>> >> 2. Size of debug info. That still becomes an issue, but we are
unsure whether we are ready to
>> >>    invest in solving all the above 1-4 problems and how much
community interested in it.
>> >
>> >Fair, for sure - I don't think you'd need to sign up to
solve all of
>> >them (don't think they necessarily need solving). Potentially
moving
>> >the logic out into a separate tool as Fangrui's considering - a
>> >post-link DWARF optimizer, rather than in-linker DWARF
optimization.
>> >
>> >I really don't want to give you the runaround like this - but
multiple
>> >times slower links is something that seems pretty problematic for
most
>> >users, to the point of weighing the maintainability of lld against
the
>> >convenience of having this functionality in-linker rather than in a
>> >post-link optimizer.
>> >
>> >(I know you've spoken a bit before about your users needs - but
if
>> >it's possible, could you explain (again :/) why they have such
a
>> >strong need for smaller DWARF? While DWARF size is an ongoing
concern
>> >for many users (Google certainly - hence the invention of Split
DWARF,
>> >use of type units and compressed DWARF, etc) - usually it's in
rather
>> >large programs, but it sounds like you're dealing with
relatively
>> >small ones (otherwise the increase in link time, I'd imagine,
would be
>> >prohibitive for your users?)?
>>
>> We have many large programs and keep Dayly/Nightly debug builds,
>> which takes a lot of disk space. Compilation time for these programs is
big.
>> The scenario is "compile once".(not
compile-debug-compile-debug).
>> So we think that solution(like dsymutil/DWARFLinker) would not slowdown
>> the compilation time of overall build significantly(see above numbers
for
>> llvm codebase) and would allow us to reduce disk space required to keep
>> all of these builds.
>Ah, OK - for archival purposes. So the interactive developers wouldn't
>necessarily be using this feature. Makes sense - similar to dsymutil
>and dwp, mostly used for archival purposes & you can debug straight
>from .o/.dwos for interactive/iterative development.
>In that case, it seems more likely that a separate tool might suffice.
agreed: if to continue the work on this then it makes sense to 
do it as separate tool. Make it fast enough. And if there would be interest 
in it - then it would probably be possible to return to idea calling it from
linker.
>Also, out of curiosity - have you tried just compressing the output
>(-gz (I think that does the right thing for the linker level
>compression too, otherwise -Wl,-compress-debug-sections might do it))
>or are you already doing that in addition?
sure. we use  -Wl,-compress-debug-sections. 

Thank you, Alexey.
>> >You mentioned that the usability cost of
>> >Split DWARF for your users was too high (or high enough to justify
>> >this alternative work of DWARF-aware linking)? That all seems a bit
>> >surprising to me - though I understand the deployment issues of
Split
>> >DWARF do present some challenges to users in more heterogenous
>> >environments than Google's... still, I'd have thought there
was some
>> >hope there)
>>
>> Our tools does not support split dwarf yet. Though we plan to implement
it.
>> When we would have support of split dwarf then it would be
>> convenient to have easy way to share built debug binaries. llvm-dwp is
the
>> answer to this. DWARFLinker could probably be another answer.
>Ah, fair enough - thanks for the context!
> > >> >One way to do that would be to have a CU-local type
indirection table.
> > >> >DIEs reference local type numbers (like local
address/string numbers -
> > >> >addrx/strx/rnglistx) and that table contains either sig8
(no linker
> > >> >fixups required) or the local type offsets you describe -
the linker
> > >> >would then only need to read this type number indirection
table and
> > >> >rewrite them to the final type numbers.
> > >>
> > >> Yes, that could be additionally done if this process would be
time-consuming.
> > >>
> > >> David, thank you for all your comments and explanations. They
are extremely helpful.
> >
> > >Sure thing - really appreciate your patience with all this -
it's... a
> > >lot of moving parts.
> >
> > >- Dave
> >
> > >
> > > Thank you, Alexey.
> > >
> > > > sig8 hash-id would be used to compare types and to
deduplicate them.
> > > > It would speed up the current dsymutil context analysis.
> > > > Types having the same hash-id could be deduplicated.
> > > > This would allow deduplicating a more number of types than
current dsymutil.
> > > > Incomplete type definitions having a similar set of members
are not deduplicated by dsymutil currently.
> > > > In this case they would have the same hash-id.
> > > >
> > > > This "type table" would take less space than
current "type units" and current ODR solution.
> > > >
> > > > Above is just an idea on how to help DWARF-aware
linker(based on idea removing obsolete debug info)
> > > > to work faster(if that is interesting).
> > > >
> > > > Alexey.
> > > >
> > > > > From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of James Henderson via llvm-dev
> > > > > Sent: Wednesday, June 3, 2020 3:48 AM
> > > > > To: David Blaikie <dblaikie at gmail.com>
> > > > > Cc: llvm-dev at lists.llvm.org
> > > > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.
> > > > >
> > > > >
> > > > >
> > > > > It makes me sad that the linker (via a library or
otherwise) has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
> > > > >
> > > > >
> > > > >
> > > > > The patching to -1 (or equivalent) is probably a good
lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
> > > > >
> > > > >
> > > > >
> > > > > We've already seen from Alexey's prototyping,
and from our own experiences with the Sony proprietary linker (which tried to
rewrite .debug_line only) that deconstructing the DWARF so that it can be more
optimally reassembled at link time is slow going, and will probably inevitably
be however much effort is put into optimising it. For a start, given the current
standards, it's impossible to know how to deconstruct it without having to
parse vast amounts of DWARF, which is typically going to mean a lot more parsing
work than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links that
large amounts of the DWARF will be redundant. Having an option to opt-in
doesn't help much there, since it just means the logic exists without most
people using it, due to it not being good enough, or potentially they don't
even know it exists.
> > > > >
> > > > >
> > > > >
> > > > > I don't have particularly concrete suggestions as
to how to solve the structural problems with DWARF at this point. The only thing
that seems obvious to me is a more "blessed" approach to fragmentation
of sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
> > > > >
> > > > >
> > > > >
> > > > > Just some thoughts.
> > > > >
> > > > >
> > > > >
> > > > > James
> > > > >
> > > > >
> > > > >
> > > > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > > > > <alapshin at accesssoftek.com> wrote:
> > > > > >
> > > > > > Hi David, please find my comments inside:
> > > > > >
> > > > > >
> > > > > > >>>Broad question: Do you have any
specific motivation/users/etc in implementing this (if you can speak about it)?
> > > > > >
> > > > > > >>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
> > > > > >
> > > > > > >>There are two general requirements:
> > > > > > >> 1) Remove (or clean) invalid debug info.
> > > > > >
> > > > > > >
> > > > > > >Perhaps a simpler direct solution for your
immediate needs might be a much narrower,
> > > > > > >and more efficient linker-DWARF-awareness
feature:
> > > > > > >
> > > > > > > With DWARFv5, rnglists present an opportunity
for a DWARF linker to rewrite the ranges
> > > > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
> > > > > > > can be referenced either directly, or by
index. If all rnglists are referenced by index, then
> > > > > > > a linker could parse only the debug_rnglists
section and rewrite ranges to remove any
> > > > > > > address ranges that refer to optimized-out
code.
> > > > > > >
> > > > > > > This would only be correct for rnglists that
had no direct references to them (that only were
> > > > > > > referenced via the indexes) - but we could
either implement it with that assumption, or could
> > > > > > > add an LLVM extension attribute on the CU
that would say "I promise I only referenced rnglists
> > > > > > > via rnglistx forms/indexes). If this
DWARF-aware linking would have to read the CU DIE (not
> > > > > > > all the other DIEs) it /could/ also then
rewrite high/low_pc if the CU wasn't using ranges...
> > > > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
> > > > > > > so no need for that.
> > > > > > >
> > > > > > > Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions
> > > > > > > ended up being laid out next to each other,
the linker could coalesce their ranges together.
> > > > > > >
> > > > > > > I imagine this could be implemented with very
little overhead to linking, especially compared
> > > > > > > to the overhead of full DWARF-aware linking.
> > > > > > >
> > > > > > >Though none of this fixes Split DWARF, where
the linker doesn't get a chance to see the
> > > > > > > addresses being used - but if you only
want/need the CU-level ranges to be correct, this
> > > > > > > might be a viable fix, and quite efficient.
> > > > > >
> > > > > > Yes, we think about that alternative. This would
resolve our problem of invalid debug info
> > > > > > and would work much faster. Thus, if we would not
have good results for D74169 then we
> > > > > > will implement it. Do you think it could be useful
to have this solution in upstream?
> > > > >
> > > > > A pure rnglist rewriting - I think it'd be OK to
have in upstream -
> > > > > again, cost/benefit/etc would have to be weighed.
I'm not sure it
> > > > > would save enough space to be particularly valuable
beyond the
> > > > > correctness issue - and it doesn't completely solve
the correctness
> > > > > issue for zero-address usage or low-address usage
(because you could
> > > > > still have overlapping subprograms inside a CU - so if
you were
> > > > > symbolizing you could use the correct rnglist to
filter, but then go
> > > > > look inside the CU only to find two subprograms that
had that address
> > > > > & not know which one was the correct one an which
one was the
> > > > > discarded one).
> > > > >
> > > > > rnglist rewriting might be easy enough to prototype -
but depends what
> > > > > you want to spend your time on, I know this whole issue
has been a
> > > > > huge investment of your time already - but maybe this
recent
> > > > > revitalization of the conversation around having an
explicit value in
> > > > > the linker might be sufficient to address
everyone's needs... *fingers
> > > > > crossed*)
> > > > >
> > > > >
> > > > > > >> 2) Optimize the DWARF size.
> > > > > >
> > > > > >
> > > > > > > Do your users care much about this? I imagine
if they had significant DWARF size issues,
> > > > > > > they'd have significant link time issues
and the kind of cost to link time this feature has would
> > > > > > > be prohibitive - but perhaps they're
sharing linked binaries much more often than they're
> > > > > > > actually performing linking.
> > > > > >
> > > > > > Yes, they do. They also have significant link-time
issues.
> > > > > > So current performance results of D74169 are not
very acceptable.
> > > > > > We hope to improve it.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >>The specifics which our users have:
> > > > > > >>  - embedded platform which uses 0 as
start of .text section.
> > > > > > >>  - custom toolset which does not support
all features yet(f.e. split dwarf).
> > > > > > >>  - tolerant of the link-time increase.
> > > > > > >>  - need a useful way to share debug
builds.
> > > > > >
> > > > > >
> > > > > > > Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
> > > > > >
> > > > > > Probably not significantly, but yes, it looks less
useful comparing to D74169.
> > > > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
> > > > > > Having only one file(executable) with minimal size
looks better than the two files with a bigger size.
> > > > > >
> > > > > > clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
> > > > > > clang compiled with -gc-debuginfo takes only 0.76G
for single executable.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
> > > > > >
> > > > > > >>We use custom solution, but the general
solution like D74169 would be better here.
> > > > > >
> > > > > >
> > > > > > > If CU ranges are the only ones that need
fixing, then I think the above solution might be as
> > > > > > > good/better - if more than CU ranges need
fixing, then I think we might want to start talking about
> > > > > > > how to fix DWARF itself (split and non-split)
to signal certain addresses point to dead code with a
> > > > > > > specific blessed value that linkers would
need to implement - because with Split DWARF there's
> > > > > > > no way to solve the non-CU addresses at the
linker.
> > > > > >
> > > > > > I think the worthful solution for that signal
value would be LowPC > HighPC.
> > > > > > That does not require additional bits in DWARF.
> > > > > > It would be natural to skip such address ranges
since they explicitly marked as invalid.
> > > > > > It could be implemented in a linker very easily.
Probably, it would make sense to describe that
> > > > > > usage in DWARF standard.
> > > > > >
> > > > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
> > > > > > they need to have another solution. Could you show
an example of such a case, please?
> > > > > >
> > > > > >
> > > > > >
> > > > > > >>>2. Support of type units.
> > > > > >
> > > > > > >>>
> > > > > >
> > > > > > >>>>  That could be implemented
further.
> > > > > >
> > > > > > >>>Enabling type units increases object
size to make it easier to deduplicate at link time by a DWARF-unaware
> > > > > >
> > > > > > >>>linker. With a DWARF aware linker
it'd be generally desirable not to have to add that object size overhead to
> > > > > >
> > > > > > >>>get the linking improvements.
> > > > > >
> > > > > > >>
> > > > > >
> > > > > > >>But, DWARFLinker should adequately work
with type units since they are already implemented.
> > > > > >
> > > > > >
> > > > > > > Maybe - it'd be nice & all, but I
don't think it's an outright necessity - if someone knows they're
using
> > > > > > > a DWARF-aware linker, they'd probably not
use type units in their object files. It's possible someone
> > > > > > > doesn't know for sure & maybe they
have pre-canned debug object files from someone else, etc.
> > > > > >
> > > > > > I see.
> > > > > >
> > > > > > >>Another thing is that the idea behind type
units has the potential to help Dwarf-aware linker to work faster.
> > > > > >
> > > > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
> > > > > >
> > > > > >
> > > > > > >When you say "analyzes context" what
do you mean? Usually I'd take that to mean
> > > > > > > "looks at things outside the type itself
- like what namespace it's in, etc" - which, yes,
> > > > > > > it should do that, but it doesn't seem
very expensive to do. But I guess you actually
> > > > > > > mean something about doing structural
equivalence in some way, looking at things inside the type?
> > > > > >
> > > > > > I think it could be useful for both cases.
Currently, dsymutil does only first thing
> > > > > > (look at type name, namespace name, etc..) and
does not do the second thing
> > > > > > (doing structural equivalence). Analyzing type
names is currently quite expensive
> > > > > > (the only search in string pool takes ~10 sec from
70 sec of overall time).
> > > > > > That is expensive because of many things should be
done to work with strings:
> > > > > > parse DWARF, search and resolve relocations,
compute a hash for strings,
> > > > > > put data into a string pool, create a fully
qualified name(like namespace::function::name).
> > > > > > It looks like it could be optimized and finally
require less time, but it still would be a noticeable
> > > > > > part of the overall time.
> > > > > >
> > > > > > If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
> > > > > > So, If instead of comparing types structure, there
would be checked single hash-id - then this process
> > > > > > would also be faster.
> > > > > >
> > > > > > Thus I think using hash-id to compare types would
allow to make current implementation faster and would
> > > > > > allow handling incomplete types by DWARFLinker
without massive performance degradation also.
> > > > > >
> > > > > > >> But the context is known when types are
generated. So, no need to spent the time analyzing it.
> > > > > >
> > > > > > >> If types could be compared without
analyzing context, then Dwarf-aware linker would work faster.
> > > > > >
> > > > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
> > > > > >
> > > > > > >> (instead of COMDAT section group) and
could be accessed through hash-id(like type units
> > > > > >
> > > > > > >> - then it would be the solution requiring
fewer bits to store but allowing to compare types
> > > > > >
> > > > > > >> by hash-id(not analysing context).
> > > > > > >> In this case, size increasing would be
small. And processing time could be done faster.
> > > > > > >>
> > > > > > >> this is just an idea and could be
discussed separately from the problem of integrating of D74169.
> > > > > >
> > > > > > >> >> 6. -flto=thin
> > > > > >
> > > > > > >> >>    That problem was described in
this review https://reviews.llvm.org/D54747#1503720. It also exists in
> > > > > >
> > > > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
> > > > > >
> > > > > > >> >> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
> > > > > >
> > > > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> > > > > >
> > > > > > >> >> more to reduce that redundancy
early on (actually removing definitions from some llvm Modules if the type
> > > > > >
> > > > > > >> >> definition is known to exist in
another Module, etc)
> > > > > > >> >I don't know if it's a
problem since that patch was reverted.
> > > > > >
> > > > > > >>
> > > > > >
> > > > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
> > > > > >
> > > > > > >> if D74169 would be applied and
--gc-debuginfo used then structure type
> > > > > > >> definition would be removed.
> > > > > >
> > > > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
> > > > > > >> definition is known to exist in another
Module".
> > > > > > >> i.e. DWARFLinker could replace the
declaration with the definition.
> > > > > >
> > > > > > >> But that problem could be more easily
resolved when debug info is generated(probably without
> > > > > > >> significant increase of debug info size):
> > > > > >
> > > > > > >> Here we have:
> > > > > >
> > > > > > >> DW_TAG_compile_unit(0x0000000b) - compile
unit containing concrete instance for function "f".
> > > > > > >> DW_TAG_compile_unit(0x00000073) - compile
unit containing abstract instance root for function "f".
> > > > > > >> DW_TAG_compile_unit(0x000000c1) - compile
unit containing function "f" definition.
> > > > > >
> > > > > > >> Code for function "f" was
deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > > > > > >> containing "f" definition
(since there is no corresponding code). But it has structure "Foo"
definition
> > > > > > >> DW_TAG_structure_type(0x0000011e)
referenced from DW_TAG_compile_unit(0x00000073)
> > > > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
> > > > > > >> was removed by thinlto and replaced with
declaration.
> > > > > >
> > > > > > >> Would it cost too much if type definition
would not be replaced with declaration for "abstract instance root"?
> > > > > > >> The number of concrete instances is
bigger than number of abstract instance roots.
> > > > > > >> Probably, it would not be too costly to
leave definition in abstract instance root?
> > > > > >
> > > > > >
> > > > > >
> > > > > > >> Alternatively, Would it cost too much if
type definition would not be replaced with declaration when
> > > > > > >> declaration references type from not used
function? (lto could understand that concrete function is not used).
> > > > > >
> > > > > >
> > > > > > >I don't follow this example - could you
provide a small concrete test case I could reproduce?
> > > > > >
> > > > > > I would provide a test case if necessary. But it
looks like this issue is finally clear, and you already commented on that.
> > > > > >
> > > > > > > Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone
> > > > > > > definition of 'f' won't be needed
- so it produces one in case one of the inlining opportunities
> > > > > > > doesn't end up inlining. Then it turns
out all calls got inlined, so the external definition wasn't needed.
> > > > > >
> > > > > > > Oh, you're suggesting that these 3 CUs
got emitted into one object file during LTO, but that DWARFLinker
> > > > > > > drops a CU without any code in it - even
though... So far as I know, in LTO, LLVM directly references
> > > > > > > types across units if the CUs are all emitted
in the same object file. (and if they weren't in the same
> > > > > > > object file - then the abstract_origin
couldn't be pointing cross-CU).
> > > > > >
> > > > > > > I guess some basic things to say:
> > > > > >
> > > > > > > With ThinLTO, the concrete/standalone
function definition is emitted in case some call sites don't end up
> > > > > > > being inlined. So we know it'll be
emitted (but might not be needed by the actual linker)
> > > > > > > ANy number of inline calls might exist - but
we shouldn't put the type information into those, because
> > > > > > > they aren't guaranteed to emit it (if the
inline function gets optimized away, there would be nothing to
> > > > > > > enforce the type being emitted) - and even if
we forced the type information to be emitted into one
> > > > > > > object file that has an inline copy of the
function - there's no guarantee that object file will get linked in either.
> > > > > >
> > > > > > > So, no, I don't think there's much we
can do to keep the size of object files down, while guaranteeing
> > > > > > > the type information will be emitted with the
usual linker semantics.
> > > > > >
> > > > > > Then dsymutil/DWARFLinker could be changed to
handle that(though it would probably be not very efficient).
> > > > > > If thinlto would understand that function is not
used finally(and then must not contain referenced type definition),
> > > > > > then this situation could be handled more
effectively.
> > > > > >
> > > > > > Thank you, Alexey.
> > > > > >
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
_______________________________________________
> > > > > >>> LLVM Developers mailing list
> > > > > >>> llvm-dev at lists.llvm.org
> > > > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > _______________________________________________
> > > > > LLVM Developers mailing list
> > > > > llvm-dev at lists.llvm.org
> > > > >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

2020-Jul-28 07:29 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
<alapshin at accesssoftek.com> wrote:>
>
> >> >> >> >> This idea goes in another direction
than fragmenting dwarf
> >> >> >> >> using elf sections&tricks. It seems
to me that the cost of fragmenting is too high.
> >> >> >>
> >> >> >> >I tend to agree - but I'm sort of
leaning towards trying to use object
> >> >> >> >features as much as possible, then
implementing just enough custom
> >> >> >> >handling in the linker to recoup overhead,
etc. (eg: add some kind of
> >> >> >> >small header/brief description that makes it
easy for the linker to
> >> >> >> >slice-and-dice - but hopefully a
domain-specific such header can be a
> >> >> >> >bit more compact than the fully general ELF
form)
> >> >> >>
> >> >> >> I think this indeed should be implemented and
evaluated.
> >> >> >> So that various approaches could be compared.
> >> >> >>
> >> >> >> >> It is not only the sizes of structures
describing fragments but also the complexity
> >> >> >> >> of tools that should be taught to work
with fragmented DWARF.
> >> >> >> >> (f.e. llvm-dwarfdump applied to object
file should be able to read fragmented DWARF,
> >> >> >> >> but applied to linked executable it
should work with non-fragmented DWARF).
> >> >> >> >> That idea is for the tool which works
the same way as dsymutil ODR.
> >> >> >> >>
> >> >> >> >> I will shortly describe the idea of
making DWARF be easier processed by dsymutil/DWARFLinker:
> >> >> >> >>
> >> >> >> >> The idea is to have only one "type
table" per object file(special section .debug_types_table).
> >> >> >> >> This "type table" would
contain all types.
> >> >> >> >> There could be a special type of
reference - type_offset - that offset points into the type table.
> >> >> >> >> Basic types could always be placed into
the start of "type table" thus, offsets to basic types
> >> >> >> >> most often would be 1 byte. There also
would be a special kind of reference - reference inside the type.
> >> >> >> >> Type units sig8 system - would not be
used to reference types.
> >> >> >> >>
> >> >> >> >> Types deduplication is assumed to be
done, not by linker mechanism for COMDAT,
> >> >> >> >> but by a tool like dsymutil. This tool
would create resulting .debug_types_table by putting there
> >> >> >> >> types from source .debug_types_table-s.
Only one copy of the type would be placed into the
> >> >> >> >> resulting table. All references
pointing to the deleted copy would be corrected to point
> >> >> >> >> to the single copy inside "type
table". (that is how dsymutil works currently)
> >> >> >>
> >> >> >> >^ that's the step that's probably a
bit expensive for a general-use
> >> >> >> >tool - it implies parsing all the DWARF to
find those references and
> >> >> >> >rewrite them, I think. For a
high-performance solution that could be
> >> >> >> >run by the linker I think it'd be
necessary to have a solution that
> >> >> >> >doesn't involve parsing all the DIEs.
> >> >> >>
> >> >> >> According to the current dsymutil processing,
> >> >> >> exactly this process is not the most
time-consuming.
> >> >> >> That could be done relatively fast.
> >> >>
> >> >> >Fair enough - though I'd still imagine any
solution that involves
> >> >> >parsing all the DIEs still wouldn't be fast
enough (maybe an order of
> >> >> >magnitude faster than the current solution even - but
that's stuill,
> >> >> >what, 6 or 7x slower than linking without the
feature?) for most users
> >> >> >to consider it a good trade-off.
> >> >>
> >> >> It seems to me that even the current 6x-7x slowdown could
be useful.
> >> >> Users who already use dsymutil or llvm-dwp(assuming
DWARFLinker
> >> >> would be taught to work with a split dwarf) tools spend
this time and,
> >> >> in some scenarios, waste disk space by inter-mediate
files.
> >> >
> >> >FWIW, dwp (llvm-dwp hasn't really been optimized compared
to binutils
> >> >dwp) is designed to be very quick - by not needing to do a lot
of
> >> >parsing/fixups. Which, yes, means larger output files than
would be
> >> >possible with more parsing/etc. It also doesn't take any
input from
> >> >the linker (so it can run in parallel with the linker) - so it
can't
> >> >remove dead subprograms. Given Google's the major (perhaps
only
> >> >significant?) user of Split DWARF - I can say that the needs
don't
> >> >necessarily overlap well with something that would take
significantly
> >> >longer to run or use significantly more memory.
Faster/cheaper/with
> >> >somewhat bigger output files is probably the right tradeoff
for
> >> >Google's use case, at least.
> >> >
> >> >I imagine Apple's use for dsymutil is somewhat similar -
it's not used
> >> >in the iterative development cycle, only in final releases -
well,
> >> >maybe their situation is more "neutral" - not a
major pain point in
> >> >any case I'd guess.
> >> >
> >>>
> >> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from lld:
> >>
> >> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
> >>     generating time for .dwp 53 sec, clang=997M clang.dwp=1.1G.
>
> >FWIW, llvm-dwp is not very well optimized (which is to say: it is not
> >optimized), binutils dwp might be a better comparison (& even that
> >doesn't have the parallelism & some potential further memory
savings
> >that lld has that we could take advantage of in a dwp-like tool)
> >
> >What build mode was the clang binary built in? Optimized or
unoptimized?
>
> right, that is unoptimized build with -ffunction-sections.
>
> >> 2. DWARFLinker from lld = linking time for clang 72 sec,
clang=760M.
And this is without Split DWARF? Without linker DWARF compression? -
that seems quite a bit surprising, that the deduplication of DWARF
could fit into less space than the wasted/reclaimed space in ranges (&
line)?

Could you double check these numbers & provide a clearer summary?

Here's my attempt at numbers (all with function-sections+gc-sections)...

Split DWARF tests didn't seem meaningful - gc-debuginfo + split DWARF
seemed to drop all the debug info (except gdb_index) so wasn't
working/comparison wasn't meaningful for Apples to Apples, but
included it for comparing gc'd non-split to non-gc'd split (disabled
gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames) (which turns
on by default with Split DWARF because gdb needs it - but a bit of an
unfair comparison without turning on gnu-pubnames/gdb-index in other
build modes too, since it... /shouldn't/ be necessary) which might've
been a factor in the data you were looking at)

* -O0: (baseline, just using strip -g: 356 MB)
  * compressed: 25% smaller with gc-debuginfo (481 MB / 641 MB) (407
MB split/non-gc)
  * uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB split/non-gc)
* -O3: (baseline: 116 MB)
  * compressed: 16% smaller (361 MB / 462 MB) (283 MB split/non-gc)
  * uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB split/non-gc)




On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
<alapshin at accesssoftek.com> wrote:>
>
> >> >> >> >> This idea goes in another direction
than fragmenting dwarf
> >> >> >> >> using elf sections&tricks. It seems
to me that the cost of fragmenting is too high.
> >> >> >>
> >> >> >> >I tend to agree - but I'm sort of
leaning towards trying to use object
> >> >> >> >features as much as possible, then
implementing just enough custom
> >> >> >> >handling in the linker to recoup overhead,
etc. (eg: add some kind of
> >> >> >> >small header/brief description that makes it
easy for the linker to
> >> >> >> >slice-and-dice - but hopefully a
domain-specific such header can be a
> >> >> >> >bit more compact than the fully general ELF
form)
> >> >> >>
> >> >> >> I think this indeed should be implemented and
evaluated.
> >> >> >> So that various approaches could be compared.
> >> >> >>
> >> >> >> >> It is not only the sizes of structures
describing fragments but also the complexity
> >> >> >> >> of tools that should be taught to work
with fragmented DWARF.
> >> >> >> >> (f.e. llvm-dwarfdump applied to object
file should be able to read fragmented DWARF,
> >> >> >> >> but applied to linked executable it
should work with non-fragmented DWARF).
> >> >> >> >> That idea is for the tool which works
the same way as dsymutil ODR.
> >> >> >> >>
> >> >> >> >> I will shortly describe the idea of
making DWARF be easier processed by dsymutil/DWARFLinker:
> >> >> >> >>
> >> >> >> >> The idea is to have only one "type
table" per object file(special section .debug_types_table).
> >> >> >> >> This "type table" would
contain all types.
> >> >> >> >> There could be a special type of
reference - type_offset - that offset points into the type table.
> >> >> >> >> Basic types could always be placed into
the start of "type table" thus, offsets to basic types
> >> >> >> >> most often would be 1 byte. There also
would be a special kind of reference - reference inside the type.
> >> >> >> >> Type units sig8 system - would not be
used to reference types.
> >> >> >> >>
> >> >> >> >> Types deduplication is assumed to be
done, not by linker mechanism for COMDAT,
> >> >> >> >> but by a tool like dsymutil. This tool
would create resulting .debug_types_table by putting there
> >> >> >> >> types from source .debug_types_table-s.
Only one copy of the type would be placed into the
> >> >> >> >> resulting table. All references
pointing to the deleted copy would be corrected to point
> >> >> >> >> to the single copy inside "type
table". (that is how dsymutil works currently)
> >> >> >>
> >> >> >> >^ that's the step that's probably a
bit expensive for a general-use
> >> >> >> >tool - it implies parsing all the DWARF to
find those references and
> >> >> >> >rewrite them, I think. For a
high-performance solution that could be
> >> >> >> >run by the linker I think it'd be
necessary to have a solution that
> >> >> >> >doesn't involve parsing all the DIEs.
> >> >> >>
> >> >> >> According to the current dsymutil processing,
> >> >> >> exactly this process is not the most
time-consuming.
> >> >> >> That could be done relatively fast.
> >> >>
> >> >> >Fair enough - though I'd still imagine any
solution that involves
> >> >> >parsing all the DIEs still wouldn't be fast
enough (maybe an order of
> >> >> >magnitude faster than the current solution even - but
that's stuill,
> >> >> >what, 6 or 7x slower than linking without the
feature?) for most users
> >> >> >to consider it a good trade-off.
> >> >>
> >> >> It seems to me that even the current 6x-7x slowdown could
be useful.
> >> >> Users who already use dsymutil or llvm-dwp(assuming
DWARFLinker
> >> >> would be taught to work with a split dwarf) tools spend
this time and,
> >> >> in some scenarios, waste disk space by inter-mediate
files.
> >> >
> >> >FWIW, dwp (llvm-dwp hasn't really been optimized compared
to binutils
> >> >dwp) is designed to be very quick - by not needing to do a lot
of
> >> >parsing/fixups. Which, yes, means larger output files than
would be
> >> >possible with more parsing/etc. It also doesn't take any
input from
> >> >the linker (so it can run in parallel with the linker) - so it
can't
> >> >remove dead subprograms. Given Google's the major (perhaps
only
> >> >significant?) user of Split DWARF - I can say that the needs
don't
> >> >necessarily overlap well with something that would take
significantly
> >> >longer to run or use significantly more memory.
Faster/cheaper/with
> >> >somewhat bigger output files is probably the right tradeoff
for
> >> >Google's use case, at least.
> >> >
> >> >I imagine Apple's use for dsymutil is somewhat similar -
it's not used
> >> >in the iterative development cycle, only in final releases -
well,
> >> >maybe their situation is more "neutral" - not a
major pain point in
> >> >any case I'd guess.
> >> >
> >>>
> >> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from lld:
> >>
> >> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
> >>     generating time for .dwp 53 sec, clang=997M clang.dwp=1.1G.
>
> >FWIW, llvm-dwp is not very well optimized (which is to say: it is not
> >optimized), binutils dwp might be a better comparison (& even that
> >doesn't have the parallelism & some potential further memory
savings
> >that lld has that we could take advantage of in a dwp-like tool)
> >
> >What build mode was the clang binary built in? Optimized or
unoptimized?
>
> right, that is unoptimized build with -ffunction-sections.
>
> >> 2. DWARFLinker from lld = linking time for clang 72 sec,
clang=760M.
>
> >It does seem a tad strange that the clang binary would be smaller
> >non-split with DWARF linking than it was split. Though I could imagine
> >this might be possible in an optimized build (wehre debug_ranges
> >become quite relatively expensive in the .o file contribution with
> >Split DWARF)
>
> >Could you compare the section sizes between these two clang binaries,
perhaps?
>
> .debug_ranges is three times bigger and .debug_line is twice bigger.
>
> >> >> Thus if they would use this LLD feature in its current
state
> >> >> - they would still receive benefits.
> >> >>
> >> >> Speaking of performance results - LLD is a multi-thread
linker;
> >> >> it handles sections in parallel. DWARFLinker generates
DWARF using
> >> >> AsmPrinter which is a stream - so it could make resulting
DWARF only
> >> >> continuously. It is not surprising that the parallel
solution works faster.
> >> >> Making DWARFLinker truly multi-threaded would probably
allow us
> >> >> to make slowdown to be at 2x-4x range.
> >> >
> >> >*nod* that's still a really expensive link - but I
understand that's a
> >> >suitable tradeoff for your users
> >> >
> >>
> >> Btw, 2x or 7x is for pure linking time. Overall compilation
slowdown
> >> is not so significant. Building LLVM codebase has only 20%
slowdown.
> >
> >Understood - that's still quite significant to most users, I'd
imagine.
>
> I see.
>
> >> >> >> Anyway, I think the dsymutil approach is still
valuable, and it
> >> >> >> would be useful to optimize it.
> >> >> >> Do you think it would be useful to make
dsymutil/DWARFLinker truly multi-thread?
> >> >> >> (To make dsymutil/DWARFLinker able to process
each object file in a separate thread)
> >> >>
> >> >> >Perhaps - that I'd probably leave up to the folks
who are more
> >> >> >invested in dsymutil (Adrian Prantl et al). Maybe one
day we'll get it
> >> >> >integrated into llvm-dwp and then I'll be
interested in getting as
> >> >> >much performance out of it as lld - so multithreading
and things would
> >> >> >be on the books.
> >> >>
> >> >> I think improving dsymutil is a valuable thing.
> >> >> Though there are several directions which might be
considered
> >> >> to make it more robust:
> >> >>
> >> >> 1. support of latest DWARF - DWARF5/DWARF64...
> >> >
> >> >I expect/though some of the Apple folks had already worked on
DWARF5 support?
> >> >DWARF64 - that's been around for a while, and just
hasn't been needed
> >> >by LLVM users thus far, it seems (until recently - where some
> >> >developers have started working on that)
> >>
> >> There already implemented debug_names table, but debug_rnglists,
> >> debug_loclists, type units - are not implemented yet.
> >
> >Superficially, type units wouldn't be on the list of features (like
> >DWARF64 - it's optional) I'd try to support in dsymutil - since
their
> >size overhead is more justified for a DWARF-agnostic linker that's
> >using comdat groups. With a DWARF-aware linker I'd be specifically
> >hoping to avoid using type units to help
>
> >> The thing which
> >> should probably be changed is that dsymutil should not have its
version
> >> of code generating DWARF tables. It should call already existed
> >> DWARF5/DWARF64 implementations. Then dsymutil would always
> >> use last DWARF generators.
>
> >
> >Possibly - I don't know what the architectural tradeoffs for that
look
> >like - I'd imagine DWARFLinker has sufficiently different
> >needs/tradeoffs than LLVM's DWARF generation code (rewriting
existing
> >DIEs compared to building new ones from scratch, etc) that it might be
> >hard for them to share a lot of their implementation.
>
> It is not easy, and would require some additions, but it would benefit
> in that all format implementation is in one place. Thus changing that place
> would reflect in other places. There are at least three implementations for
> .debug_ranges, .debug_aranges currently...
>
>
> >> >> 2. implement multi-threaded execution.
> >> >> 3. support of split DWARF.
> >> >
> >> >Maybe, though I'm still not sure it'd be the right
tradeoff -
> >> >especially if it involved having to wait to run the .dwo
merger (call
> >> >it DWARF-aware dwp, or dsymutil with dwp support) until after
the
> >> >linker ran.
> >> >
> >> >> 4. implement dsymutil for non-darwin platform.
> >> >
> >> >That's probably, essentially (3), more-or-less. Split
DWARF is
> >> >somewhat of a formalization of Apple's/MachO DWARF
distribution model
> >> >(leave DWARF it in files that aren't linked/use them from
a debugger,
> >> >but also be able to merge them into some final file (dsym or
dwp) for
> >> >archival purposes)
> >> >
> >> >> All of this is a massive piece of work.
> >> >> Our original investment was to solve two problems:
> >> >>
> >> >> 1. Overlapped address ranges, which is currently close to
being solved. Thank you for helping with that!
> >> >
> >> >Yeah, again, sorry that's taken quite so long/somewhat
circuitous route.
> >> >
> >> >> 2. Size of debug info. That still becomes an issue, but
we are unsure whether we are ready to
> >> >>    invest in solving all the above 1-4 problems and how
much community interested in it.
> >> >
> >> >Fair, for sure - I don't think you'd need to sign up
to solve all of
> >> >them (don't think they necessarily need solving).
Potentially moving
> >> >the logic out into a separate tool as Fangrui's
considering - a
> >> >post-link DWARF optimizer, rather than in-linker DWARF
optimization.
> >> >
> >> >I really don't want to give you the runaround like this -
but multiple
> >> >times slower links is something that seems pretty problematic
for most
> >> >users, to the point of weighing the maintainability of lld
against the
> >> >convenience of having this functionality in-linker rather than
in a
> >> >post-link optimizer.
> >> >
> >> >(I know you've spoken a bit before about your users needs
- but if
> >> >it's possible, could you explain (again :/) why they have
such a
> >> >strong need for smaller DWARF? While DWARF size is an ongoing
concern
> >> >for many users (Google certainly - hence the invention of
Split DWARF,
> >> >use of type units and compressed DWARF, etc) - usually
it's in rather
> >> >large programs, but it sounds like you're dealing with
relatively
> >> >small ones (otherwise the increase in link time, I'd
imagine, would be
> >> >prohibitive for your users?)?
> >>
> >> We have many large programs and keep Dayly/Nightly debug builds,
> >> which takes a lot of disk space. Compilation time for these
programs is big.
> >> The scenario is "compile once".(not
compile-debug-compile-debug).
> >> So we think that solution(like dsymutil/DWARFLinker) would not
slowdown
> >> the compilation time of overall build significantly(see above
numbers for
> >> llvm codebase) and would allow us to reduce disk space required to
keep
> >> all of these builds.
>
> >Ah, OK - for archival purposes. So the interactive developers
wouldn't
> >necessarily be using this feature. Makes sense - similar to dsymutil
> >and dwp, mostly used for archival purposes & you can debug straight
> >from .o/.dwos for interactive/iterative development.
>
> >In that case, it seems more likely that a separate tool might suffice.
>
> agreed: if to continue the work on this then it makes sense to
> do it as separate tool. Make it fast enough. And if there would be interest
> in it - then it would probably be possible to return to idea calling it
from linker.
>
> >Also, out of curiosity - have you tried just compressing the output
> >(-gz (I think that does the right thing for the linker level
> >compression too, otherwise -Wl,-compress-debug-sections might do it))
> >or are you already doing that in addition?
>
> sure. we use  -Wl,-compress-debug-sections.
>
> Thank you, Alexey.
>
> >> >You mentioned that the usability cost of
> >> >Split DWARF for your users was too high (or high enough to
justify
> >> >this alternative work of DWARF-aware linking)? That all seems
a bit
> >> >surprising to me - though I understand the deployment issues
of Split
> >> >DWARF do present some challenges to users in more heterogenous
> >> >environments than Google's... still, I'd have thought
there was some
> >> >hope there)
> >>
> >> Our tools does not support split dwarf yet. Though we plan to
implement it.
> >> When we would have support of split dwarf then it would be
> >> convenient to have easy way to share built debug binaries.
llvm-dwp is the
> >> answer to this. DWARFLinker could probably be another answer.
>
> >Ah, fair enough - thanks for the context!
>
> > > >> >One way to do that would be to have a CU-local type
indirection table.
> > > >> >DIEs reference local type numbers (like local
address/string numbers -
> > > >> >addrx/strx/rnglistx) and that table contains either
sig8 (no linker
> > > >> >fixups required) or the local type offsets you
describe - the linker
> > > >> >would then only need to read this type number
indirection table and
> > > >> >rewrite them to the final type numbers.
> > > >>
> > > >> Yes, that could be additionally done if this process
would be time-consuming.
> > > >>
> > > >> David, thank you for all your comments and explanations.
They are extremely helpful.
> > >
> > > >Sure thing - really appreciate your patience with all this -
it's... a
> > > >lot of moving parts.
> > >
> > > >- Dave
> > >
> > > >
> > > > Thank you, Alexey.
> > > >
> > > > > sig8 hash-id would be used to compare types and to
deduplicate them.
> > > > > It would speed up the current dsymutil context
analysis.
> > > > > Types having the same hash-id could be deduplicated.
> > > > > This would allow deduplicating a more number of types
than current dsymutil.
> > > > > Incomplete type definitions having a similar set of
members are not deduplicated by dsymutil currently.
> > > > > In this case they would have the same hash-id.
> > > > >
> > > > > This "type table" would take less space than
current "type units" and current ODR solution.
> > > > >
> > > > > Above is just an idea on how to help DWARF-aware
linker(based on idea removing obsolete debug info)
> > > > > to work faster(if that is interesting).
> > > > >
> > > > > Alexey.
> > > > >
> > > > > > From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of James Henderson via llvm-dev
> > > > > > Sent: Wednesday, June 3, 2020 3:48 AM
> > > > > > To: David Blaikie <dblaikie at gmail.com>
> > > > > > Cc: llvm-dev at lists.llvm.org
> > > > > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD]
Remove obsolete debug info in lld.
> > > > > >
> > > > > >
> > > > > >
> > > > > > It makes me sad that the linker (via a library or
otherwise) has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
> > > > > >
> > > > > >
> > > > > >
> > > > > > The patching to -1 (or equivalent) is probably a
good lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
> > > > > >
> > > > > >
> > > > > >
> > > > > > We've already seen from Alexey's
prototyping, and from our own experiences with the Sony proprietary linker
(which tried to rewrite .debug_line only) that deconstructing the DWARF so that
it can be more optimally reassembled at link time is slow going, and will
probably inevitably be however much effort is put into optimising it. For a
start, given the current standards, it's impossible to know how to
deconstruct it without having to parse vast amounts of DWARF, which is typically
going to mean a lot more parsing work than the linker would normally have to
deal with. Additionally, much of this parsing work is wasted effort, since it
seems unlikely in many links that large amounts of the DWARF will be redundant.
Having an option to opt-in doesn't help much there, since it just means the
logic exists without most people using it, due to it not being good enough, or
potentially they don't even know it exists.
> > > > > >
> > > > > >
> > > > > >
> > > > > > I don't have particularly concrete suggestions
as to how to solve the structural problems with DWARF at this point. The only
thing that seems obvious to me is a more "blessed" approach to
fragmentation of sections, similar to what I tried with my prototype mentioned
earlier in the thread, although we'd need to figure out the previously
stated performance issues. Other ideas might tie into this, like somehow sharing
the various table headers a bit like CIEs in .eh_frame that could be merged by
the linker - each object could have separate table header sections, which are
referenced by the individual .debug_* blocks, which in turn are one per
function/data piece and easily discardable/merged by the linker.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Just some thoughts.
> > > > > >
> > > > > >
> > > > > >
> > > > > > James
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via
llvm-dev <llvm-dev at lists.llvm.org> wrote:
> > > > > >
> > > > > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > > > > > <alapshin at accesssoftek.com> wrote:
> > > > > > >
> > > > > > > Hi David, please find my comments inside:
> > > > > > >
> > > > > > >
> > > > > > > >>>Broad question: Do you have any
specific motivation/users/etc in implementing this (if you can speak about it)?
> > > > > > >
> > > > > > > >>> - it might help motivate the
work, understand what tradeoffs might be suitable for you/your users, etc.
> > > > > > >
> > > > > > > >>There are two general requirements:
> > > > > > > >> 1) Remove (or clean) invalid debug
info.
> > > > > > >
> > > > > > > >
> > > > > > > >Perhaps a simpler direct solution for
your immediate needs might be a much narrower,
> > > > > > > >and more efficient linker-DWARF-awareness
feature:
> > > > > > > >
> > > > > > > > With DWARFv5, rnglists present an
opportunity for a DWARF linker to rewrite the ranges
> > > > > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
> > > > > > > > can be referenced either directly, or by
index. If all rnglists are referenced by index, then
> > > > > > > > a linker could parse only the
debug_rnglists section and rewrite ranges to remove any
> > > > > > > > address ranges that refer to
optimized-out code.
> > > > > > > >
> > > > > > > > This would only be correct for rnglists
that had no direct references to them (that only were
> > > > > > > > referenced via the indexes) - but we
could either implement it with that assumption, or could
> > > > > > > > add an LLVM extension attribute on the
CU that would say "I promise I only referenced rnglists
> > > > > > > > via rnglistx forms/indexes). If this
DWARF-aware linking would have to read the CU DIE (not
> > > > > > > > all the other DIEs) it /could/ also then
rewrite high/low_pc if the CU wasn't using ranges...
> > > > > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
> > > > > > > > so no need for that.
> > > > > > > >
> > > > > > > > Such a DWARF-aware rnglist linking could
also simplify rnglists, in cases where functions
> > > > > > > > ended up being laid out next to each
other, the linker could coalesce their ranges together.
> > > > > > > >
> > > > > > > > I imagine this could be implemented with
very little overhead to linking, especially compared
> > > > > > > > to the overhead of full DWARF-aware
linking.
> > > > > > > >
> > > > > > > >Though none of this fixes Split DWARF,
where the linker doesn't get a chance to see the
> > > > > > > > addresses being used - but if you only
want/need the CU-level ranges to be correct, this
> > > > > > > > might be a viable fix, and quite
efficient.
> > > > > > >
> > > > > > > Yes, we think about that alternative. This
would resolve our problem of invalid debug info
> > > > > > > and would work much faster. Thus, if we would
not have good results for D74169 then we
> > > > > > > will implement it. Do you think it could be
useful to have this solution in upstream?
> > > > > >
> > > > > > A pure rnglist rewriting - I think it'd be OK
to have in upstream -
> > > > > > again, cost/benefit/etc would have to be weighed.
I'm not sure it
> > > > > > would save enough space to be particularly
valuable beyond the
> > > > > > correctness issue - and it doesn't completely
solve the correctness
> > > > > > issue for zero-address usage or low-address usage
(because you could
> > > > > > still have overlapping subprograms inside a CU -
so if you were
> > > > > > symbolizing you could use the correct rnglist to
filter, but then go
> > > > > > look inside the CU only to find two subprograms
that had that address
> > > > > > & not know which one was the correct one an
which one was the
> > > > > > discarded one).
> > > > > >
> > > > > > rnglist rewriting might be easy enough to
prototype - but depends what
> > > > > > you want to spend your time on, I know this whole
issue has been a
> > > > > > huge investment of your time already - but maybe
this recent
> > > > > > revitalization of the conversation around having
an explicit value in
> > > > > > the linker might be sufficient to address
everyone's needs... *fingers
> > > > > > crossed*)
> > > > > >
> > > > > >
> > > > > > > >> 2) Optimize the DWARF size.
> > > > > > >
> > > > > > >
> > > > > > > > Do your users care much about this? I
imagine if they had significant DWARF size issues,
> > > > > > > > they'd have significant link time
issues and the kind of cost to link time this feature has would
> > > > > > > > be prohibitive - but perhaps they're
sharing linked binaries much more often than they're
> > > > > > > > actually performing linking.
> > > > > > >
> > > > > > > Yes, they do. They also have significant
link-time issues.
> > > > > > > So current performance results of D74169 are
not very acceptable.
> > > > > > > We hope to improve it.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >>The specifics which our users have:
> > > > > > > >>  - embedded platform which uses 0 as
start of .text section.
> > > > > > > >>  - custom toolset which does not
support all features yet(f.e. split dwarf).
> > > > > > > >>  - tolerant of the link-time
increase.
> > > > > > > >>  - need a useful way to share debug
builds.
> > > > > > >
> > > > > > >
> > > > > > > > Sharing two files (executable and dwp)
is significantly less useful than sharing one file?
> > > > > > >
> > > > > > > Probably not significantly, but yes, it looks
less useful comparing to D74169.
> > > > > > > Having only two files (executable and .dwp)
looks significantly better than having executable and multiple .dwo files.
> > > > > > > Having only one file(executable) with minimal
size looks better than the two files with a bigger size.
> > > > > > >
> > > > > > > clang compiled with -gsplitdwarf takes 0.9G
for executable and 0.9G for .dwp.
> > > > > > > clang compiled with -gc-debuginfo takes only
0.76G for single executable.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >>For the first point: we have a
problem "Overlapping address ranges starting from 0"(D59553).
> > > > > > >
> > > > > > > >>We use custom solution, but the
general solution like D74169 would be better here.
> > > > > > >
> > > > > > >
> > > > > > > > If CU ranges are the only ones that need
fixing, then I think the above solution might be as
> > > > > > > > good/better - if more than CU ranges
need fixing, then I think we might want to start talking about
> > > > > > > > how to fix DWARF itself (split and
non-split) to signal certain addresses point to dead code with a
> > > > > > > > specific blessed value that linkers
would need to implement - because with Split DWARF there's
> > > > > > > > no way to solve the non-CU addresses at
the linker.
> > > > > > >
> > > > > > > I think the worthful solution for that signal
value would be LowPC > HighPC.
> > > > > > > That does not require additional bits in
DWARF.
> > > > > > > It would be natural to skip such address
ranges since they explicitly marked as invalid.
> > > > > > > It could be implemented in a linker very
easily. Probably, it would make sense to describe that
> > > > > > > usage in DWARF standard.
> > > > > > >
> > > > > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
> > > > > > > they need to have another solution. Could you
show an example of such a case, please?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >>>2. Support of type units.
> > > > > > >
> > > > > > > >>>
> > > > > > >
> > > > > > > >>>>  That could be implemented
further.
> > > > > > >
> > > > > > > >>>Enabling type units increases
object size to make it easier to deduplicate at link time by a DWARF-unaware
> > > > > > >
> > > > > > > >>>linker. With a DWARF aware linker
it'd be generally desirable not to have to add that object size overhead to
> > > > > > >
> > > > > > > >>>get the linking improvements.
> > > > > > >
> > > > > > > >>
> > > > > > >
> > > > > > > >>But, DWARFLinker should adequately
work with type units since they are already implemented.
> > > > > > >
> > > > > > >
> > > > > > > > Maybe - it'd be nice & all, but
I don't think it's an outright necessity - if someone knows they're
using
> > > > > > > > a DWARF-aware linker, they'd
probably not use type units in their object files. It's possible someone
> > > > > > > > doesn't know for sure & maybe
they have pre-canned debug object files from someone else, etc.
> > > > > > >
> > > > > > > I see.
> > > > > > >
> > > > > > > >>Another thing is that the idea behind
type units has the potential to help Dwarf-aware linker to work faster.
> > > > > > >
> > > > > > > >>Currently, DWARFLinker analyzes
context to understand whether types are the same or not.
> > > > > > >
> > > > > > >
> > > > > > > >When you say "analyzes context"
what do you mean? Usually I'd take that to mean
> > > > > > > > "looks at things outside the type
itself - like what namespace it's in, etc" - which, yes,
> > > > > > > > it should do that, but it doesn't
seem very expensive to do. But I guess you actually
> > > > > > > > mean something about doing structural
equivalence in some way, looking at things inside the type?
> > > > > > >
> > > > > > > I think it could be useful for both cases.
Currently, dsymutil does only first thing
> > > > > > > (look at type name, namespace name, etc..)
and does not do the second thing
> > > > > > > (doing structural equivalence). Analyzing
type names is currently quite expensive
> > > > > > > (the only search in string pool takes ~10 sec
from 70 sec of overall time).
> > > > > > > That is expensive because of many things
should be done to work with strings:
> > > > > > > parse DWARF, search and resolve relocations,
compute a hash for strings,
> > > > > > > put data into a string pool, create a fully
qualified name(like namespace::function::name).
> > > > > > > It looks like it could be optimized and
finally require less time, but it still would be a noticeable
> > > > > > > part of the overall time.
> > > > > > >
> > > > > > > If dsymutil starts to check for the
structural equivalence, then the process would be even more slowly.
> > > > > > > So, If instead of comparing types structure,
there would be checked single hash-id - then this process
> > > > > > > would also be faster.
> > > > > > >
> > > > > > > Thus I think using hash-id to compare types
would allow to make current implementation faster and would
> > > > > > > allow handling incomplete types by
DWARFLinker without massive performance degradation also.
> > > > > > >
> > > > > > > >> But the context is known when types
are generated. So, no need to spent the time analyzing it.
> > > > > > >
> > > > > > > >> If types could be compared without
analyzing context, then Dwarf-aware linker would work faster.
> > > > > > >
> > > > > > > >> That is just an idea(not for
immediate implementation): If types would be stored in some "type
table"
> > > > > > >
> > > > > > > >> (instead of COMDAT section group)
and could be accessed through hash-id(like type units
> > > > > > >
> > > > > > > >> - then it would be the solution
requiring fewer bits to store but allowing to compare types
> > > > > > >
> > > > > > > >> by hash-id(not analysing context).
> > > > > > > >> In this case, size increasing would
be small. And processing time could be done faster.
> > > > > > > >>
> > > > > > > >> this is just an idea and could be
discussed separately from the problem of integrating of D74169.
> > > > > > >
> > > > > > > >> >> 6. -flto=thin
> > > > > > >
> > > > > > > >> >>    That problem was
described in this review https://reviews.llvm.org/D54747#1503720. It also exists
in
> > > > > > >
> > > > > > > >> >> current
DWARFLinker/dsymutil implementation. I think that problem should be discussed
more: it could
> > > > > > >
> > > > > > > >> >> probably be fixed by
avoiding generation of such incomplete declaration during thinlto,
> > > > > > >
> > > > > > > >> >> That would be costly to
produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> > > > > > >
> > > > > > > >> >> more to reduce that
redundancy early on (actually removing definitions from some llvm Modules if the
type
> > > > > > >
> > > > > > > >> >> definition is known to
exist in another Module, etc)
> > > > > > > >> >I don't know if it's a
problem since that patch was reverted.
> > > > > > >
> > > > > > > >>
> > > > > > >
> > > > > > > >> Yes. That patch was reverted, but
this patch(D74169) has the same problem.
> > > > > > >
> > > > > > > >> if D74169 would be applied and
--gc-debuginfo used then structure type
> > > > > > > >> definition would be removed.
> > > > > > >
> > > > > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
> > > > > > > >> definition is known to exist in
another Module".
> > > > > > > >> i.e. DWARFLinker could replace the
declaration with the definition.
> > > > > > >
> > > > > > > >> But that problem could be more
easily resolved when debug info is generated(probably without
> > > > > > > >> significant increase of debug info
size):
> > > > > > >
> > > > > > > >> Here we have:
> > > > > > >
> > > > > > > >> DW_TAG_compile_unit(0x0000000b) -
compile unit containing concrete instance for function "f".
> > > > > > > >> DW_TAG_compile_unit(0x00000073) -
compile unit containing abstract instance root for function "f".
> > > > > > > >> DW_TAG_compile_unit(0x000000c1) -
compile unit containing function "f" definition.
> > > > > > >
> > > > > > > >> Code for function "f" was
deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > > > > > > >> containing "f" definition
(since there is no corresponding code). But it has structure "Foo"
definition
> > > > > > > >> DW_TAG_structure_type(0x0000011e)
referenced from DW_TAG_compile_unit(0x00000073)
> > > > > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
> > > > > > > >> was removed by thinlto and replaced
with declaration.
> > > > > > >
> > > > > > > >> Would it cost too much if type
definition would not be replaced with declaration for "abstract instance
root"?
> > > > > > > >> The number of concrete instances is
bigger than number of abstract instance roots.
> > > > > > > >> Probably, it would not be too costly
to leave definition in abstract instance root?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >> Alternatively, Would it cost too
much if type definition would not be replaced with declaration when
> > > > > > > >> declaration references type from not
used function? (lto could understand that concrete function is not used).
> > > > > > >
> > > > > > >
> > > > > > > >I don't follow this example - could
you provide a small concrete test case I could reproduce?
> > > > > > >
> > > > > > > I would provide a test case if necessary. But
it looks like this issue is finally clear, and you already commented on that.
> > > > > > >
> > > > > > > > Oh, I guess this is happening perhaps
because ThinLTO can't know for sure that a standalone
> > > > > > > > definition of 'f' won't be
needed - so it produces one in case one of the inlining opportunities
> > > > > > > > doesn't end up inlining. Then it
turns out all calls got inlined, so the external definition wasn't needed.
> > > > > > >
> > > > > > > > Oh, you're suggesting that these 3
CUs got emitted into one object file during LTO, but that DWARFLinker
> > > > > > > > drops a CU without any code in it - even
though... So far as I know, in LTO, LLVM directly references
> > > > > > > > types across units if the CUs are all
emitted in the same object file. (and if they weren't in the same
> > > > > > > > object file - then the abstract_origin
couldn't be pointing cross-CU).
> > > > > > >
> > > > > > > > I guess some basic things to say:
> > > > > > >
> > > > > > > > With ThinLTO, the concrete/standalone
function definition is emitted in case some call sites don't end up
> > > > > > > > being inlined. So we know it'll be
emitted (but might not be needed by the actual linker)
> > > > > > > > ANy number of inline calls might exist -
but we shouldn't put the type information into those, because
> > > > > > > > they aren't guaranteed to emit it
(if the inline function gets optimized away, there would be nothing to
> > > > > > > > enforce the type being emitted) - and
even if we forced the type information to be emitted into one
> > > > > > > > object file that has an inline copy of
the function - there's no guarantee that object file will get linked in
either.
> > > > > > >
> > > > > > > > So, no, I don't think there's
much we can do to keep the size of object files down, while guaranteeing
> > > > > > > > the type information will be emitted
with the usual linker semantics.
> > > > > > >
> > > > > > > Then dsymutil/DWARFLinker could be changed to
handle that(though it would probably be not very efficient).
> > > > > > > If thinlto would understand that function is
not used finally(and then must not contain referenced type definition),
> > > > > > > then this situation could be handled more
effectively.
> > > > > > >
> > > > > > > Thank you, Alexey.
> > > > > > >
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
_______________________________________________
> > > > > > >>> LLVM Developers mailing list
> > > > > > >>> llvm-dev at lists.llvm.org
> > > > > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > > _______________________________________________
> > > > > > LLVM Developers mailing list
> > > > > > llvm-dev at lists.llvm.org
> > > > > >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alexey Lapshin via llvm-dev

2020-Jul-28 15:55 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
>>
>>>>>>>>>> This idea goes in another direction
than fragmenting dwarf
>>>>>>>>>> using elf sections&tricks. It seems
to me that the cost of fragmenting is too high.
>>>>>>>>> I tend to agree - but I'm sort of
leaning towards trying to use object
>>>>>>>>> features as much as possible, then
implementing just enough custom
>>>>>>>>> handling in the linker to recoup overhead,
etc. (eg: add some kind of
>>>>>>>>> small header/brief description that makes
it easy for the linker to
>>>>>>>>> slice-and-dice - but hopefully a
domain-specific such header can be a
>>>>>>>>> bit more compact than the fully general ELF
form)
>>>>>>>> I think this indeed should be implemented and
evaluated.
>>>>>>>> So that various approaches could be compared.
>>>>>>>>
>>>>>>>>>> It is not only the sizes of structures
describing fragments but also the complexity
>>>>>>>>>> of tools that should be taught to work
with fragmented DWARF.
>>>>>>>>>> (f.e. llvm-dwarfdump applied to object
file should be able to read fragmented DWARF,
>>>>>>>>>> but applied to linked executable it
should work with non-fragmented DWARF).
>>>>>>>>>> That idea is for the tool which works
the same way as dsymutil ODR.
>>>>>>>>>>
>>>>>>>>>> I will shortly describe the idea of
making DWARF be easier processed by dsymutil/DWARFLinker:
>>>>>>>>>>
>>>>>>>>>> The idea is to have only one "type
table" per object file(special section .debug_types_table).
>>>>>>>>>> This "type table" would
contain all types.
>>>>>>>>>> There could be a special type of
reference - type_offset - that offset points into the type table.
>>>>>>>>>> Basic types could always be placed into
the start of "type table" thus, offsets to basic types
>>>>>>>>>> most often would be 1 byte. There also
would be a special kind of reference - reference inside the type.
>>>>>>>>>> Type units sig8 system - would not be
used to reference types.
>>>>>>>>>>
>>>>>>>>>> Types deduplication is assumed to be
done, not by linker mechanism for COMDAT,
>>>>>>>>>> but by a tool like dsymutil. This tool
would create resulting .debug_types_table by putting there
>>>>>>>>>> types from source .debug_types_table-s.
Only one copy of the type would be placed into the
>>>>>>>>>> resulting table. All references
pointing to the deleted copy would be corrected to point
>>>>>>>>>> to the single copy inside "type
table". (that is how dsymutil works currently)
>>>>>>>>> ^ that's the step that's probably a
bit expensive for a general-use
>>>>>>>>> tool - it implies parsing all the DWARF to
find those references and
>>>>>>>>> rewrite them, I think. For a
high-performance solution that could be
>>>>>>>>> run by the linker I think it'd be
necessary to have a solution that
>>>>>>>>> doesn't involve parsing all the DIEs.
>>>>>>>> According to the current dsymutil processing,
>>>>>>>> exactly this process is not the most
time-consuming.
>>>>>>>> That could be done relatively fast.
>>>>>>> Fair enough - though I'd still imagine any
solution that involves
>>>>>>> parsing all the DIEs still wouldn't be fast
enough (maybe an order of
>>>>>>> magnitude faster than the current solution even -
but that's stuill,
>>>>>>> what, 6 or 7x slower than linking without the
feature?) for most users
>>>>>>> to consider it a good trade-off.
>>>>>> It seems to me that even the current 6x-7x slowdown
could be useful.
>>>>>> Users who already use dsymutil or llvm-dwp(assuming
DWARFLinker
>>>>>> would be taught to work with a split dwarf) tools spend
this time and,
>>>>>> in some scenarios, waste disk space by inter-mediate
files.
>>>>> FWIW, dwp (llvm-dwp hasn't really been optimized
compared to binutils
>>>>> dwp) is designed to be very quick - by not needing to do a
lot of
>>>>> parsing/fixups. Which, yes, means larger output files than
would be
>>>>> possible with more parsing/etc. It also doesn't take
any input from
>>>>> the linker (so it can run in parallel with the linker) - so
it can't
>>>>> remove dead subprograms. Given Google's the major
(perhaps only
>>>>> significant?) user of Split DWARF - I can say that the
needs don't
>>>>> necessarily overlap well with something that would take
significantly
>>>>> longer to run or use significantly more memory.
Faster/cheaper/with
>>>>> somewhat bigger output files is probably the right tradeoff
for
>>>>> Google's use case, at least.
>>>>>
>>>>> I imagine Apple's use for dsymutil is somewhat similar
- it's not used
>>>>> in the iterative development cycle, only in final releases
- well,
>>>>> maybe their situation is more "neutral" - not a
major pain point in
>>>>> any case I'd guess.
>>>>>
>>>>>
>>>> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from
lld:
>>>>
>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
>>>>      generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
>>> FWIW, llvm-dwp is not very well optimized (which is to say: it is
not
>>> optimized), binutils dwp might be a better comparison (& even
that
>>> doesn't have the parallelism & some potential further
memory savings
>>> that lld has that we could take advantage of in a dwp-like tool)
>>>
>>> What build mode was the clang binary built in? Optimized or
unoptimized?
>> right, that is unoptimized build with -ffunction-sections.
>>
>>>> 2. DWARFLinker from lld = linking time for clang 72 sec,
clang=760M.
> And this is without Split DWARF? Without linker DWARF compression? -
> that seems quite a bit surprising, that the deduplication of DWARF
> could fit into less space than the wasted/reclaimed space in ranges (&
> line)?
that was without split dwarf, without linker compression.
>
> Could you double check these numbers & provide a clearer summary?
sure, I would re-check it.
>
> Here's my attempt at numbers (all with
function-sections+gc-sections)...
>
> Split DWARF tests didn't seem meaningful - gc-debuginfo + split DWARF
> seemed to drop all the debug info (except gdb_index) so wasn't
> working/comparison wasn't meaningful for Apples to Apples, but
> included it for comparing gc'd non-split to non-gc'd split
(disabled
> gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames) (which turns
> on by default with Split DWARF because gdb needs it - but a bit of an
> unfair comparison without turning on gnu-pubnames/gdb-index in other
> build modes too, since it... /shouldn't/ be necessary) which
might've
> been a factor in the data you were looking at)
that might be the case. i.e. clang=997M for split dwarf(from my previous 
measurement) might include gnu-pubnames.

would recheck it and if that is the case then it is a unfair comparison.


My point was that "DWARFLinker from lld" takes less space than
singleton
split dwarf file+.dwp file.

for -O0 uncompressed:

- .dwp took 1.1G(if I built it correctly), singleton clang(from your 
measurements) 566 MB

    overall 1.6G.

- The "DWARFLinker from lld" 820 MB(from your measurements).


So "DWARFLinker from lld" looks two times better.


Anyway, thank you for pointing me to possible mistake. I would recheck 
it and update results.


Alexey.

>
> * -O0: (baseline, just using strip -g: 356 MB)
>    * compressed: 25% smaller with gc-debuginfo (481 MB / 641 MB) (407
> MB split/non-gc)
>    * uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB split/non-gc)
> * -O3: (baseline: 116 MB)
>    * compressed: 16% smaller (361 MB / 462 MB) (283 MB split/non-gc)
>    * uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB split/non-gc)
>
>
>
>
> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
>>
>>>>>>>>>> This idea goes in another direction
than fragmenting dwarf
>>>>>>>>>> using elf sections&tricks. It seems
to me that the cost of fragmenting is too high.
>>>>>>>>> I tend to agree - but I'm sort of
leaning towards trying to use object
>>>>>>>>> features as much as possible, then
implementing just enough custom
>>>>>>>>> handling in the linker to recoup overhead,
etc. (eg: add some kind of
>>>>>>>>> small header/brief description that makes
it easy for the linker to
>>>>>>>>> slice-and-dice - but hopefully a
domain-specific such header can be a
>>>>>>>>> bit more compact than the fully general ELF
form)
>>>>>>>> I think this indeed should be implemented and
evaluated.
>>>>>>>> So that various approaches could be compared.
>>>>>>>>
>>>>>>>>>> It is not only the sizes of structures
describing fragments but also the complexity
>>>>>>>>>> of tools that should be taught to work
with fragmented DWARF.
>>>>>>>>>> (f.e. llvm-dwarfdump applied to object
file should be able to read fragmented DWARF,
>>>>>>>>>> but applied to linked executable it
should work with non-fragmented DWARF).
>>>>>>>>>> That idea is for the tool which works
the same way as dsymutil ODR.
>>>>>>>>>>
>>>>>>>>>> I will shortly describe the idea of
making DWARF be easier processed by dsymutil/DWARFLinker:
>>>>>>>>>>
>>>>>>>>>> The idea is to have only one "type
table" per object file(special section .debug_types_table).
>>>>>>>>>> This "type table" would
contain all types.
>>>>>>>>>> There could be a special type of
reference - type_offset - that offset points into the type table.
>>>>>>>>>> Basic types could always be placed into
the start of "type table" thus, offsets to basic types
>>>>>>>>>> most often would be 1 byte. There also
would be a special kind of reference - reference inside the type.
>>>>>>>>>> Type units sig8 system - would not be
used to reference types.
>>>>>>>>>>
>>>>>>>>>> Types deduplication is assumed to be
done, not by linker mechanism for COMDAT,
>>>>>>>>>> but by a tool like dsymutil. This tool
would create resulting .debug_types_table by putting there
>>>>>>>>>> types from source .debug_types_table-s.
Only one copy of the type would be placed into the
>>>>>>>>>> resulting table. All references
pointing to the deleted copy would be corrected to point
>>>>>>>>>> to the single copy inside "type
table". (that is how dsymutil works currently)
>>>>>>>>> ^ that's the step that's probably a
bit expensive for a general-use
>>>>>>>>> tool - it implies parsing all the DWARF to
find those references and
>>>>>>>>> rewrite them, I think. For a
high-performance solution that could be
>>>>>>>>> run by the linker I think it'd be
necessary to have a solution that
>>>>>>>>> doesn't involve parsing all the DIEs.
>>>>>>>> According to the current dsymutil processing,
>>>>>>>> exactly this process is not the most
time-consuming.
>>>>>>>> That could be done relatively fast.
>>>>>>> Fair enough - though I'd still imagine any
solution that involves
>>>>>>> parsing all the DIEs still wouldn't be fast
enough (maybe an order of
>>>>>>> magnitude faster than the current solution even -
but that's stuill,
>>>>>>> what, 6 or 7x slower than linking without the
feature?) for most users
>>>>>>> to consider it a good trade-off.
>>>>>> It seems to me that even the current 6x-7x slowdown
could be useful.
>>>>>> Users who already use dsymutil or llvm-dwp(assuming
DWARFLinker
>>>>>> would be taught to work with a split dwarf) tools spend
this time and,
>>>>>> in some scenarios, waste disk space by inter-mediate
files.
>>>>> FWIW, dwp (llvm-dwp hasn't really been optimized
compared to binutils
>>>>> dwp) is designed to be very quick - by not needing to do a
lot of
>>>>> parsing/fixups. Which, yes, means larger output files than
would be
>>>>> possible with more parsing/etc. It also doesn't take
any input from
>>>>> the linker (so it can run in parallel with the linker) - so
it can't
>>>>> remove dead subprograms. Given Google's the major
(perhaps only
>>>>> significant?) user of Split DWARF - I can say that the
needs don't
>>>>> necessarily overlap well with something that would take
significantly
>>>>> longer to run or use significantly more memory.
Faster/cheaper/with
>>>>> somewhat bigger output files is probably the right tradeoff
for
>>>>> Google's use case, at least.
>>>>>
>>>>> I imagine Apple's use for dsymutil is somewhat similar
- it's not used
>>>>> in the iterative development cycle, only in final releases
- well,
>>>>> maybe their situation is more "neutral" - not a
major pain point in
>>>>> any case I'd guess.
>>>>>
>>>>>
>>>> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from
lld:
>>>>
>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
>>>>      generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
>>> FWIW, llvm-dwp is not very well optimized (which is to say: it is
not
>>> optimized), binutils dwp might be a better comparison (& even
that
>>> doesn't have the parallelism & some potential further
memory savings
>>> that lld has that we could take advantage of in a dwp-like tool)
>>>
>>> What build mode was the clang binary built in? Optimized or
unoptimized?
>> right, that is unoptimized build with -ffunction-sections.
>>
>>>> 2. DWARFLinker from lld = linking time for clang 72 sec,
clang=760M.
>>> It does seem a tad strange that the clang binary would be smaller
>>> non-split with DWARF linking than it was split. Though I could
imagine
>>> this might be possible in an optimized build (wehre debug_ranges
>>> become quite relatively expensive in the .o file contribution with
>>> Split DWARF)
>>> Could you compare the section sizes between these two clang
binaries, perhaps?
>> .debug_ranges is three times bigger and .debug_line is twice bigger.
>>
>>>>>> Thus if they would use this LLD feature in its current
state
>>>>>> - they would still receive benefits.
>>>>>>
>>>>>> Speaking of performance results - LLD is a multi-thread
linker;
>>>>>> it handles sections in parallel. DWARFLinker generates
DWARF using
>>>>>> AsmPrinter which is a stream - so it could make
resulting DWARF only
>>>>>> continuously. It is not surprising that the parallel
solution works faster.
>>>>>> Making DWARFLinker truly multi-threaded would probably
allow us
>>>>>> to make slowdown to be at 2x-4x range.
>>>>> *nod* that's still a really expensive link - but I
understand that's a
>>>>> suitable tradeoff for your users
>>>>>
>>>> Btw, 2x or 7x is for pure linking time. Overall compilation
slowdown
>>>> is not so significant. Building LLVM codebase has only 20%
slowdown.
>>> Understood - that's still quite significant to most users,
I'd imagine.
>> I see.
>>
>>>>>>>> Anyway, I think the dsymutil approach is still
valuable, and it
>>>>>>>> would be useful to optimize it.
>>>>>>>> Do you think it would be useful to make
dsymutil/DWARFLinker truly multi-thread?
>>>>>>>> (To make dsymutil/DWARFLinker able to process
each object file in a separate thread)
>>>>>>> Perhaps - that I'd probably leave up to the
folks who are more
>>>>>>> invested in dsymutil (Adrian Prantl et al). Maybe
one day we'll get it
>>>>>>> integrated into llvm-dwp and then I'll be
interested in getting as
>>>>>>> much performance out of it as lld - so
multithreading and things would
>>>>>>> be on the books.
>>>>>> I think improving dsymutil is a valuable thing.
>>>>>> Though there are several directions which might be
considered
>>>>>> to make it more robust:
>>>>>>
>>>>>> 1. support of latest DWARF - DWARF5/DWARF64...
>>>>> I expect/though some of the Apple folks had already worked
on DWARF5 support?
>>>>> DWARF64 - that's been around for a while, and just
hasn't been needed
>>>>> by LLVM users thus far, it seems (until recently - where
some
>>>>> developers have started working on that)
>>>> There already implemented debug_names table, but
debug_rnglists,
>>>> debug_loclists, type units - are not implemented yet.
>>> Superficially, type units wouldn't be on the list of features
(like
>>> DWARF64 - it's optional) I'd try to support in dsymutil -
since their
>>> size overhead is more justified for a DWARF-agnostic linker
that's
>>> using comdat groups. With a DWARF-aware linker I'd be
specifically
>>> hoping to avoid using type units to help
>>>> The thing which
>>>> should probably be changed is that dsymutil should not have its
version
>>>> of code generating DWARF tables. It should call already existed
>>>> DWARF5/DWARF64 implementations. Then dsymutil would always
>>>> use last DWARF generators.
>>> Possibly - I don't know what the architectural tradeoffs for
that look
>>> like - I'd imagine DWARFLinker has sufficiently different
>>> needs/tradeoffs than LLVM's DWARF generation code (rewriting
existing
>>> DIEs compared to building new ones from scratch, etc) that it might
be
>>> hard for them to share a lot of their implementation.
>> It is not easy, and would require some additions, but it would benefit
>> in that all format implementation is in one place. Thus changing that
place
>> would reflect in other places. There are at least three implementations
for
>> .debug_ranges, .debug_aranges currently...
>>
>>
>>>>>> 2. implement multi-threaded execution.
>>>>>> 3. support of split DWARF.
>>>>> Maybe, though I'm still not sure it'd be the right
tradeoff -
>>>>> especially if it involved having to wait to run the .dwo
merger (call
>>>>> it DWARF-aware dwp, or dsymutil with dwp support) until
after the
>>>>> linker ran.
>>>>>
>>>>>> 4. implement dsymutil for non-darwin platform.
>>>>> That's probably, essentially (3), more-or-less. Split
DWARF is
>>>>> somewhat of a formalization of Apple's/MachO DWARF
distribution model
>>>>> (leave DWARF it in files that aren't linked/use them
from a debugger,
>>>>> but also be able to merge them into some final file (dsym
or dwp) for
>>>>> archival purposes)
>>>>>
>>>>>> All of this is a massive piece of work.
>>>>>> Our original investment was to solve two problems:
>>>>>>
>>>>>> 1. Overlapped address ranges, which is currently close
to being solved. Thank you for helping with that!
>>>>> Yeah, again, sorry that's taken quite so long/somewhat
circuitous route.
>>>>>
>>>>>> 2. Size of debug info. That still becomes an issue, but
we are unsure whether we are ready to
>>>>>>     invest in solving all the above 1-4 problems and
how much community interested in it.
>>>>> Fair, for sure - I don't think you'd need to sign
up to solve all of
>>>>> them (don't think they necessarily need solving).
Potentially moving
>>>>> the logic out into a separate tool as Fangrui's
considering - a
>>>>> post-link DWARF optimizer, rather than in-linker DWARF
optimization.
>>>>>
>>>>> I really don't want to give you the runaround like this
- but multiple
>>>>> times slower links is something that seems pretty
problematic for most
>>>>> users, to the point of weighing the maintainability of lld
against the
>>>>> convenience of having this functionality in-linker rather
than in a
>>>>> post-link optimizer.
>>>>>
>>>>> (I know you've spoken a bit before about your users
needs - but if
>>>>> it's possible, could you explain (again :/) why they
have such a
>>>>> strong need for smaller DWARF? While DWARF size is an
ongoing concern
>>>>> for many users (Google certainly - hence the invention of
Split DWARF,
>>>>> use of type units and compressed DWARF, etc) - usually
it's in rather
>>>>> large programs, but it sounds like you're dealing with
relatively
>>>>> small ones (otherwise the increase in link time, I'd
imagine, would be
>>>>> prohibitive for your users?)?
>>>> We have many large programs and keep Dayly/Nightly debug
builds,
>>>> which takes a lot of disk space. Compilation time for these
programs is big.
>>>> The scenario is "compile once".(not
compile-debug-compile-debug).
>>>> So we think that solution(like dsymutil/DWARFLinker) would not
slowdown
>>>> the compilation time of overall build significantly(see above
numbers for
>>>> llvm codebase) and would allow us to reduce disk space required
to keep
>>>> all of these builds.
>>> Ah, OK - for archival purposes. So the interactive developers
wouldn't
>>> necessarily be using this feature. Makes sense - similar to
dsymutil
>>> and dwp, mostly used for archival purposes & you can debug
straight
>> >from .o/.dwos for interactive/iterative development.
>>
>>> In that case, it seems more likely that a separate tool might
suffice.
>> agreed: if to continue the work on this then it makes sense to
>> do it as separate tool. Make it fast enough. And if there would be
interest
>> in it - then it would probably be possible to return to idea calling it
from linker.
>>
>>> Also, out of curiosity - have you tried just compressing the output
>>> (-gz (I think that does the right thing for the linker level
>>> compression too, otherwise -Wl,-compress-debug-sections might do
it))
>>> or are you already doing that in addition?
>> sure. we use  -Wl,-compress-debug-sections.
>>
>> Thank you, Alexey.
>>
>>>>> You mentioned that the usability cost of
>>>>> Split DWARF for your users was too high (or high enough to
justify
>>>>> this alternative work of DWARF-aware linking)? That all
seems a bit
>>>>> surprising to me - though I understand the deployment
issues of Split
>>>>> DWARF do present some challenges to users in more
heterogenous
>>>>> environments than Google's... still, I'd have
thought there was some
>>>>> hope there)
>>>> Our tools does not support split dwarf yet. Though we plan to
implement it.
>>>> When we would have support of split dwarf then it would be
>>>> convenient to have easy way to share built debug binaries.
llvm-dwp is the
>>>> answer to this. DWARFLinker could probably be another answer.
>>> Ah, fair enough - thanks for the context!
>>>>>>> One way to do that would be to have a CU-local type
indirection table.
>>>>>>> DIEs reference local type numbers (like local
address/string numbers -
>>>>>>> addrx/strx/rnglistx) and that table contains either
sig8 (no linker
>>>>>>> fixups required) or the local type offsets you
describe - the linker
>>>>>>> would then only need to read this type number
indirection table and
>>>>>>> rewrite them to the final type numbers.
>>>>>> Yes, that could be additionally done if this process
would be time-consuming.
>>>>>>
>>>>>> David, thank you for all your comments and
explanations. They are extremely helpful.
>>>>> Sure thing - really appreciate your patience with all this
- it's... a
>>>>> lot of moving parts.
>>>>> - Dave
>>>>> Thank you, Alexey.
>>>>>
>>>>>> sig8 hash-id would be used to compare types and to
deduplicate them.
>>>>>> It would speed up the current dsymutil context
analysis.
>>>>>> Types having the same hash-id could be deduplicated.
>>>>>> This would allow deduplicating a more number of types
than current dsymutil.
>>>>>> Incomplete type definitions having a similar set of
members are not deduplicated by dsymutil currently.
>>>>>> In this case they would have the same hash-id.
>>>>>>
>>>>>> This "type table" would take less space than
current "type units" and current ODR solution.
>>>>>>
>>>>>> Above is just an idea on how to help DWARF-aware
linker(based on idea removing obsolete debug info)
>>>>>> to work faster(if that is interesting).
>>>>>>
>>>>>> Alexey.
>>>>>>
>>>>>>> From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of James Henderson via llvm-dev
>>>>>>> Sent: Wednesday, June 3, 2020 3:48 AM
>>>>>>> To: David Blaikie <dblaikie at gmail.com>
>>>>>>> Cc: llvm-dev at lists.llvm.org
>>>>>>> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD]
Remove obsolete debug info in lld.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It makes me sad that the linker (via a library or
otherwise) has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The patching to -1 (or equivalent) is probably a
good lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We've already seen from Alexey's
prototyping, and from our own experiences with the Sony proprietary linker
(which tried to rewrite .debug_line only) that deconstructing the DWARF so that
it can be more optimally reassembled at link time is slow going, and will
probably inevitably be however much effort is put into optimising it. For a
start, given the current standards, it's impossible to know how to
deconstruct it without having to parse vast amounts of DWARF, which is typically
going to mean a lot more parsing work than the linker would normally have to
deal with. Additionally, much of this parsing work is wasted effort, since it
seems unlikely in many links that large amounts of the DWARF will be redundant.
Having an option to opt-in doesn't help much there, since it just means the
logic exists without most people using it, due to it not being good enough, or
potentially they don't even know it exists.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I don't have particularly concrete suggestions
as to how to solve the structural problems with DWARF at this point. The only
thing that seems obvious to me is a more "blessed" approach to
fragmentation of sections, similar to what I tried with my prototype mentioned
earlier in the thread, although we'd need to figure out the previously
stated performance issues. Other ideas might tie into this, like somehow sharing
the various table headers a bit like CIEs in .eh_frame that could be merged by
the linker - each object could have separate table header sections, which are
referenced by the individual .debug_* blocks, which in turn are one per
function/data piece and easily discardable/merged by the linker.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Just some thoughts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2 Jun 2020 at 19:24, David Blaikie via
llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>> On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
>>>>>>> <alapshin at accesssoftek.com> wrote:
>>>>>>>> Hi David, please find my comments inside:
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> Broad question: Do you have any
specific motivation/users/etc in implementing this (if you can speak about it)?
>>>>>>>>>>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
>>>>>>>>>> There are two general requirements:
>>>>>>>>>> 1) Remove (or clean) invalid debug
info.
>>>>>>>>> Perhaps a simpler direct solution for your
immediate needs might be a much narrower,
>>>>>>>>> and more efficient linker-DWARF-awareness
feature:
>>>>>>>>>
>>>>>>>>> With DWARFv5, rnglists present an
opportunity for a DWARF linker to rewrite the ranges
>>>>>>>>> without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
>>>>>>>>> can be referenced either directly, or by
index. If all rnglists are referenced by index, then
>>>>>>>>> a linker could parse only the
debug_rnglists section and rewrite ranges to remove any
>>>>>>>>> address ranges that refer to optimized-out
code.
>>>>>>>>>
>>>>>>>>> This would only be correct for rnglists
that had no direct references to them (that only were
>>>>>>>>> referenced via the indexes) - but we could
either implement it with that assumption, or could
>>>>>>>>> add an LLVM extension attribute on the CU
that would say "I promise I only referenced rnglists
>>>>>>>>> via rnglistx forms/indexes). If this
DWARF-aware linking would have to read the CU DIE (not
>>>>>>>>> all the other DIEs) it /could/ also then
rewrite high/low_pc if the CU wasn't using ranges...
>>>>>>>>> but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
>>>>>>>>> so no need for that.
>>>>>>>>>
>>>>>>>>> Such a DWARF-aware rnglist linking could
also simplify rnglists, in cases where functions
>>>>>>>>> ended up being laid out next to each other,
the linker could coalesce their ranges together.
>>>>>>>>>
>>>>>>>>> I imagine this could be implemented with
very little overhead to linking, especially compared
>>>>>>>>> to the overhead of full DWARF-aware
linking.
>>>>>>>>>
>>>>>>>>> Though none of this fixes Split DWARF,
where the linker doesn't get a chance to see the
>>>>>>>>> addresses being used - but if you only
want/need the CU-level ranges to be correct, this
>>>>>>>>> might be a viable fix, and quite efficient.
>>>>>>>> Yes, we think about that alternative. This
would resolve our problem of invalid debug info
>>>>>>>> and would work much faster. Thus, if we would
not have good results for D74169 then we
>>>>>>>> will implement it. Do you think it could be
useful to have this solution in upstream?
>>>>>>> A pure rnglist rewriting - I think it'd be OK
to have in upstream -
>>>>>>> again, cost/benefit/etc would have to be weighed.
I'm not sure it
>>>>>>> would save enough space to be particularly valuable
beyond the
>>>>>>> correctness issue - and it doesn't completely
solve the correctness
>>>>>>> issue for zero-address usage or low-address usage
(because you could
>>>>>>> still have overlapping subprograms inside a CU - so
if you were
>>>>>>> symbolizing you could use the correct rnglist to
filter, but then go
>>>>>>> look inside the CU only to find two subprograms
that had that address
>>>>>>> & not know which one was the correct one an
which one was the
>>>>>>> discarded one).
>>>>>>>
>>>>>>> rnglist rewriting might be easy enough to prototype
- but depends what
>>>>>>> you want to spend your time on, I know this whole
issue has been a
>>>>>>> huge investment of your time already - but maybe
this recent
>>>>>>> revitalization of the conversation around having an
explicit value in
>>>>>>> the linker might be sufficient to address
everyone's needs... *fingers
>>>>>>> crossed*)
>>>>>>>
>>>>>>>
>>>>>>>>>> 2) Optimize the DWARF size.
>>>>>>>>
>>>>>>>>> Do your users care much about this? I
imagine if they had significant DWARF size issues,
>>>>>>>>> they'd have significant link time
issues and the kind of cost to link time this feature has would
>>>>>>>>> be prohibitive - but perhaps they're
sharing linked binaries much more often than they're
>>>>>>>>> actually performing linking.
>>>>>>>> Yes, they do. They also have significant
link-time issues.
>>>>>>>> So current performance results of D74169 are
not very acceptable.
>>>>>>>> We hope to improve it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> The specifics which our users have:
>>>>>>>>>>   - embedded platform which uses 0 as
start of .text section.
>>>>>>>>>>   - custom toolset which does not
support all features yet(f.e. split dwarf).
>>>>>>>>>>   - tolerant of the link-time increase.
>>>>>>>>>>   - need a useful way to share debug
builds.
>>>>>>>>
>>>>>>>>> Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
>>>>>>>> Probably not significantly, but yes, it looks
less useful comparing to D74169.
>>>>>>>> Having only two files (executable and .dwp)
looks significantly better than having executable and multiple .dwo files.
>>>>>>>> Having only one file(executable) with minimal
size looks better than the two files with a bigger size.
>>>>>>>>
>>>>>>>> clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
>>>>>>>> clang compiled with -gc-debuginfo takes only
0.76G for single executable.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
>>>>>>>>>> We use custom solution, but the general
solution like D74169 would be better here.
>>>>>>>>
>>>>>>>>> If CU ranges are the only ones that need
fixing, then I think the above solution might be as
>>>>>>>>> good/better - if more than CU ranges need
fixing, then I think we might want to start talking about
>>>>>>>>> how to fix DWARF itself (split and
non-split) to signal certain addresses point to dead code with a
>>>>>>>>> specific blessed value that linkers would
need to implement - because with Split DWARF there's
>>>>>>>>> no way to solve the non-CU addresses at the
linker.
>>>>>>>> I think the worthful solution for that signal
value would be LowPC > HighPC.
>>>>>>>> That does not require additional bits in DWARF.
>>>>>>>> It would be natural to skip such address ranges
since they explicitly marked as invalid.
>>>>>>>> It could be implemented in a linker very
easily. Probably, it would make sense to describe that
>>>>>>>> usage in DWARF standard.
>>>>>>>>
>>>>>>>> As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
>>>>>>>> they need to have another solution. Could you
show an example of such a case, please?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> 2. Support of type units.
>>>>>>>>>>>>   That could be implemented
further.
>>>>>>>>>>> Enabling type units increases
object size to make it easier to deduplicate at link time by a DWARF-unaware
>>>>>>>>>>> linker. With a DWARF aware linker
it'd be generally desirable not to have to add that object size overhead to
>>>>>>>>>>> get the linking improvements.
>>>>>>>>>> But, DWARFLinker should adequately work
with type units since they are already implemented.
>>>>>>>>
>>>>>>>>> Maybe - it'd be nice & all, but I
don't think it's an outright necessity - if someone knows they're
using
>>>>>>>>> a DWARF-aware linker, they'd probably
not use type units in their object files. It's possible someone
>>>>>>>>> doesn't know for sure & maybe they
have pre-canned debug object files from someone else, etc.
>>>>>>>> I see.
>>>>>>>>
>>>>>>>>>> Another thing is that the idea behind
type units has the potential to help Dwarf-aware linker to work faster.
>>>>>>>>>> Currently, DWARFLinker analyzes context
to understand whether types are the same or not.
>>>>>>>>
>>>>>>>>> When you say "analyzes context"
what do you mean? Usually I'd take that to mean
>>>>>>>>> "looks at things outside the type
itself - like what namespace it's in, etc" - which, yes,
>>>>>>>>> it should do that, but it doesn't seem
very expensive to do. But I guess you actually
>>>>>>>>> mean something about doing structural
equivalence in some way, looking at things inside the type?
>>>>>>>> I think it could be useful for both cases.
Currently, dsymutil does only first thing
>>>>>>>> (look at type name, namespace name, etc..) and
does not do the second thing
>>>>>>>> (doing structural equivalence). Analyzing type
names is currently quite expensive
>>>>>>>> (the only search in string pool takes ~10 sec
from 70 sec of overall time).
>>>>>>>> That is expensive because of many things should
be done to work with strings:
>>>>>>>> parse DWARF, search and resolve relocations,
compute a hash for strings,
>>>>>>>> put data into a string pool, create a fully
qualified name(like namespace::function::name).
>>>>>>>> It looks like it could be optimized and finally
require less time, but it still would be a noticeable
>>>>>>>> part of the overall time.
>>>>>>>>
>>>>>>>> If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
>>>>>>>> So, If instead of comparing types structure,
there would be checked single hash-id - then this process
>>>>>>>> would also be faster.
>>>>>>>>
>>>>>>>> Thus I think using hash-id to compare types
would allow to make current implementation faster and would
>>>>>>>> allow handling incomplete types by DWARFLinker
without massive performance degradation also.
>>>>>>>>
>>>>>>>>>> But the context is known when types are
generated. So, no need to spent the time analyzing it.
>>>>>>>>>> If types could be compared without
analyzing context, then Dwarf-aware linker would work faster.
>>>>>>>>>> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
>>>>>>>>>> (instead of COMDAT section group) and
could be accessed through hash-id(like type units
>>>>>>>>>> - then it would be the solution
requiring fewer bits to store but allowing to compare types
>>>>>>>>>> by hash-id(not analysing context).
>>>>>>>>>> In this case, size increasing would be
small. And processing time could be done faster.
>>>>>>>>>>
>>>>>>>>>> this is just an idea and could be
discussed separately from the problem of integrating of D74169.
>>>>>>>>>>>> 6. -flto=thin
>>>>>>>>>>>>     That problem was described
in this review https://reviews.llvm.org/D54747#1503720. It also exists in
>>>>>>>>>>>> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
>>>>>>>>>>>> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
>>>>>>>>>>>> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>>>>>>>>>>>> more to reduce that redundancy
early on (actually removing definitions from some llvm Modules if the type
>>>>>>>>>>>> definition is known to exist in
another Module, etc)
>>>>>>>>>>> I don't know if it's a
problem since that patch was reverted.
>>>>>>>>>> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
>>>>>>>>>> if D74169 would be applied and
--gc-debuginfo used then structure type
>>>>>>>>>> definition would be removed.
>>>>>>>>>> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
>>>>>>>>>> definition is known to exist in another
Module".
>>>>>>>>>> i.e. DWARFLinker could replace the
declaration with the definition.
>>>>>>>>>> But that problem could be more easily
resolved when debug info is generated(probably without
>>>>>>>>>> significant increase of debug info
size):
>>>>>>>>>> Here we have:
>>>>>>>>>> DW_TAG_compile_unit(0x0000000b) -
compile unit containing concrete instance for function "f".
>>>>>>>>>> DW_TAG_compile_unit(0x00000073) -
compile unit containing abstract instance root for function "f".
>>>>>>>>>> DW_TAG_compile_unit(0x000000c1) -
compile unit containing function "f" definition.
>>>>>>>>>> Code for function "f" was
deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
>>>>>>>>>> containing "f" definition
(since there is no corresponding code). But it has structure "Foo"
definition
>>>>>>>>>> DW_TAG_structure_type(0x0000011e)
referenced from DW_TAG_compile_unit(0x00000073)
>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
>>>>>>>>>> was removed by thinlto and replaced
with declaration.
>>>>>>>>>> Would it cost too much if type
definition would not be replaced with declaration for "abstract instance
root"?
>>>>>>>>>> The number of concrete instances is
bigger than number of abstract instance roots.
>>>>>>>>>> Probably, it would not be too costly to
leave definition in abstract instance root?
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Alternatively, Would it cost too much
if type definition would not be replaced with declaration when
>>>>>>>>>> declaration references type from not
used function? (lto could understand that concrete function is not used).
>>>>>>>>
>>>>>>>>> I don't follow this example - could you
provide a small concrete test case I could reproduce?
>>>>>>>> I would provide a test case if necessary. But
it looks like this issue is finally clear, and you already commented on that.
>>>>>>>>
>>>>>>>>> Oh, I guess this is happening perhaps
because ThinLTO can't know for sure that a standalone
>>>>>>>>> definition of 'f' won't be
needed - so it produces one in case one of the inlining opportunities
>>>>>>>>> doesn't end up inlining. Then it turns
out all calls got inlined, so the external definition wasn't needed.
>>>>>>>>> Oh, you're suggesting that these 3 CUs
got emitted into one object file during LTO, but that DWARFLinker
>>>>>>>>> drops a CU without any code in it - even
though... So far as I know, in LTO, LLVM directly references
>>>>>>>>> types across units if the CUs are all
emitted in the same object file. (and if they weren't in the same
>>>>>>>>> object file - then the abstract_origin
couldn't be pointing cross-CU).
>>>>>>>>> I guess some basic things to say:
>>>>>>>>> With ThinLTO, the concrete/standalone
function definition is emitted in case some call sites don't end up
>>>>>>>>> being inlined. So we know it'll be
emitted (but might not be needed by the actual linker)
>>>>>>>>> ANy number of inline calls might exist -
but we shouldn't put the type information into those, because
>>>>>>>>> they aren't guaranteed to emit it (if
the inline function gets optimized away, there would be nothing to
>>>>>>>>> enforce the type being emitted) - and even
if we forced the type information to be emitted into one
>>>>>>>>> object file that has an inline copy of the
function - there's no guarantee that object file will get linked in either.
>>>>>>>>> So, no, I don't think there's much
we can do to keep the size of object files down, while guaranteeing
>>>>>>>>> the type information will be emitted with
the usual linker semantics.
>>>>>>>> Then dsymutil/DWARFLinker could be changed to
handle that(though it would probably be not very efficient).
>>>>>>>> If thinlto would understand that function is
not used finally(and then must not contain referenced type definition),
>>>>>>>> then this situation could be handled more
effectively.
>>>>>>>>
>>>>>>>> Thank you, Alexey.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jul 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.