thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2020-Jul-31 11:01 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On 28.07.2020 19:28, David Blaikie wrote:> On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin <avl.lapshin at
gmail.com> wrote:
>>
>> On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:
>>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>> <alapshin at accesssoftek.com> wrote:
>>>>>>>>>>>> This idea goes in another
direction than fragmenting dwarf
>>>>>>>>>>>> using elf sections&tricks.
It seems to me that the cost of fragmenting is too high.
>>>>>>>>>>> I tend to agree - but I'm sort
of leaning towards trying to use object
>>>>>>>>>>> features as much as possible, then
implementing just enough custom
>>>>>>>>>>> handling in the linker to recoup
overhead, etc. (eg: add some kind of
>>>>>>>>>>> small header/brief description that
makes it easy for the linker to
>>>>>>>>>>> slice-and-dice - but hopefully a
domain-specific such header can be a
>>>>>>>>>>> bit more compact than the fully
general ELF form)
>>>>>>>>>> I think this indeed should be
implemented and evaluated.
>>>>>>>>>> So that various approaches could be
compared.
>>>>>>>>>>
>>>>>>>>>>>> It is not only the sizes of
structures describing fragments but also the complexity
>>>>>>>>>>>> of tools that should be taught
to work with fragmented DWARF.
>>>>>>>>>>>> (f.e. llvm-dwarfdump applied to
object file should be able to read fragmented DWARF,
>>>>>>>>>>>> but applied to linked
executable it should work with non-fragmented DWARF).
>>>>>>>>>>>> That idea is for the tool which
works the same way as dsymutil ODR.
>>>>>>>>>>>>
>>>>>>>>>>>> I will shortly describe the
idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to have only one
"type table" per object file(special section .debug_types_table).
>>>>>>>>>>>> This "type table"
would contain all types.
>>>>>>>>>>>> There could be a special type
of reference - type_offset - that offset points into the type table.
>>>>>>>>>>>> Basic types could always be
placed into the start of "type table" thus, offsets to basic types
>>>>>>>>>>>> most often would be 1 byte.
There also would be a special kind of reference - reference inside the type.
>>>>>>>>>>>> Type units sig8 system - would
not be used to reference types.
>>>>>>>>>>>>
>>>>>>>>>>>> Types deduplication is assumed
to be done, not by linker mechanism for COMDAT,
>>>>>>>>>>>> but by a tool like dsymutil.
This tool would create resulting .debug_types_table by putting there
>>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the type would be placed into the
>>>>>>>>>>>> resulting table. All references
pointing to the deleted copy would be corrected to point
>>>>>>>>>>>> to the single copy inside
"type table". (that is how dsymutil works currently)
>>>>>>>>>>> ^ that's the step that's
probably a bit expensive for a general-use
>>>>>>>>>>> tool - it implies parsing all the
DWARF to find those references and
>>>>>>>>>>> rewrite them, I think. For a
high-performance solution that could be
>>>>>>>>>>> run by the linker I think it'd
be necessary to have a solution that
>>>>>>>>>>> doesn't involve parsing all the
DIEs.
>>>>>>>>>> According to the current dsymutil
processing,
>>>>>>>>>> exactly this process is not the most
time-consuming.
>>>>>>>>>> That could be done relatively fast.
>>>>>>>>> Fair enough - though I'd still imagine
any solution that involves
>>>>>>>>> parsing all the DIEs still wouldn't be
fast enough (maybe an order of
>>>>>>>>> magnitude faster than the current solution
even - but that's stuill,
>>>>>>>>> what, 6 or 7x slower than linking without
the feature?) for most users
>>>>>>>>> to consider it a good trade-off.
>>>>>>>> It seems to me that even the current 6x-7x
slowdown could be useful.
>>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
>>>>>>>> would be taught to work with a split dwarf)
tools spend this time and,
>>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
>>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared to binutils
>>>>>>> dwp) is designed to be very quick - by not needing
to do a lot of
>>>>>>> parsing/fixups. Which, yes, means larger output
files than would be
>>>>>>> possible with more parsing/etc. It also doesn't
take any input from
>>>>>>> the linker (so it can run in parallel with the
linker) - so it can't
>>>>>>> remove dead subprograms. Given Google's the
major (perhaps only
>>>>>>> significant?) user of Split DWARF - I can say that
the needs don't
>>>>>>> necessarily overlap well with something that would
take significantly
>>>>>>> longer to run or use significantly more memory.
Faster/cheaper/with
>>>>>>> somewhat bigger output files is probably the right
tradeoff for
>>>>>>> Google's use case, at least.
>>>>>>>
>>>>>>> I imagine Apple's use for dsymutil is somewhat
similar - it's not used
>>>>>>> in the iterative development cycle, only in final
releases - well,
>>>>>>> maybe their situation is more "neutral" -
not a major pain point in
>>>>>>> any case I'd guess.
>>>>>>>
>>>>>>>
>>>>>> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker
from lld:
>>>>>>
>>>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
>>>>>>       generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
>>>>> FWIW, llvm-dwp is not very well optimized (which is to say:
it is not
>>>>> optimized), binutils dwp might be a better comparison
(& even that
>>>>> doesn't have the parallelism & some potential
further memory savings
>>>>> that lld has that we could take advantage of in a dwp-like
tool)
>>>>>
>>>>> What build mode was the clang binary built in? Optimized or
unoptimized?
>>>> right, that is unoptimized build with -ffunction-sections.
>>>>
>>>>>> 2. DWARFLinker from lld = linking time for clang 72
sec, clang=760M.
>>> And this is without Split DWARF? Without linker DWARF compression?
-
>>> that seems quite a bit surprising, that the deduplication of DWARF
>>> could fit into less space than the wasted/reclaimed space in ranges
(&
>>> line)?
>> that was without split dwarf, without linker compression.
>>
>>> Could you double check these numbers & provide a clearer
summary?
>> sure, I would re-check it.
>>
>>> Here's my attempt at numbers (all with
function-sections+gc-sections)...
>>>
>>> Split DWARF tests didn't seem meaningful - gc-debuginfo + split
DWARF
>>> seemed to drop all the debug info (except gdb_index) so wasn't
>>> working/comparison wasn't meaningful for Apples to Apples, but
>>> included it for comparing gc'd non-split to non-gc'd split
(disabled
>>> gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames) (which
turns
>>> on by default with Split DWARF because gdb needs it - but a bit of
an
>>> unfair comparison without turning on gnu-pubnames/gdb-index in
other
>>> build modes too, since it... /shouldn't/ be necessary) which
might've
>>> been a factor in the data you were looking at)
>> that might be the case. i.e. clang=997M for split dwarf(from my
previous
>> measurement) might include gnu-pubnames.
>>
>> would recheck it and if that is the case then it is a unfair
comparison.
>>
>>
>> My point was that "DWARFLinker from lld" takes less space
than singleton
>> split dwarf file+.dwp file.
>>
>> for -O0 uncompressed:
>>
>> - .dwp took 1.1G(if I built it correctly), singleton clang(from your
>> measurements) 566 MB
>>
>>      overall 1.6G.
> Oh, yeah, even if there are some measurement issues, linked executable
> + .dwp is going to be larger than a linked executable using non-split
> DWARF (in v5), since v5 uses all the same representations as non-split
> DWARF, and split DWARF adds the indirection overhead of a split file,
> etc.
>
> Even without DWARF linking, it's true that split DWARF has overhead
> (dwp+executable will be larger than executable non-split).
>
> But maybe we've ended up down a bit of a tangent in any case.
>
> Trying to bring this back to "should this be committed to lld"
seems
> valuable, and I'm not sure what the right criteria are for that.I think it would be useful to do "removing obsolete debug info"
in the linker. First thing is that it would be the fastest way(no need
to copy data/create temp files/built address map...) Second thing
is that it would be a good separation of concepts. All debug info
processing, currently done in the linker(gdb_index, upcoming
debug_names), could be moved into separate library processing
debug info. When gdb_index/debug_names should be built without
"removing of obsolete debug info" it would have the same
performance results as it currently has.

We decided to give the idea of "removing of obsolete debug info"
another try and are going to implement it as a separate utility
working with built binary. Making it to be multi-thread would
probably show better performance results and then it could
probably be considered as acceptable to use from the linker.

Alexey.
>
> Ray's the best person to weigh in on that. My 2c is that I think it
> probably is worthwhile, even just as an experiment, assuming it's not
> too intrusive to lld.
>
>> - The "DWARFLinker from lld" 820 MB(from your measurements).
>>
>>
>> So "DWARFLinker from lld" looks two times better.
>>
>>
>> Anyway, thank you for pointing me to possible mistake. I would recheck
>> it and update results.
>>
>>
>> Alexey.
>>
>>
>>> * -O0: (baseline, just using strip -g: 356 MB)
>>>     * compressed: 25% smaller with gc-debuginfo (481 MB / 641 MB)
(407
>>> MB split/non-gc)
>>>     * uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB
split/non-gc)
>>> * -O3: (baseline: 116 MB)
>>>     * compressed: 16% smaller (361 MB / 462 MB) (283 MB
split/non-gc)
>>>     * uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB
split/non-gc)
>>>
>>>
>>>
>>>
>>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>> <alapshin at accesssoftek.com> wrote:
>>>>>>>>>>>> This idea goes in another
direction than fragmenting dwarf
>>>>>>>>>>>> using elf sections&tricks.
It seems to me that the cost of fragmenting is too high.
>>>>>>>>>>> I tend to agree - but I'm sort
of leaning towards trying to use object
>>>>>>>>>>> features as much as possible, then
implementing just enough custom
>>>>>>>>>>> handling in the linker to recoup
overhead, etc. (eg: add some kind of
>>>>>>>>>>> small header/brief description that
makes it easy for the linker to
>>>>>>>>>>> slice-and-dice - but hopefully a
domain-specific such header can be a
>>>>>>>>>>> bit more compact than the fully
general ELF form)
>>>>>>>>>> I think this indeed should be
implemented and evaluated.
>>>>>>>>>> So that various approaches could be
compared.
>>>>>>>>>>
>>>>>>>>>>>> It is not only the sizes of
structures describing fragments but also the complexity
>>>>>>>>>>>> of tools that should be taught
to work with fragmented DWARF.
>>>>>>>>>>>> (f.e. llvm-dwarfdump applied to
object file should be able to read fragmented DWARF,
>>>>>>>>>>>> but applied to linked
executable it should work with non-fragmented DWARF).
>>>>>>>>>>>> That idea is for the tool which
works the same way as dsymutil ODR.
>>>>>>>>>>>>
>>>>>>>>>>>> I will shortly describe the
idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to have only one
"type table" per object file(special section .debug_types_table).
>>>>>>>>>>>> This "type table"
would contain all types.
>>>>>>>>>>>> There could be a special type
of reference - type_offset - that offset points into the type table.
>>>>>>>>>>>> Basic types could always be
placed into the start of "type table" thus, offsets to basic types
>>>>>>>>>>>> most often would be 1 byte.
There also would be a special kind of reference - reference inside the type.
>>>>>>>>>>>> Type units sig8 system - would
not be used to reference types.
>>>>>>>>>>>>
>>>>>>>>>>>> Types deduplication is assumed
to be done, not by linker mechanism for COMDAT,
>>>>>>>>>>>> but by a tool like dsymutil.
This tool would create resulting .debug_types_table by putting there
>>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the type would be placed into the
>>>>>>>>>>>> resulting table. All references
pointing to the deleted copy would be corrected to point
>>>>>>>>>>>> to the single copy inside
"type table". (that is how dsymutil works currently)
>>>>>>>>>>> ^ that's the step that's
probably a bit expensive for a general-use
>>>>>>>>>>> tool - it implies parsing all the
DWARF to find those references and
>>>>>>>>>>> rewrite them, I think. For a
high-performance solution that could be
>>>>>>>>>>> run by the linker I think it'd
be necessary to have a solution that
>>>>>>>>>>> doesn't involve parsing all the
DIEs.
>>>>>>>>>> According to the current dsymutil
processing,
>>>>>>>>>> exactly this process is not the most
time-consuming.
>>>>>>>>>> That could be done relatively fast.
>>>>>>>>> Fair enough - though I'd still imagine
any solution that involves
>>>>>>>>> parsing all the DIEs still wouldn't be
fast enough (maybe an order of
>>>>>>>>> magnitude faster than the current solution
even - but that's stuill,
>>>>>>>>> what, 6 or 7x slower than linking without
the feature?) for most users
>>>>>>>>> to consider it a good trade-off.
>>>>>>>> It seems to me that even the current 6x-7x
slowdown could be useful.
>>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
>>>>>>>> would be taught to work with a split dwarf)
tools spend this time and,
>>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
>>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared to binutils
>>>>>>> dwp) is designed to be very quick - by not needing
to do a lot of
>>>>>>> parsing/fixups. Which, yes, means larger output
files than would be
>>>>>>> possible with more parsing/etc. It also doesn't
take any input from
>>>>>>> the linker (so it can run in parallel with the
linker) - so it can't
>>>>>>> remove dead subprograms. Given Google's the
major (perhaps only
>>>>>>> significant?) user of Split DWARF - I can say that
the needs don't
>>>>>>> necessarily overlap well with something that would
take significantly
>>>>>>> longer to run or use significantly more memory.
Faster/cheaper/with
>>>>>>> somewhat bigger output files is probably the right
tradeoff for
>>>>>>> Google's use case, at least.
>>>>>>>
>>>>>>> I imagine Apple's use for dsymutil is somewhat
similar - it's not used
>>>>>>> in the iterative development cycle, only in final
releases - well,
>>>>>>> maybe their situation is more "neutral" -
not a major pain point in
>>>>>>> any case I'd guess.
>>>>>>>
>>>>>>>
>>>>>> I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker
from lld:
>>>>>>
>>>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
>>>>>>       generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
>>>>> FWIW, llvm-dwp is not very well optimized (which is to say:
it is not
>>>>> optimized), binutils dwp might be a better comparison
(& even that
>>>>> doesn't have the parallelism & some potential
further memory savings
>>>>> that lld has that we could take advantage of in a dwp-like
tool)
>>>>>
>>>>> What build mode was the clang binary built in? Optimized or
unoptimized?
>>>> right, that is unoptimized build with -ffunction-sections.
>>>>
>>>>>> 2. DWARFLinker from lld = linking time for clang 72
sec, clang=760M.
>>>>> It does seem a tad strange that the clang binary would be
smaller
>>>>> non-split with DWARF linking than it was split. Though I
could imagine
>>>>> this might be possible in an optimized build (wehre
debug_ranges
>>>>> become quite relatively expensive in the .o file
contribution with
>>>>> Split DWARF)
>>>>> Could you compare the section sizes between these two clang
binaries, perhaps?
>>>> .debug_ranges is three times bigger and .debug_line is twice
bigger.
>>>>
>>>>>>>> Thus if they would use this LLD feature in its
current state
>>>>>>>> - they would still receive benefits.
>>>>>>>>
>>>>>>>> Speaking of performance results - LLD is a
multi-thread linker;
>>>>>>>> it handles sections in parallel. DWARFLinker
generates DWARF using
>>>>>>>> AsmPrinter which is a stream - so it could make
resulting DWARF only
>>>>>>>> continuously. It is not surprising that the
parallel solution works faster.
>>>>>>>> Making DWARFLinker truly multi-threaded would
probably allow us
>>>>>>>> to make slowdown to be at 2x-4x range.
>>>>>>> *nod* that's still a really expensive link -
but I understand that's a
>>>>>>> suitable tradeoff for your users
>>>>>>>
>>>>>> Btw, 2x or 7x is for pure linking time. Overall
compilation slowdown
>>>>>> is not so significant. Building LLVM codebase has only
20% slowdown.
>>>>> Understood - that's still quite significant to most
users, I'd imagine.
>>>> I see.
>>>>
>>>>>>>>>> Anyway, I think the dsymutil approach
is still valuable, and it
>>>>>>>>>> would be useful to optimize it.
>>>>>>>>>> Do you think it would be useful to make
dsymutil/DWARFLinker truly multi-thread?
>>>>>>>>>> (To make dsymutil/DWARFLinker able to
process each object file in a separate thread)
>>>>>>>>> Perhaps - that I'd probably leave up to
the folks who are more
>>>>>>>>> invested in dsymutil (Adrian Prantl et al).
Maybe one day we'll get it
>>>>>>>>> integrated into llvm-dwp and then I'll
be interested in getting as
>>>>>>>>> much performance out of it as lld - so
multithreading and things would
>>>>>>>>> be on the books.
>>>>>>>> I think improving dsymutil is a valuable thing.
>>>>>>>> Though there are several directions which might
be considered
>>>>>>>> to make it more robust:
>>>>>>>>
>>>>>>>> 1. support of latest DWARF - DWARF5/DWARF64...
>>>>>>> I expect/though some of the Apple folks had already
worked on DWARF5 support?
>>>>>>> DWARF64 - that's been around for a while, and
just hasn't been needed
>>>>>>> by LLVM users thus far, it seems (until recently -
where some
>>>>>>> developers have started working on that)
>>>>>> There already implemented debug_names table, but
debug_rnglists,
>>>>>> debug_loclists, type units - are not implemented yet.
>>>>> Superficially, type units wouldn't be on the list of
features (like
>>>>> DWARF64 - it's optional) I'd try to support in
dsymutil - since their
>>>>> size overhead is more justified for a DWARF-agnostic linker
that's
>>>>> using comdat groups. With a DWARF-aware linker I'd be
specifically
>>>>> hoping to avoid using type units to help
>>>>>> The thing which
>>>>>> should probably be changed is that dsymutil should not
have its version
>>>>>> of code generating DWARF tables. It should call already
existed
>>>>>> DWARF5/DWARF64 implementations. Then dsymutil would
always
>>>>>> use last DWARF generators.
>>>>> Possibly - I don't know what the architectural
tradeoffs for that look
>>>>> like - I'd imagine DWARFLinker has sufficiently
different
>>>>> needs/tradeoffs than LLVM's DWARF generation code
(rewriting existing
>>>>> DIEs compared to building new ones from scratch, etc) that
it might be
>>>>> hard for them to share a lot of their implementation.
>>>> It is not easy, and would require some additions, but it would
benefit
>>>> in that all format implementation is in one place. Thus
changing that place
>>>> would reflect in other places. There are at least three
implementations for
>>>> .debug_ranges, .debug_aranges currently...
>>>>
>>>>
>>>>>>>> 2. implement multi-threaded execution.
>>>>>>>> 3. support of split DWARF.
>>>>>>> Maybe, though I'm still not sure it'd be
the right tradeoff -
>>>>>>> especially if it involved having to wait to run the
.dwo merger (call
>>>>>>> it DWARF-aware dwp, or dsymutil with dwp support)
until after the
>>>>>>> linker ran.
>>>>>>>
>>>>>>>> 4. implement dsymutil for non-darwin platform.
>>>>>>> That's probably, essentially (3), more-or-less.
Split DWARF is
>>>>>>> somewhat of a formalization of Apple's/MachO
DWARF distribution model
>>>>>>> (leave DWARF it in files that aren't linked/use
them from a debugger,
>>>>>>> but also be able to merge them into some final file
(dsym or dwp) for
>>>>>>> archival purposes)
>>>>>>>
>>>>>>>> All of this is a massive piece of work.
>>>>>>>> Our original investment was to solve two
problems:
>>>>>>>>
>>>>>>>> 1. Overlapped address ranges, which is
currently close to being solved. Thank you for helping with that!
>>>>>>> Yeah, again, sorry that's taken quite so
long/somewhat circuitous route.
>>>>>>>
>>>>>>>> 2. Size of debug info. That still becomes an
issue, but we are unsure whether we are ready to
>>>>>>>>      invest in solving all the above 1-4
problems and how much community interested in it.
>>>>>>> Fair, for sure - I don't think you'd need
to sign up to solve all of
>>>>>>> them (don't think they necessarily need
solving). Potentially moving
>>>>>>> the logic out into a separate tool as Fangrui's
considering - a
>>>>>>> post-link DWARF optimizer, rather than in-linker
DWARF optimization.
>>>>>>>
>>>>>>> I really don't want to give you the runaround
like this - but multiple
>>>>>>> times slower links is something that seems pretty
problematic for most
>>>>>>> users, to the point of weighing the maintainability
of lld against the
>>>>>>> convenience of having this functionality in-linker
rather than in a
>>>>>>> post-link optimizer.
>>>>>>>
>>>>>>> (I know you've spoken a bit before about your
users needs - but if
>>>>>>> it's possible, could you explain (again :/) why
they have such a
>>>>>>> strong need for smaller DWARF? While DWARF size is
an ongoing concern
>>>>>>> for many users (Google certainly - hence the
invention of Split DWARF,
>>>>>>> use of type units and compressed DWARF, etc) -
usually it's in rather
>>>>>>> large programs, but it sounds like you're
dealing with relatively
>>>>>>> small ones (otherwise the increase in link time,
I'd imagine, would be
>>>>>>> prohibitive for your users?)?
>>>>>> We have many large programs and keep Dayly/Nightly
debug builds,
>>>>>> which takes a lot of disk space. Compilation time for
these programs is big.
>>>>>> The scenario is "compile once".(not
compile-debug-compile-debug).
>>>>>> So we think that solution(like dsymutil/DWARFLinker)
would not slowdown
>>>>>> the compilation time of overall build significantly(see
above numbers for
>>>>>> llvm codebase) and would allow us to reduce disk space
required to keep
>>>>>> all of these builds.
>>>>> Ah, OK - for archival purposes. So the interactive
developers wouldn't
>>>>> necessarily be using this feature. Makes sense - similar to
dsymutil
>>>>> and dwp, mostly used for archival purposes & you can
debug straight
>>>> >from .o/.dwos for interactive/iterative development.
>>>>
>>>>> In that case, it seems more likely that a separate tool
might suffice.
>>>> agreed: if to continue the work on this then it makes sense to
>>>> do it as separate tool. Make it fast enough. And if there would
be interest
>>>> in it - then it would probably be possible to return to idea
calling it from linker.
>>>>
>>>>> Also, out of curiosity - have you tried just compressing
the output
>>>>> (-gz (I think that does the right thing for the linker
level
>>>>> compression too, otherwise -Wl,-compress-debug-sections
might do it))
>>>>> or are you already doing that in addition?
>>>> sure. we use  -Wl,-compress-debug-sections.
>>>>
>>>> Thank you, Alexey.
>>>>
>>>>>>> You mentioned that the usability cost of
>>>>>>> Split DWARF for your users was too high (or high
enough to justify
>>>>>>> this alternative work of DWARF-aware linking)? That
all seems a bit
>>>>>>> surprising to me - though I understand the
deployment issues of Split
>>>>>>> DWARF do present some challenges to users in more
heterogenous
>>>>>>> environments than Google's... still, I'd
have thought there was some
>>>>>>> hope there)
>>>>>> Our tools does not support split dwarf yet. Though we
plan to implement it.
>>>>>> When we would have support of split dwarf then it would
be
>>>>>> convenient to have easy way to share built debug
binaries. llvm-dwp is the
>>>>>> answer to this. DWARFLinker could probably be another
answer.
>>>>> Ah, fair enough - thanks for the context!
>>>>>>>>> One way to do that would be to have a
CU-local type indirection table.
>>>>>>>>> DIEs reference local type numbers (like
local address/string numbers -
>>>>>>>>> addrx/strx/rnglistx) and that table
contains either sig8 (no linker
>>>>>>>>> fixups required) or the local type offsets
you describe - the linker
>>>>>>>>> would then only need to read this type
number indirection table and
>>>>>>>>> rewrite them to the final type numbers.
>>>>>>>> Yes, that could be additionally done if this
process would be time-consuming.
>>>>>>>>
>>>>>>>> David, thank you for all your comments and
explanations. They are extremely helpful.
>>>>>>> Sure thing - really appreciate your patience with
all this - it's... a
>>>>>>> lot of moving parts.
>>>>>>> - Dave
>>>>>>> Thank you, Alexey.
>>>>>>>
>>>>>>>> sig8 hash-id would be used to compare types and
to deduplicate them.
>>>>>>>> It would speed up the current dsymutil context
analysis.
>>>>>>>> Types having the same hash-id could be
deduplicated.
>>>>>>>> This would allow deduplicating a more number of
types than current dsymutil.
>>>>>>>> Incomplete type definitions having a similar
set of members are not deduplicated by dsymutil currently.
>>>>>>>> In this case they would have the same hash-id.
>>>>>>>>
>>>>>>>> This "type table" would take less
space than current "type units" and current ODR solution.
>>>>>>>>
>>>>>>>> Above is just an idea on how to help
DWARF-aware linker(based on idea removing obsolete debug info)
>>>>>>>> to work faster(if that is interesting).
>>>>>>>>
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>>> From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of James Henderson via llvm-dev
>>>>>>>>> Sent: Wednesday, June 3, 2020 3:48 AM
>>>>>>>>> To: David Blaikie <dblaikie at
gmail.com>
>>>>>>>>> Cc: llvm-dev at lists.llvm.org
>>>>>>>>> Subject: Re: [llvm-dev]
[Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It makes me sad that the linker (via a
library or otherwise) has to be "DWARF-aware" to be able to
effectively handle --gc-sections, COMDATs, --icf etc for debug info, without
leaving large blocks of data kicking around.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The patching to -1 (or equivalent) is
probably a good lightweight solution (though I'd love it if it could be done
based on section type in the future rather than section name, but that's
probably outside the realm of DWARF), as it requires only minimal understanding
in the linker, but anything beyond that seems to be complicated logic that is
mostly due to the structure of DWARF. Patching to -1 does feel a bit like a
sticking plaster/band aid to patch over the issue rather than properly solving
it too - there will still be debug data (potentially significant amounts in
COMDAT-heavy objects) that the linker has to write and the debugger has to
somehow know how to skip (even if it knows that -1 is special-case due to the
standard being updated, it needs to get as far as the -1), which is all wasted
effort.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We've already seen from Alexey's
prototyping, and from our own experiences with the Sony proprietary linker
(which tried to rewrite .debug_line only) that deconstructing the DWARF so that
it can be more optimally reassembled at link time is slow going, and will
probably inevitably be however much effort is put into optimising it. For a
start, given the current standards, it's impossible to know how to
deconstruct it without having to parse vast amounts of DWARF, which is typically
going to mean a lot more parsing work than the linker would normally have to
deal with. Additionally, much of this parsing work is wasted effort, since it
seems unlikely in many links that large amounts of the DWARF will be redundant.
Having an option to opt-in doesn't help much there, since it just means the
logic exists without most people using it, due to it not being good enough, or
potentially they don't even know it exists.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't have particularly concrete
suggestions as to how to solve the structural problems with DWARF at this point.
The only thing that seems obvious to me is a more "blessed" approach
to fragmentation of sections, similar to what I tried with my prototype
mentioned earlier in the thread, although we'd need to figure out the
previously stated performance issues. Other ideas might tie into this, like
somehow sharing the various table headers a bit like CIEs in .eh_frame that
could be merged by the linker - each object could have separate table header
sections, which are referenced by the individual .debug_* blocks, which in turn
are one per function/data piece and easily discardable/merged by the linker.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just some thoughts.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 2 Jun 2020 at 19:24, David Blaikie
via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, May 19, 2020 at 7:17 AM Alexey
Lapshin
>>>>>>>>> <alapshin at accesssoftek.com> wrote:
>>>>>>>>>> Hi David, please find my comments
inside:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> Broad question: Do you have
any specific motivation/users/etc in implementing this (if you can speak about
it)?
>>>>>>>>>>>>> - it might help motivate
the work, understand what tradeoffs might be suitable for you/your users, etc.
>>>>>>>>>>>> There are two general
requirements:
>>>>>>>>>>>> 1) Remove (or clean) invalid
debug info.
>>>>>>>>>>> Perhaps a simpler direct solution
for your immediate needs might be a much narrower,
>>>>>>>>>>> and more efficient
linker-DWARF-awareness feature:
>>>>>>>>>>>
>>>>>>>>>>> With DWARFv5, rnglists present an
opportunity for a DWARF linker to rewrite the ranges
>>>>>>>>>>> without parsing the rest of the
DWARF. /technically/ this isn't guaranteed - rnglist entries
>>>>>>>>>>> can be referenced either directly,
or by index. If all rnglists are referenced by index, then
>>>>>>>>>>> a linker could parse only the
debug_rnglists section and rewrite ranges to remove any
>>>>>>>>>>> address ranges that refer to
optimized-out code.
>>>>>>>>>>>
>>>>>>>>>>> This would only be correct for
rnglists that had no direct references to them (that only were
>>>>>>>>>>> referenced via the indexes) - but
we could either implement it with that assumption, or could
>>>>>>>>>>> add an LLVM extension attribute on
the CU that would say "I promise I only referenced rnglists
>>>>>>>>>>> via rnglistx forms/indexes). If
this DWARF-aware linking would have to read the CU DIE (not
>>>>>>>>>>> all the other DIEs) it /could/ also
then rewrite high/low_pc if the CU wasn't using ranges...
>>>>>>>>>>> but that wouldn't come up in
the function-removal case, because then you'd have ranges anyway,
>>>>>>>>>>> so no need for that.
>>>>>>>>>>>
>>>>>>>>>>> Such a DWARF-aware rnglist linking
could also simplify rnglists, in cases where functions
>>>>>>>>>>> ended up being laid out next to
each other, the linker could coalesce their ranges together.
>>>>>>>>>>>
>>>>>>>>>>> I imagine this could be implemented
with very little overhead to linking, especially compared
>>>>>>>>>>> to the overhead of full DWARF-aware
linking.
>>>>>>>>>>>
>>>>>>>>>>> Though none of this fixes Split
DWARF, where the linker doesn't get a chance to see the
>>>>>>>>>>> addresses being used - but if you
only want/need the CU-level ranges to be correct, this
>>>>>>>>>>> might be a viable fix, and quite
efficient.
>>>>>>>>>> Yes, we think about that alternative.
This would resolve our problem of invalid debug info
>>>>>>>>>> and would work much faster. Thus, if we
would not have good results for D74169 then we
>>>>>>>>>> will implement it. Do you think it
could be useful to have this solution in upstream?
>>>>>>>>> A pure rnglist rewriting - I think it'd
be OK to have in upstream -
>>>>>>>>> again, cost/benefit/etc would have to be
weighed. I'm not sure it
>>>>>>>>> would save enough space to be particularly
valuable beyond the
>>>>>>>>> correctness issue - and it doesn't
completely solve the correctness
>>>>>>>>> issue for zero-address usage or low-address
usage (because you could
>>>>>>>>> still have overlapping subprograms inside a
CU - so if you were
>>>>>>>>> symbolizing you could use the correct
rnglist to filter, but then go
>>>>>>>>> look inside the CU only to find two
subprograms that had that address
>>>>>>>>> & not know which one was the correct
one an which one was the
>>>>>>>>> discarded one).
>>>>>>>>>
>>>>>>>>> rnglist rewriting might be easy enough to
prototype - but depends what
>>>>>>>>> you want to spend your time on, I know this
whole issue has been a
>>>>>>>>> huge investment of your time already - but
maybe this recent
>>>>>>>>> revitalization of the conversation around
having an explicit value in
>>>>>>>>> the linker might be sufficient to address
everyone's needs... *fingers
>>>>>>>>> crossed*)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> 2) Optimize the DWARF size.
>>>>>>>>>>> Do your users care much about this?
I imagine if they had significant DWARF size issues,
>>>>>>>>>>> they'd have significant link
time issues and the kind of cost to link time this feature has would
>>>>>>>>>>> be prohibitive - but perhaps
they're sharing linked binaries much more often than they're
>>>>>>>>>>> actually performing linking.
>>>>>>>>>> Yes, they do. They also have
significant link-time issues.
>>>>>>>>>> So current performance results of
D74169 are not very acceptable.
>>>>>>>>>> We hope to improve it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> The specifics which our users
have:
>>>>>>>>>>>>    - embedded platform which
uses 0 as start of .text section.
>>>>>>>>>>>>    - custom toolset which does
not support all features yet(f.e. split dwarf).
>>>>>>>>>>>>    - tolerant of the link-time
increase.
>>>>>>>>>>>>    - need a useful way to share
debug builds.
>>>>>>>>>>> Sharing two files (executable and
dwp) is significantly less useful than sharing one file?
>>>>>>>>>> Probably not significantly, but yes, it
looks less useful comparing to D74169.
>>>>>>>>>> Having only two files (executable and
.dwp) looks significantly better than having executable and multiple .dwo files.
>>>>>>>>>> Having only one file(executable) with
minimal size looks better than the two files with a bigger size.
>>>>>>>>>>
>>>>>>>>>> clang compiled with -gsplitdwarf takes
0.9G for executable and 0.9G for .dwp.
>>>>>>>>>> clang compiled with -gc-debuginfo takes
only 0.76G for single executable.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> For the first point: we have a
problem "Overlapping address ranges starting from 0"(D59553).
>>>>>>>>>>>> We use custom solution, but the
general solution like D74169 would be better here.
>>>>>>>>>>> If CU ranges are the only ones that
need fixing, then I think the above solution might be as
>>>>>>>>>>> good/better - if more than CU
ranges need fixing, then I think we might want to start talking about
>>>>>>>>>>> how to fix DWARF itself (split and
non-split) to signal certain addresses point to dead code with a
>>>>>>>>>>> specific blessed value that linkers
would need to implement - because with Split DWARF there's
>>>>>>>>>>> no way to solve the non-CU
addresses at the linker.
>>>>>>>>>> I think the worthful solution for that
signal value would be LowPC > HighPC.
>>>>>>>>>> That does not require additional bits
in DWARF.
>>>>>>>>>> It would be natural to skip such
address ranges since they explicitly marked as invalid.
>>>>>>>>>> It could be implemented in a linker
very easily. Probably, it would make sense to describe that
>>>>>>>>>> usage in DWARF standard.
>>>>>>>>>>
>>>>>>>>>> As to the addresses which are not seen
by the linker(since they are in .dwo files) - yes,
>>>>>>>>>> they need to have another solution.
Could you show an example of such a case, please?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> 2. Support of type units.
>>>>>>>>>>>>>>    That could be
implemented further.
>>>>>>>>>>>>> Enabling type units
increases object size to make it easier to deduplicate at link time by a
DWARF-unaware
>>>>>>>>>>>>> linker. With a DWARF aware
linker it'd be generally desirable not to have to add that object size
overhead to
>>>>>>>>>>>>> get the linking
improvements.
>>>>>>>>>>>> But, DWARFLinker should
adequately work with type units since they are already implemented.
>>>>>>>>>>> Maybe - it'd be nice & all,
but I don't think it's an outright necessity - if someone knows
they're using
>>>>>>>>>>> a DWARF-aware linker, they'd
probably not use type units in their object files. It's possible someone
>>>>>>>>>>> doesn't know for sure &
maybe they have pre-canned debug object files from someone else, etc.
>>>>>>>>>> I see.
>>>>>>>>>>
>>>>>>>>>>>> Another thing is that the idea
behind type units has the potential to help Dwarf-aware linker to work faster.
>>>>>>>>>>>> Currently, DWARFLinker analyzes
context to understand whether types are the same or not.
>>>>>>>>>>> When you say "analyzes
context" what do you mean? Usually I'd take that to mean
>>>>>>>>>>> "looks at things outside the
type itself - like what namespace it's in, etc" - which, yes,
>>>>>>>>>>> it should do that, but it
doesn't seem very expensive to do. But I guess you actually
>>>>>>>>>>> mean something about doing
structural equivalence in some way, looking at things inside the type?
>>>>>>>>>> I think it could be useful for both
cases. Currently, dsymutil does only first thing
>>>>>>>>>> (look at type name, namespace name,
etc..) and does not do the second thing
>>>>>>>>>> (doing structural equivalence).
Analyzing type names is currently quite expensive
>>>>>>>>>> (the only search in string pool takes
~10 sec from 70 sec of overall time).
>>>>>>>>>> That is expensive because of many
things should be done to work with strings:
>>>>>>>>>> parse DWARF, search and resolve
relocations, compute a hash for strings,
>>>>>>>>>> put data into a string pool, create a
fully qualified name(like namespace::function::name).
>>>>>>>>>> It looks like it could be optimized and
finally require less time, but it still would be a noticeable
>>>>>>>>>> part of the overall time.
>>>>>>>>>>
>>>>>>>>>> If dsymutil starts to check for the
structural equivalence, then the process would be even more slowly.
>>>>>>>>>> So, If instead of comparing types
structure, there would be checked single hash-id - then this process
>>>>>>>>>> would also be faster.
>>>>>>>>>>
>>>>>>>>>> Thus I think using hash-id to compare
types would allow to make current implementation faster and would
>>>>>>>>>> allow handling incomplete types by
DWARFLinker without massive performance degradation also.
>>>>>>>>>>
>>>>>>>>>>>> But the context is known when
types are generated. So, no need to spent the time analyzing it.
>>>>>>>>>>>> If types could be compared
without analyzing context, then Dwarf-aware linker would work faster.
>>>>>>>>>>>> That is just an idea(not for
immediate implementation): If types would be stored in some "type
table"
>>>>>>>>>>>> (instead of COMDAT section
group) and could be accessed through hash-id(like type units
>>>>>>>>>>>> - then it would be the solution
requiring fewer bits to store but allowing to compare types
>>>>>>>>>>>> by hash-id(not analysing
context).
>>>>>>>>>>>> In this case, size increasing
would be small. And processing time could be done faster.
>>>>>>>>>>>>
>>>>>>>>>>>> this is just an idea and could
be discussed separately from the problem of integrating of D74169.
>>>>>>>>>>>>>> 6. -flto=thin
>>>>>>>>>>>>>>      That problem was
described in this review https://reviews.llvm.org/D54747#1503720. It also exists
in
>>>>>>>>>>>>>> current
DWARFLinker/dsymutil implementation. I think that problem should be discussed
more: it could
>>>>>>>>>>>>>> probably be fixed by
avoiding generation of such incomplete declaration during thinlto,
>>>>>>>>>>>>>> That would be costly to
produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>>>>>>>>>>>>>> more to reduce that
redundancy early on (actually removing definitions from some llvm Modules if the
type
>>>>>>>>>>>>>> definition is known to
exist in another Module, etc)
>>>>>>>>>>>>> I don't know if
it's a problem since that patch was reverted.
>>>>>>>>>>>> Yes. That patch was reverted,
but this patch(D74169) has the same problem.
>>>>>>>>>>>> if D74169 would be applied and
--gc-debuginfo used then structure type
>>>>>>>>>>>> definition would be removed.
>>>>>>>>>>>> DWARFLinker could handle that
case - "removing definitions from some llvm Modules if the type
>>>>>>>>>>>> definition is known to exist in
another Module".
>>>>>>>>>>>> i.e. DWARFLinker could replace
the declaration with the definition.
>>>>>>>>>>>> But that problem could be more
easily resolved when debug info is generated(probably without
>>>>>>>>>>>> significant increase of debug
info size):
>>>>>>>>>>>> Here we have:
>>>>>>>>>>>> DW_TAG_compile_unit(0x0000000b)
- compile unit containing concrete instance for function "f".
>>>>>>>>>>>> DW_TAG_compile_unit(0x00000073)
- compile unit containing abstract instance root for function "f".
>>>>>>>>>>>> DW_TAG_compile_unit(0x000000c1)
- compile unit containing function "f" definition.
>>>>>>>>>>>> Code for function "f"
was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
>>>>>>>>>>>> containing "f"
definition (since there is no corresponding code). But it has structure
"Foo" definition
>>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
>>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
>>>>>>>>>>>> was removed by thinlto and
replaced with declaration.
>>>>>>>>>>>> Would it cost too much if type
definition would not be replaced with declaration for "abstract instance
root"?
>>>>>>>>>>>> The number of concrete
instances is bigger than number of abstract instance roots.
>>>>>>>>>>>> Probably, it would not be too
costly to leave definition in abstract instance root?
>>>>>>>>>>
>>>>>>>>>>>> Alternatively, Would it cost
too much if type definition would not be replaced with declaration when
>>>>>>>>>>>> declaration references type
from not used function? (lto could understand that concrete function is not
used).
>>>>>>>>>>> I don't follow this example -
could you provide a small concrete test case I could reproduce?
>>>>>>>>>> I would provide a test case if
necessary. But it looks like this issue is finally clear, and you already
commented on that.
>>>>>>>>>>
>>>>>>>>>>> Oh, I guess this is happening
perhaps because ThinLTO can't know for sure that a standalone
>>>>>>>>>>> definition of 'f' won't
be needed - so it produces one in case one of the inlining opportunities
>>>>>>>>>>> doesn't end up inlining. Then
it turns out all calls got inlined, so the external definition wasn't
needed.
>>>>>>>>>>> Oh, you're suggesting that
these 3 CUs got emitted into one object file during LTO, but that DWARFLinker
>>>>>>>>>>> drops a CU without any code in it -
even though... So far as I know, in LTO, LLVM directly references
>>>>>>>>>>> types across units if the CUs are
all emitted in the same object file. (and if they weren't in the same
>>>>>>>>>>> object file - then the
abstract_origin couldn't be pointing cross-CU).
>>>>>>>>>>> I guess some basic things to say:
>>>>>>>>>>> With ThinLTO, the
concrete/standalone function definition is emitted in case some call sites
don't end up
>>>>>>>>>>> being inlined. So we know it'll
be emitted (but might not be needed by the actual linker)
>>>>>>>>>>> ANy number of inline calls might
exist - but we shouldn't put the type information into those, because
>>>>>>>>>>> they aren't guaranteed to emit
it (if the inline function gets optimized away, there would be nothing to
>>>>>>>>>>> enforce the type being emitted) -
and even if we forced the type information to be emitted into one
>>>>>>>>>>> object file that has an inline copy
of the function - there's no guarantee that object file will get linked in
either.
>>>>>>>>>>> So, no, I don't think
there's much we can do to keep the size of object files down, while
guaranteeing
>>>>>>>>>>> the type information will be
emitted with the usual linker semantics.
>>>>>>>>>> Then dsymutil/DWARFLinker could be
changed to handle that(though it would probably be not very efficient).
>>>>>>>>>> If thinlto would understand that
function is not used finally(and then must not contain referenced type
definition),
>>>>>>>>>> then this situation could be handled
more effectively.
>>>>>>>>>>
>>>>>>>>>> Thank you, Alexey.
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
_______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Eric Christopher via llvm-dev

2020-Jul-31 19:02 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi Alexey,

On Fri, Jul 31, 2020 at 4:02 AM Alexey Lapshin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 28.07.2020 19:28, David Blaikie wrote:
> > On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin <avl.lapshin at
gmail.com>
> wrote:
> >>
> >> On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:
> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
> >>> <alapshin at accesssoftek.com> wrote:
> >>>>>>>>>>>> This idea goes in another
direction than fragmenting dwarf
> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of
> fragmenting is too high.
> >>>>>>>>>>> I tend to agree - but I'm
sort of leaning towards trying to
> use object
> >>>>>>>>>>> features as much as possible,
then implementing just enough
> custom
> >>>>>>>>>>> handling in the linker to
recoup overhead, etc. (eg: add some
> kind of
> >>>>>>>>>>> small header/brief description
that makes it easy for the
> linker to
> >>>>>>>>>>> slice-and-dice - but hopefully
a domain-specific such header
> can be a
> >>>>>>>>>>> bit more compact than the
fully general ELF form)
> >>>>>>>>>> I think this indeed should be
implemented and evaluated.
> >>>>>>>>>> So that various approaches could
be compared.
> >>>>>>>>>>
> >>>>>>>>>>>> It is not only the sizes
of structures describing fragments
> but also the complexity
> >>>>>>>>>>>> of tools that should be
taught to work with fragmented DWARF.
> >>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be able to
> read fragmented DWARF,
> >>>>>>>>>>>> but applied to linked
executable it should work with
> non-fragmented DWARF).
> >>>>>>>>>>>> That idea is for the tool
which works the same way as
> dsymutil ODR.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will shortly describe
the idea of making DWARF be easier
> processed by dsymutil/DWARFLinker:
> >>>>>>>>>>>>
> >>>>>>>>>>>> The idea is to have only
one "type table" per object
> file(special section .debug_types_table).
> >>>>>>>>>>>> This "type
table" would contain all types.
> >>>>>>>>>>>> There could be a special
type of reference - type_offset -
> that offset points into the type table.
> >>>>>>>>>>>> Basic types could always
be placed into the start of "type
> table" thus, offsets to basic types
> >>>>>>>>>>>> most often would be 1
byte. There also would be a special
> kind of reference - reference inside the type.
> >>>>>>>>>>>> Type units sig8 system -
would not be used to reference types.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Types deduplication is
assumed to be done, not by linker
> mechanism for COMDAT,
> >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create resulting
> .debug_types_table by putting there
> >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the
> type would be placed into the
> >>>>>>>>>>>> resulting table. All
references pointing to the deleted copy
> would be corrected to point
> >>>>>>>>>>>> to the single copy inside
"type table". (that is how dsymutil
> works currently)
> >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for a
> general-use
> >>>>>>>>>>> tool - it implies parsing all
the DWARF to find those
> references and
> >>>>>>>>>>> rewrite them, I think. For a
high-performance solution that
> could be
> >>>>>>>>>>> run by the linker I think
it'd be necessary to have a solution
> that
> >>>>>>>>>>> doesn't involve parsing
all the DIEs.
> >>>>>>>>>> According to the current dsymutil
processing,
> >>>>>>>>>> exactly this process is not the
most time-consuming.
> >>>>>>>>>> That could be done relatively
fast.
> >>>>>>>>> Fair enough - though I'd still
imagine any solution that involves
> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an
> order of
> >>>>>>>>> magnitude faster than the current
solution even - but that's
> stuill,
> >>>>>>>>> what, 6 or 7x slower than linking
without the feature?) for most
> users
> >>>>>>>>> to consider it a good trade-off.
> >>>>>>>> It seems to me that even the current 6x-7x
slowdown could be
> useful.
> >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
> >>>>>>>> would be taught to work with a split
dwarf) tools spend this time
> and,
> >>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
> >>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared to
> binutils
> >>>>>>> dwp) is designed to be very quick - by not
needing to do a lot of
> >>>>>>> parsing/fixups. Which, yes, means larger
output files than would be
> >>>>>>> possible with more parsing/etc. It also
doesn't take any input from
> >>>>>>> the linker (so it can run in parallel with the
linker) - so it
> can't
> >>>>>>> remove dead subprograms. Given Google's
the major (perhaps only
> >>>>>>> significant?) user of Split DWARF - I can say
that the needs don't
> >>>>>>> necessarily overlap well with something that
would take
> significantly
> >>>>>>> longer to run or use significantly more
memory. Faster/cheaper/with
> >>>>>>> somewhat bigger output files is probably the
right tradeoff for
> >>>>>>> Google's use case, at least.
> >>>>>>>
> >>>>>>> I imagine Apple's use for dsymutil is
somewhat similar - it's not
> used
> >>>>>>> in the iterative development cycle, only in
final releases - well,
> >>>>>>> maybe their situation is more
"neutral" - not a major pain point in
> >>>>>>> any case I'd guess.
> >>>>>>>
> >>>>>>>
> >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from lld:
> >>>>>>
> >>>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6
sec,
> >>>>>>       generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
> >>>>> FWIW, llvm-dwp is not very well optimized (which is to
say: it is not
> >>>>> optimized), binutils dwp might be a better comparison
(& even that
> >>>>> doesn't have the parallelism & some potential
further memory savings
> >>>>> that lld has that we could take advantage of in a
dwp-like tool)
> >>>>>
> >>>>> What build mode was the clang binary built in?
Optimized or
> unoptimized?
> >>>> right, that is unoptimized build with -ffunction-sections.
> >>>>
> >>>>>> 2. DWARFLinker from lld = linking time for clang
72 sec, clang=760M.
> >>> And this is without Split DWARF? Without linker DWARF
compression? -
> >>> that seems quite a bit surprising, that the deduplication of
DWARF
> >>> could fit into less space than the wasted/reclaimed space in
ranges (&
> >>> line)?
> >> that was without split dwarf, without linker compression.
> >>
> >>> Could you double check these numbers & provide a clearer
summary?
> >> sure, I would re-check it.
> >>
> >>> Here's my attempt at numbers (all with
> function-sections+gc-sections)...
> >>>
> >>> Split DWARF tests didn't seem meaningful - gc-debuginfo +
split DWARF
> >>> seemed to drop all the debug info (except gdb_index) so
wasn't
> >>> working/comparison wasn't meaningful for Apples to Apples,
but
> >>> included it for comparing gc'd non-split to non-gc'd
split (disabled
> >>> gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames)
(which turns
> >>> on by default with Split DWARF because gdb needs it - but a
bit of an
> >>> unfair comparison without turning on gnu-pubnames/gdb-index in
other
> >>> build modes too, since it... /shouldn't/ be necessary)
which might've
> >>> been a factor in the data you were looking at)
> >> that might be the case. i.e. clang=997M for split dwarf(from my
previous
> >> measurement) might include gnu-pubnames.
> >>
> >> would recheck it and if that is the case then it is a unfair
comparison.
> >>
> >>
> >> My point was that "DWARFLinker from lld" takes less
space than singleton
> >> split dwarf file+.dwp file.
> >>
> >> for -O0 uncompressed:
> >>
> >> - .dwp took 1.1G(if I built it correctly), singleton clang(from
your
> >> measurements) 566 MB
> >>
> >>      overall 1.6G.
> > Oh, yeah, even if there are some measurement issues, linked executable
> > + .dwp is going to be larger than a linked executable using non-split
> > DWARF (in v5), since v5 uses all the same representations as non-split
> > DWARF, and split DWARF adds the indirection overhead of a split file,
> > etc.
> >
> > Even without DWARF linking, it's true that split DWARF has
overhead
> > (dwp+executable will be larger than executable non-split).
> >
> > But maybe we've ended up down a bit of a tangent in any case.
> >
> > Trying to bring this back to "should this be committed to
lld" seems
> > valuable, and I'm not sure what the right criteria are for that.
> I think it would be useful to do "removing obsolete debug info"
> in the linker. First thing is that it would be the fastest way(no need
> to copy data/create temp files/built address map...) Second thing
> is that it would be a good separation of concepts. All debug info
> processing, currently done in the linker(gdb_index, upcoming
> debug_names), could be moved into separate library processing
> debug info. When gdb_index/debug_names should be built without
> "removing of obsolete debug info" it would have the same
> performance results as it currently has.
>
> We decided to give the idea of "removing of obsolete debug info"
> another try and are going to implement it as a separate utility
> working with built binary. Making it to be multi-thread would
> probably show better performance results and then it could
> probably be considered as acceptable to use from the linker.
>
>I'm quite interested in this direction. One thought I had was to
incorporate such a library into dsymutil but with support for ELF. If you
get a proposal written up I'd love to take a look and comment.

Thanks!

-eric

> Alexey.
>
> >
> > Ray's the best person to weigh in on that. My 2c is that I think
it
> > probably is worthwhile, even just as an experiment, assuming it's
not
> > too intrusive to lld.
> >
> >> - The "DWARFLinker from lld" 820 MB(from your
measurements).
> >>
> >>
> >> So "DWARFLinker from lld" looks two times better.
> >>
> >>
> >> Anyway, thank you for pointing me to possible mistake. I would
recheck
> >> it and update results.
> >>
> >>
> >> Alexey.
> >>
> >>
> >>> * -O0: (baseline, just using strip -g: 356 MB)
> >>>     * compressed: 25% smaller with gc-debuginfo (481 MB / 641
MB) (407
> >>> MB split/non-gc)
> >>>     * uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB
split/non-gc)
> >>> * -O3: (baseline: 116 MB)
> >>>     * compressed: 16% smaller (361 MB / 462 MB) (283 MB
split/non-gc)
> >>>     * uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB
> split/non-gc)
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
> >>> <alapshin at accesssoftek.com> wrote:
> >>>>>>>>>>>> This idea goes in another
direction than fragmenting dwarf
> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of
> fragmenting is too high.
> >>>>>>>>>>> I tend to agree - but I'm
sort of leaning towards trying to
> use object
> >>>>>>>>>>> features as much as possible,
then implementing just enough
> custom
> >>>>>>>>>>> handling in the linker to
recoup overhead, etc. (eg: add some
> kind of
> >>>>>>>>>>> small header/brief description
that makes it easy for the
> linker to
> >>>>>>>>>>> slice-and-dice - but hopefully
a domain-specific such header
> can be a
> >>>>>>>>>>> bit more compact than the
fully general ELF form)
> >>>>>>>>>> I think this indeed should be
implemented and evaluated.
> >>>>>>>>>> So that various approaches could
be compared.
> >>>>>>>>>>
> >>>>>>>>>>>> It is not only the sizes
of structures describing fragments
> but also the complexity
> >>>>>>>>>>>> of tools that should be
taught to work with fragmented DWARF.
> >>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be able to
> read fragmented DWARF,
> >>>>>>>>>>>> but applied to linked
executable it should work with
> non-fragmented DWARF).
> >>>>>>>>>>>> That idea is for the tool
which works the same way as
> dsymutil ODR.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will shortly describe
the idea of making DWARF be easier
> processed by dsymutil/DWARFLinker:
> >>>>>>>>>>>>
> >>>>>>>>>>>> The idea is to have only
one "type table" per object
> file(special section .debug_types_table).
> >>>>>>>>>>>> This "type
table" would contain all types.
> >>>>>>>>>>>> There could be a special
type of reference - type_offset -
> that offset points into the type table.
> >>>>>>>>>>>> Basic types could always
be placed into the start of "type
> table" thus, offsets to basic types
> >>>>>>>>>>>> most often would be 1
byte. There also would be a special
> kind of reference - reference inside the type.
> >>>>>>>>>>>> Type units sig8 system -
would not be used to reference types.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Types deduplication is
assumed to be done, not by linker
> mechanism for COMDAT,
> >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create resulting
> .debug_types_table by putting there
> >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the
> type would be placed into the
> >>>>>>>>>>>> resulting table. All
references pointing to the deleted copy
> would be corrected to point
> >>>>>>>>>>>> to the single copy inside
"type table". (that is how dsymutil
> works currently)
> >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for a
> general-use
> >>>>>>>>>>> tool - it implies parsing all
the DWARF to find those
> references and
> >>>>>>>>>>> rewrite them, I think. For a
high-performance solution that
> could be
> >>>>>>>>>>> run by the linker I think
it'd be necessary to have a solution
> that
> >>>>>>>>>>> doesn't involve parsing
all the DIEs.
> >>>>>>>>>> According to the current dsymutil
processing,
> >>>>>>>>>> exactly this process is not the
most time-consuming.
> >>>>>>>>>> That could be done relatively
fast.
> >>>>>>>>> Fair enough - though I'd still
imagine any solution that involves
> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an
> order of
> >>>>>>>>> magnitude faster than the current
solution even - but that's
> stuill,
> >>>>>>>>> what, 6 or 7x slower than linking
without the feature?) for most
> users
> >>>>>>>>> to consider it a good trade-off.
> >>>>>>>> It seems to me that even the current 6x-7x
slowdown could be
> useful.
> >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
> >>>>>>>> would be taught to work with a split
dwarf) tools spend this time
> and,
> >>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
> >>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared to
> binutils
> >>>>>>> dwp) is designed to be very quick - by not
needing to do a lot of
> >>>>>>> parsing/fixups. Which, yes, means larger
output files than would be
> >>>>>>> possible with more parsing/etc. It also
doesn't take any input from
> >>>>>>> the linker (so it can run in parallel with the
linker) - so it
> can't
> >>>>>>> remove dead subprograms. Given Google's
the major (perhaps only
> >>>>>>> significant?) user of Split DWARF - I can say
that the needs don't
> >>>>>>> necessarily overlap well with something that
would take
> significantly
> >>>>>>> longer to run or use significantly more
memory. Faster/cheaper/with
> >>>>>>> somewhat bigger output files is probably the
right tradeoff for
> >>>>>>> Google's use case, at least.
> >>>>>>>
> >>>>>>> I imagine Apple's use for dsymutil is
somewhat similar - it's not
> used
> >>>>>>> in the iterative development cycle, only in
final releases - well,
> >>>>>>> maybe their situation is more
"neutral" - not a major pain point in
> >>>>>>> any case I'd guess.
> >>>>>>>
> >>>>>>>
> >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from lld:
> >>>>>>
> >>>>>> 1. split-dwarf+llvm-dwp = linking time for clang 6
sec,
> >>>>>>       generating time for .dwp 53 sec, clang=997M
clang.dwp=1.1G.
> >>>>> FWIW, llvm-dwp is not very well optimized (which is to
say: it is not
> >>>>> optimized), binutils dwp might be a better comparison
(& even that
> >>>>> doesn't have the parallelism & some potential
further memory savings
> >>>>> that lld has that we could take advantage of in a
dwp-like tool)
> >>>>>
> >>>>> What build mode was the clang binary built in?
Optimized or
> unoptimized?
> >>>> right, that is unoptimized build with -ffunction-sections.
> >>>>
> >>>>>> 2. DWARFLinker from lld = linking time for clang
72 sec, clang=760M.
> >>>>> It does seem a tad strange that the clang binary would
be smaller
> >>>>> non-split with DWARF linking than it was split. Though
I could
> imagine
> >>>>> this might be possible in an optimized build (wehre
debug_ranges
> >>>>> become quite relatively expensive in the .o file
contribution with
> >>>>> Split DWARF)
> >>>>> Could you compare the section sizes between these two
clang
> binaries, perhaps?
> >>>> .debug_ranges is three times bigger and .debug_line is
twice bigger.
> >>>>
> >>>>>>>> Thus if they would use this LLD feature in
its current state
> >>>>>>>> - they would still receive benefits.
> >>>>>>>>
> >>>>>>>> Speaking of performance results - LLD is a
multi-thread linker;
> >>>>>>>> it handles sections in parallel.
DWARFLinker generates DWARF using
> >>>>>>>> AsmPrinter which is a stream - so it could
make resulting DWARF
> only
> >>>>>>>> continuously. It is not surprising that
the parallel solution
> works faster.
> >>>>>>>> Making DWARFLinker truly multi-threaded
would probably allow us
> >>>>>>>> to make slowdown to be at 2x-4x range.
> >>>>>>> *nod* that's still a really expensive link
- but I understand
> that's a
> >>>>>>> suitable tradeoff for your users
> >>>>>>>
> >>>>>> Btw, 2x or 7x is for pure linking time. Overall
compilation slowdown
> >>>>>> is not so significant. Building LLVM codebase has
only 20% slowdown.
> >>>>> Understood - that's still quite significant to
most users, I'd
> imagine.
> >>>> I see.
> >>>>
> >>>>>>>>>> Anyway, I think the dsymutil
approach is still valuable, and it
> >>>>>>>>>> would be useful to optimize it.
> >>>>>>>>>> Do you think it would be useful to
make dsymutil/DWARFLinker
> truly multi-thread?
> >>>>>>>>>> (To make dsymutil/DWARFLinker able
to process each object file
> in a separate thread)
> >>>>>>>>> Perhaps - that I'd probably leave
up to the folks who are more
> >>>>>>>>> invested in dsymutil (Adrian Prantl et
al). Maybe one day we'll
> get it
> >>>>>>>>> integrated into llvm-dwp and then
I'll be interested in getting
> as
> >>>>>>>>> much performance out of it as lld - so
multithreading and things
> would
> >>>>>>>>> be on the books.
> >>>>>>>> I think improving dsymutil is a valuable
thing.
> >>>>>>>> Though there are several directions which
might be considered
> >>>>>>>> to make it more robust:
> >>>>>>>>
> >>>>>>>> 1. support of latest DWARF -
DWARF5/DWARF64...
> >>>>>>> I expect/though some of the Apple folks had
already worked on
> DWARF5 support?
> >>>>>>> DWARF64 - that's been around for a while,
and just hasn't been
> needed
> >>>>>>> by LLVM users thus far, it seems (until
recently - where some
> >>>>>>> developers have started working on that)
> >>>>>> There already implemented debug_names table, but
debug_rnglists,
> >>>>>> debug_loclists, type units - are not implemented
yet.
> >>>>> Superficially, type units wouldn't be on the list
of features (like
> >>>>> DWARF64 - it's optional) I'd try to support in
dsymutil - since their
> >>>>> size overhead is more justified for a DWARF-agnostic
linker that's
> >>>>> using comdat groups. With a DWARF-aware linker I'd
be specifically
> >>>>> hoping to avoid using type units to help
> >>>>>> The thing which
> >>>>>> should probably be changed is that dsymutil should
not have its
> version
> >>>>>> of code generating DWARF tables. It should call
already existed
> >>>>>> DWARF5/DWARF64 implementations. Then dsymutil
would always
> >>>>>> use last DWARF generators.
> >>>>> Possibly - I don't know what the architectural
tradeoffs for that
> look
> >>>>> like - I'd imagine DWARFLinker has sufficiently
different
> >>>>> needs/tradeoffs than LLVM's DWARF generation code
(rewriting existing
> >>>>> DIEs compared to building new ones from scratch, etc)
that it might
> be
> >>>>> hard for them to share a lot of their implementation.
> >>>> It is not easy, and would require some additions, but it
would benefit
> >>>> in that all format implementation is in one place. Thus
changing that
> place
> >>>> would reflect in other places. There are at least three
> implementations for
> >>>> .debug_ranges, .debug_aranges currently...
> >>>>
> >>>>
> >>>>>>>> 2. implement multi-threaded execution.
> >>>>>>>> 3. support of split DWARF.
> >>>>>>> Maybe, though I'm still not sure it'd
be the right tradeoff -
> >>>>>>> especially if it involved having to wait to
run the .dwo merger
> (call
> >>>>>>> it DWARF-aware dwp, or dsymutil with dwp
support) until after the
> >>>>>>> linker ran.
> >>>>>>>
> >>>>>>>> 4. implement dsymutil for non-darwin
platform.
> >>>>>>> That's probably, essentially (3),
more-or-less. Split DWARF is
> >>>>>>> somewhat of a formalization of
Apple's/MachO DWARF distribution
> model
> >>>>>>> (leave DWARF it in files that aren't
linked/use them from a
> debugger,
> >>>>>>> but also be able to merge them into some final
file (dsym or dwp)
> for
> >>>>>>> archival purposes)
> >>>>>>>
> >>>>>>>> All of this is a massive piece of work.
> >>>>>>>> Our original investment was to solve two
problems:
> >>>>>>>>
> >>>>>>>> 1. Overlapped address ranges, which is
currently close to being
> solved. Thank you for helping with that!
> >>>>>>> Yeah, again, sorry that's taken quite so
long/somewhat circuitous
> route.
> >>>>>>>
> >>>>>>>> 2. Size of debug info. That still becomes
an issue, but we are
> unsure whether we are ready to
> >>>>>>>>      invest in solving all the above 1-4
problems and how much
> community interested in it.
> >>>>>>> Fair, for sure - I don't think you'd
need to sign up to solve all
> of
> >>>>>>> them (don't think they necessarily need
solving). Potentially
> moving
> >>>>>>> the logic out into a separate tool as
Fangrui's considering - a
> >>>>>>> post-link DWARF optimizer, rather than
in-linker DWARF
> optimization.
> >>>>>>>
> >>>>>>> I really don't want to give you the
runaround like this - but
> multiple
> >>>>>>> times slower links is something that seems
pretty problematic for
> most
> >>>>>>> users, to the point of weighing the
maintainability of lld against
> the
> >>>>>>> convenience of having this functionality
in-linker rather than in a
> >>>>>>> post-link optimizer.
> >>>>>>>
> >>>>>>> (I know you've spoken a bit before about
your users needs - but if
> >>>>>>> it's possible, could you explain (again
:/) why they have such a
> >>>>>>> strong need for smaller DWARF? While DWARF
size is an ongoing
> concern
> >>>>>>> for many users (Google certainly - hence the
invention of Split
> DWARF,
> >>>>>>> use of type units and compressed DWARF, etc) -
usually it's in
> rather
> >>>>>>> large programs, but it sounds like you're
dealing with relatively
> >>>>>>> small ones (otherwise the increase in link
time, I'd imagine,
> would be
> >>>>>>> prohibitive for your users?)?
> >>>>>> We have many large programs and keep Dayly/Nightly
debug builds,
> >>>>>> which takes a lot of disk space. Compilation time
for these
> programs is big.
> >>>>>> The scenario is "compile once".(not
compile-debug-compile-debug).
> >>>>>> So we think that solution(like
dsymutil/DWARFLinker) would not
> slowdown
> >>>>>> the compilation time of overall build
significantly(see above
> numbers for
> >>>>>> llvm codebase) and would allow us to reduce disk
space required to
> keep
> >>>>>> all of these builds.
> >>>>> Ah, OK - for archival purposes. So the interactive
developers
> wouldn't
> >>>>> necessarily be using this feature. Makes sense -
similar to dsymutil
> >>>>> and dwp, mostly used for archival purposes & you
can debug straight
> >>>> >from .o/.dwos for interactive/iterative development.
> >>>>
> >>>>> In that case, it seems more likely that a separate
tool might
> suffice.
> >>>> agreed: if to continue the work on this then it makes
sense to
> >>>> do it as separate tool. Make it fast enough. And if there
would be
> interest
> >>>> in it - then it would probably be possible to return to
idea calling
> it from linker.
> >>>>
> >>>>> Also, out of curiosity - have you tried just
compressing the output
> >>>>> (-gz (I think that does the right thing for the linker
level
> >>>>> compression too, otherwise
-Wl,-compress-debug-sections might do it))
> >>>>> or are you already doing that in addition?
> >>>> sure. we use  -Wl,-compress-debug-sections.
> >>>>
> >>>> Thank you, Alexey.
> >>>>
> >>>>>>> You mentioned that the usability cost of
> >>>>>>> Split DWARF for your users was too high (or
high enough to justify
> >>>>>>> this alternative work of DWARF-aware linking)?
That all seems a bit
> >>>>>>> surprising to me - though I understand the
deployment issues of
> Split
> >>>>>>> DWARF do present some challenges to users in
more heterogenous
> >>>>>>> environments than Google's... still,
I'd have thought there was
> some
> >>>>>>> hope there)
> >>>>>> Our tools does not support split dwarf yet. Though
we plan to
> implement it.
> >>>>>> When we would have support of split dwarf then it
would be
> >>>>>> convenient to have easy way to share built debug
binaries. llvm-dwp
> is the
> >>>>>> answer to this. DWARFLinker could probably be
another answer.
> >>>>> Ah, fair enough - thanks for the context!
> >>>>>>>>> One way to do that would be to have a
CU-local type indirection
> table.
> >>>>>>>>> DIEs reference local type numbers
(like local address/string
> numbers -
> >>>>>>>>> addrx/strx/rnglistx) and that table
contains either sig8 (no
> linker
> >>>>>>>>> fixups required) or the local type
offsets you describe - the
> linker
> >>>>>>>>> would then only need to read this type
number indirection table
> and
> >>>>>>>>> rewrite them to the final type
numbers.
> >>>>>>>> Yes, that could be additionally done if
this process would be
> time-consuming.
> >>>>>>>>
> >>>>>>>> David, thank you for all your comments and
explanations. They are
> extremely helpful.
> >>>>>>> Sure thing - really appreciate your patience
with all this -
> it's... a
> >>>>>>> lot of moving parts.
> >>>>>>> - Dave
> >>>>>>> Thank you, Alexey.
> >>>>>>>
> >>>>>>>> sig8 hash-id would be used to compare
types and to deduplicate
> them.
> >>>>>>>> It would speed up the current dsymutil
context analysis.
> >>>>>>>> Types having the same hash-id could be
deduplicated.
> >>>>>>>> This would allow deduplicating a more
number of types than
> current dsymutil.
> >>>>>>>> Incomplete type definitions having a
similar set of members are
> not deduplicated by dsymutil currently.
> >>>>>>>> In this case they would have the same
hash-id.
> >>>>>>>>
> >>>>>>>> This "type table" would take
less space than current "type units"
> and current ODR solution.
> >>>>>>>>
> >>>>>>>> Above is just an idea on how to help
DWARF-aware linker(based on
> idea removing obsolete debug info)
> >>>>>>>> to work faster(if that is interesting).
> >>>>>>>>
> >>>>>>>> Alexey.
> >>>>>>>>
> >>>>>>>>> From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of
> James Henderson via llvm-dev
> >>>>>>>>> Sent: Wednesday, June 3, 2020 3:48 AM
> >>>>>>>>> To: David Blaikie <dblaikie at
gmail.com>
> >>>>>>>>> Cc: llvm-dev at lists.llvm.org
> >>>>>>>>> Subject: Re: [llvm-dev]
[Debuginfo][DWARF][LLD] Remove obsolete
> debug info in lld.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It makes me sad that the linker (via a
library or otherwise) has
> to be "DWARF-aware" to be able to effectively handle
--gc-sections,
> COMDATs, --icf etc for debug info, without leaving large blocks of data
> kicking around.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The patching to -1 (or equivalent) is
probably a good
> lightweight solution (though I'd love it if it could be done based on
> section type in the future rather than section name, but that's
probably
> outside the realm of DWARF), as it requires only minimal understanding in
> the linker, but anything beyond that seems to be complicated logic that is
> mostly due to the structure of DWARF. Patching to -1 does feel a bit like a
> sticking plaster/band aid to patch over the issue rather than properly
> solving it too - there will still be debug data (potentially significant
> amounts in COMDAT-heavy objects) that the linker has to write and the
> debugger has to somehow know how to skip (even if it knows that -1 is
> special-case due to the standard being updated, it needs to get as far as
> the -1), which is all wasted effort.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We've already seen from
Alexey's prototyping, and from our own
> experiences with the Sony proprietary linker (which tried to rewrite
> .debug_line only) that deconstructing the DWARF so that it can be more
> optimally reassembled at link time is slow going, and will probably
> inevitably be however much effort is put into optimising it. For a start,
> given the current standards, it's impossible to know how to deconstruct
it
> without having to parse vast amounts of DWARF, which is typically going to
> mean a lot more parsing work than the linker would normally have to deal
> with. Additionally, much of this parsing work is wasted effort, since it
> seems unlikely in many links that large amounts of the DWARF will be
> redundant. Having an option to opt-in doesn't help much there, since it
> just means the logic exists without most people using it, due to it not
> being good enough, or potentially they don't even know it exists.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I don't have particularly concrete
suggestions as to how to
> solve the structural problems with DWARF at this point. The only thing that
> seems obvious to me is a more "blessed" approach to fragmentation
of
> sections, similar to what I tried with my prototype mentioned earlier in
> the thread, although we'd need to figure out the previously stated
> performance issues. Other ideas might tie into this, like somehow sharing
> the various table headers a bit like CIEs in .eh_frame that could be merged
> by the linker - each object could have separate table header sections,
> which are referenced by the individual .debug_* blocks, which in turn are
> one per function/data piece and easily discardable/merged by the linker.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Just some thoughts.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> James
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, 2 Jun 2020 at 19:24, David
Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>>>>>>>
> >>>>>>>>> On Tue, May 19, 2020 at 7:17 AM Alexey
Lapshin
> >>>>>>>>> <alapshin at accesssoftek.com>
wrote:
> >>>>>>>>>> Hi David, please find my comments
inside:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>> Broad question: Do you
have any specific
> motivation/users/etc in implementing this (if you can speak about it)?
> >>>>>>>>>>>>> - it might help
motivate the work, understand what tradeoffs
> might be suitable for you/your users, etc.
> >>>>>>>>>>>> There are two general
requirements:
> >>>>>>>>>>>> 1) Remove (or clean)
invalid debug info.
> >>>>>>>>>>> Perhaps a simpler direct
solution for your immediate needs
> might be a much narrower,
> >>>>>>>>>>> and more efficient
linker-DWARF-awareness feature:
> >>>>>>>>>>>
> >>>>>>>>>>> With DWARFv5, rnglists present
an opportunity for a DWARF
> linker to rewrite the ranges
> >>>>>>>>>>> without parsing the rest of
the DWARF. /technically/ this
> isn't guaranteed - rnglist entries
> >>>>>>>>>>> can be referenced either
directly, or by index. If all
> rnglists are referenced by index, then
> >>>>>>>>>>> a linker could parse only the
debug_rnglists section and
> rewrite ranges to remove any
> >>>>>>>>>>> address ranges that refer to
optimized-out code.
> >>>>>>>>>>>
> >>>>>>>>>>> This would only be correct for
rnglists that had no direct
> references to them (that only were
> >>>>>>>>>>> referenced via the indexes) -
but we could either implement it
> with that assumption, or could
> >>>>>>>>>>> add an LLVM extension
attribute on the CU that would say "I
> promise I only referenced rnglists
> >>>>>>>>>>> via rnglistx forms/indexes).
If this DWARF-aware linking would
> have to read the CU DIE (not
> >>>>>>>>>>> all the other DIEs) it /could/
also then rewrite high/low_pc
> if the CU wasn't using ranges...
> >>>>>>>>>>> but that wouldn't come up
in the function-removal case,
> because then you'd have ranges anyway,
> >>>>>>>>>>> so no need for that.
> >>>>>>>>>>>
> >>>>>>>>>>> Such a DWARF-aware rnglist
linking could also simplify
> rnglists, in cases where functions
> >>>>>>>>>>> ended up being laid out next
to each other, the linker could
> coalesce their ranges together.
> >>>>>>>>>>>
> >>>>>>>>>>> I imagine this could be
implemented with very little overhead
> to linking, especially compared
> >>>>>>>>>>> to the overhead of full
DWARF-aware linking.
> >>>>>>>>>>>
> >>>>>>>>>>> Though none of this fixes
Split DWARF, where the linker
> doesn't get a chance to see the
> >>>>>>>>>>> addresses being used - but if
you only want/need the CU-level
> ranges to be correct, this
> >>>>>>>>>>> might be a viable fix, and
quite efficient.
> >>>>>>>>>> Yes, we think about that
alternative. This would resolve our
> problem of invalid debug info
> >>>>>>>>>> and would work much faster. Thus,
if we would not have good
> results for D74169 then we
> >>>>>>>>>> will implement it. Do you think it
could be useful to have this
> solution in upstream?
> >>>>>>>>> A pure rnglist rewriting - I think
it'd be OK to have in
> upstream -
> >>>>>>>>> again, cost/benefit/etc would have to
be weighed. I'm not sure it
> >>>>>>>>> would save enough space to be
particularly valuable beyond the
> >>>>>>>>> correctness issue - and it doesn't
completely solve the
> correctness
> >>>>>>>>> issue for zero-address usage or
low-address usage (because you
> could
> >>>>>>>>> still have overlapping subprograms
inside a CU - so if you were
> >>>>>>>>> symbolizing you could use the correct
rnglist to filter, but
> then go
> >>>>>>>>> look inside the CU only to find two
subprograms that had that
> address
> >>>>>>>>> & not know which one was the
correct one an which one was the
> >>>>>>>>> discarded one).
> >>>>>>>>>
> >>>>>>>>> rnglist rewriting might be easy enough
to prototype - but
> depends what
> >>>>>>>>> you want to spend your time on, I know
this whole issue has been
> a
> >>>>>>>>> huge investment of your time already -
but maybe this recent
> >>>>>>>>> revitalization of the conversation
around having an explicit
> value in
> >>>>>>>>> the linker might be sufficient to
address everyone's needs...
> *fingers
> >>>>>>>>> crossed*)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>> 2) Optimize the DWARF
size.
> >>>>>>>>>>> Do your users care much about
this? I imagine if they had
> significant DWARF size issues,
> >>>>>>>>>>> they'd have significant
link time issues and the kind of cost
> to link time this feature has would
> >>>>>>>>>>> be prohibitive - but perhaps
they're sharing linked binaries
> much more often than they're
> >>>>>>>>>>> actually performing linking.
> >>>>>>>>>> Yes, they do. They also have
significant link-time issues.
> >>>>>>>>>> So current performance results of
D74169 are not very
> acceptable.
> >>>>>>>>>> We hope to improve it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> The specifics which our
users have:
> >>>>>>>>>>>>    - embedded platform
which uses 0 as start of .text section.
> >>>>>>>>>>>>    - custom toolset which
does not support all features
> yet(f.e. split dwarf).
> >>>>>>>>>>>>    - tolerant of the
link-time increase.
> >>>>>>>>>>>>    - need a useful way to
share debug builds.
> >>>>>>>>>>> Sharing two files (executable
and dwp) is significantly less
> useful than sharing one file?
> >>>>>>>>>> Probably not significantly, but
yes, it looks less useful
> comparing to D74169.
> >>>>>>>>>> Having only two files (executable
and .dwp) looks significantly
> better than having executable and multiple .dwo files.
> >>>>>>>>>> Having only one file(executable)
with minimal size looks better
> than the two files with a bigger size.
> >>>>>>>>>>
> >>>>>>>>>> clang compiled with -gsplitdwarf
takes 0.9G for executable and
> 0.9G for .dwp.
> >>>>>>>>>> clang compiled with -gc-debuginfo
takes only 0.76G for single
> executable.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> For the first point: we
have a problem "Overlapping address
> ranges starting from 0"(D59553).
> >>>>>>>>>>>> We use custom solution,
but the general solution like D74169
> would be better here.
> >>>>>>>>>>> If CU ranges are the only ones
that need fixing, then I think
> the above solution might be as
> >>>>>>>>>>> good/better - if more than CU
ranges need fixing, then I think
> we might want to start talking about
> >>>>>>>>>>> how to fix DWARF itself (split
and non-split) to signal
> certain addresses point to dead code with a
> >>>>>>>>>>> specific blessed value that
linkers would need to implement -
> because with Split DWARF there's
> >>>>>>>>>>> no way to solve the non-CU
addresses at the linker.
> >>>>>>>>>> I think the worthful solution for
that signal value would be
> LowPC > HighPC.
> >>>>>>>>>> That does not require additional
bits in DWARF.
> >>>>>>>>>> It would be natural to skip such
address ranges since they
> explicitly marked as invalid.
> >>>>>>>>>> It could be implemented in a
linker very easily. Probably, it
> would make sense to describe that
> >>>>>>>>>> usage in DWARF standard.
> >>>>>>>>>>
> >>>>>>>>>> As to the addresses which are not
seen by the linker(since they
> are in .dwo files) - yes,
> >>>>>>>>>> they need to have another
solution. Could you show an example
> of such a case, please?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>> 2. Support of type
units.
> >>>>>>>>>>>>>>    That could be
implemented further.
> >>>>>>>>>>>>> Enabling type units
increases object size to make it easier
> to deduplicate at link time by a DWARF-unaware
> >>>>>>>>>>>>> linker. With a DWARF
aware linker it'd be generally
> desirable not to have to add that object size overhead to
> >>>>>>>>>>>>> get the linking
improvements.
> >>>>>>>>>>>> But, DWARFLinker should
adequately work with type units since
> they are already implemented.
> >>>>>>>>>>> Maybe - it'd be nice &
all, but I don't think it's an outright
> necessity - if someone knows they're using
> >>>>>>>>>>> a DWARF-aware linker,
they'd probably not use type units in
> their object files. It's possible someone
> >>>>>>>>>>> doesn't know for sure
& maybe they have pre-canned debug
> object files from someone else, etc.
> >>>>>>>>>> I see.
> >>>>>>>>>>
> >>>>>>>>>>>> Another thing is that the
idea behind type units has the
> potential to help Dwarf-aware linker to work faster.
> >>>>>>>>>>>> Currently, DWARFLinker
analyzes context to understand whether
> types are the same or not.
> >>>>>>>>>>> When you say "analyzes
context" what do you mean? Usually I'd
> take that to mean
> >>>>>>>>>>> "looks at things outside
the type itself - like what namespace
> it's in, etc" - which, yes,
> >>>>>>>>>>> it should do that, but it
doesn't seem very expensive to do.
> But I guess you actually
> >>>>>>>>>>> mean something about doing
structural equivalence in some way,
> looking at things inside the type?
> >>>>>>>>>> I think it could be useful for
both cases. Currently, dsymutil
> does only first thing
> >>>>>>>>>> (look at type name, namespace
name, etc..) and does not do the
> second thing
> >>>>>>>>>> (doing structural equivalence).
Analyzing type names is
> currently quite expensive
> >>>>>>>>>> (the only search in string pool
takes ~10 sec from 70 sec of
> overall time).
> >>>>>>>>>> That is expensive because of many
things should be done to work
> with strings:
> >>>>>>>>>> parse DWARF, search and resolve
relocations, compute a hash for
> strings,
> >>>>>>>>>> put data into a string pool,
create a fully qualified name(like
> namespace::function::name).
> >>>>>>>>>> It looks like it could be
optimized and finally require less
> time, but it still would be a noticeable
> >>>>>>>>>> part of the overall time.
> >>>>>>>>>>
> >>>>>>>>>> If dsymutil starts to check for
the structural equivalence,
> then the process would be even more slowly.
> >>>>>>>>>> So, If instead of comparing types
structure, there would be
> checked single hash-id - then this process
> >>>>>>>>>> would also be faster.
> >>>>>>>>>>
> >>>>>>>>>> Thus I think using hash-id to
compare types would allow to make
> current implementation faster and would
> >>>>>>>>>> allow handling incomplete types by
DWARFLinker without massive
> performance degradation also.
> >>>>>>>>>>
> >>>>>>>>>>>> But the context is known
when types are generated. So, no
> need to spent the time analyzing it.
> >>>>>>>>>>>> If types could be compared
without analyzing context, then
> Dwarf-aware linker would work faster.
> >>>>>>>>>>>> That is just an idea(not
for immediate implementation): If
> types would be stored in some "type table"
> >>>>>>>>>>>> (instead of COMDAT section
group) and could be accessed
> through hash-id(like type units
> >>>>>>>>>>>> - then it would be the
solution requiring fewer bits to store
> but allowing to compare types
> >>>>>>>>>>>> by hash-id(not analysing
context).
> >>>>>>>>>>>> In this case, size
increasing would be small. And processing
> time could be done faster.
> >>>>>>>>>>>>
> >>>>>>>>>>>> this is just an idea and
could be discussed separately from
> the problem of integrating of D74169.
> >>>>>>>>>>>>>> 6. -flto=thin
> >>>>>>>>>>>>>>      That problem
was described in this review
> https://reviews.llvm.org/D54747#1503720. It also exists in
> >>>>>>>>>>>>>> current
DWARFLinker/dsymutil implementation. I think that
> problem should be discussed more: it could
> >>>>>>>>>>>>>> probably be fixed
by avoiding generation of such incomplete
> declaration during thinlto,
> >>>>>>>>>>>>>> That would be
costly to produce extra/redundant debug info
> in ThinLTO - actually ThinLTO could be doing
> >>>>>>>>>>>>>> more to reduce
that redundancy early on (actually removing
> definitions from some llvm Modules if the type
> >>>>>>>>>>>>>> definition is
known to exist in another Module, etc)
> >>>>>>>>>>>>> I don't know if
it's a problem since that patch was reverted.
> >>>>>>>>>>>> Yes. That patch was
reverted, but this patch(D74169) has the
> same problem.
> >>>>>>>>>>>> if D74169 would be applied
and --gc-debuginfo used then
> structure type
> >>>>>>>>>>>> definition would be
removed.
> >>>>>>>>>>>> DWARFLinker could handle
that case - "removing definitions
> from some llvm Modules if the type
> >>>>>>>>>>>> definition is known to
exist in another Module".
> >>>>>>>>>>>> i.e. DWARFLinker could
replace the declaration with the
> definition.
> >>>>>>>>>>>> But that problem could be
more easily resolved when debug
> info is generated(probably without
> >>>>>>>>>>>> significant increase of
debug info size):
> >>>>>>>>>>>> Here we have:
> >>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile unit containing
> concrete instance for function "f".
> >>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile unit containing
> abstract instance root for function "f".
> >>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile unit containing
> function "f" definition.
> >>>>>>>>>>>> Code for function
"f" was deleted. gc-debuginfo deletes
> compile unit DW_TAG_compile_unit(0x000000c1)
> >>>>>>>>>>>> containing "f"
definition (since there is no corresponding
> code). But it has structure "Foo" definition
> >>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced from
> DW_TAG_compile_unit(0x00000073)
> >>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That
> declaration is exactly the case when definition
> >>>>>>>>>>>> was removed by thinlto and
replaced with declaration.
> >>>>>>>>>>>> Would it cost too much if
type definition would not be
> replaced with declaration for "abstract instance root"?
> >>>>>>>>>>>> The number of concrete
instances is bigger than number of
> abstract instance roots.
> >>>>>>>>>>>> Probably, it would not be
too costly to leave definition in
> abstract instance root?
> >>>>>>>>>>
> >>>>>>>>>>>> Alternatively, Would it
cost too much if type definition
> would not be replaced with declaration when
> >>>>>>>>>>>> declaration references
type from not used function? (lto
> could understand that concrete function is not used).
> >>>>>>>>>>> I don't follow this
example - could you provide a small
> concrete test case I could reproduce?
> >>>>>>>>>> I would provide a test case if
necessary. But it looks like
> this issue is finally clear, and you already commented on that.
> >>>>>>>>>>
> >>>>>>>>>>> Oh, I guess this is happening
perhaps because ThinLTO can't
> know for sure that a standalone
> >>>>>>>>>>> definition of 'f'
won't be needed - so it produces one in case
> one of the inlining opportunities
> >>>>>>>>>>> doesn't end up inlining.
Then it turns out all calls got
> inlined, so the external definition wasn't needed.
> >>>>>>>>>>> Oh, you're suggesting that
these 3 CUs got emitted into one
> object file during LTO, but that DWARFLinker
> >>>>>>>>>>> drops a CU without any code in
it - even though... So far as I
> know, in LTO, LLVM directly references
> >>>>>>>>>>> types across units if the CUs
are all emitted in the same
> object file. (and if they weren't in the same
> >>>>>>>>>>> object file - then the
abstract_origin couldn't be pointing
> cross-CU).
> >>>>>>>>>>> I guess some basic things to
say:
> >>>>>>>>>>> With ThinLTO, the
concrete/standalone function definition is
> emitted in case some call sites don't end up
> >>>>>>>>>>> being inlined. So we know
it'll be emitted (but might not be
> needed by the actual linker)
> >>>>>>>>>>> ANy number of inline calls
might exist - but we shouldn't put
> the type information into those, because
> >>>>>>>>>>> they aren't guaranteed to
emit it (if the inline function gets
> optimized away, there would be nothing to
> >>>>>>>>>>> enforce the type being
emitted) - and even if we forced the
> type information to be emitted into one
> >>>>>>>>>>> object file that has an inline
copy of the function - there's
> no guarantee that object file will get linked in either.
> >>>>>>>>>>> So, no, I don't think
there's much we can do to keep the size
> of object files down, while guaranteeing
> >>>>>>>>>>> the type information will be
emitted with the usual linker
> semantics.
> >>>>>>>>>> Then dsymutil/DWARFLinker could be
changed to handle
> that(though it would probably be not very efficient).
> >>>>>>>>>> If thinlto would understand that
function is not used
> finally(and then must not contain referenced type definition),
> >>>>>>>>>> then this situation could be
handled more effectively.
> >>>>>>>>>>
> >>>>>>>>>> Thank you, Alexey.
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
_______________________________________________
> >>>>>>>>>>>> LLVM Developers mailing
list
> >>>>>>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>
_______________________________________________
> >>>>>>>>> LLVM Developers mailing list
> >>>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200731/93217290/attachment-0001.html>

Alexey Lapshin via llvm-dev

2020-Aug-03 15:32 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi Eric, please

On 31.07.2020 22:02, Eric Christopher wrote:> Hi Alexey,
>
> On Fri, Jul 31, 2020 at 4:02 AM Alexey Lapshin via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
>     On 28.07.2020 19:28, David Blaikie wrote:
>     > On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin
>     <avl.lapshin at gmail.com <mailto:avl.lapshin at
gmail.com>> wrote:
>     >>
>     >> On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:
>     >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>     >>> <alapshin at accesssoftek.com <mailto:alapshin at
accesssoftek.com>>
>     wrote:
>     >>>>>>>>>>>> This idea goes in
another direction than fragmenting
>     dwarf
>     >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the
>     cost of fragmenting is too high.
>     >>>>>>>>>>> I tend to agree - but
I'm sort of leaning towards
>     trying to use object
>     >>>>>>>>>>> features as much as
possible, then implementing just
>     enough custom
>     >>>>>>>>>>> handling in the linker to
recoup overhead, etc. (eg:
>     add some kind of
>     >>>>>>>>>>> small header/brief
description that makes it easy for
>     the linker to
>     >>>>>>>>>>> slice-and-dice - but
hopefully a domain-specific such
>     header can be a
>     >>>>>>>>>>> bit more compact than the
fully general ELF form)
>     >>>>>>>>>> I think this indeed should be
implemented and evaluated.
>     >>>>>>>>>> So that various approaches
could be compared.
>     >>>>>>>>>>
>     >>>>>>>>>>>> It is not only the
sizes of structures describing
>     fragments but also the complexity
>     >>>>>>>>>>>> of tools that should
be taught to work with
>     fragmented DWARF.
>     >>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be
>     able to read fragmented DWARF,
>     >>>>>>>>>>>> but applied to linked
executable it should work with
>     non-fragmented DWARF).
>     >>>>>>>>>>>> That idea is for the
tool which works the same way as
>     dsymutil ODR.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be
>     easier processed by dsymutil/DWARFLinker:
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> The idea is to have
only one "type table" per object
>     file(special section .debug_types_table).
>     >>>>>>>>>>>> This "type
table" would contain all types.
>     >>>>>>>>>>>> There could be a
special type of reference -
>     type_offset - that offset points into the type table.
>     >>>>>>>>>>>> Basic types could
always be placed into the start of
>     "type table" thus, offsets to basic types
>     >>>>>>>>>>>> most often would be 1
byte. There also would be a
>     special kind of reference - reference inside the type.
>     >>>>>>>>>>>> Type units sig8 system
- would not be used to
>     reference types.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> Types deduplication is
assumed to be done, not by
>     linker mechanism for COMDAT,
>     >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create
>     resulting .debug_types_table by putting there
>     >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy
>     of the type would be placed into the
>     >>>>>>>>>>>> resulting table. All
references pointing to the
>     deleted copy would be corrected to point
>     >>>>>>>>>>>> to the single copy
inside "type table". (that is how
>     dsymutil works currently)
>     >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for
>     a general-use
>     >>>>>>>>>>> tool - it implies parsing
all the DWARF to find those
>     references and
>     >>>>>>>>>>> rewrite them, I think. For
a high-performance solution
>     that could be
>     >>>>>>>>>>> run by the linker I think
it'd be necessary to have a
>     solution that
>     >>>>>>>>>>> doesn't involve
parsing all the DIEs.
>     >>>>>>>>>> According to the current
dsymutil processing,
>     >>>>>>>>>> exactly this process is not
the most time-consuming.
>     >>>>>>>>>> That could be done relatively
fast.
>     >>>>>>>>> Fair enough - though I'd still
imagine any solution that
>     involves
>     >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough
>     (maybe an order of
>     >>>>>>>>> magnitude faster than the current
solution even - but
>     that's stuill,
>     >>>>>>>>> what, 6 or 7x slower than linking
without the feature?)
>     for most users
>     >>>>>>>>> to consider it a good trade-off.
>     >>>>>>>> It seems to me that even the current
6x-7x slowdown could
>     be useful.
>     >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming
>     DWARFLinker
>     >>>>>>>> would be taught to work with a split
dwarf) tools spend
>     this time and,
>     >>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
>     >>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared
>     to binutils
>     >>>>>>> dwp) is designed to be very quick - by not
needing to do a
>     lot of
>     >>>>>>> parsing/fixups. Which, yes, means larger
output files than
>     would be
>     >>>>>>> possible with more parsing/etc. It also
doesn't take any
>     input from
>     >>>>>>> the linker (so it can run in parallel with
the linker) -
>     so it can't
>     >>>>>>> remove dead subprograms. Given
Google's the major (perhaps
>     only
>     >>>>>>> significant?) user of Split DWARF - I can
say that the
>     needs don't
>     >>>>>>> necessarily overlap well with something
that would take
>     significantly
>     >>>>>>> longer to run or use significantly more
memory.
>     Faster/cheaper/with
>     >>>>>>> somewhat bigger output files is probably
the right
>     tradeoff for
>     >>>>>>> Google's use case, at least.
>     >>>>>>>
>     >>>>>>> I imagine Apple's use for dsymutil is
somewhat similar -
>     it's not used
>     >>>>>>> in the iterative development cycle, only
in final releases
>     - well,
>     >>>>>>> maybe their situation is more
"neutral" - not a major pain
>     point in
>     >>>>>>> any case I'd guess.
>     >>>>>>>
>     >>>>>>>
>     >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from
>     lld:
>     >>>>>>
>     >>>>>> 1. split-dwarf+llvm-dwp = linking time for
clang 6 sec,
>     >>>>>>       generating time for .dwp 53 sec,
clang=997M
>     clang.dwp=1.1G.
>     >>>>> FWIW, llvm-dwp is not very well optimized (which
is to say:
>     it is not
>     >>>>> optimized), binutils dwp might be a better
comparison (&
>     even that
>     >>>>> doesn't have the parallelism & some
potential further memory
>     savings
>     >>>>> that lld has that we could take advantage of in a
dwp-like tool)
>     >>>>>
>     >>>>> What build mode was the clang binary built in?
Optimized or
>     unoptimized?
>     >>>> right, that is unoptimized build with
-ffunction-sections.
>     >>>>
>     >>>>>> 2. DWARFLinker from lld = linking time for
clang 72 sec,
>     clang=760M.
>     >>> And this is without Split DWARF? Without linker DWARF
>     compression? -
>     >>> that seems quite a bit surprising, that the deduplication
of DWARF
>     >>> could fit into less space than the wasted/reclaimed space
in
>     ranges (&
>     >>> line)?
>     >> that was without split dwarf, without linker compression.
>     >>
>     >>> Could you double check these numbers & provide a
clearer summary?
>     >> sure, I would re-check it.
>     >>
>     >>> Here's my attempt at numbers (all with
>     function-sections+gc-sections)...
>     >>>
>     >>> Split DWARF tests didn't seem meaningful -
gc-debuginfo +
>     split DWARF
>     >>> seemed to drop all the debug info (except gdb_index) so
wasn't
>     >>> working/comparison wasn't meaningful for Apples to
Apples, but
>     >>> included it for comparing gc'd non-split to
non-gc'd split
>     (disabled
>     >>> gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames)
>     (which turns
>     >>> on by default with Split DWARF because gdb needs it - but
a
>     bit of an
>     >>> unfair comparison without turning on
gnu-pubnames/gdb-index in
>     other
>     >>> build modes too, since it... /shouldn't/ be necessary)
which
>     might've
>     >>> been a factor in the data you were looking at)
>     >> that might be the case. i.e. clang=997M for split dwarf(from
my
>     previous
>     >> measurement) might include gnu-pubnames.
>     >>
>     >> would recheck it and if that is the case then it is a unfair
>     comparison.
>     >>
>     >>
>     >> My point was that "DWARFLinker from lld" takes less
space than
>     singleton
>     >> split dwarf file+.dwp file.
>     >>
>     >> for -O0 uncompressed:
>     >>
>     >> - .dwp took 1.1G(if I built it correctly), singleton
clang(from
>     your
>     >> measurements) 566 MB
>     >>
>     >>      overall 1.6G.
>     > Oh, yeah, even if there are some measurement issues, linked
>     executable
>     > + .dwp is going to be larger than a linked executable using
>     non-split
>     > DWARF (in v5), since v5 uses all the same representations as
>     non-split
>     > DWARF, and split DWARF adds the indirection overhead of a split
>     file,
>     > etc.
>     >
>     > Even without DWARF linking, it's true that split DWARF has
overhead
>     > (dwp+executable will be larger than executable non-split).
>     >
>     > But maybe we've ended up down a bit of a tangent in any case.
>     >
>     > Trying to bring this back to "should this be committed to
lld" seems
>     > valuable, and I'm not sure what the right criteria are for
that.
>     I think it would be useful to do "removing obsolete debug
info"
>     in the linker. First thing is that it would be the fastest way(no need
>     to copy data/create temp files/built address map...) Second thing
>     is that it would be a good separation of concepts. All debug info
>     processing, currently done in the linker(gdb_index, upcoming
>     debug_names), could be moved into separate library processing
>     debug info. When gdb_index/debug_names should be built without
>     "removing of obsolete debug info" it would have the same
>     performance results as it currently has.
>
>     We decided to give the idea of "removing of obsolete debug
info"
>     another try and are going to implement it as a separate utility
>     working with built binary. Making it to be multi-thread would
>     probably show better performance results and then it could
>     probably be considered as acceptable to use from the linker.
>
>
> I'm quite interested in this direction. One thought I had was to 
> incorporate such a library into dsymutil but with support for ELF. If 
> you get a proposal written up I'd love to take a look and comment.
>
yes, I would share the proposal in a separate thread within a week or two.

Shortly: we decided to move in slightly other direction than adding this 
functionality
into dsymutil. Though if there is a preference to implement it as part 
of dsymutil
we are OK to do this way.

In its first version, this new utility supposed to receive built binary 
with debug info
as input(with the new marking for references to removed code sections -1/-2
-https://reviews.llvm.org/D84825) and create a new binary with removed 
obsolete
debug info according to the above marking. In the next versions, it 
could be extended
with other debug info optimizations tasks. F.e. generation new index 
tables, debug info
optimizing... etc...

We considered three options:

1. add new functionality into dsymutil. So that dsymutil behaves 
differently
     on a non-darwin platform and supports another set of command-line 
options.

2. add new functionality into llvm-objcopy. llvm-objcopy already 
supports various
      binary objects formats(MachO,ELF,COFF,wasm). It also has several 
options
      to work with debug-info.

3. create new utility llvm-dwarfutil which would implement the above 
functionality
      and reuse DWARFLinker(extracted from dsymutil) library and new 
library
      ObjectCopy(extracted from llvm-objcopy).

So far our preference is number three. The reason for this is that separate
utility specifically working with debug info looks as good separation of 
concepts.
Adding another behavior to dsymutil looks not very good. Extending the 
already
rich interface of llvm-objcopy looks also not very good. Having in mind 
that actual
implementation would be shared by libraries, the separate utility, 
working specifically
with debug info, looks like the right choice. That is our current idea.

I would publish the proposal shortly to discuss it.


Thank you, Alexey.> Thanks!
>
> -eric
>
>     Alexey.
>
>     >
>     > Ray's the best person to weigh in on that. My 2c is that I
think it
>     > probably is worthwhile, even just as an experiment, assuming
>     it's not
>     > too intrusive to lld.
>     >
>     >> - The "DWARFLinker from lld" 820 MB(from your
measurements).
>     >>
>     >>
>     >> So "DWARFLinker from lld" looks two times better.
>     >>
>     >>
>     >> Anyway, thank you for pointing me to possible mistake. I would
>     recheck
>     >> it and update results.
>     >>
>     >>
>     >> Alexey.
>     >>
>     >>
>     >>> * -O0: (baseline, just using strip -g: 356 MB)
>     >>>     * compressed: 25% smaller with gc-debuginfo (481 MB /
641
>     MB) (407
>     >>> MB split/non-gc)
>     >>>     * uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB
>     split/non-gc)
>     >>> * -O3: (baseline: 116 MB)
>     >>>     * compressed: 16% smaller (361 MB / 462 MB) (283 MB
>     split/non-gc)
>     >>>     * uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB
>     split/non-gc)
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>     >>> <alapshin at accesssoftek.com <mailto:alapshin at
accesssoftek.com>>
>     wrote:
>     >>>>>>>>>>>> This idea goes in
another direction than fragmenting
>     dwarf
>     >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the
>     cost of fragmenting is too high.
>     >>>>>>>>>>> I tend to agree - but
I'm sort of leaning towards
>     trying to use object
>     >>>>>>>>>>> features as much as
possible, then implementing just
>     enough custom
>     >>>>>>>>>>> handling in the linker to
recoup overhead, etc. (eg:
>     add some kind of
>     >>>>>>>>>>> small header/brief
description that makes it easy for
>     the linker to
>     >>>>>>>>>>> slice-and-dice - but
hopefully a domain-specific such
>     header can be a
>     >>>>>>>>>>> bit more compact than the
fully general ELF form)
>     >>>>>>>>>> I think this indeed should be
implemented and evaluated.
>     >>>>>>>>>> So that various approaches
could be compared.
>     >>>>>>>>>>
>     >>>>>>>>>>>> It is not only the
sizes of structures describing
>     fragments but also the complexity
>     >>>>>>>>>>>> of tools that should
be taught to work with
>     fragmented DWARF.
>     >>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be
>     able to read fragmented DWARF,
>     >>>>>>>>>>>> but applied to linked
executable it should work with
>     non-fragmented DWARF).
>     >>>>>>>>>>>> That idea is for the
tool which works the same way as
>     dsymutil ODR.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be
>     easier processed by dsymutil/DWARFLinker:
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> The idea is to have
only one "type table" per object
>     file(special section .debug_types_table).
>     >>>>>>>>>>>> This "type
table" would contain all types.
>     >>>>>>>>>>>> There could be a
special type of reference -
>     type_offset - that offset points into the type table.
>     >>>>>>>>>>>> Basic types could
always be placed into the start of
>     "type table" thus, offsets to basic types
>     >>>>>>>>>>>> most often would be 1
byte. There also would be a
>     special kind of reference - reference inside the type.
>     >>>>>>>>>>>> Type units sig8 system
- would not be used to
>     reference types.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> Types deduplication is
assumed to be done, not by
>     linker mechanism for COMDAT,
>     >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create
>     resulting .debug_types_table by putting there
>     >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy
>     of the type would be placed into the
>     >>>>>>>>>>>> resulting table. All
references pointing to the
>     deleted copy would be corrected to point
>     >>>>>>>>>>>> to the single copy
inside "type table". (that is how
>     dsymutil works currently)
>     >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for
>     a general-use
>     >>>>>>>>>>> tool - it implies parsing
all the DWARF to find those
>     references and
>     >>>>>>>>>>> rewrite them, I think. For
a high-performance solution
>     that could be
>     >>>>>>>>>>> run by the linker I think
it'd be necessary to have a
>     solution that
>     >>>>>>>>>>> doesn't involve
parsing all the DIEs.
>     >>>>>>>>>> According to the current
dsymutil processing,
>     >>>>>>>>>> exactly this process is not
the most time-consuming.
>     >>>>>>>>>> That could be done relatively
fast.
>     >>>>>>>>> Fair enough - though I'd still
imagine any solution that
>     involves
>     >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough
>     (maybe an order of
>     >>>>>>>>> magnitude faster than the current
solution even - but
>     that's stuill,
>     >>>>>>>>> what, 6 or 7x slower than linking
without the feature?)
>     for most users
>     >>>>>>>>> to consider it a good trade-off.
>     >>>>>>>> It seems to me that even the current
6x-7x slowdown could
>     be useful.
>     >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming
>     DWARFLinker
>     >>>>>>>> would be taught to work with a split
dwarf) tools spend
>     this time and,
>     >>>>>>>> in some scenarios, waste disk space by
inter-mediate files.
>     >>>>>>> FWIW, dwp (llvm-dwp hasn't really been
optimized compared
>     to binutils
>     >>>>>>> dwp) is designed to be very quick - by not
needing to do a
>     lot of
>     >>>>>>> parsing/fixups. Which, yes, means larger
output files than
>     would be
>     >>>>>>> possible with more parsing/etc. It also
doesn't take any
>     input from
>     >>>>>>> the linker (so it can run in parallel with
the linker) -
>     so it can't
>     >>>>>>> remove dead subprograms. Given
Google's the major (perhaps
>     only
>     >>>>>>> significant?) user of Split DWARF - I can
say that the
>     needs don't
>     >>>>>>> necessarily overlap well with something
that would take
>     significantly
>     >>>>>>> longer to run or use significantly more
memory.
>     Faster/cheaper/with
>     >>>>>>> somewhat bigger output files is probably
the right
>     tradeoff for
>     >>>>>>> Google's use case, at least.
>     >>>>>>>
>     >>>>>>> I imagine Apple's use for dsymutil is
somewhat similar -
>     it's not used
>     >>>>>>> in the iterative development cycle, only
in final releases
>     - well,
>     >>>>>>> maybe their situation is more
"neutral" - not a major pain
>     point in
>     >>>>>>> any case I'd guess.
>     >>>>>>>
>     >>>>>>>
>     >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from
>     lld:
>     >>>>>>
>     >>>>>> 1. split-dwarf+llvm-dwp = linking time for
clang 6 sec,
>     >>>>>>       generating time for .dwp 53 sec,
clang=997M
>     clang.dwp=1.1G.
>     >>>>> FWIW, llvm-dwp is not very well optimized (which
is to say:
>     it is not
>     >>>>> optimized), binutils dwp might be a better
comparison (&
>     even that
>     >>>>> doesn't have the parallelism & some
potential further memory
>     savings
>     >>>>> that lld has that we could take advantage of in a
dwp-like tool)
>     >>>>>
>     >>>>> What build mode was the clang binary built in?
Optimized or
>     unoptimized?
>     >>>> right, that is unoptimized build with
-ffunction-sections.
>     >>>>
>     >>>>>> 2. DWARFLinker from lld = linking time for
clang 72 sec,
>     clang=760M.
>     >>>>> It does seem a tad strange that the clang binary
would be
>     smaller
>     >>>>> non-split with DWARF linking than it was split.
Though I
>     could imagine
>     >>>>> this might be possible in an optimized build
(wehre debug_ranges
>     >>>>> become quite relatively expensive in the .o file
>     contribution with
>     >>>>> Split DWARF)
>     >>>>> Could you compare the section sizes between these
two clang
>     binaries, perhaps?
>     >>>> .debug_ranges is three times bigger and .debug_line is
twice
>     bigger.
>     >>>>
>     >>>>>>>> Thus if they would use this LLD
feature in its current state
>     >>>>>>>> - they would still receive benefits.
>     >>>>>>>>
>     >>>>>>>> Speaking of performance results - LLD
is a multi-thread
>     linker;
>     >>>>>>>> it handles sections in parallel.
DWARFLinker generates
>     DWARF using
>     >>>>>>>> AsmPrinter which is a stream - so it
could make resulting
>     DWARF only
>     >>>>>>>> continuously. It is not surprising
that the parallel
>     solution works faster.
>     >>>>>>>> Making DWARFLinker truly
multi-threaded would probably
>     allow us
>     >>>>>>>> to make slowdown to be at 2x-4x range.
>     >>>>>>> *nod* that's still a really expensive
link - but I
>     understand that's a
>     >>>>>>> suitable tradeoff for your users
>     >>>>>>>
>     >>>>>> Btw, 2x or 7x is for pure linking time.
Overall compilation
>     slowdown
>     >>>>>> is not so significant. Building LLVM codebase
has only 20%
>     slowdown.
>     >>>>> Understood - that's still quite significant to
most users,
>     I'd imagine.
>     >>>> I see.
>     >>>>
>     >>>>>>>>>> Anyway, I think the dsymutil
approach is still
>     valuable, and it
>     >>>>>>>>>> would be useful to optimize
it.
>     >>>>>>>>>> Do you think it would be
useful to make
>     dsymutil/DWARFLinker truly multi-thread?
>     >>>>>>>>>> (To make dsymutil/DWARFLinker
able to process each
>     object file in a separate thread)
>     >>>>>>>>> Perhaps - that I'd probably
leave up to the folks who
>     are more
>     >>>>>>>>> invested in dsymutil (Adrian
Prantl et al). Maybe one
>     day we'll get it
>     >>>>>>>>> integrated into llvm-dwp and then
I'll be interested in
>     getting as
>     >>>>>>>>> much performance out of it as lld
- so multithreading
>     and things would
>     >>>>>>>>> be on the books.
>     >>>>>>>> I think improving dsymutil is a
valuable thing.
>     >>>>>>>> Though there are several directions
which might be considered
>     >>>>>>>> to make it more robust:
>     >>>>>>>>
>     >>>>>>>> 1. support of latest DWARF -
DWARF5/DWARF64...
>     >>>>>>> I expect/though some of the Apple folks
had already worked
>     on DWARF5 support?
>     >>>>>>> DWARF64 - that's been around for a
while, and just hasn't
>     been needed
>     >>>>>>> by LLVM users thus far, it seems (until
recently - where some
>     >>>>>>> developers have started working on that)
>     >>>>>> There already implemented debug_names table,
but
>     debug_rnglists,
>     >>>>>> debug_loclists, type units - are not
implemented yet.
>     >>>>> Superficially, type units wouldn't be on the
list of
>     features (like
>     >>>>> DWARF64 - it's optional) I'd try to
support in dsymutil -
>     since their
>     >>>>> size overhead is more justified for a
DWARF-agnostic linker
>     that's
>     >>>>> using comdat groups. With a DWARF-aware linker
I'd be
>     specifically
>     >>>>> hoping to avoid using type units to help
>     >>>>>> The thing which
>     >>>>>> should probably be changed is that dsymutil
should not have
>     its version
>     >>>>>> of code generating DWARF tables. It should
call already existed
>     >>>>>> DWARF5/DWARF64 implementations. Then dsymutil
would always
>     >>>>>> use last DWARF generators.
>     >>>>> Possibly - I don't know what the architectural
tradeoffs for
>     that look
>     >>>>> like - I'd imagine DWARFLinker has
sufficiently different
>     >>>>> needs/tradeoffs than LLVM's DWARF generation
code (rewriting
>     existing
>     >>>>> DIEs compared to building new ones from scratch,
etc) that
>     it might be
>     >>>>> hard for them to share a lot of their
implementation.
>     >>>> It is not easy, and would require some additions, but
it
>     would benefit
>     >>>> in that all format implementation is in one place.
Thus
>     changing that place
>     >>>> would reflect in other places. There are at least
three
>     implementations for
>     >>>> .debug_ranges, .debug_aranges currently...
>     >>>>
>     >>>>
>     >>>>>>>> 2. implement multi-threaded execution.
>     >>>>>>>> 3. support of split DWARF.
>     >>>>>>> Maybe, though I'm still not sure
it'd be the right tradeoff -
>     >>>>>>> especially if it involved having to wait
to run the .dwo
>     merger (call
>     >>>>>>> it DWARF-aware dwp, or dsymutil with dwp
support) until
>     after the
>     >>>>>>> linker ran.
>     >>>>>>>
>     >>>>>>>> 4. implement dsymutil for non-darwin
platform.
>     >>>>>>> That's probably, essentially (3),
more-or-less. Split DWARF is
>     >>>>>>> somewhat of a formalization of
Apple's/MachO DWARF
>     distribution model
>     >>>>>>> (leave DWARF it in files that aren't
linked/use them from
>     a debugger,
>     >>>>>>> but also be able to merge them into some
final file (dsym
>     or dwp) for
>     >>>>>>> archival purposes)
>     >>>>>>>
>     >>>>>>>> All of this is a massive piece of
work.
>     >>>>>>>> Our original investment was to solve
two problems:
>     >>>>>>>>
>     >>>>>>>> 1. Overlapped address ranges, which is
currently close to
>     being solved. Thank you for helping with that!
>     >>>>>>> Yeah, again, sorry that's taken quite
so long/somewhat
>     circuitous route.
>     >>>>>>>
>     >>>>>>>> 2. Size of debug info. That still
becomes an issue, but
>     we are unsure whether we are ready to
>     >>>>>>>>      invest in solving all the above
1-4 problems and how
>     much community interested in it.
>     >>>>>>> Fair, for sure - I don't think
you'd need to sign up to
>     solve all of
>     >>>>>>> them (don't think they necessarily
need solving).
>     Potentially moving
>     >>>>>>> the logic out into a separate tool as
Fangrui's
>     considering - a
>     >>>>>>> post-link DWARF optimizer, rather than
in-linker DWARF
>     optimization.
>     >>>>>>>
>     >>>>>>> I really don't want to give you the
runaround like this -
>     but multiple
>     >>>>>>> times slower links is something that seems
pretty
>     problematic for most
>     >>>>>>> users, to the point of weighing the
maintainability of lld
>     against the
>     >>>>>>> convenience of having this functionality
in-linker rather
>     than in a
>     >>>>>>> post-link optimizer.
>     >>>>>>>
>     >>>>>>> (I know you've spoken a bit before
about your users needs
>     - but if
>     >>>>>>> it's possible, could you explain
(again :/) why they have
>     such a
>     >>>>>>> strong need for smaller DWARF? While DWARF
size is an
>     ongoing concern
>     >>>>>>> for many users (Google certainly - hence
the invention of
>     Split DWARF,
>     >>>>>>> use of type units and compressed DWARF,
etc) - usually
>     it's in rather
>     >>>>>>> large programs, but it sounds like
you're dealing with
>     relatively
>     >>>>>>> small ones (otherwise the increase in link
time, I'd
>     imagine, would be
>     >>>>>>> prohibitive for your users?)?
>     >>>>>> We have many large programs and keep
Dayly/Nightly debug
>     builds,
>     >>>>>> which takes a lot of disk space. Compilation
time for these
>     programs is big.
>     >>>>>> The scenario is "compile once".(not
>     compile-debug-compile-debug).
>     >>>>>> So we think that solution(like
dsymutil/DWARFLinker) would
>     not slowdown
>     >>>>>> the compilation time of overall build
significantly(see
>     above numbers for
>     >>>>>> llvm codebase) and would allow us to reduce
disk space
>     required to keep
>     >>>>>> all of these builds.
>     >>>>> Ah, OK - for archival purposes. So the interactive
>     developers wouldn't
>     >>>>> necessarily be using this feature. Makes sense -
similar to
>     dsymutil
>     >>>>> and dwp, mostly used for archival purposes &
you can debug
>     straight
>     >>>> >from .o/.dwos for interactive/iterative
development.
>     >>>>
>     >>>>> In that case, it seems more likely that a separate
tool
>     might suffice.
>     >>>> agreed: if to continue the work on this then it makes
sense to
>     >>>> do it as separate tool. Make it fast enough. And if
there
>     would be interest
>     >>>> in it - then it would probably be possible to return
to idea
>     calling it from linker.
>     >>>>
>     >>>>> Also, out of curiosity - have you tried just
compressing the
>     output
>     >>>>> (-gz (I think that does the right thing for the
linker level
>     >>>>> compression too, otherwise
-Wl,-compress-debug-sections
>     might do it))
>     >>>>> or are you already doing that in addition?
>     >>>> sure. we use  -Wl,-compress-debug-sections.
>     >>>>
>     >>>> Thank you, Alexey.
>     >>>>
>     >>>>>>> You mentioned that the usability cost of
>     >>>>>>> Split DWARF for your users was too high
(or high enough to
>     justify
>     >>>>>>> this alternative work of DWARF-aware
linking)? That all
>     seems a bit
>     >>>>>>> surprising to me - though I understand the
deployment
>     issues of Split
>     >>>>>>> DWARF do present some challenges to users
in more heterogenous
>     >>>>>>> environments than Google's... still,
I'd have thought
>     there was some
>     >>>>>>> hope there)
>     >>>>>> Our tools does not support split dwarf yet.
Though we plan
>     to implement it.
>     >>>>>> When we would have support of split dwarf then
it would be
>     >>>>>> convenient to have easy way to share built
debug binaries.
>     llvm-dwp is the
>     >>>>>> answer to this. DWARFLinker could probably be
another answer.
>     >>>>> Ah, fair enough - thanks for the context!
>     >>>>>>>>> One way to do that would be to
have a CU-local type
>     indirection table.
>     >>>>>>>>> DIEs reference local type numbers
(like local
>     address/string numbers -
>     >>>>>>>>> addrx/strx/rnglistx) and that
table contains either sig8
>     (no linker
>     >>>>>>>>> fixups required) or the local type
offsets you describe
>     - the linker
>     >>>>>>>>> would then only need to read this
type number
>     indirection table and
>     >>>>>>>>> rewrite them to the final type
numbers.
>     >>>>>>>> Yes, that could be additionally done
if this process
>     would be time-consuming.
>     >>>>>>>>
>     >>>>>>>> David, thank you for all your comments
and explanations.
>     They are extremely helpful.
>     >>>>>>> Sure thing - really appreciate your
patience with all this
>     - it's... a
>     >>>>>>> lot of moving parts.
>     >>>>>>> - Dave
>     >>>>>>> Thank you, Alexey.
>     >>>>>>>
>     >>>>>>>> sig8 hash-id would be used to compare
types and to
>     deduplicate them.
>     >>>>>>>> It would speed up the current dsymutil
context analysis.
>     >>>>>>>> Types having the same hash-id could be
deduplicated.
>     >>>>>>>> This would allow deduplicating a more
number of types
>     than current dsymutil.
>     >>>>>>>> Incomplete type definitions having a
similar set of
>     members are not deduplicated by dsymutil currently.
>     >>>>>>>> In this case they would have the same
hash-id.
>     >>>>>>>>
>     >>>>>>>> This "type table" would take
less space than current
>     "type units" and current ODR solution.
>     >>>>>>>>
>     >>>>>>>> Above is just an idea on how to help
DWARF-aware
>     linker(based on idea removing obsolete debug info)
>     >>>>>>>> to work faster(if that is
interesting).
>     >>>>>>>>
>     >>>>>>>> Alexey.
>     >>>>>>>>
>     >>>>>>>>> From: llvm-dev
<llvm-dev-bounces at lists.llvm.org
>     <mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of
James
>     Henderson via llvm-dev
>     >>>>>>>>> Sent: Wednesday, June 3, 2020 3:48
AM
>     >>>>>>>>> To: David Blaikie <dblaikie at
gmail.com
>     <mailto:dblaikie at gmail.com>>
>     >>>>>>>>> Cc: llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>
>     >>>>>>>>> Subject: Re: [llvm-dev]
[Debuginfo][DWARF][LLD] Remove
>     obsolete debug info in lld.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> It makes me sad that the linker
(via a library or
>     otherwise) has to be "DWARF-aware" to be able to effectively
>     handle --gc-sections, COMDATs, --icf etc for debug info, without
>     leaving large blocks of data kicking around.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> The patching to -1 (or equivalent)
is probably a good
>     lightweight solution (though I'd love it if it could be done based
>     on section type in the future rather than section name, but that's
>     probably outside the realm of DWARF), as it requires only minimal
>     understanding in the linker, but anything beyond that seems to be
>     complicated logic that is mostly due to the structure of DWARF.
>     Patching to -1 does feel a bit like a sticking plaster/band aid to
>     patch over the issue rather than properly solving it too - there
>     will still be debug data (potentially significant amounts in
>     COMDAT-heavy objects) that the linker has to write and the
>     debugger has to somehow know how to skip (even if it knows that -1
>     is special-case due to the standard being updated, it needs to get
>     as far as the -1), which is all wasted effort.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> We've already seen from
Alexey's prototyping, and from
>     our own experiences with the Sony proprietary linker (which tried
>     to rewrite .debug_line only) that deconstructing the DWARF so that
>     it can be more optimally reassembled at link time is slow going,
>     and will probably inevitably be however much effort is put into
>     optimising it. For a start, given the current standards, it's
>     impossible to know how to deconstruct it without having to parse
>     vast amounts of DWARF, which is typically going to mean a lot more
>     parsing work than the linker would normally have to deal with.
>     Additionally, much of this parsing work is wasted effort, since it
>     seems unlikely in many links that large amounts of the DWARF will
>     be redundant. Having an option to opt-in doesn't help much there,
>     since it just means the logic exists without most people using it,
>     due to it not being good enough, or potentially they don't even
>     know it exists.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> I don't have particularly
concrete suggestions as to how
>     to solve the structural problems with DWARF at this point. The
>     only thing that seems obvious to me is a more "blessed"
approach
>     to fragmentation of sections, similar to what I tried with my
>     prototype mentioned earlier in the thread, although we'd need to
>     figure out the previously stated performance issues. Other ideas
>     might tie into this, like somehow sharing the various table
>     headers a bit like CIEs in .eh_frame that could be merged by the
>     linker - each object could have separate table header sections,
>     which are referenced by the individual .debug_* blocks, which in
>     turn are one per function/data piece and easily discardable/merged
>     by the linker.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> Just some thoughts.
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> James
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> On Tue, 2 Jun 2020 at 19:24, David
Blaikie via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>     >>>>>>>>>
>     >>>>>>>>> On Tue, May 19, 2020 at 7:17 AM
Alexey Lapshin
>     >>>>>>>>> <alapshin at accesssoftek.com
>     <mailto:alapshin at accesssoftek.com>> wrote:
>     >>>>>>>>>> Hi David, please find my
comments inside:
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>>>> Broad question: Do
you have any specific
>     motivation/users/etc in implementing this (if you can speak about it)?
>     >>>>>>>>>>>>> - it might help
motivate the work, understand what
>     tradeoffs might be suitable for you/your users, etc.
>     >>>>>>>>>>>> There are two general
requirements:
>     >>>>>>>>>>>> 1) Remove (or clean)
invalid debug info.
>     >>>>>>>>>>> Perhaps a simpler direct
solution for your immediate
>     needs might be a much narrower,
>     >>>>>>>>>>> and more efficient
linker-DWARF-awareness feature:
>     >>>>>>>>>>>
>     >>>>>>>>>>> With DWARFv5, rnglists
present an opportunity for a
>     DWARF linker to rewrite the ranges
>     >>>>>>>>>>> without parsing the rest
of the DWARF. /technically/
>     this isn't guaranteed - rnglist entries
>     >>>>>>>>>>> can be referenced either
directly, or by index. If all
>     rnglists are referenced by index, then
>     >>>>>>>>>>> a linker could parse only
the debug_rnglists section
>     and rewrite ranges to remove any
>     >>>>>>>>>>> address ranges that refer
to optimized-out code.
>     >>>>>>>>>>>
>     >>>>>>>>>>> This would only be correct
for rnglists that had no
>     direct references to them (that only were
>     >>>>>>>>>>> referenced via the
indexes) - but we could either
>     implement it with that assumption, or could
>     >>>>>>>>>>> add an LLVM extension
attribute on the CU that would
>     say "I promise I only referenced rnglists
>     >>>>>>>>>>> via rnglistx
forms/indexes). If this DWARF-aware
>     linking would have to read the CU DIE (not
>     >>>>>>>>>>> all the other DIEs) it
/could/ also then rewrite
>     high/low_pc if the CU wasn't using ranges...
>     >>>>>>>>>>> but that wouldn't come
up in the function-removal
>     case, because then you'd have ranges anyway,
>     >>>>>>>>>>> so no need for that.
>     >>>>>>>>>>>
>     >>>>>>>>>>> Such a DWARF-aware rnglist
linking could also simplify
>     rnglists, in cases where functions
>     >>>>>>>>>>> ended up being laid out
next to each other, the linker
>     could coalesce their ranges together.
>     >>>>>>>>>>>
>     >>>>>>>>>>> I imagine this could be
implemented with very little
>     overhead to linking, especially compared
>     >>>>>>>>>>> to the overhead of full
DWARF-aware linking.
>     >>>>>>>>>>>
>     >>>>>>>>>>> Though none of this fixes
Split DWARF, where the
>     linker doesn't get a chance to see the
>     >>>>>>>>>>> addresses being used - but
if you only want/need the
>     CU-level ranges to be correct, this
>     >>>>>>>>>>> might be a viable fix, and
quite efficient.
>     >>>>>>>>>> Yes, we think about that
alternative. This would
>     resolve our problem of invalid debug info
>     >>>>>>>>>> and would work much faster.
Thus, if we would not have
>     good results for D74169 then we
>     >>>>>>>>>> will implement it. Do you
think it could be useful to
>     have this solution in upstream?
>     >>>>>>>>> A pure rnglist rewriting - I think
it'd be OK to have in
>     upstream -
>     >>>>>>>>> again, cost/benefit/etc would have
to be weighed. I'm
>     not sure it
>     >>>>>>>>> would save enough space to be
particularly valuable
>     beyond the
>     >>>>>>>>> correctness issue - and it
doesn't completely solve the
>     correctness
>     >>>>>>>>> issue for zero-address usage or
low-address usage
>     (because you could
>     >>>>>>>>> still have overlapping subprograms
inside a CU - so if
>     you were
>     >>>>>>>>> symbolizing you could use the
correct rnglist to filter,
>     but then go
>     >>>>>>>>> look inside the CU only to find
two subprograms that had
>     that address
>     >>>>>>>>> & not know which one was the
correct one an which one
>     was the
>     >>>>>>>>> discarded one).
>     >>>>>>>>>
>     >>>>>>>>> rnglist rewriting might be easy
enough to prototype -
>     but depends what
>     >>>>>>>>> you want to spend your time on, I
know this whole issue
>     has been a
>     >>>>>>>>> huge investment of your time
already - but maybe this recent
>     >>>>>>>>> revitalization of the conversation
around having an
>     explicit value in
>     >>>>>>>>> the linker might be sufficient to
address everyone's
>     needs... *fingers
>     >>>>>>>>> crossed*)
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>>>> 2) Optimize the DWARF
size.
>     >>>>>>>>>>> Do your users care much
about this? I imagine if they
>     had significant DWARF size issues,
>     >>>>>>>>>>> they'd have
significant link time issues and the kind
>     of cost to link time this feature has would
>     >>>>>>>>>>> be prohibitive - but
perhaps they're sharing linked
>     binaries much more often than they're
>     >>>>>>>>>>> actually performing
linking.
>     >>>>>>>>>> Yes, they do. They also have
significant link-time issues.
>     >>>>>>>>>> So current performance results
of D74169 are not very
>     acceptable.
>     >>>>>>>>>> We hope to improve it.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>>> The specifics which
our users have:
>     >>>>>>>>>>>>    - embedded platform
which uses 0 as start of .text
>     section.
>     >>>>>>>>>>>>    - custom toolset
which does not support all
>     features yet(f.e. split dwarf).
>     >>>>>>>>>>>>    - tolerant of the
link-time increase.
>     >>>>>>>>>>>>    - need a useful way
to share debug builds.
>     >>>>>>>>>>> Sharing two files
(executable and dwp) is
>     significantly less useful than sharing one file?
>     >>>>>>>>>> Probably not significantly,
but yes, it looks less
>     useful comparing to D74169.
>     >>>>>>>>>> Having only two files
(executable and .dwp) looks
>     significantly better than having executable and multiple .dwo files.
>     >>>>>>>>>> Having only one
file(executable) with minimal size
>     looks better than the two files with a bigger size.
>     >>>>>>>>>>
>     >>>>>>>>>> clang compiled with
-gsplitdwarf takes 0.9G for
>     executable and 0.9G for .dwp.
>     >>>>>>>>>> clang compiled with
-gc-debuginfo takes only 0.76G for
>     single executable.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>>> For the first point:
we have a problem "Overlapping
>     address ranges starting from 0"(D59553).
>     >>>>>>>>>>>> We use custom
solution, but the general solution like
>     D74169 would be better here.
>     >>>>>>>>>>> If CU ranges are the only
ones that need fixing, then
>     I think the above solution might be as
>     >>>>>>>>>>> good/better - if more than
CU ranges need fixing, then
>     I think we might want to start talking about
>     >>>>>>>>>>> how to fix DWARF itself
(split and non-split) to
>     signal certain addresses point to dead code with a
>     >>>>>>>>>>> specific blessed value
that linkers would need to
>     implement - because with Split DWARF there's
>     >>>>>>>>>>> no way to solve the non-CU
addresses at the linker.
>     >>>>>>>>>> I think the worthful solution
for that signal value
>     would be LowPC > HighPC.
>     >>>>>>>>>> That does not require
additional bits in DWARF.
>     >>>>>>>>>> It would be natural to skip
such address ranges since
>     they explicitly marked as invalid.
>     >>>>>>>>>> It could be implemented in a
linker very easily.
>     Probably, it would make sense to describe that
>     >>>>>>>>>> usage in DWARF standard.
>     >>>>>>>>>>
>     >>>>>>>>>> As to the addresses which are
not seen by the
>     linker(since they are in .dwo files) - yes,
>     >>>>>>>>>> they need to have another
solution. Could you show an
>     example of such a case, please?
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>>>> 2. Support of type
units.
>     >>>>>>>>>>>>>> That could be
implemented further.
>     >>>>>>>>>>>>> Enabling type
units increases object size to make it
>     easier to deduplicate at link time by a DWARF-unaware
>     >>>>>>>>>>>>> linker. With a
DWARF aware linker it'd be generally
>     desirable not to have to add that object size overhead to
>     >>>>>>>>>>>>> get the linking
improvements.
>     >>>>>>>>>>>> But, DWARFLinker
should adequately work with type
>     units since they are already implemented.
>     >>>>>>>>>>> Maybe - it'd be nice
& all, but I don't think it's an
>     outright necessity - if someone knows they're using
>     >>>>>>>>>>> a DWARF-aware linker,
they'd probably not use type
>     units in their object files. It's possible someone
>     >>>>>>>>>>> doesn't know for sure
& maybe they have pre-canned
>     debug object files from someone else, etc.
>     >>>>>>>>>> I see.
>     >>>>>>>>>>
>     >>>>>>>>>>>> Another thing is that
the idea behind type units has
>     the potential to help Dwarf-aware linker to work faster.
>     >>>>>>>>>>>> Currently, DWARFLinker
analyzes context to understand
>     whether types are the same or not.
>     >>>>>>>>>>> When you say
"analyzes context" what do you mean?
>     Usually I'd take that to mean
>     >>>>>>>>>>> "looks at things
outside the type itself - like what
>     namespace it's in, etc" - which, yes,
>     >>>>>>>>>>> it should do that, but it
doesn't seem very expensive
>     to do. But I guess you actually
>     >>>>>>>>>>> mean something about doing
structural equivalence in
>     some way, looking at things inside the type?
>     >>>>>>>>>> I think it could be useful for
both cases. Currently,
>     dsymutil does only first thing
>     >>>>>>>>>> (look at type name, namespace
name, etc..) and does not
>     do the second thing
>     >>>>>>>>>> (doing structural
equivalence). Analyzing type names is
>     currently quite expensive
>     >>>>>>>>>> (the only search in string
pool takes ~10 sec from 70
>     sec of overall time).
>     >>>>>>>>>> That is expensive because of
many things should be done
>     to work with strings:
>     >>>>>>>>>> parse DWARF, search and
resolve relocations, compute a
>     hash for strings,
>     >>>>>>>>>> put data into a string pool,
create a fully qualified
>     name(like namespace::function::name).
>     >>>>>>>>>> It looks like it could be
optimized and finally require
>     less time, but it still would be a noticeable
>     >>>>>>>>>> part of the overall time.
>     >>>>>>>>>>
>     >>>>>>>>>> If dsymutil starts to check
for the structural
>     equivalence, then the process would be even more slowly.
>     >>>>>>>>>> So, If instead of comparing
types structure, there
>     would be checked single hash-id - then this process
>     >>>>>>>>>> would also be faster.
>     >>>>>>>>>>
>     >>>>>>>>>> Thus I think using hash-id to
compare types would allow
>     to make current implementation faster and would
>     >>>>>>>>>> allow handling incomplete
types by DWARFLinker without
>     massive performance degradation also.
>     >>>>>>>>>>
>     >>>>>>>>>>>> But the context is
known when types are generated.
>     So, no need to spent the time analyzing it.
>     >>>>>>>>>>>> If types could be
compared without analyzing context,
>     then Dwarf-aware linker would work faster.
>     >>>>>>>>>>>> That is just an
idea(not for immediate
>     implementation): If types would be stored in some "type
table"
>     >>>>>>>>>>>> (instead of COMDAT
section group) and could be
>     accessed through hash-id(like type units
>     >>>>>>>>>>>> - then it would be the
solution requiring fewer bits
>     to store but allowing to compare types
>     >>>>>>>>>>>> by hash-id(not
analysing context).
>     >>>>>>>>>>>> In this case, size
increasing would be small. And
>     processing time could be done faster.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> this is just an idea
and could be discussed
>     separately from the problem of integrating of D74169.
>     >>>>>>>>>>>>>> 6. -flto=thin
>     >>>>>>>>>>>>>>   That problem
was described in this review
>     https://reviews.llvm.org/D54747#1503720. It also exists in
>     >>>>>>>>>>>>>> current
DWARFLinker/dsymutil implementation. I
>     think that problem should be discussed more: it could
>     >>>>>>>>>>>>>> probably be
fixed by avoiding generation of such
>     incomplete declaration during thinlto,
>     >>>>>>>>>>>>>> That would be
costly to produce extra/redundant
>     debug info in ThinLTO - actually ThinLTO could be doing
>     >>>>>>>>>>>>>> more to reduce
that redundancy early on (actually
>     removing definitions from some llvm Modules if the type
>     >>>>>>>>>>>>>> definition is
known to exist in another Module, etc)
>     >>>>>>>>>>>>> I don't know
if it's a problem since that patch was
>     reverted.
>     >>>>>>>>>>>> Yes. That patch was
reverted, but this patch(D74169)
>     has the same problem.
>     >>>>>>>>>>>> if D74169 would be
applied and --gc-debuginfo used
>     then structure type
>     >>>>>>>>>>>> definition would be
removed.
>     >>>>>>>>>>>> DWARFLinker could
handle that case - "removing
>     definitions from some llvm Modules if the type
>     >>>>>>>>>>>> definition is known to
exist in another Module".
>     >>>>>>>>>>>> i.e. DWARFLinker could
replace the declaration with
>     the definition.
>     >>>>>>>>>>>> But that problem could
be more easily resolved when
>     debug info is generated(probably without
>     >>>>>>>>>>>> significant increase
of debug info size):
>     >>>>>>>>>>>> Here we have:
>     >>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile unit
>     containing concrete instance for function "f".
>     >>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile unit
>     containing abstract instance root for function "f".
>     >>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile unit
>     containing function "f" definition.
>     >>>>>>>>>>>> Code for function
"f" was deleted. gc-debuginfo
>     deletes compile unit DW_TAG_compile_unit(0x000000c1)
>     >>>>>>>>>>>> containing
"f" definition (since there is no
>     corresponding code). But it has structure "Foo" definition
>     >>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced from
>     DW_TAG_compile_unit(0x00000073)
>     >>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae).
>     That declaration is exactly the case when definition
>     >>>>>>>>>>>> was removed by thinlto
and replaced with declaration.
>     >>>>>>>>>>>> Would it cost too much
if type definition would not
>     be replaced with declaration for "abstract instance root"?
>     >>>>>>>>>>>> The number of concrete
instances is bigger than
>     number of abstract instance roots.
>     >>>>>>>>>>>> Probably, it would not
be too costly to leave
>     definition in abstract instance root?
>     >>>>>>>>>>
>     >>>>>>>>>>>> Alternatively, Would
it cost too much if type
>     definition would not be replaced with declaration when
>     >>>>>>>>>>>> declaration references
type from not used function?
>     (lto could understand that concrete function is not used).
>     >>>>>>>>>>> I don't follow this
example - could you provide a
>     small concrete test case I could reproduce?
>     >>>>>>>>>> I would provide a test case if
necessary. But it looks
>     like this issue is finally clear, and you already commented on that.
>     >>>>>>>>>>
>     >>>>>>>>>>> Oh, I guess this is
happening perhaps because ThinLTO
>     can't know for sure that a standalone
>     >>>>>>>>>>> definition of 'f'
won't be needed - so it produces one
>     in case one of the inlining opportunities
>     >>>>>>>>>>> doesn't end up
inlining. Then it turns out all calls
>     got inlined, so the external definition wasn't needed.
>     >>>>>>>>>>> Oh, you're suggesting
that these 3 CUs got emitted
>     into one object file during LTO, but that DWARFLinker
>     >>>>>>>>>>> drops a CU without any
code in it - even though... So
>     far as I know, in LTO, LLVM directly references
>     >>>>>>>>>>> types across units if the
CUs are all emitted in the
>     same object file. (and if they weren't in the same
>     >>>>>>>>>>> object file - then the
abstract_origin couldn't be
>     pointing cross-CU).
>     >>>>>>>>>>> I guess some basic things
to say:
>     >>>>>>>>>>> With ThinLTO, the
concrete/standalone function
>     definition is emitted in case some call sites don't end up
>     >>>>>>>>>>> being inlined. So we know
it'll be emitted (but might
>     not be needed by the actual linker)
>     >>>>>>>>>>> ANy number of inline calls
might exist - but we
>     shouldn't put the type information into those, because
>     >>>>>>>>>>> they aren't guaranteed
to emit it (if the inline
>     function gets optimized away, there would be nothing to
>     >>>>>>>>>>> enforce the type being
emitted) - and even if we
>     forced the type information to be emitted into one
>     >>>>>>>>>>> object file that has an
inline copy of the function -
>     there's no guarantee that object file will get linked in either.
>     >>>>>>>>>>> So, no, I don't think
there's much we can do to keep
>     the size of object files down, while guaranteeing
>     >>>>>>>>>>> the type information will
be emitted with the usual
>     linker semantics.
>     >>>>>>>>>> Then dsymutil/DWARFLinker
could be changed to handle
>     that(though it would probably be not very efficient).
>     >>>>>>>>>> If thinlto would understand
that function is not used
>     finally(and then must not contain referenced type definition),
>     >>>>>>>>>> then this situation could be
handled more effectively.
>     >>>>>>>>>>
>     >>>>>>>>>> Thank you, Alexey.
>     >>>>>>>>>>
>     >>>>>>>>>>>>
>     >>>>>>>>>>>>
>     >>>>>>>>>>>>
_______________________________________________
>     >>>>>>>>>>>> LLVM Developers
mailing list
>     >>>>>>>>>>>> llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     >>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >>>>>>>>>
_______________________________________________
>     >>>>>>>>> LLVM Developers mailing list
>     >>>>>>>>> llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>
>     >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >>> _______________________________________________
>     >>> LLVM Developers mailing list
>     >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200803/c2337109/attachment-0001.html>

llvm dev - Jul 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.