thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2020-Jun-22 14:20 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>> >> >Alexey> Probably we could try to make DWARF easy to
parsing, trimming, rewriting so that full DWARF
>> >> >Alexey> parsing solution would not take too much time?
>> >> >Alexey>
>> >> >Alexey> f.e. -debug-types-section solution uses COMDAT
sections to split and deduplicate types.
>> >> >Alexey> That solution works quite fast. It has already
mentioned drawback with a big size
>> >> >Alexey> overhead(because of section headers/type unit
headers sizes). But, the fact that type units
>> >> >Alexey> could be identified just by hash-id(without
parsing type names and types hierarchies)
>> >> >Alexey> allows the linker to reject duplications
quickly. Another thing is that the linker drops
>> >> >Alexey> duplicated COMDAT sections without any
additional check. After duplications are deleted,
>> >> >Alexey> the debug info is still consistent.
>> >> >Alexey> There could be done DWARF aware solution
working using the same two principles:
>> >> >Alexey> 1. compare types by hash-id.
>> >> >Alexey> 2. drop duplications without analyzing
contents.
>> >> >Alexey>
>> >> >Alexey> If all types are put into a separate type table
and have hash-id, then it would be much easier to
>> >> >Alexey> deduplicate them. The idea demonstrated here -
https://reviews.llvm.org/P8164. (It still has a
>> >> >Alexey> questions: whether base types should be put
into type table, whether references into type table
>> >> >Alexey> should be done by DW_AT_signature or just by
offset, etc.. ) While handling that separate type table
>> >> >Alexey> the DWARF aware linker would check the only
hash_id and put only one type description
>> >> >Alexey> with the same id in the final type table. It
also would allow us to solve that -flto=thin problem -
>> >> >Alexey> 
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil
example there).
>> >> >Alexey> i.e., the case when type definition would be
removed will not occur.
>> >>
>> >> David>I think there is scope for lower-overhead type
deduplication,
>> >> David>especially now with type units being merged into the
debug_info
>> >> David>section. Perhaps we could drop dwo_ids and use
section references to
>> >> David>refer to types & rely on the linker to keep those
referenced sections
>> >> David>alive - though section references are longer than
CU-relative
>> >> David>references. (but we need the extra length - because
if the linker
>> >> David>deduplicates a type definition - one CU may be
referencing a type very
>> >> David>far away, so the shorter reference might be
inadequate) I don't think
>> >> David>the indirection through the type hash is /super/
significant to the
>> >> David>cost - I think it's more in the duplication of
many DIEs especially
>> >> David>for function definitions (since the type unit sig8
system only
>> >> David>provides a way to reference the type - not its member
functions, their
>> >> David>parameters, etc - so all those DIEs get duplicated in
any CU that
>> >> David>needs to provide a definition of a member function).
We could
>> >> David>prototype cross-unit DIE references to lower the cost
of that
>> >> David>duplication, though rumor has it that constructor
based type homing
>> >> David>might provide enough value to obviate the need for
type units (or at
>> >> David>least make the overhead not worthwhile - so
revisiting the overhead to
>> >> David>reduce it might make it worthwhile again... ).
>> >>
>> >> David>Probably wouldn't be super hard to use LLVM's
existing cross-unit DIE
>> >> David>Referencing machinery (implemented for LTO) to refer
directly to DIEs
>> >> David>in a type unit without using the signature system...
- hmm, that'd
>> >> David>only work if your type unit DIEs were identical?
/maybe/ ? Not sure
>> >> David>how that'd work if you wanted to refer into a
type unit, but the type
>> >> David>unit got deduplicated. Might be able to rely on the
linker to preserve
>> >> David>every unique copy of the type unit that's
referenced if we phrase
>> >> David>things carefully - so if your compiler does produce
exactly identical
>> >> David>type units they get deduplicated and sec_refs refer
to the uniquely
>> >> David>preserved copy - but otherwise it preserves as many
distinct copies as
>> >> David>needed. (I don't know enough about how that works
to be sure - but I
>> >> David>know that these linkonce/inline function
deduplication does seem to
>> >> David>cause the DWARF to refer to the singular function if
that function is
>> >> David>identical, and if it isn't, then you get 0 - so
there's /something/ in
>> >> David>the linker that can adjust for deduplicating
identical duplicates... )
>> >>
>> >> Probably I was a bit unclear: the above idea is not for types
>> >> (placed in COMDAT sections) deduplicated by the linker.
>>
>> >Yeah, I was just wondering whether it could be useful for that too.
>>
>> >> This idea goes in another direction than fragmenting dwarf
>> >> using elf sections&tricks. It seems to me that the cost of
fragmenting is too high.
>>
>> >I tend to agree - but I'm sort of leaning towards trying to use
object
>> >features as much as possible, then implementing just enough custom
>> >handling in the linker to recoup overhead, etc. (eg: add some kind
of
>> >small header/brief description that makes it easy for the linker to
>> >slice-and-dice - but hopefully a domain-specific such header can be
a
>> >bit more compact than the fully general ELF form)
>>
>> I think this indeed should be implemented and evaluated.
>> So that various approaches could be compared.
>>
>> >> It is not only the sizes of structures describing fragments
but also the complexity
>> >> of tools that should be taught to work with fragmented DWARF.
>> >> (f.e. llvm-dwarfdump applied to object file should be able to
read fragmented DWARF,
>> >> but applied to linked executable it should work with
non-fragmented DWARF).
>> >> That idea is for the tool which works the same way as dsymutil
ODR.
>> >>
>> >> I will shortly describe the idea of making DWARF be easier
processed by dsymutil/DWARFLinker:
>> >>
>> >> The idea is to have only one "type table" per object
file(special section .debug_types_table).
>> >> This "type table" would contain all types.
>> >> There could be a special type of reference - type_offset -
that offset points into the type table.
>> >> Basic types could always be placed into the start of
"type table" thus, offsets to basic types
>> >> most often would be 1 byte. There also would be a special kind
of reference - reference inside the type.
>> >> Type units sig8 system - would not be used to reference types.
>> >>
>> >> Types deduplication is assumed to be done, not by linker
mechanism for COMDAT,
>> >> but by a tool like dsymutil. This tool would create resulting
.debug_types_table by putting there
>> >> types from source .debug_types_table-s. Only one copy of the
type would be placed into the
>> >> resulting table. All references pointing to the deleted copy
would be corrected to point
>> >> to the single copy inside "type table". (that is how
dsymutil works currently)
>>
>> >^ that's the step that's probably a bit expensive for a
general-use
>> >tool - it implies parsing all the DWARF to find those references
and
>> >rewrite them, I think. For a high-performance solution that could
be
>> >run by the linker I think it'd be necessary to have a solution
that
>> >doesn't involve parsing all the DIEs.
>>
>> According to the current dsymutil processing,
>> exactly this process is not the most time-consuming.
>> That could be done relatively fast.
>Fair enough - though I'd still imagine any solution that involves
>parsing all the DIEs still wouldn't be fast enough (maybe an order of
>magnitude faster than the current solution even - but that's stuill,
>what, 6 or 7x slower than linking without the feature?) for most users
>to consider it a good trade-off.
It seems to me that even the current 6x-7x slowdown could be useful. 
Users who already use dsymutil or llvm-dwp(assuming DWARFLinker 
would be taught to work with a split dwarf) tools spend this time and, 
in some scenarios, waste disk space by inter-mediate files.
Thus if they would use this LLD feature in its current state 
- they would still receive benefits. 

Speaking of performance results - LLD is a multi-thread linker; 
it handles sections in parallel. DWARFLinker generates DWARF using 
AsmPrinter which is a stream - so it could make resulting DWARF only 
continuously. It is not surprising that the parallel solution works faster. 
Making DWARFLinker truly multi-threaded would probably allow us 
to make slowdown to be at 2x-4x range.
>> Anyway, I think the dsymutil approach is still valuable, and it
>> would be useful to optimize it.
>> Do you think it would be useful to make dsymutil/DWARFLinker truly
multi-thread?
>> (To make dsymutil/DWARFLinker able to process each object file in a
separate thread)
>Perhaps - that I'd probably leave up to the folks who are more
>invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
>integrated into llvm-dwp and then I'll be interested in getting as
>much performance out of it as lld - so multithreading and things would
>be on the books.
I think improving dsymutil is a valuable thing. 
Though there are several directions which might be considered 
to make it more robust:

1. support of latest DWARF - DWARF5/DWARF64...
2. implement multi-threaded execution.
3. support of split DWARF.
4. implement dsymutil for non-darwin platform.

All of this is a massive piece of work. 
Our original investment was to solve two problems:

1. Overlapped address ranges, which is currently close to being solved. Thank
you for helping with that!

2. Size of debug info. That still becomes an issue, but we are unsure whether we
are ready to
   invest in solving all the above 1-4 problems and how much community
interested in it.

Thank you, Alexey.
>> >One way to do that would be to have a CU-local type indirection
table.
>> >DIEs reference local type numbers (like local address/string
numbers -
>> >addrx/strx/rnglistx) and that table contains either sig8 (no linker
>> >fixups required) or the local type offsets you describe - the
linker
>> >would then only need to read this type number indirection table and
>> >rewrite them to the final type numbers.
>>
>> Yes, that could be additionally done if this process would be
time-consuming.
>>
>> David, thank you for all your comments and explanations. They are
extremely helpful.
>Sure thing - really appreciate your patience with all this - it's... a
>lot of moving parts.
>- Dave
>
> Thank you, Alexey.
>
> > sig8 hash-id would be used to compare types and to deduplicate them.
> > It would speed up the current dsymutil context analysis.
> > Types having the same hash-id could be deduplicated.
> > This would allow deduplicating a more number of types than current
dsymutil.
> > Incomplete type definitions having a similar set of members are not
deduplicated by dsymutil currently.
> > In this case they would have the same hash-id.
> >
> > This "type table" would take less space than current
"type units" and current ODR solution.
> >
> > Above is just an idea on how to help DWARF-aware linker(based on idea
removing obsolete debug info)
> > to work faster(if that is interesting).
> >
> > Alexey.
> >
> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of James Henderson via llvm-dev
> > > Sent: Wednesday, June 3, 2020 3:48 AM
> > > To: David Blaikie <dblaikie at gmail.com>
> > > Cc: llvm-dev at lists.llvm.org
> > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete
debug info in lld.
> > >
> > >
> > >
> > > It makes me sad that the linker (via a library or otherwise) has
to be "DWARF-aware" to be able to effectively handle --gc-sections,
COMDATs, --icf etc for debug info, without leaving large blocks of data kicking
around.
> > >
> > >
> > >
> > > The patching to -1 (or equivalent) is probably a good lightweight
solution (though I'd love it if it could be done based on section type in
the future rather than section name, but that's probably outside the realm
of DWARF), as it requires only minimal understanding in the linker, but anything
beyond that seems to be complicated logic that is mostly due to the structure of
DWARF. Patching to -1 does feel a bit like a sticking plaster/band aid to patch
over the issue rather than properly solving it too - there will still be debug
data (potentially significant amounts in COMDAT-heavy objects) that the linker
has to write and the debugger has to somehow know how to skip (even if it knows
that -1 is special-case due to the standard being updated, it needs to get as
far as the -1), which is all wasted effort.
> > >
> > >
> > >
> > > We've already seen from Alexey's prototyping, and from
our own experiences with the Sony proprietary linker (which tried to rewrite
.debug_line only) that deconstructing the DWARF so that it can be more optimally
reassembled at link time is slow going, and will probably inevitably be however
much effort is put into optimising it. For a start, given the current standards,
it's impossible to know how to deconstruct it without having to parse vast
amounts of DWARF, which is typically going to mean a lot more parsing work than
the linker would normally have to deal with. Additionally, much of this parsing
work is wasted effort, since it seems unlikely in many links that large amounts
of the DWARF will be redundant. Having an option to opt-in doesn't help much
there, since it just means the logic exists without most people using it, due to
it not being good enough, or potentially they don't even know it exists.
> > >
> > >
> > >
> > > I don't have particularly concrete suggestions as to how to
solve the structural problems with DWARF at this point. The only thing that
seems obvious to me is a more "blessed" approach to fragmentation of
sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
> > >
> > >
> > >
> > > Just some thoughts.
> > >
> > >
> > >
> > > James
> > >
> > >
> > >
> > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> > >
> > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > > <alapshin at accesssoftek.com> wrote:
> > > >
> > > > Hi David, please find my comments inside:
> > > >
> > > >
> > > > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
> > > >
> > > > >>> - it might help motivate the work, understand
what tradeoffs might be suitable for you/your users, etc.
> > > >
> > > > >>There are two general requirements:
> > > > >> 1) Remove (or clean) invalid debug info.
> > > >
> > > > >
> > > > >Perhaps a simpler direct solution for your immediate
needs might be a much narrower,
> > > > >and more efficient linker-DWARF-awareness feature:
> > > > >
> > > > > With DWARFv5, rnglists present an opportunity for a
DWARF linker to rewrite the ranges
> > > > > without parsing the rest of the DWARF. /technically/
this isn't guaranteed - rnglist entries
> > > > > can be referenced either directly, or by index. If all
rnglists are referenced by index, then
> > > > > a linker could parse only the debug_rnglists section
and rewrite ranges to remove any
> > > > > address ranges that refer to optimized-out code.
> > > > >
> > > > > This would only be correct for rnglists that had no
direct references to them (that only were
> > > > > referenced via the indexes) - but we could either
implement it with that assumption, or could
> > > > > add an LLVM extension attribute on the CU that would
say "I promise I only referenced rnglists
> > > > > via rnglistx forms/indexes). If this DWARF-aware
linking would have to read the CU DIE (not
> > > > > all the other DIEs) it /could/ also then rewrite
high/low_pc if the CU wasn't using ranges...
> > > > > but that wouldn't come up in the function-removal
case, because then you'd have ranges anyway,
> > > > > so no need for that.
> > > > >
> > > > > Such a DWARF-aware rnglist linking could also simplify
rnglists, in cases where functions
> > > > > ended up being laid out next to each other, the linker
could coalesce their ranges together.
> > > > >
> > > > > I imagine this could be implemented with very little
overhead to linking, especially compared
> > > > > to the overhead of full DWARF-aware linking.
> > > > >
> > > > >Though none of this fixes Split DWARF, where the linker
doesn't get a chance to see the
> > > > > addresses being used - but if you only want/need the
CU-level ranges to be correct, this
> > > > > might be a viable fix, and quite efficient.
> > > >
> > > > Yes, we think about that alternative. This would resolve our
problem of invalid debug info
> > > > and would work much faster. Thus, if we would not have good
results for D74169 then we
> > > > will implement it. Do you think it could be useful to have
this solution in upstream?
> > >
> > > A pure rnglist rewriting - I think it'd be OK to have in
upstream -
> > > again, cost/benefit/etc would have to be weighed. I'm not
sure it
> > > would save enough space to be particularly valuable beyond the
> > > correctness issue - and it doesn't completely solve the
correctness
> > > issue for zero-address usage or low-address usage (because you
could
> > > still have overlapping subprograms inside a CU - so if you were
> > > symbolizing you could use the correct rnglist to filter, but then
go
> > > look inside the CU only to find two subprograms that had that
address
> > > & not know which one was the correct one an which one was the
> > > discarded one).
> > >
> > > rnglist rewriting might be easy enough to prototype - but depends
what
> > > you want to spend your time on, I know this whole issue has been
a
> > > huge investment of your time already - but maybe this recent
> > > revitalization of the conversation around having an explicit
value in
> > > the linker might be sufficient to address everyone's needs...
*fingers
> > > crossed*)
> > >
> > >
> > > > >> 2) Optimize the DWARF size.
> > > >
> > > >
> > > > > Do your users care much about this? I imagine if they
had significant DWARF size issues,
> > > > > they'd have significant link time issues and the
kind of cost to link time this feature has would
> > > > > be prohibitive - but perhaps they're sharing linked
binaries much more often than they're
> > > > > actually performing linking.
> > > >
> > > > Yes, they do. They also have significant link-time issues.
> > > > So current performance results of D74169 are not very
acceptable.
> > > > We hope to improve it.
> > > >
> > > >
> > > >
> > > > >>The specifics which our users have:
> > > > >>  - embedded platform which uses 0 as start of .text
section.
> > > > >>  - custom toolset which does not support all
features yet(f.e. split dwarf).
> > > > >>  - tolerant of the link-time increase.
> > > > >>  - need a useful way to share debug builds.
> > > >
> > > >
> > > > > Sharing two files (executable and dwp) is significantly
less useful than sharing one file?
> > > >
> > > > Probably not significantly, but yes, it looks less useful
comparing to D74169.
> > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
> > > > Having only one file(executable) with minimal size looks
better than the two files with a bigger size.
> > > >
> > > > clang compiled with -gsplitdwarf takes 0.9G for executable
and 0.9G for .dwp.
> > > > clang compiled with -gc-debuginfo takes only 0.76G for
single executable.
> > > >
> > > >
> > > >
> > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
> > > >
> > > > >>We use custom solution, but the general solution
like D74169 would be better here.
> > > >
> > > >
> > > > > If CU ranges are the only ones that need fixing, then I
think the above solution might be as
> > > > > good/better - if more than CU ranges need fixing, then
I think we might want to start talking about
> > > > > how to fix DWARF itself (split and non-split) to signal
certain addresses point to dead code with a
> > > > > specific blessed value that linkers would need to
implement - because with Split DWARF there's
> > > > > no way to solve the non-CU addresses at the linker.
> > > >
> > > > I think the worthful solution for that signal value would be
LowPC > HighPC.
> > > > That does not require additional bits in DWARF.
> > > > It would be natural to skip such address ranges since they
explicitly marked as invalid.
> > > > It could be implemented in a linker very easily. Probably,
it would make sense to describe that
> > > > usage in DWARF standard.
> > > >
> > > > As to the addresses which are not seen by the linker(since
they are in .dwo files) - yes,
> > > > they need to have another solution. Could you show an
example of such a case, please?
> > > >
> > > >
> > > >
> > > > >>>2. Support of type units.
> > > >
> > > > >>>
> > > >
> > > > >>>>  That could be implemented further.
> > > >
> > > > >>>Enabling type units increases object size to
make it easier to deduplicate at link time by a DWARF-unaware
> > > >
> > > > >>>linker. With a DWARF aware linker it'd be
generally desirable not to have to add that object size overhead to
> > > >
> > > > >>>get the linking improvements.
> > > >
> > > > >>
> > > >
> > > > >>But, DWARFLinker should adequately work with type
units since they are already implemented.
> > > >
> > > >
> > > > > Maybe - it'd be nice & all, but I don't
think it's an outright necessity - if someone knows they're using
> > > > > a DWARF-aware linker, they'd probably not use type
units in their object files. It's possible someone
> > > > > doesn't know for sure & maybe they have
pre-canned debug object files from someone else, etc.
> > > >
> > > > I see.
> > > >
> > > > >>Another thing is that the idea behind type units has
the potential to help Dwarf-aware linker to work faster.
> > > >
> > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
> > > >
> > > >
> > > > >When you say "analyzes context" what do you
mean? Usually I'd take that to mean
> > > > > "looks at things outside the type itself - like
what namespace it's in, etc" - which, yes,
> > > > > it should do that, but it doesn't seem very
expensive to do. But I guess you actually
> > > > > mean something about doing structural equivalence in
some way, looking at things inside the type?
> > > >
> > > > I think it could be useful for both cases. Currently,
dsymutil does only first thing
> > > > (look at type name, namespace name, etc..) and does not do
the second thing
> > > > (doing structural equivalence). Analyzing type names is
currently quite expensive
> > > > (the only search in string pool takes ~10 sec from 70 sec of
overall time).
> > > > That is expensive because of many things should be done to
work with strings:
> > > > parse DWARF, search and resolve relocations, compute a hash
for strings,
> > > > put data into a string pool, create a fully qualified
name(like namespace::function::name).
> > > > It looks like it could be optimized and finally require less
time, but it still would be a noticeable
> > > > part of the overall time.
> > > >
> > > > If dsymutil starts to check for the structural equivalence,
then the process would be even more slowly.
> > > > So, If instead of comparing types structure, there would be
checked single hash-id - then this process
> > > > would also be faster.
> > > >
> > > > Thus I think using hash-id to compare types would allow to
make current implementation faster and would
> > > > allow handling incomplete types by DWARFLinker without
massive performance degradation also.
> > > >
> > > > >> But the context is known when types are generated.
So, no need to spent the time analyzing it.
> > > >
> > > > >> If types could be compared without analyzing
context, then Dwarf-aware linker would work faster.
> > > >
> > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
> > > >
> > > > >> (instead of COMDAT section group) and could be
accessed through hash-id(like type units
> > > >
> > > > >> - then it would be the solution requiring fewer
bits to store but allowing to compare types
> > > >
> > > > >> by hash-id(not analysing context).
> > > > >> In this case, size increasing would be small. And
processing time could be done faster.
> > > > >>
> > > > >> this is just an idea and could be discussed
separately from the problem of integrating of D74169.
> > > >
> > > > >> >> 6. -flto=thin
> > > >
> > > > >> >>    That problem was described in this
review https://reviews.llvm.org/D54747#1503720. It also exists in
> > > >
> > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
> > > >
> > > > >> >> probably be fixed by avoiding generation
of such incomplete declaration during thinlto,
> > > >
> > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> > > >
> > > > >> >> more to reduce that redundancy early on
(actually removing definitions from some llvm Modules if the type
> > > >
> > > > >> >> definition is known to exist in another
Module, etc)
> > > > >> >I don't know if it's a problem since
that patch was reverted.
> > > >
> > > > >>
> > > >
> > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
> > > >
> > > > >> if D74169 would be applied and --gc-debuginfo used
then structure type
> > > > >> definition would be removed.
> > > >
> > > > >> DWARFLinker could handle that case - "removing
definitions from some llvm Modules if the type
> > > > >> definition is known to exist in another
Module".
> > > > >> i.e. DWARFLinker could replace the declaration with
the definition.
> > > >
> > > > >> But that problem could be more easily resolved when
debug info is generated(probably without
> > > > >> significant increase of debug info size):
> > > >
> > > > >> Here we have:
> > > >
> > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".
> > > > >> DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function "f".
> > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.
> > > >
> > > > >> Code for function "f" was deleted.
gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > > > >> containing "f" definition (since there is
no corresponding code). But it has structure "Foo" definition
> > > > >> DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
> > > > >> by declaration DW_TAG_structure_type(0x000000ae).
That declaration is exactly the case when definition
> > > > >> was removed by thinlto and replaced with
declaration.
> > > >
> > > > >> Would it cost too much if type definition would not
be replaced with declaration for "abstract instance root"?
> > > > >> The number of concrete instances is bigger than
number of abstract instance roots.
> > > > >> Probably, it would not be too costly to leave
definition in abstract instance root?
> > > >
> > > >
> > > >
> > > > >> Alternatively, Would it cost too much if type
definition would not be replaced with declaration when
> > > > >> declaration references type from not used function?
(lto could understand that concrete function is not used).
> > > >
> > > >
> > > > >I don't follow this example - could you provide a
small concrete test case I could reproduce?
> > > >
> > > > I would provide a test case if necessary. But it looks like
this issue is finally clear, and you already commented on that.
> > > >
> > > > > Oh, I guess this is happening perhaps because ThinLTO
can't know for sure that a standalone
> > > > > definition of 'f' won't be needed - so it
produces one in case one of the inlining opportunities
> > > > > doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition wasn't needed.
> > > >
> > > > > Oh, you're suggesting that these 3 CUs got emitted
into one object file during LTO, but that DWARFLinker
> > > > > drops a CU without any code in it - even though... So
far as I know, in LTO, LLVM directly references
> > > > > types across units if the CUs are all emitted in the
same object file. (and if they weren't in the same
> > > > > object file - then the abstract_origin couldn't be
pointing cross-CU).
> > > >
> > > > > I guess some basic things to say:
> > > >
> > > > > With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites don't end up
> > > > > being inlined. So we know it'll be emitted (but
might not be needed by the actual linker)
> > > > > ANy number of inline calls might exist - but we
shouldn't put the type information into those, because
> > > > > they aren't guaranteed to emit it (if the inline
function gets optimized away, there would be nothing to
> > > > > enforce the type being emitted) - and even if we forced
the type information to be emitted into one
> > > > > object file that has an inline copy of the function -
there's no guarantee that object file will get linked in either.
> > > >
> > > > > So, no, I don't think there's much we can do to
keep the size of object files down, while guaranteeing
> > > > > the type information will be emitted with the usual
linker semantics.
> > > >
> > > > Then dsymutil/DWARFLinker could be changed to handle
that(though it would probably be not very efficient).
> > > > If thinlto would understand that function is not used
finally(and then must not contain referenced type definition),
> > > > then this situation could be handled more effectively.
> > > >
> > > > Thank you, Alexey.
> > > >
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> LLVM Developers mailing list
> > > >>> llvm-dev at lists.llvm.org
> > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Fangrui Song via llvm-dev

2020-Jun-22 21:04 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On 2020-06-22, Alexey Lapshin via llvm-dev wrote:>>> >> >Alexey> Probably we could try to make DWARF easy to
parsing, trimming, rewriting so that full DWARF
>>> >> >Alexey> parsing solution would not take too much
time?
>>> >> >Alexey>
>>> >> >Alexey> f.e. -debug-types-section solution uses
COMDAT sections to split and deduplicate types.
>>> >> >Alexey> That solution works quite fast. It has
already mentioned drawback with a big size
>>> >> >Alexey> overhead(because of section headers/type
unit headers sizes). But, the fact that type units
>>> >> >Alexey> could be identified just by hash-id(without
parsing type names and types hierarchies)
>>> >> >Alexey> allows the linker to reject duplications
quickly. Another thing is that the linker drops
>>> >> >Alexey> duplicated COMDAT sections without any
additional check. After duplications are deleted,
>>> >> >Alexey> the debug info is still consistent.
>>> >> >Alexey> There could be done DWARF aware solution
working using the same two principles:
>>> >> >Alexey> 1. compare types by hash-id.
>>> >> >Alexey> 2. drop duplications without analyzing
contents.
>>> >> >Alexey>
>>> >> >Alexey> If all types are put into a separate type
table and have hash-id, then it would be much easier to
>>> >> >Alexey> deduplicate them. The idea demonstrated
here - https://reviews.llvm.org/P8164. (It still has a
>>> >> >Alexey> questions: whether base types should be put
into type table, whether references into type table
>>> >> >Alexey> should be done by DW_AT_signature or just
by offset, etc.. ) While handling that separate type table
>>> >> >Alexey> the DWARF aware linker would check the only
hash_id and put only one type description
>>> >> >Alexey> with the same id in the final type table.
It also would allow us to solve that -flto=thin problem -
>>> >> >Alexey> 
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil
example there).
>>> >> >Alexey> i.e., the case when type definition would
be removed will not occur.
>>> >>
>>> >> David>I think there is scope for lower-overhead type
deduplication,
>>> >> David>especially now with type units being merged into
the debug_info
>>> >> David>section. Perhaps we could drop dwo_ids and use
section references to
>>> >> David>refer to types & rely on the linker to keep
those referenced sections
>>> >> David>alive - though section references are longer than
CU-relative
>>> >> David>references. (but we need the extra length -
because if the linker
>>> >> David>deduplicates a type definition - one CU may be
referencing a type very
>>> >> David>far away, so the shorter reference might be
inadequate) I don't think
>>> >> David>the indirection through the type hash is /super/
significant to the
>>> >> David>cost - I think it's more in the duplication
of many DIEs especially
>>> >> David>for function definitions (since the type unit
sig8 system only
>>> >> David>provides a way to reference the type - not its
member functions, their
>>> >> David>parameters, etc - so all those DIEs get
duplicated in any CU that
>>> >> David>needs to provide a definition of a member
function). We could
>>> >> David>prototype cross-unit DIE references to lower the
cost of that
>>> >> David>duplication, though rumor has it that constructor
based type homing
>>> >> David>might provide enough value to obviate the need
for type units (or at
>>> >> David>least make the overhead not worthwhile - so
revisiting the overhead to
>>> >> David>reduce it might make it worthwhile again... ).
>>> >>
>>> >> David>Probably wouldn't be super hard to use
LLVM's existing cross-unit DIE
>>> >> David>Referencing machinery (implemented for LTO) to
refer directly to DIEs
>>> >> David>in a type unit without using the signature
system... - hmm, that'd
>>> >> David>only work if your type unit DIEs were identical?
/maybe/ ? Not sure
>>> >> David>how that'd work if you wanted to refer into a
type unit, but the type
>>> >> David>unit got deduplicated. Might be able to rely on
the linker to preserve
>>> >> David>every unique copy of the type unit that's
referenced if we phrase
>>> >> David>things carefully - so if your compiler does
produce exactly identical
>>> >> David>type units they get deduplicated and sec_refs
refer to the uniquely
>>> >> David>preserved copy - but otherwise it preserves as
many distinct copies as
>>> >> David>needed. (I don't know enough about how that
works to be sure - but I
>>> >> David>know that these linkonce/inline function
deduplication does seem to
>>> >> David>cause the DWARF to refer to the singular function
if that function is
>>> >> David>identical, and if it isn't, then you get 0 -
so there's /something/ in
>>> >> David>the linker that can adjust for deduplicating
identical duplicates... )
>>> >>
>>> >> Probably I was a bit unclear: the above idea is not for
types
>>> >> (placed in COMDAT sections) deduplicated by the linker.
>>>
>>> >Yeah, I was just wondering whether it could be useful for that
too.
>>>
>>> >> This idea goes in another direction than fragmenting dwarf
>>> >> using elf sections&tricks. It seems to me that the
cost of fragmenting is too high.
>>>
>>> >I tend to agree - but I'm sort of leaning towards trying to
use object
>>> >features as much as possible, then implementing just enough
custom
>>> >handling in the linker to recoup overhead, etc. (eg: add some
kind of
>>> >small header/brief description that makes it easy for the
linker to
>>> >slice-and-dice - but hopefully a domain-specific such header
can be a
>>> >bit more compact than the fully general ELF form)
>>>
>>> I think this indeed should be implemented and evaluated.
>>> So that various approaches could be compared.
>>>
>>> >> It is not only the sizes of structures describing
fragments but also the complexity
>>> >> of tools that should be taught to work with fragmented
DWARF.
>>> >> (f.e. llvm-dwarfdump applied to object file should be able
to read fragmented DWARF,
>>> >> but applied to linked executable it should work with
non-fragmented DWARF).
>>> >> That idea is for the tool which works the same way as
dsymutil ODR.
>>> >>
>>> >> I will shortly describe the idea of making DWARF be easier
processed by dsymutil/DWARFLinker:
>>> >>
>>> >> The idea is to have only one "type table" per
object file(special section .debug_types_table).
>>> >> This "type table" would contain all types.
>>> >> There could be a special type of reference - type_offset -
that offset points into the type table.
>>> >> Basic types could always be placed into the start of
"type table" thus, offsets to basic types
>>> >> most often would be 1 byte. There also would be a special
kind of reference - reference inside the type.
>>> >> Type units sig8 system - would not be used to reference
types.
>>> >>
>>> >> Types deduplication is assumed to be done, not by linker
mechanism for COMDAT,
>>> >> but by a tool like dsymutil. This tool would create
resulting .debug_types_table by putting there
>>> >> types from source .debug_types_table-s. Only one copy of
the type would be placed into the
>>> >> resulting table. All references pointing to the deleted
copy would be corrected to point
>>> >> to the single copy inside "type table". (that is
how dsymutil works currently)
>>>
>>> >^ that's the step that's probably a bit expensive for a
general-use
>>> >tool - it implies parsing all the DWARF to find those
references and
>>> >rewrite them, I think. For a high-performance solution that
could be
>>> >run by the linker I think it'd be necessary to have a
solution that
>>> >doesn't involve parsing all the DIEs.
>>>
>>> According to the current dsymutil processing,
>>> exactly this process is not the most time-consuming.
>>> That could be done relatively fast.
>
>>Fair enough - though I'd still imagine any solution that involves
>>parsing all the DIEs still wouldn't be fast enough (maybe an order
of
>>magnitude faster than the current solution even - but that's stuill,
>>what, 6 or 7x slower than linking without the feature?) for most users
>>to consider it a good trade-off.
>
>It seems to me that even the current 6x-7x slowdown could be useful.
>Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
>would be taught to work with a split dwarf) tools spend this time and,
>in some scenarios, waste disk space by inter-mediate files.
>Thus if they would use this LLD feature in its current state
>- they would still receive benefits.
>
>Speaking of performance results - LLD is a multi-thread linker;
>it handles sections in parallel. DWARFLinker generates DWARF using
>AsmPrinter which is a stream - so it could make resulting DWARF only
>continuously. It is not surprising that the parallel solution works faster.
>Making DWARFLinker truly multi-threaded would probably allow us
>to make slowdown to be at 2x-4x range.
If it is 6x-7x slowdown (which may be optimized to 2x-4x), I wonder
whether it is a good trade-off keeping it as an in-linker pass, or
rather we should just use another utility compressing the output separately.

If the slowdown is such a pain, I might not consider --gc-debuginfo a
readily usable feature like --gdb-index or future --debug-names (DWARF
v5 accelator table - I have a plan to add it but I am always distracted
by other priorities at hand). Considering that this breaks GNU linkers,
I will add the following lines to the build system

   if LINKER_IS_LLD && ENABLE_GC_DEBUGINFO
     add -Wl,--gc-debuginfo

I don't think this is more complex than:

   if ENABLE_GC_DEBUGINFO
     set linker to a wrapper which optimizes the output like dwz

We probably should add another -f option for specifying the linker path,
like -fld-path= https://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html
>>> Anyway, I think the dsymutil approach is still valuable, and it
>>> would be useful to optimize it.
>>> Do you think it would be useful to make dsymutil/DWARFLinker truly
multi-thread?
>>> (To make dsymutil/DWARFLinker able to process each object file in a
separate thread)
>
>>Perhaps - that I'd probably leave up to the folks who are more
>>invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get
it
>>integrated into llvm-dwp and then I'll be interested in getting as
>>much performance out of it as lld - so multithreading and things would
>>be on the books.
>
>I think improving dsymutil is a valuable thing.
>Though there are several directions which might be considered
>to make it more robust:
>
>1. support of latest DWARF - DWARF5/DWARF64...
>2. implement multi-threaded execution.
>3. support of split DWARF.
>4. implement dsymutil for non-darwin platform.
>
>All of this is a massive piece of work.
>Our original investment was to solve two problems:
>
>1. Overlapped address ranges, which is currently close to being solved.
Thank you for helping with that!
>
>2. Size of debug info. That still becomes an issue, but we are unsure
whether we are ready to
>   invest in solving all the above 1-4 problems and how much community
interested in it.
>
>Thank you, Alexey.
>
>>> >One way to do that would be to have a CU-local type indirection
table.
>>> >DIEs reference local type numbers (like local address/string
numbers -
>>> >addrx/strx/rnglistx) and that table contains either sig8 (no
linker
>>> >fixups required) or the local type offsets you describe - the
linker
>>> >would then only need to read this type number indirection table
and
>>> >rewrite them to the final type numbers.
>>>
>>> Yes, that could be additionally done if this process would be
time-consuming.
>>>
>>> David, thank you for all your comments and explanations. They are
extremely helpful.
>
>>Sure thing - really appreciate your patience with all this - it's...
a
>>lot of moving parts.
>
>>- Dave
>
>>
>> Thank you, Alexey.
>>
>> > sig8 hash-id would be used to compare types and to deduplicate
them.
>> > It would speed up the current dsymutil context analysis.
>> > Types having the same hash-id could be deduplicated.
>> > This would allow deduplicating a more number of types than current
dsymutil.
>> > Incomplete type definitions having a similar set of members are
not deduplicated by dsymutil currently.
>> > In this case they would have the same hash-id.
>> >
>> > This "type table" would take less space than current
"type units" and current ODR solution.
>> >
>> > Above is just an idea on how to help DWARF-aware linker(based on
idea removing obsolete debug info)
>> > to work faster(if that is interesting).
>> >
>> > Alexey.
>> >
>> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of James Henderson via llvm-dev
>> > > Sent: Wednesday, June 3, 2020 3:48 AM
>> > > To: David Blaikie <dblaikie at gmail.com>
>> > > Cc: llvm-dev at lists.llvm.org
>> > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.
>> > >
>> > >
>> > >
>> > > It makes me sad that the linker (via a library or otherwise)
has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
>> > >
>> > >
>> > >
>> > > The patching to -1 (or equivalent) is probably a good
lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
>> > >
>> > >
>> > >
>> > > We've already seen from Alexey's prototyping, and
from our own experiences with the Sony proprietary linker (which tried to
rewrite .debug_line only) that deconstructing the DWARF so that it can be more
optimally reassembled at link time is slow going, and will probably inevitably
be however much effort is put into optimising it. For a start, given the current
standards, it's impossible to know how to deconstruct it without having to
parse vast amounts of DWARF, which is typically going to mean a lot more parsing
work than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links that
large amounts of the DWARF will be redundant. Having an option to opt-in
doesn't help much there, since it just means the logic exists without most
people using it, due to it not being good enough, or potentially they don't
even know it exists.
>> > >
>> > >
>> > >
>> > > I don't have particularly concrete suggestions as to how
to solve the structural problems with DWARF at this point. The only thing that
seems obvious to me is a more "blessed" approach to fragmentation of
sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
>> > >
>> > >
>> > >
>> > > Just some thoughts.
>> > >
>> > >
>> > >
>> > > James
>> > >
>> > >
>> > >
>> > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> > >
>> > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
>> > > <alapshin at accesssoftek.com> wrote:
>> > > >
>> > > > Hi David, please find my comments inside:
>> > > >
>> > > >
>> > > > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
>> > > >
>> > > > >>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
>> > > >
>> > > > >>There are two general requirements:
>> > > > >> 1) Remove (or clean) invalid debug info.
>> > > >
>> > > > >
>> > > > >Perhaps a simpler direct solution for your immediate
needs might be a much narrower,
>> > > > >and more efficient linker-DWARF-awareness feature:
>> > > > >
>> > > > > With DWARFv5, rnglists present an opportunity for a
DWARF linker to rewrite the ranges
>> > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
>> > > > > can be referenced either directly, or by index. If
all rnglists are referenced by index, then
>> > > > > a linker could parse only the debug_rnglists
section and rewrite ranges to remove any
>> > > > > address ranges that refer to optimized-out code.
>> > > > >
>> > > > > This would only be correct for rnglists that had no
direct references to them (that only were
>> > > > > referenced via the indexes) - but we could either
implement it with that assumption, or could
>> > > > > add an LLVM extension attribute on the CU that
would say "I promise I only referenced rnglists
>> > > > > via rnglistx forms/indexes). If this DWARF-aware
linking would have to read the CU DIE (not
>> > > > > all the other DIEs) it /could/ also then rewrite
high/low_pc if the CU wasn't using ranges...
>> > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
>> > > > > so no need for that.
>> > > > >
>> > > > > Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions
>> > > > > ended up being laid out next to each other, the
linker could coalesce their ranges together.
>> > > > >
>> > > > > I imagine this could be implemented with very
little overhead to linking, especially compared
>> > > > > to the overhead of full DWARF-aware linking.
>> > > > >
>> > > > >Though none of this fixes Split DWARF, where the
linker doesn't get a chance to see the
>> > > > > addresses being used - but if you only want/need
the CU-level ranges to be correct, this
>> > > > > might be a viable fix, and quite efficient.
>> > > >
>> > > > Yes, we think about that alternative. This would resolve
our problem of invalid debug info
>> > > > and would work much faster. Thus, if we would not have
good results for D74169 then we
>> > > > will implement it. Do you think it could be useful to
have this solution in upstream?
>> > >
>> > > A pure rnglist rewriting - I think it'd be OK to have in
upstream -
>> > > again, cost/benefit/etc would have to be weighed. I'm not
sure it
>> > > would save enough space to be particularly valuable beyond
the
>> > > correctness issue - and it doesn't completely solve the
correctness
>> > > issue for zero-address usage or low-address usage (because
you could
>> > > still have overlapping subprograms inside a CU - so if you
were
>> > > symbolizing you could use the correct rnglist to filter, but
then go
>> > > look inside the CU only to find two subprograms that had that
address
>> > > & not know which one was the correct one an which one was
the
>> > > discarded one).
>> > >
>> > > rnglist rewriting might be easy enough to prototype - but
depends what
>> > > you want to spend your time on, I know this whole issue has
been a
>> > > huge investment of your time already - but maybe this recent
>> > > revitalization of the conversation around having an explicit
value in
>> > > the linker might be sufficient to address everyone's
needs... *fingers
>> > > crossed*)
>> > >
>> > >
>> > > > >> 2) Optimize the DWARF size.
>> > > >
>> > > >
>> > > > > Do your users care much about this? I imagine if
they had significant DWARF size issues,
>> > > > > they'd have significant link time issues and
the kind of cost to link time this feature has would
>> > > > > be prohibitive - but perhaps they're sharing
linked binaries much more often than they're
>> > > > > actually performing linking.
>> > > >
>> > > > Yes, they do. They also have significant link-time
issues.
>> > > > So current performance results of D74169 are not very
acceptable.
>> > > > We hope to improve it.
>> > > >
>> > > >
>> > > >
>> > > > >>The specifics which our users have:
>> > > > >>  - embedded platform which uses 0 as start of
.text section.
>> > > > >>  - custom toolset which does not support all
features yet(f.e. split dwarf).
>> > > > >>  - tolerant of the link-time increase.
>> > > > >>  - need a useful way to share debug builds.
>> > > >
>> > > >
>> > > > > Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
>> > > >
>> > > > Probably not significantly, but yes, it looks less
useful comparing to D74169.
>> > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
>> > > > Having only one file(executable) with minimal size looks
better than the two files with a bigger size.
>> > > >
>> > > > clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
>> > > > clang compiled with -gc-debuginfo takes only 0.76G for
single executable.
>> > > >
>> > > >
>> > > >
>> > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
>> > > >
>> > > > >>We use custom solution, but the general solution
like D74169 would be better here.
>> > > >
>> > > >
>> > > > > If CU ranges are the only ones that need fixing,
then I think the above solution might be as
>> > > > > good/better - if more than CU ranges need fixing,
then I think we might want to start talking about
>> > > > > how to fix DWARF itself (split and non-split) to
signal certain addresses point to dead code with a
>> > > > > specific blessed value that linkers would need to
implement - because with Split DWARF there's
>> > > > > no way to solve the non-CU addresses at the linker.
>> > > >
>> > > > I think the worthful solution for that signal value
would be LowPC > HighPC.
>> > > > That does not require additional bits in DWARF.
>> > > > It would be natural to skip such address ranges since
they explicitly marked as invalid.
>> > > > It could be implemented in a linker very easily.
Probably, it would make sense to describe that
>> > > > usage in DWARF standard.
>> > > >
>> > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
>> > > > they need to have another solution. Could you show an
example of such a case, please?
>> > > >
>> > > >
>> > > >
>> > > > >>>2. Support of type units.
>> > > >
>> > > > >>>
>> > > >
>> > > > >>>>  That could be implemented further.
>> > > >
>> > > > >>>Enabling type units increases object size to
make it easier to deduplicate at link time by a DWARF-unaware
>> > > >
>> > > > >>>linker. With a DWARF aware linker it'd
be generally desirable not to have to add that object size overhead to
>> > > >
>> > > > >>>get the linking improvements.
>> > > >
>> > > > >>
>> > > >
>> > > > >>But, DWARFLinker should adequately work with
type units since they are already implemented.
>> > > >
>> > > >
>> > > > > Maybe - it'd be nice & all, but I don't
think it's an outright necessity - if someone knows they're using
>> > > > > a DWARF-aware linker, they'd probably not use
type units in their object files. It's possible someone
>> > > > > doesn't know for sure & maybe they have
pre-canned debug object files from someone else, etc.
>> > > >
>> > > > I see.
>> > > >
>> > > > >>Another thing is that the idea behind type units
has the potential to help Dwarf-aware linker to work faster.
>> > > >
>> > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
>> > > >
>> > > >
>> > > > >When you say "analyzes context" what do
you mean? Usually I'd take that to mean
>> > > > > "looks at things outside the type itself -
like what namespace it's in, etc" - which, yes,
>> > > > > it should do that, but it doesn't seem very
expensive to do. But I guess you actually
>> > > > > mean something about doing structural equivalence
in some way, looking at things inside the type?
>> > > >
>> > > > I think it could be useful for both cases. Currently,
dsymutil does only first thing
>> > > > (look at type name, namespace name, etc..) and does not
do the second thing
>> > > > (doing structural equivalence). Analyzing type names is
currently quite expensive
>> > > > (the only search in string pool takes ~10 sec from 70
sec of overall time).
>> > > > That is expensive because of many things should be done
to work with strings:
>> > > > parse DWARF, search and resolve relocations, compute a
hash for strings,
>> > > > put data into a string pool, create a fully qualified
name(like namespace::function::name).
>> > > > It looks like it could be optimized and finally require
less time, but it still would be a noticeable
>> > > > part of the overall time.
>> > > >
>> > > > If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
>> > > > So, If instead of comparing types structure, there would
be checked single hash-id - then this process
>> > > > would also be faster.
>> > > >
>> > > > Thus I think using hash-id to compare types would allow
to make current implementation faster and would
>> > > > allow handling incomplete types by DWARFLinker without
massive performance degradation also.
>> > > >
>> > > > >> But the context is known when types are
generated. So, no need to spent the time analyzing it.
>> > > >
>> > > > >> If types could be compared without analyzing
context, then Dwarf-aware linker would work faster.
>> > > >
>> > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
>> > > >
>> > > > >> (instead of COMDAT section group) and could be
accessed through hash-id(like type units
>> > > >
>> > > > >> - then it would be the solution requiring fewer
bits to store but allowing to compare types
>> > > >
>> > > > >> by hash-id(not analysing context).
>> > > > >> In this case, size increasing would be small.
And processing time could be done faster.
>> > > > >>
>> > > > >> this is just an idea and could be discussed
separately from the problem of integrating of D74169.
>> > > >
>> > > > >> >> 6. -flto=thin
>> > > >
>> > > > >> >>    That problem was described in this
review https://reviews.llvm.org/D54747#1503720. It also exists in
>> > > >
>> > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
>> > > >
>> > > > >> >> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
>> > > >
>> > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>> > > >
>> > > > >> >> more to reduce that redundancy early
on (actually removing definitions from some llvm Modules if the type
>> > > >
>> > > > >> >> definition is known to exist in
another Module, etc)
>> > > > >> >I don't know if it's a problem
since that patch was reverted.
>> > > >
>> > > > >>
>> > > >
>> > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
>> > > >
>> > > > >> if D74169 would be applied and --gc-debuginfo
used then structure type
>> > > > >> definition would be removed.
>> > > >
>> > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
>> > > > >> definition is known to exist in another
Module".
>> > > > >> i.e. DWARFLinker could replace the declaration
with the definition.
>> > > >
>> > > > >> But that problem could be more easily resolved
when debug info is generated(probably without
>> > > > >> significant increase of debug info size):
>> > > >
>> > > > >> Here we have:
>> > > >
>> > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".
>> > > > >> DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function "f".
>> > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.
>> > > >
>> > > > >> Code for function "f" was deleted.
gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
>> > > > >> containing "f" definition (since
there is no corresponding code). But it has structure "Foo" definition
>> > > > >> DW_TAG_structure_type(0x0000011e) referenced
from DW_TAG_compile_unit(0x00000073)
>> > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
>> > > > >> was removed by thinlto and replaced with
declaration.
>> > > >
>> > > > >> Would it cost too much if type definition would
not be replaced with declaration for "abstract instance root"?
>> > > > >> The number of concrete instances is bigger than
number of abstract instance roots.
>> > > > >> Probably, it would not be too costly to leave
definition in abstract instance root?
>> > > >
>> > > >
>> > > >
>> > > > >> Alternatively, Would it cost too much if type
definition would not be replaced with declaration when
>> > > > >> declaration references type from not used
function? (lto could understand that concrete function is not used).
>> > > >
>> > > >
>> > > > >I don't follow this example - could you provide
a small concrete test case I could reproduce?
>> > > >
>> > > > I would provide a test case if necessary. But it looks
like this issue is finally clear, and you already commented on that.
>> > > >
>> > > > > Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone
>> > > > > definition of 'f' won't be needed - so
it produces one in case one of the inlining opportunities
>> > > > > doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition wasn't needed.
>> > > >
>> > > > > Oh, you're suggesting that these 3 CUs got
emitted into one object file during LTO, but that DWARFLinker
>> > > > > drops a CU without any code in it - even though...
So far as I know, in LTO, LLVM directly references
>> > > > > types across units if the CUs are all emitted in
the same object file. (and if they weren't in the same
>> > > > > object file - then the abstract_origin couldn't
be pointing cross-CU).
>> > > >
>> > > > > I guess some basic things to say:
>> > > >
>> > > > > With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites don't end up
>> > > > > being inlined. So we know it'll be emitted (but
might not be needed by the actual linker)
>> > > > > ANy number of inline calls might exist - but we
shouldn't put the type information into those, because
>> > > > > they aren't guaranteed to emit it (if the
inline function gets optimized away, there would be nothing to
>> > > > > enforce the type being emitted) - and even if we
forced the type information to be emitted into one
>> > > > > object file that has an inline copy of the function
- there's no guarantee that object file will get linked in either.
>> > > >
>> > > > > So, no, I don't think there's much we can
do to keep the size of object files down, while guaranteeing
>> > > > > the type information will be emitted with the usual
linker semantics.
>> > > >
>> > > > Then dsymutil/DWARFLinker could be changed to handle
that(though it would probably be not very efficient).
>> > > > If thinlto would understand that function is not used
finally(and then must not contain referenced type definition),
>> > > > then this situation could be handled more effectively.
>> > > >
>> > > > Thank you, Alexey.
>> > > >
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> _______________________________________________
>> > > >>> LLVM Developers mailing list
>> > > >>> llvm-dev at lists.llvm.org
>> > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > > _______________________________________________
>> > > LLVM Developers mailing list
>> > > llvm-dev at lists.llvm.org
>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

2020-Jun-23 01:19 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On Mon, Jun 22, 2020 at 7:21 AM Alexey Lapshin
<alapshin at accesssoftek.com> wrote:>
> >> >> >Alexey> Probably we could try to make DWARF easy
to parsing, trimming, rewriting so that full DWARF
> >> >> >Alexey> parsing solution would not take too much
time?
> >> >> >Alexey>
> >> >> >Alexey> f.e. -debug-types-section solution uses
COMDAT sections to split and deduplicate types.
> >> >> >Alexey> That solution works quite fast. It has
already mentioned drawback with a big size
> >> >> >Alexey> overhead(because of section headers/type
unit headers sizes). But, the fact that type units
> >> >> >Alexey> could be identified just by
hash-id(without parsing type names and types hierarchies)
> >> >> >Alexey> allows the linker to reject duplications
quickly. Another thing is that the linker drops
> >> >> >Alexey> duplicated COMDAT sections without any
additional check. After duplications are deleted,
> >> >> >Alexey> the debug info is still consistent.
> >> >> >Alexey> There could be done DWARF aware solution
working using the same two principles:
> >> >> >Alexey> 1. compare types by hash-id.
> >> >> >Alexey> 2. drop duplications without analyzing
contents.
> >> >> >Alexey>
> >> >> >Alexey> If all types are put into a separate type
table and have hash-id, then it would be much easier to
> >> >> >Alexey> deduplicate them. The idea demonstrated
here - https://reviews.llvm.org/P8164. (It still has a
> >> >> >Alexey> questions: whether base types should be
put into type table, whether references into type table
> >> >> >Alexey> should be done by DW_AT_signature or just
by offset, etc.. ) While handling that separate type table
> >> >> >Alexey> the DWARF aware linker would check the
only hash_id and put only one type description
> >> >> >Alexey> with the same id in the final type table.
It also would allow us to solve that -flto=thin problem -
> >> >> >Alexey> 
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil
example there).
> >> >> >Alexey> i.e., the case when type definition would
be removed will not occur.
> >> >>
> >> >> David>I think there is scope for lower-overhead type
deduplication,
> >> >> David>especially now with type units being merged into
the debug_info
> >> >> David>section. Perhaps we could drop dwo_ids and use
section references to
> >> >> David>refer to types & rely on the linker to keep
those referenced sections
> >> >> David>alive - though section references are longer
than CU-relative
> >> >> David>references. (but we need the extra length -
because if the linker
> >> >> David>deduplicates a type definition - one CU may be
referencing a type very
> >> >> David>far away, so the shorter reference might be
inadequate) I don't think
> >> >> David>the indirection through the type hash is /super/
significant to the
> >> >> David>cost - I think it's more in the duplication
of many DIEs especially
> >> >> David>for function definitions (since the type unit
sig8 system only
> >> >> David>provides a way to reference the type - not its
member functions, their
> >> >> David>parameters, etc - so all those DIEs get
duplicated in any CU that
> >> >> David>needs to provide a definition of a member
function). We could
> >> >> David>prototype cross-unit DIE references to lower the
cost of that
> >> >> David>duplication, though rumor has it that
constructor based type homing
> >> >> David>might provide enough value to obviate the need
for type units (or at
> >> >> David>least make the overhead not worthwhile - so
revisiting the overhead to
> >> >> David>reduce it might make it worthwhile again... ).
> >> >>
> >> >> David>Probably wouldn't be super hard to use
LLVM's existing cross-unit DIE
> >> >> David>Referencing machinery (implemented for LTO) to
refer directly to DIEs
> >> >> David>in a type unit without using the signature
system... - hmm, that'd
> >> >> David>only work if your type unit DIEs were identical?
/maybe/ ? Not sure
> >> >> David>how that'd work if you wanted to refer into
a type unit, but the type
> >> >> David>unit got deduplicated. Might be able to rely on
the linker to preserve
> >> >> David>every unique copy of the type unit that's
referenced if we phrase
> >> >> David>things carefully - so if your compiler does
produce exactly identical
> >> >> David>type units they get deduplicated and sec_refs
refer to the uniquely
> >> >> David>preserved copy - but otherwise it preserves as
many distinct copies as
> >> >> David>needed. (I don't know enough about how that
works to be sure - but I
> >> >> David>know that these linkonce/inline function
deduplication does seem to
> >> >> David>cause the DWARF to refer to the singular
function if that function is
> >> >> David>identical, and if it isn't, then you get 0 -
so there's /something/ in
> >> >> David>the linker that can adjust for deduplicating
identical duplicates... )
> >> >>
> >> >> Probably I was a bit unclear: the above idea is not for
types
> >> >> (placed in COMDAT sections) deduplicated by the linker.
> >>
> >> >Yeah, I was just wondering whether it could be useful for that
too.
> >>
> >> >> This idea goes in another direction than fragmenting
dwarf
> >> >> using elf sections&tricks. It seems to me that the
cost of fragmenting is too high.
> >>
> >> >I tend to agree - but I'm sort of leaning towards trying
to use object
> >> >features as much as possible, then implementing just enough
custom
> >> >handling in the linker to recoup overhead, etc. (eg: add some
kind of
> >> >small header/brief description that makes it easy for the
linker to
> >> >slice-and-dice - but hopefully a domain-specific such header
can be a
> >> >bit more compact than the fully general ELF form)
> >>
> >> I think this indeed should be implemented and evaluated.
> >> So that various approaches could be compared.
> >>
> >> >> It is not only the sizes of structures describing
fragments but also the complexity
> >> >> of tools that should be taught to work with fragmented
DWARF.
> >> >> (f.e. llvm-dwarfdump applied to object file should be
able to read fragmented DWARF,
> >> >> but applied to linked executable it should work with
non-fragmented DWARF).
> >> >> That idea is for the tool which works the same way as
dsymutil ODR.
> >> >>
> >> >> I will shortly describe the idea of making DWARF be
easier processed by dsymutil/DWARFLinker:
> >> >>
> >> >> The idea is to have only one "type table" per
object file(special section .debug_types_table).
> >> >> This "type table" would contain all types.
> >> >> There could be a special type of reference - type_offset
- that offset points into the type table.
> >> >> Basic types could always be placed into the start of
"type table" thus, offsets to basic types
> >> >> most often would be 1 byte. There also would be a special
kind of reference - reference inside the type.
> >> >> Type units sig8 system - would not be used to reference
types.
> >> >>
> >> >> Types deduplication is assumed to be done, not by linker
mechanism for COMDAT,
> >> >> but by a tool like dsymutil. This tool would create
resulting .debug_types_table by putting there
> >> >> types from source .debug_types_table-s. Only one copy of
the type would be placed into the
> >> >> resulting table. All references pointing to the deleted
copy would be corrected to point
> >> >> to the single copy inside "type table". (that
is how dsymutil works currently)
> >>
> >> >^ that's the step that's probably a bit expensive for
a general-use
> >> >tool - it implies parsing all the DWARF to find those
references and
> >> >rewrite them, I think. For a high-performance solution that
could be
> >> >run by the linker I think it'd be necessary to have a
solution that
> >> >doesn't involve parsing all the DIEs.
> >>
> >> According to the current dsymutil processing,
> >> exactly this process is not the most time-consuming.
> >> That could be done relatively fast.
>
> >Fair enough - though I'd still imagine any solution that involves
> >parsing all the DIEs still wouldn't be fast enough (maybe an order
of
> >magnitude faster than the current solution even - but that's
stuill,
> >what, 6 or 7x slower than linking without the feature?) for most users
> >to consider it a good trade-off.
>
> It seems to me that even the current 6x-7x slowdown could be useful.
> Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
> would be taught to work with a split dwarf) tools spend this time and,
> in some scenarios, waste disk space by inter-mediate files.
FWIW, dwp (llvm-dwp hasn't really been optimized compared to binutils
dwp) is designed to be very quick - by not needing to do a lot of
parsing/fixups. Which, yes, means larger output files than would be
possible with more parsing/etc. It also doesn't take any input from
the linker (so it can run in parallel with the linker) - so it can't
remove dead subprograms. Given Google's the major (perhaps only
significant?) user of Split DWARF - I can say that the needs don't
necessarily overlap well with something that would take significantly
longer to run or use significantly more memory. Faster/cheaper/with
somewhat bigger output files is probably the right tradeoff for
Google's use case, at least.

I imagine Apple's use for dsymutil is somewhat similar - it's not used
in the iterative development cycle, only in final releases - well,
maybe their situation is more "neutral" - not a major pain point in
any case I'd guess.
> Thus if they would use this LLD feature in its current state
> - they would still receive benefits.
>
> Speaking of performance results - LLD is a multi-thread linker;
> it handles sections in parallel. DWARFLinker generates DWARF using
> AsmPrinter which is a stream - so it could make resulting DWARF only
> continuously. It is not surprising that the parallel solution works faster.
> Making DWARFLinker truly multi-threaded would probably allow us
> to make slowdown to be at 2x-4x range.
*nod* that's still a really expensive link - but I understand that's a
suitable tradeoff for your users
> >> Anyway, I think the dsymutil approach is still valuable, and it
> >> would be useful to optimize it.
> >> Do you think it would be useful to make dsymutil/DWARFLinker truly
multi-thread?
> >> (To make dsymutil/DWARFLinker able to process each object file in
a separate thread)
>
> >Perhaps - that I'd probably leave up to the folks who are more
> >invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get
it
> >integrated into llvm-dwp and then I'll be interested in getting as
> >much performance out of it as lld - so multithreading and things would
> >be on the books.
>
> I think improving dsymutil is a valuable thing.
> Though there are several directions which might be considered
> to make it more robust:
>
> 1. support of latest DWARF - DWARF5/DWARF64...
I expect/though some of the Apple folks had already worked on DWARF5 support?
DWARF64 - that's been around for a while, and just hasn't been needed
by LLVM users thus far, it seems (until recently - where some
developers have started working on that)
> 2. implement multi-threaded execution.
> 3. support of split DWARF.
Maybe, though I'm still not sure it'd be the right tradeoff -
especially if it involved having to wait to run the .dwo merger (call
it DWARF-aware dwp, or dsymutil with dwp support) until after the
linker ran.
> 4. implement dsymutil for non-darwin platform.
That's probably, essentially (3), more-or-less. Split DWARF is
somewhat of a formalization of Apple's/MachO DWARF distribution model
(leave DWARF it in files that aren't linked/use them from a debugger,
but also be able to merge them into some final file (dsym or dwp) for
archival purposes)
> All of this is a massive piece of work.
> Our original investment was to solve two problems:
>
> 1. Overlapped address ranges, which is currently close to being solved.
Thank you for helping with that!
Yeah, again, sorry that's taken quite so long/somewhat circuitous route.
> 2. Size of debug info. That still becomes an issue, but we are unsure
whether we are ready to
>    invest in solving all the above 1-4 problems and how much community
interested in it.
Fair, for sure - I don't think you'd need to sign up to solve all of
them (don't think they necessarily need solving). Potentially moving
the logic out into a separate tool as Fangrui's considering - a
post-link DWARF optimizer, rather than in-linker DWARF optimization.

I really don't want to give you the runaround like this - but multiple
times slower links is something that seems pretty problematic for most
users, to the point of weighing the maintainability of lld against the
convenience of having this functionality in-linker rather than in a
post-link optimizer.

(I know you've spoken a bit before about your users needs - but if
it's possible, could you explain (again :/) why they have such a
strong need for smaller DWARF? While DWARF size is an ongoing concern
for many users (Google certainly - hence the invention of Split DWARF,
use of type units and compressed DWARF, etc) - usually it's in rather
large programs, but it sounds like you're dealing with relatively
small ones (otherwise the increase in link time, I'd imagine, would be
prohibitive for your users?)? You mentioned that the usability cost of
Split DWARF for your users was too high (or high enough to justify
this alternative work of DWARF-aware linking)? That all seems a bit
surprising to me - though I understand the deployment issues of Split
DWARF do present some challenges to users in more heterogenous
environments than Google's... still, I'd have thought there was some
hope there)
> Thank you, Alexey.
>
> >> >One way to do that would be to have a CU-local type
indirection table.
> >> >DIEs reference local type numbers (like local address/string
numbers -
> >> >addrx/strx/rnglistx) and that table contains either sig8 (no
linker
> >> >fixups required) or the local type offsets you describe - the
linker
> >> >would then only need to read this type number indirection
table and
> >> >rewrite them to the final type numbers.
> >>
> >> Yes, that could be additionally done if this process would be
time-consuming.
> >>
> >> David, thank you for all your comments and explanations. They are
extremely helpful.
>
> >Sure thing - really appreciate your patience with all this -
it's... a
> >lot of moving parts.
>
> >- Dave
>
> >
> > Thank you, Alexey.
> >
> > > sig8 hash-id would be used to compare types and to deduplicate
them.
> > > It would speed up the current dsymutil context analysis.
> > > Types having the same hash-id could be deduplicated.
> > > This would allow deduplicating a more number of types than
current dsymutil.
> > > Incomplete type definitions having a similar set of members are
not deduplicated by dsymutil currently.
> > > In this case they would have the same hash-id.
> > >
> > > This "type table" would take less space than current
"type units" and current ODR solution.
> > >
> > > Above is just an idea on how to help DWARF-aware linker(based on
idea removing obsolete debug info)
> > > to work faster(if that is interesting).
> > >
> > > Alexey.
> > >
> > > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of James Henderson via llvm-dev
> > > > Sent: Wednesday, June 3, 2020 3:48 AM
> > > > To: David Blaikie <dblaikie at gmail.com>
> > > > Cc: llvm-dev at lists.llvm.org
> > > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.
> > > >
> > > >
> > > >
> > > > It makes me sad that the linker (via a library or otherwise)
has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
> > > >
> > > >
> > > >
> > > > The patching to -1 (or equivalent) is probably a good
lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
> > > >
> > > >
> > > >
> > > > We've already seen from Alexey's prototyping, and
from our own experiences with the Sony proprietary linker (which tried to
rewrite .debug_line only) that deconstructing the DWARF so that it can be more
optimally reassembled at link time is slow going, and will probably inevitably
be however much effort is put into optimising it. For a start, given the current
standards, it's impossible to know how to deconstruct it without having to
parse vast amounts of DWARF, which is typically going to mean a lot more parsing
work than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links that
large amounts of the DWARF will be redundant. Having an option to opt-in
doesn't help much there, since it just means the logic exists without most
people using it, due to it not being good enough, or potentially they don't
even know it exists.
> > > >
> > > >
> > > >
> > > > I don't have particularly concrete suggestions as to how
to solve the structural problems with DWARF at this point. The only thing that
seems obvious to me is a more "blessed" approach to fragmentation of
sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
> > > >
> > > >
> > > >
> > > > Just some thoughts.
> > > >
> > > >
> > > >
> > > > James
> > > >
> > > >
> > > >
> > > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> > > >
> > > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > > > <alapshin at accesssoftek.com> wrote:
> > > > >
> > > > > Hi David, please find my comments inside:
> > > > >
> > > > >
> > > > > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
> > > > >
> > > > > >>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
> > > > >
> > > > > >>There are two general requirements:
> > > > > >> 1) Remove (or clean) invalid debug info.
> > > > >
> > > > > >
> > > > > >Perhaps a simpler direct solution for your
immediate needs might be a much narrower,
> > > > > >and more efficient linker-DWARF-awareness feature:
> > > > > >
> > > > > > With DWARFv5, rnglists present an opportunity for
a DWARF linker to rewrite the ranges
> > > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
> > > > > > can be referenced either directly, or by index. If
all rnglists are referenced by index, then
> > > > > > a linker could parse only the debug_rnglists
section and rewrite ranges to remove any
> > > > > > address ranges that refer to optimized-out code.
> > > > > >
> > > > > > This would only be correct for rnglists that had
no direct references to them (that only were
> > > > > > referenced via the indexes) - but we could either
implement it with that assumption, or could
> > > > > > add an LLVM extension attribute on the CU that
would say "I promise I only referenced rnglists
> > > > > > via rnglistx forms/indexes). If this DWARF-aware
linking would have to read the CU DIE (not
> > > > > > all the other DIEs) it /could/ also then rewrite
high/low_pc if the CU wasn't using ranges...
> > > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
> > > > > > so no need for that.
> > > > > >
> > > > > > Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions
> > > > > > ended up being laid out next to each other, the
linker could coalesce their ranges together.
> > > > > >
> > > > > > I imagine this could be implemented with very
little overhead to linking, especially compared
> > > > > > to the overhead of full DWARF-aware linking.
> > > > > >
> > > > > >Though none of this fixes Split DWARF, where the
linker doesn't get a chance to see the
> > > > > > addresses being used - but if you only want/need
the CU-level ranges to be correct, this
> > > > > > might be a viable fix, and quite efficient.
> > > > >
> > > > > Yes, we think about that alternative. This would
resolve our problem of invalid debug info
> > > > > and would work much faster. Thus, if we would not have
good results for D74169 then we
> > > > > will implement it. Do you think it could be useful to
have this solution in upstream?
> > > >
> > > > A pure rnglist rewriting - I think it'd be OK to have in
upstream -
> > > > again, cost/benefit/etc would have to be weighed. I'm
not sure it
> > > > would save enough space to be particularly valuable beyond
the
> > > > correctness issue - and it doesn't completely solve the
correctness
> > > > issue for zero-address usage or low-address usage (because
you could
> > > > still have overlapping subprograms inside a CU - so if you
were
> > > > symbolizing you could use the correct rnglist to filter, but
then go
> > > > look inside the CU only to find two subprograms that had
that address
> > > > & not know which one was the correct one an which one
was the
> > > > discarded one).
> > > >
> > > > rnglist rewriting might be easy enough to prototype - but
depends what
> > > > you want to spend your time on, I know this whole issue has
been a
> > > > huge investment of your time already - but maybe this recent
> > > > revitalization of the conversation around having an explicit
value in
> > > > the linker might be sufficient to address everyone's
needs... *fingers
> > > > crossed*)
> > > >
> > > >
> > > > > >> 2) Optimize the DWARF size.
> > > > >
> > > > >
> > > > > > Do your users care much about this? I imagine if
they had significant DWARF size issues,
> > > > > > they'd have significant link time issues and
the kind of cost to link time this feature has would
> > > > > > be prohibitive - but perhaps they're sharing
linked binaries much more often than they're
> > > > > > actually performing linking.
> > > > >
> > > > > Yes, they do. They also have significant link-time
issues.
> > > > > So current performance results of D74169 are not very
acceptable.
> > > > > We hope to improve it.
> > > > >
> > > > >
> > > > >
> > > > > >>The specifics which our users have:
> > > > > >>  - embedded platform which uses 0 as start of
.text section.
> > > > > >>  - custom toolset which does not support all
features yet(f.e. split dwarf).
> > > > > >>  - tolerant of the link-time increase.
> > > > > >>  - need a useful way to share debug builds.
> > > > >
> > > > >
> > > > > > Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
> > > > >
> > > > > Probably not significantly, but yes, it looks less
useful comparing to D74169.
> > > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
> > > > > Having only one file(executable) with minimal size
looks better than the two files with a bigger size.
> > > > >
> > > > > clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
> > > > > clang compiled with -gc-debuginfo takes only 0.76G for
single executable.
> > > > >
> > > > >
> > > > >
> > > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
> > > > >
> > > > > >>We use custom solution, but the general
solution like D74169 would be better here.
> > > > >
> > > > >
> > > > > > If CU ranges are the only ones that need fixing,
then I think the above solution might be as
> > > > > > good/better - if more than CU ranges need fixing,
then I think we might want to start talking about
> > > > > > how to fix DWARF itself (split and non-split) to
signal certain addresses point to dead code with a
> > > > > > specific blessed value that linkers would need to
implement - because with Split DWARF there's
> > > > > > no way to solve the non-CU addresses at the
linker.
> > > > >
> > > > > I think the worthful solution for that signal value
would be LowPC > HighPC.
> > > > > That does not require additional bits in DWARF.
> > > > > It would be natural to skip such address ranges since
they explicitly marked as invalid.
> > > > > It could be implemented in a linker very easily.
Probably, it would make sense to describe that
> > > > > usage in DWARF standard.
> > > > >
> > > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
> > > > > they need to have another solution. Could you show an
example of such a case, please?
> > > > >
> > > > >
> > > > >
> > > > > >>>2. Support of type units.
> > > > >
> > > > > >>>
> > > > >
> > > > > >>>>  That could be implemented further.
> > > > >
> > > > > >>>Enabling type units increases object size
to make it easier to deduplicate at link time by a DWARF-unaware
> > > > >
> > > > > >>>linker. With a DWARF aware linker it'd
be generally desirable not to have to add that object size overhead to
> > > > >
> > > > > >>>get the linking improvements.
> > > > >
> > > > > >>
> > > > >
> > > > > >>But, DWARFLinker should adequately work with
type units since they are already implemented.
> > > > >
> > > > >
> > > > > > Maybe - it'd be nice & all, but I
don't think it's an outright necessity - if someone knows they're
using
> > > > > > a DWARF-aware linker, they'd probably not use
type units in their object files. It's possible someone
> > > > > > doesn't know for sure & maybe they have
pre-canned debug object files from someone else, etc.
> > > > >
> > > > > I see.
> > > > >
> > > > > >>Another thing is that the idea behind type
units has the potential to help Dwarf-aware linker to work faster.
> > > > >
> > > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
> > > > >
> > > > >
> > > > > >When you say "analyzes context" what do
you mean? Usually I'd take that to mean
> > > > > > "looks at things outside the type itself -
like what namespace it's in, etc" - which, yes,
> > > > > > it should do that, but it doesn't seem very
expensive to do. But I guess you actually
> > > > > > mean something about doing structural equivalence
in some way, looking at things inside the type?
> > > > >
> > > > > I think it could be useful for both cases. Currently,
dsymutil does only first thing
> > > > > (look at type name, namespace name, etc..) and does not
do the second thing
> > > > > (doing structural equivalence). Analyzing type names is
currently quite expensive
> > > > > (the only search in string pool takes ~10 sec from 70
sec of overall time).
> > > > > That is expensive because of many things should be done
to work with strings:
> > > > > parse DWARF, search and resolve relocations, compute a
hash for strings,
> > > > > put data into a string pool, create a fully qualified
name(like namespace::function::name).
> > > > > It looks like it could be optimized and finally require
less time, but it still would be a noticeable
> > > > > part of the overall time.
> > > > >
> > > > > If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
> > > > > So, If instead of comparing types structure, there
would be checked single hash-id - then this process
> > > > > would also be faster.
> > > > >
> > > > > Thus I think using hash-id to compare types would allow
to make current implementation faster and would
> > > > > allow handling incomplete types by DWARFLinker without
massive performance degradation also.
> > > > >
> > > > > >> But the context is known when types are
generated. So, no need to spent the time analyzing it.
> > > > >
> > > > > >> If types could be compared without analyzing
context, then Dwarf-aware linker would work faster.
> > > > >
> > > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
> > > > >
> > > > > >> (instead of COMDAT section group) and could be
accessed through hash-id(like type units
> > > > >
> > > > > >> - then it would be the solution requiring
fewer bits to store but allowing to compare types
> > > > >
> > > > > >> by hash-id(not analysing context).
> > > > > >> In this case, size increasing would be small.
And processing time could be done faster.
> > > > > >>
> > > > > >> this is just an idea and could be discussed
separately from the problem of integrating of D74169.
> > > > >
> > > > > >> >> 6. -flto=thin
> > > > >
> > > > > >> >>    That problem was described in this
review https://reviews.llvm.org/D54747#1503720. It also exists in
> > > > >
> > > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
> > > > >
> > > > > >> >> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
> > > > >
> > > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> > > > >
> > > > > >> >> more to reduce that redundancy early
on (actually removing definitions from some llvm Modules if the type
> > > > >
> > > > > >> >> definition is known to exist in
another Module, etc)
> > > > > >> >I don't know if it's a problem
since that patch was reverted.
> > > > >
> > > > > >>
> > > > >
> > > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
> > > > >
> > > > > >> if D74169 would be applied and --gc-debuginfo
used then structure type
> > > > > >> definition would be removed.
> > > > >
> > > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
> > > > > >> definition is known to exist in another
Module".
> > > > > >> i.e. DWARFLinker could replace the declaration
with the definition.
> > > > >
> > > > > >> But that problem could be more easily resolved
when debug info is generated(probably without
> > > > > >> significant increase of debug info size):
> > > > >
> > > > > >> Here we have:
> > > > >
> > > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".
> > > > > >> DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function "f".
> > > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.
> > > > >
> > > > > >> Code for function "f" was deleted.
gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > > > > >> containing "f" definition (since
there is no corresponding code). But it has structure "Foo" definition
> > > > > >> DW_TAG_structure_type(0x0000011e) referenced
from DW_TAG_compile_unit(0x00000073)
> > > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
> > > > > >> was removed by thinlto and replaced with
declaration.
> > > > >
> > > > > >> Would it cost too much if type definition
would not be replaced with declaration for "abstract instance root"?
> > > > > >> The number of concrete instances is bigger
than number of abstract instance roots.
> > > > > >> Probably, it would not be too costly to leave
definition in abstract instance root?
> > > > >
> > > > >
> > > > >
> > > > > >> Alternatively, Would it cost too much if type
definition would not be replaced with declaration when
> > > > > >> declaration references type from not used
function? (lto could understand that concrete function is not used).
> > > > >
> > > > >
> > > > > >I don't follow this example - could you provide
a small concrete test case I could reproduce?
> > > > >
> > > > > I would provide a test case if necessary. But it looks
like this issue is finally clear, and you already commented on that.
> > > > >
> > > > > > Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone
> > > > > > definition of 'f' won't be needed - so
it produces one in case one of the inlining opportunities
> > > > > > doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition wasn't needed.
> > > > >
> > > > > > Oh, you're suggesting that these 3 CUs got
emitted into one object file during LTO, but that DWARFLinker
> > > > > > drops a CU without any code in it - even though...
So far as I know, in LTO, LLVM directly references
> > > > > > types across units if the CUs are all emitted in
the same object file. (and if they weren't in the same
> > > > > > object file - then the abstract_origin
couldn't be pointing cross-CU).
> > > > >
> > > > > > I guess some basic things to say:
> > > > >
> > > > > > With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites don't end up
> > > > > > being inlined. So we know it'll be emitted
(but might not be needed by the actual linker)
> > > > > > ANy number of inline calls might exist - but we
shouldn't put the type information into those, because
> > > > > > they aren't guaranteed to emit it (if the
inline function gets optimized away, there would be nothing to
> > > > > > enforce the type being emitted) - and even if we
forced the type information to be emitted into one
> > > > > > object file that has an inline copy of the
function - there's no guarantee that object file will get linked in either.
> > > > >
> > > > > > So, no, I don't think there's much we can
do to keep the size of object files down, while guaranteeing
> > > > > > the type information will be emitted with the
usual linker semantics.
> > > > >
> > > > > Then dsymutil/DWARFLinker could be changed to handle
that(though it would probably be not very efficient).
> > > > > If thinlto would understand that function is not used
finally(and then must not contain referenced type definition),
> > > > > then this situation could be handled more effectively.
> > > > >
> > > > > Thank you, Alexey.
> > > > >
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> LLVM Developers mailing list
> > > > >>> llvm-dev at lists.llvm.org
> > > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alexey Lapshin via llvm-dev

2020-Jun-23 21:19 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>On 2020-06-22, Alexey Lapshin via llvm-dev wrote:
>>>> >> >Alexey> Probably we could try to make DWARF
easy to parsing, trimming, rewriting so that full DWARF
>>>> >> >Alexey> parsing solution would not take too
much time?
>>>> >> >Alexey>
>>>> >> >Alexey> f.e. -debug-types-section solution uses
COMDAT sections to split and deduplicate types.
>>>> >> >Alexey> That solution works quite fast. It has
already mentioned drawback with a big size
>>>> >> >Alexey> overhead(because of section
headers/type unit headers sizes). But, the fact that type units
>>>> >> >Alexey> could be identified just by
hash-id(without parsing type names and types hierarchies)
>>>> >> >Alexey> allows the linker to reject
duplications quickly. Another thing is that the linker drops
>>>> >> >Alexey> duplicated COMDAT sections without any
additional check. After duplications are deleted,
>>>> >> >Alexey> the debug info is still consistent.
>>>> >> >Alexey> There could be done DWARF aware
solution working using the same two principles:
>>>> >> >Alexey> 1. compare types by hash-id.
>>>> >> >Alexey> 2. drop duplications without analyzing
contents.
>>>> >> >Alexey>
>>>> >> >Alexey> If all types are put into a separate
type table and have hash-id, then it would be much easier to
>>>> >> >Alexey> deduplicate them. The idea demonstrated
here - https://reviews.llvm.org/P8164. (It still has a
>>>> >> >Alexey> questions: whether base types should be
put into type table, whether references into type table
>>>> >> >Alexey> should be done by DW_AT_signature or
just by offset, etc.. ) While handling that separate type table
>>>> >> >Alexey> the DWARF aware linker would check the
only hash_id and put only one type description
>>>> >> >Alexey> with the same id in the final type
table. It also would allow us to solve that -flto=thin problem -
>>>> >> >Alexey> 
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil
example there).
>>>> >> >Alexey> i.e., the case when type definition
would be removed will not occur.
>>>> >>
>>>> >> David>I think there is scope for lower-overhead
type deduplication,
>>>> >> David>especially now with type units being merged
into the debug_info
>>>> >> David>section. Perhaps we could drop dwo_ids and
use section references to
>>>> >> David>refer to types & rely on the linker to
keep those referenced sections
>>>> >> David>alive - though section references are longer
than CU-relative
>>>> >> David>references. (but we need the extra length -
because if the linker
>>>> >> David>deduplicates a type definition - one CU may
be referencing a type very
>>>> >> David>far away, so the shorter reference might be
inadequate) I don't think
>>>> >> David>the indirection through the type hash is
/super/ significant to the
>>>> >> David>cost - I think it's more in the
duplication of many DIEs especially
>>>> >> David>for function definitions (since the type unit
sig8 system only
>>>> >> David>provides a way to reference the type - not
its member functions, their
>>>> >> David>parameters, etc - so all those DIEs get
duplicated in any CU that
>>>> >> David>needs to provide a definition of a member
function). We could
>>>> >> David>prototype cross-unit DIE references to lower
the cost of that
>>>> >> David>duplication, though rumor has it that
constructor based type homing
>>>> >> David>might provide enough value to obviate the
need for type units (or at
>>>> >> David>least make the overhead not worthwhile - so
revisiting the overhead to
>>>> >> David>reduce it might make it worthwhile again...
).
>>>> >>
>>>> >> David>Probably wouldn't be super hard to use
LLVM's existing cross-unit DIE
>>>> >> David>Referencing machinery (implemented for LTO)
to refer directly to DIEs
>>>> >> David>in a type unit without using the signature
system... - hmm, that'd
>>>> >> David>only work if your type unit DIEs were
identical? /maybe/ ? Not sure
>>>> >> David>how that'd work if you wanted to refer
into a type unit, but the type
>>>> >> David>unit got deduplicated. Might be able to rely
on the linker to preserve
>>>> >> David>every unique copy of the type unit that's
referenced if we phrase
>>>> >> David>things carefully - so if your compiler does
produce exactly identical
>>>> >> David>type units they get deduplicated and sec_refs
refer to the uniquely
>>>> >> David>preserved copy - but otherwise it preserves
as many distinct copies as
>>>> >> David>needed. (I don't know enough about how
that works to be sure - but I
>>>> >> David>know that these linkonce/inline function
deduplication does seem to
>>>> >> David>cause the DWARF to refer to the singular
function if that function is
>>>> >> David>identical, and if it isn't, then you get
0 - so there's /something/ in
>>>> >> David>the linker that can adjust for deduplicating
identical duplicates... )
>>>> >>
>>>> >> Probably I was a bit unclear: the above idea is not
for types
>>>> >> (placed in COMDAT sections) deduplicated by the
linker.
>>>>
>>>> >Yeah, I was just wondering whether it could be useful for
that too.
>>>>
>>>> >> This idea goes in another direction than fragmenting
dwarf
>>>> >> using elf sections&tricks. It seems to me that the
cost of fragmenting is too high.
>>>>
>>>> >I tend to agree - but I'm sort of leaning towards
trying to use object
>>>> >features as much as possible, then implementing just enough
custom
>>>> >handling in the linker to recoup overhead, etc. (eg: add
some kind of
>>>> >small header/brief description that makes it easy for the
linker to
>>>> >slice-and-dice - but hopefully a domain-specific such
header can be a
>>>> >bit more compact than the fully general ELF form)
>>>>
>>>> I think this indeed should be implemented and evaluated.
>>>> So that various approaches could be compared.
>>>>
>>>> >> It is not only the sizes of structures describing
fragments but also the complexity
>>>> >> of tools that should be taught to work with fragmented
DWARF.
>>>> >> (f.e. llvm-dwarfdump applied to object file should be
able to read fragmented DWARF,
>>>> >> but applied to linked executable it should work with
non-fragmented DWARF).
>>>> >> That idea is for the tool which works the same way as
dsymutil ODR.
>>>> >>
>>>> >> I will shortly describe the idea of making DWARF be
easier processed by dsymutil/DWARFLinker:
>>>> >>
>>>> >> The idea is to have only one "type table"
per object file(special section .debug_types_table).
>>>> >> This "type table" would contain all types.
>>>> >> There could be a special type of reference -
type_offset - that offset points into the type table.
>>>> >> Basic types could always be placed into the start of
"type table" thus, offsets to basic types
>>>> >> most often would be 1 byte. There also would be a
special kind of reference - reference inside the type.
>>>> >> Type units sig8 system - would not be used to
reference types.
>>>> >>
>>>> >> Types deduplication is assumed to be done, not by
linker mechanism for COMDAT,
>>>> >> but by a tool like dsymutil. This tool would create
resulting .debug_types_table by putting there
>>>> >> types from source .debug_types_table-s. Only one copy
of the type would be placed into the
>>>> >> resulting table. All references pointing to the
deleted copy would be corrected to point
>>>> >> to the single copy inside "type table".
(that is how dsymutil works currently)
>>>>
>>>> >^ that's the step that's probably a bit expensive
for a general-use
>>>> >tool - it implies parsing all the DWARF to find those
references and
>>>> >rewrite them, I think. For a high-performance solution that
could be
>>>> >run by the linker I think it'd be necessary to have a
solution that
>>>> >doesn't involve parsing all the DIEs.
>>>>
>>>> According to the current dsymutil processing,
>>>> exactly this process is not the most time-consuming.
>>>> That could be done relatively fast.
>>
>>>Fair enough - though I'd still imagine any solution that
involves
>>>parsing all the DIEs still wouldn't be fast enough (maybe an
order of
>>>magnitude faster than the current solution even - but that's
stuill,
>>>what, 6 or 7x slower than linking without the feature?) for most
users
>>>to consider it a good trade-off.
>>
>>It seems to me that even the current 6x-7x slowdown could be useful.
>>Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
>>would be taught to work with a split dwarf) tools spend this time and,
>>in some scenarios, waste disk space by inter-mediate files.
>>Thus if they would use this LLD feature in its current state
>>- they would still receive benefits.
>>
>>Speaking of performance results - LLD is a multi-thread linker;
>>it handles sections in parallel. DWARFLinker generates DWARF using
>>AsmPrinter which is a stream - so it could make resulting DWARF only
>>continuously. It is not surprising that the parallel solution works
faster.
>>Making DWARFLinker truly multi-threaded would probably allow us
>>to make slowdown to be at 2x-4x range.
>
>If it is 6x-7x slowdown (which may be optimized to 2x-4x), I wonder
>whether it is a good trade-off keeping it as an in-linker pass, or
>rather we should just use another utility compressing the output separately.
>
>If the slowdown is such a pain, I might not consider --gc-debuginfo a
>readily usable feature like --gdb-index or future --debug-names (DWARF
>v5 accelator table - I have a plan to add it but I am always distracted
>by other priorities at hand). Considering that this breaks GNU linkers,
>I will add the following lines to the build system
>
>   if LINKER_IS_LLD && ENABLE_GC_DEBUGINFO
>     add -Wl,--gc-debuginfo
>
>I don't think this is more complex than:
>
>   if ENABLE_GC_DEBUGINFO
>     set linker to a wrapper which optimizes the output like dwz
>
>We probably should add another -f option for specifying the linker path,
>like -fld-path=
https://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html
gc-debuginfo could be done not from linker but as a standalone 
tool(like dsymutil,llvm-dwp), as you said. The reasons why it 
was suggested to do from the linker: 

1. Linker already has liveness information built and object files loaded. 
   Thus, it would be the fastest implementation if called from the linker.
   Otherwise, there should be created and written address map while 
   linking, there should be generated inter-mediate debug-info files. 
   And then, the separate tool would read that map and load object files 
   or inter-mediate debug-info files again. So processing time 
   would become even longer.

2. Linker already processes debug info: error reporting, --gdb-index, 
    upcoming --debug-names. From the design point of view, it would be 
    good to have a separate module - DWARFLinker - which implements 
    all that functionality. So that there would not be additional separate 
    specific linker implementation of them. Instead, already existed 
    implementation would be called from the linker. i.e. Depending on the 
    tasks, the linker would call either DWARFLinker.generate-gdb-index(), 
    DWARFLinker.generate-debug-names(), DWARFLinker.gc-debuginfo(). 

The idea behind gc-debuginfo was not to slowdown the linking process for
everybody.
But to allow generation optimized debug-info for those who need it. 
That is the same idea as LTO. LTO slowdowns usual compilation significantly, 
but it creates a highly optimized code.

Thank you, Alexey.
>>> Anyway, I think the dsymutil approach is still valuable, and it
>>> would be useful to optimize it.
>>> Do you think it would be useful to make dsymutil/DWARFLinker truly
multi-thread?
>>> (To make dsymutil/DWARFLinker able to process each object file in a
separate thread)
>
>>Perhaps - that I'd probably leave up to the folks who are more
>>invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get
it
>>integrated into llvm-dwp and then I'll be interested in getting as
>>much performance out of it as lld - so multithreading and things would
>>be on the books.
>
>I think improving dsymutil is a valuable thing.
>Though there are several directions which might be considered
>to make it more robust:
>
>1. support of latest DWARF - DWARF5/DWARF64...
>2. implement multi-threaded execution.
>3. support of split DWARF.
>4. implement dsymutil for non-darwin platform.
>
>All of this is a massive piece of work.
>Our original investment was to solve two problems:
>
>1. Overlapped address ranges, which is currently close to being solved.
Thank you for helping with that!
>
>2. Size of debug info. That still becomes an issue, but we are unsure
whether we are ready to
>   invest in solving all the above 1-4 problems and how much community
interested in it.
>
>Thank you, Alexey.
>
>>> >One way to do that would be to have a CU-local type indirection
table.
>>> >DIEs reference local type numbers (like local address/string
numbers -
>>> >addrx/strx/rnglistx) and that table contains either sig8 (no
linker
>>> >fixups required) or the local type offsets you describe - the
linker
>>> >would then only need to read this type number indirection table
and
>>> >rewrite them to the final type numbers.
>>>
>>> Yes, that could be additionally done if this process would be
time-consuming.
>>>
>>> David, thank you for all your comments and explanations. They are
extremely helpful.
>
>>Sure thing - really appreciate your patience with all this - it's...
a
>>lot of moving parts.
>
>>- Dave
>
>>
>> Thank you, Alexey.
>>
>> > sig8 hash-id would be used to compare types and to deduplicate
them.
>> > It would speed up the current dsymutil context analysis.
>> > Types having the same hash-id could be deduplicated.
>> > This would allow deduplicating a more number of types than current
dsymutil.
>> > Incomplete type definitions having a similar set of members are
not deduplicated by dsymutil currently.
>> > In this case they would have the same hash-id.
>> >
>> > This "type table" would take less space than current
"type units" and current ODR solution.
>> >
>> > Above is just an idea on how to help DWARF-aware linker(based on
idea removing obsolete debug info)
>> > to work faster(if that is interesting).
>> >
>> > Alexey.
>> >
>> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of James Henderson via llvm-dev
>> > > Sent: Wednesday, June 3, 2020 3:48 AM
>> > > To: David Blaikie <dblaikie at gmail.com>
>> > > Cc: llvm-dev at lists.llvm.org
>> > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.
>> > >
>> > >
>> > >
>> > > It makes me sad that the linker (via a library or otherwise)
has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
>> > >
>> > >
>> > >
>> > > The patching to -1 (or equivalent) is probably a good
lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
>> > >
>> > >
>> > >
>> > > We've already seen from Alexey's prototyping, and
from our own experiences with the Sony proprietary linker (which tried to
rewrite .debug_line only) that deconstructing the DWARF so that it can be more
optimally reassembled at link time is slow going, and will probably inevitably
be however much effort is put into optimising it. For a start, given the current
standards, it's impossible to know how to deconstruct it without having to
parse vast amounts of DWARF, which is typically going to mean a lot more parsing
work than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links that
large amounts of the DWARF will be redundant. Having an option to opt-in
doesn't help much there, since it just means the logic exists without most
people using it, due to it not being good enough, or potentially they don't
even know it exists.
>> > >
>> > >
>> > >
>> > > I don't have particularly concrete suggestions as to how
to solve the structural problems with DWARF at this point. The only thing that
seems obvious to me is a more "blessed" approach to fragmentation of
sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
>> > >
>> > >
>> > >
>> > > Just some thoughts.
>> > >
>> > >
>> > >
>> > > James
>> > >
>> > >
>> > >
>> > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> > >
>> > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
>> > > <alapshin at accesssoftek.com> wrote:
>> > > >
>> > > > Hi David, please find my comments inside:
>> > > >
>> > > >
>> > > > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
>> > > >
>> > > > >>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
>> > > >
>> > > > >>There are two general requirements:
>> > > > >> 1) Remove (or clean) invalid debug info.
>> > > >
>> > > > >
>> > > > >Perhaps a simpler direct solution for your immediate
needs might be a much narrower,
>> > > > >and more efficient linker-DWARF-awareness feature:
>> > > > >
>> > > > > With DWARFv5, rnglists present an opportunity for a
DWARF linker to rewrite the ranges
>> > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
>> > > > > can be referenced either directly, or by index. If
all rnglists are referenced by index, then
>> > > > > a linker could parse only the debug_rnglists
section and rewrite ranges to remove any
>> > > > > address ranges that refer to optimized-out code.
>> > > > >
>> > > > > This would only be correct for rnglists that had no
direct references to them (that only were
>> > > > > referenced via the indexes) - but we could either
implement it with that assumption, or could
>> > > > > add an LLVM extension attribute on the CU that
would say "I promise I only referenced rnglists
>> > > > > via rnglistx forms/indexes). If this DWARF-aware
linking would have to read the CU DIE (not
>> > > > > all the other DIEs) it /could/ also then rewrite
high/low_pc if the CU wasn't using ranges...
>> > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
>> > > > > so no need for that.
>> > > > >
>> > > > > Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions
>> > > > > ended up being laid out next to each other, the
linker could coalesce their ranges together.
>> > > > >
>> > > > > I imagine this could be implemented with very
little overhead to linking, especially compared
>> > > > > to the overhead of full DWARF-aware linking.
>> > > > >
>> > > > >Though none of this fixes Split DWARF, where the
linker doesn't get a chance to see the
>> > > > > addresses being used - but if you only want/need
the CU-level ranges to be correct, this
>> > > > > might be a viable fix, and quite efficient.
>> > > >
>> > > > Yes, we think about that alternative. This would resolve
our problem of invalid debug info
>> > > > and would work much faster. Thus, if we would not have
good results for D74169 then we
>> > > > will implement it. Do you think it could be useful to
have this solution in upstream?
>> > >
>> > > A pure rnglist rewriting - I think it'd be OK to have in
upstream -
>> > > again, cost/benefit/etc would have to be weighed. I'm not
sure it
>> > > would save enough space to be particularly valuable beyond
the
>> > > correctness issue - and it doesn't completely solve the
correctness
>> > > issue for zero-address usage or low-address usage (because
you could
>> > > still have overlapping subprograms inside a CU - so if you
were
>> > > symbolizing you could use the correct rnglist to filter, but
then go
>> > > look inside the CU only to find two subprograms that had that
address
>> > > & not know which one was the correct one an which one was
the
>> > > discarded one).
>> > >
>> > > rnglist rewriting might be easy enough to prototype - but
depends what
>> > > you want to spend your time on, I know this whole issue has
been a
>> > > huge investment of your time already - but maybe this recent
>> > > revitalization of the conversation around having an explicit
value in
>> > > the linker might be sufficient to address everyone's
needs... *fingers
>> > > crossed*)
>> > >
>> > >
>> > > > >> 2) Optimize the DWARF size.
>> > > >
>> > > >
>> > > > > Do your users care much about this? I imagine if
they had significant DWARF size issues,
>> > > > > they'd have significant link time issues and
the kind of cost to link time this feature has would
>> > > > > be prohibitive - but perhaps they're sharing
linked binaries much more often than they're
>> > > > > actually performing linking.
>> > > >
>> > > > Yes, they do. They also have significant link-time
issues.
>> > > > So current performance results of D74169 are not very
acceptable.
>> > > > We hope to improve it.
>> > > >
>> > > >
>> > > >
>> > > > >>The specifics which our users have:
>> > > > >>  - embedded platform which uses 0 as start of
.text section.
>> > > > >>  - custom toolset which does not support all
features yet(f.e. split dwarf).
>> > > > >>  - tolerant of the link-time increase.
>> > > > >>  - need a useful way to share debug builds.
>> > > >
>> > > >
>> > > > > Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
>> > > >
>> > > > Probably not significantly, but yes, it looks less
useful comparing to D74169.
>> > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
>> > > > Having only one file(executable) with minimal size looks
better than the two files with a bigger size.
>> > > >
>> > > > clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
>> > > > clang compiled with -gc-debuginfo takes only 0.76G for
single executable.
>> > > >
>> > > >
>> > > >
>> > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
>> > > >
>> > > > >>We use custom solution, but the general solution
like D74169 would be better here.
>> > > >
>> > > >
>> > > > > If CU ranges are the only ones that need fixing,
then I think the above solution might be as
>> > > > > good/better - if more than CU ranges need fixing,
then I think we might want to start talking about
>> > > > > how to fix DWARF itself (split and non-split) to
signal certain addresses point to dead code with a
>> > > > > specific blessed value that linkers would need to
implement - because with Split DWARF there's
>> > > > > no way to solve the non-CU addresses at the linker.
>> > > >
>> > > > I think the worthful solution for that signal value
would be LowPC > HighPC.
>> > > > That does not require additional bits in DWARF.
>> > > > It would be natural to skip such address ranges since
they explicitly marked as invalid.
>> > > > It could be implemented in a linker very easily.
Probably, it would make sense to describe that
>> > > > usage in DWARF standard.
>> > > >
>> > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
>> > > > they need to have another solution. Could you show an
example of such a case, please?
>> > > >
>> > > >
>> > > >
>> > > > >>>2. Support of type units.
>> > > >
>> > > > >>>
>> > > >
>> > > > >>>>  That could be implemented further.
>> > > >
>> > > > >>>Enabling type units increases object size to
make it easier to deduplicate at link time by a DWARF-unaware
>> > > >
>> > > > >>>linker. With a DWARF aware linker it'd
be generally desirable not to have to add that object size overhead to
>> > > >
>> > > > >>>get the linking improvements.
>> > > >
>> > > > >>
>> > > >
>> > > > >>But, DWARFLinker should adequately work with
type units since they are already implemented.
>> > > >
>> > > >
>> > > > > Maybe - it'd be nice & all, but I don't
think it's an outright necessity - if someone knows they're using
>> > > > > a DWARF-aware linker, they'd probably not use
type units in their object files. It's possible someone
>> > > > > doesn't know for sure & maybe they have
pre-canned debug object files from someone else, etc.
>> > > >
>> > > > I see.
>> > > >
>> > > > >>Another thing is that the idea behind type units
has the potential to help Dwarf-aware linker to work faster.
>> > > >
>> > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
>> > > >
>> > > >
>> > > > >When you say "analyzes context" what do
you mean? Usually I'd take that to mean
>> > > > > "looks at things outside the type itself -
like what namespace it's in, etc" - which, yes,
>> > > > > it should do that, but it doesn't seem very
expensive to do. But I guess you actually
>> > > > > mean something about doing structural equivalence
in some way, looking at things inside the type?
>> > > >
>> > > > I think it could be useful for both cases. Currently,
dsymutil does only first thing
>> > > > (look at type name, namespace name, etc..) and does not
do the second thing
>> > > > (doing structural equivalence). Analyzing type names is
currently quite expensive
>> > > > (the only search in string pool takes ~10 sec from 70
sec of overall time).
>> > > > That is expensive because of many things should be done
to work with strings:
>> > > > parse DWARF, search and resolve relocations, compute a
hash for strings,
>> > > > put data into a string pool, create a fully qualified
name(like namespace::function::name).
>> > > > It looks like it could be optimized and finally require
less time, but it still would be a noticeable
>> > > > part of the overall time.
>> > > >
>> > > > If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
>> > > > So, If instead of comparing types structure, there would
be checked single hash-id - then this process
>> > > > would also be faster.
>> > > >
>> > > > Thus I think using hash-id to compare types would allow
to make current implementation faster and would
>> > > > allow handling incomplete types by DWARFLinker without
massive performance degradation also.
>> > > >
>> > > > >> But the context is known when types are
generated. So, no need to spent the time analyzing it.
>> > > >
>> > > > >> If types could be compared without analyzing
context, then Dwarf-aware linker would work faster.
>> > > >
>> > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
>> > > >
>> > > > >> (instead of COMDAT section group) and could be
accessed through hash-id(like type units
>> > > >
>> > > > >> - then it would be the solution requiring fewer
bits to store but allowing to compare types
>> > > >
>> > > > >> by hash-id(not analysing context).
>> > > > >> In this case, size increasing would be small.
And processing time could be done faster.
>> > > > >>
>> > > > >> this is just an idea and could be discussed
separately from the problem of integrating of D74169.
>> > > >
>> > > > >> >> 6. -flto=thin
>> > > >
>> > > > >> >>    That problem was described in this
review https://reviews.llvm.org/D54747#1503720. It also exists in
>> > > >
>> > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
>> > > >
>> > > > >> >> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
>> > > >
>> > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>> > > >
>> > > > >> >> more to reduce that redundancy early
on (actually removing definitions from some llvm Modules if the type
>> > > >
>> > > > >> >> definition is known to exist in
another Module, etc)
>> > > > >> >I don't know if it's a problem
since that patch was reverted.
>> > > >
>> > > > >>
>> > > >
>> > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
>> > > >
>> > > > >> if D74169 would be applied and --gc-debuginfo
used then structure type
>> > > > >> definition would be removed.
>> > > >
>> > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
>> > > > >> definition is known to exist in another
Module".
>> > > > >> i.e. DWARFLinker could replace the declaration
with the definition.
>> > > >
>> > > > >> But that problem could be more easily resolved
when debug info is generated(probably without
>> > > > >> significant increase of debug info size):
>> > > >
>> > > > >> Here we have:
>> > > >
>> > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".
>> > > > >> DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function "f".
>> > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.
>> > > >
>> > > > >> Code for function "f" was deleted.
gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
>> > > > >> containing "f" definition (since
there is no corresponding code). But it has structure "Foo" definition
>> > > > >> DW_TAG_structure_type(0x0000011e) referenced
from DW_TAG_compile_unit(0x00000073)
>> > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
>> > > > >> was removed by thinlto and replaced with
declaration.
>> > > >
>> > > > >> Would it cost too much if type definition would
not be replaced with declaration for "abstract instance root"?
>> > > > >> The number of concrete instances is bigger than
number of abstract instance roots.
>> > > > >> Probably, it would not be too costly to leave
definition in abstract instance root?
>> > > >
>> > > >
>> > > >
>> > > > >> Alternatively, Would it cost too much if type
definition would not be replaced with declaration when
>> > > > >> declaration references type from not used
function? (lto could understand that concrete function is not used).
>> > > >
>> > > >
>> > > > >I don't follow this example - could you provide
a small concrete test case I could reproduce?
>> > > >
>> > > > I would provide a test case if necessary. But it looks
like this issue is finally clear, and you already commented on that.
>> > > >
>> > > > > Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone
>> > > > > definition of 'f' won't be needed - so
it produces one in case one of the inlining opportunities
>> > > > > doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition wasn't needed.
>> > > >
>> > > > > Oh, you're suggesting that these 3 CUs got
emitted into one object file during LTO, but that DWARFLinker
>> > > > > drops a CU without any code in it - even though...
So far as I know, in LTO, LLVM directly references
>> > > > > types across units if the CUs are all emitted in
the same object file. (and if they weren't in the same
>> > > > > object file - then the abstract_origin couldn't
be pointing cross-CU).
>> > > >
>> > > > > I guess some basic things to say:
>> > > >
>> > > > > With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites don't end up
>> > > > > being inlined. So we know it'll be emitted (but
might not be needed by the actual linker)
>> > > > > ANy number of inline calls might exist - but we
shouldn't put the type information into those, because
>> > > > > they aren't guaranteed to emit it (if the
inline function gets optimized away, there would be nothing to
>> > > > > enforce the type being emitted) - and even if we
forced the type information to be emitted into one
>> > > > > object file that has an inline copy of the function
- there's no guarantee that object file will get linked in either.
>> > > >
>> > > > > So, no, I don't think there's much we can
do to keep the size of object files down, while guaranteeing
>> > > > > the type information will be emitted with the usual
linker semantics.
>> > > >
>> > > > Then dsymutil/DWARFLinker could be changed to handle
that(though it would probably be not very efficient).
>> > > > If thinlto would understand that function is not used
finally(and then must not contain referenced type definition),
>> > > > then this situation could be handled more effectively.
>> > > >
>> > > > Thank you, Alexey.
>> > > >
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> _______________________________________________
>> > > >>> LLVM Developers mailing list
>> > > >>> llvm-dev at lists.llvm.org
>> > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > > _______________________________________________
>> > > LLVM Developers mailing list
>> > > llvm-dev at lists.llvm.org
>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alexey Lapshin via llvm-dev

2020-Jun-25 21:23 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>On Mon, Jun 22, 2020 at 7:21 AM Alexey Lapshin
><alapshin at accesssoftek.com> wrote:
>>
>> >>
>> >> >> This idea goes in another direction than fragmenting
dwarf
>> >> >> using elf sections&tricks. It seems to me that
the cost of fragmenting is too high.
>> >>
>> >> >I tend to agree - but I'm sort of leaning towards
trying to use object
>> >> >features as much as possible, then implementing just
enough custom
>> >> >handling in the linker to recoup overhead, etc. (eg: add
some kind of
>> >> >small header/brief description that makes it easy for the
linker to
>> >> >slice-and-dice - but hopefully a domain-specific such
header can be a
>> >> >bit more compact than the fully general ELF form)
>> >>
>> >> I think this indeed should be implemented and evaluated.
>> >> So that various approaches could be compared.
>> >>
>> >> >> It is not only the sizes of structures describing
fragments but also the complexity
>> >> >> of tools that should be taught to work with
fragmented DWARF.
>> >> >> (f.e. llvm-dwarfdump applied to object file should be
able to read fragmented DWARF,
>> >> >> but applied to linked executable it should work with
non-fragmented DWARF).
>> >> >> That idea is for the tool which works the same way as
dsymutil ODR.
>> >> >>
>> >> >> I will shortly describe the idea of making DWARF be
easier processed by dsymutil/DWARFLinker:
>> >> >>
>> >> >> The idea is to have only one "type table"
per object file(special section .debug_types_table).
>> >> >> This "type table" would contain all types.
>> >> >> There could be a special type of reference -
type_offset - that offset points into the type table.
>> >> >> Basic types could always be placed into the start of
"type table" thus, offsets to basic types
>> >> >> most often would be 1 byte. There also would be a
special kind of reference - reference inside the type.
>> >> >> Type units sig8 system - would not be used to
reference types.
>> >> >>
>> >> >> Types deduplication is assumed to be done, not by
linker mechanism for COMDAT,
>> >> >> but by a tool like dsymutil. This tool would create
resulting .debug_types_table by putting there
>> >> >> types from source .debug_types_table-s. Only one copy
of the type would be placed into the
>> >> >> resulting table. All references pointing to the
deleted copy would be corrected to point
>> >> >> to the single copy inside "type table".
(that is how dsymutil works currently)
>> >>
>> >> >^ that's the step that's probably a bit expensive
for a general-use
>> >> >tool - it implies parsing all the DWARF to find those
references and
>> >> >rewrite them, I think. For a high-performance solution
that could be
>> >> >run by the linker I think it'd be necessary to have a
solution that
>> >> >doesn't involve parsing all the DIEs.
>> >>
>> >> According to the current dsymutil processing,
>> >> exactly this process is not the most time-consuming.
>> >> That could be done relatively fast.
>>
>> >Fair enough - though I'd still imagine any solution that
involves
>> >parsing all the DIEs still wouldn't be fast enough (maybe an
order of
>> >magnitude faster than the current solution even - but that's
stuill,
>> >what, 6 or 7x slower than linking without the feature?) for most
users
>> >to consider it a good trade-off.
>>
>> It seems to me that even the current 6x-7x slowdown could be useful.
>> Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
>> would be taught to work with a split dwarf) tools spend this time and,
>> in some scenarios, waste disk space by inter-mediate files.
>
>FWIW, dwp (llvm-dwp hasn't really been optimized compared to binutils
>dwp) is designed to be very quick - by not needing to do a lot of
>parsing/fixups. Which, yes, means larger output files than would be
>possible with more parsing/etc. It also doesn't take any input from
>the linker (so it can run in parallel with the linker) - so it can't
>remove dead subprograms. Given Google's the major (perhaps only
>significant?) user of Split DWARF - I can say that the needs don't
>necessarily overlap well with something that would take significantly
>longer to run or use significantly more memory. Faster/cheaper/with
>somewhat bigger output files is probably the right tradeoff for
>Google's use case, at least.
>
>I imagine Apple's use for dsymutil is somewhat similar - it's not
used
>in the iterative development cycle, only in final releases - well,
>maybe their situation is more "neutral" - not a major pain point
in
>any case I'd guess.
>
I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from lld:

1. split-dwarf+llvm-dwp = linking time for clang 6 sec, 
    generating time for .dwp 53 sec, clang=997M clang.dwp=1.1G.
2. DWARFLinker from lld = linking time for clang 72 sec, clang=760M.

>> Thus if they would use this LLD feature in its current state
>> - they would still receive benefits.
>>
>> Speaking of performance results - LLD is a multi-thread linker;
>> it handles sections in parallel. DWARFLinker generates DWARF using
>> AsmPrinter which is a stream - so it could make resulting DWARF only
>> continuously. It is not surprising that the parallel solution works
faster.
>> Making DWARFLinker truly multi-threaded would probably allow us
>> to make slowdown to be at 2x-4x range.
>
>*nod* that's still a really expensive link - but I understand that's
a
>suitable tradeoff for your users
>
Btw, 2x or 7x is for pure linking time. Overall compilation slowdown 
is not so significant. Building LLVM codebase has only 20% slowdown.
>> >> Anyway, I think the dsymutil approach is still valuable, and
it
>> >> would be useful to optimize it.
>> >> Do you think it would be useful to make dsymutil/DWARFLinker
truly multi-thread?
>> >> (To make dsymutil/DWARFLinker able to process each object file
in a separate thread)
>>
>> >Perhaps - that I'd probably leave up to the folks who are more
>> >invested in dsymutil (Adrian Prantl et al). Maybe one day we'll
get it
>> >integrated into llvm-dwp and then I'll be interested in getting
as
>> >much performance out of it as lld - so multithreading and things
would
>> >be on the books.
>>
>> I think improving dsymutil is a valuable thing.
>> Though there are several directions which might be considered
>> to make it more robust:
>>
>> 1. support of latest DWARF - DWARF5/DWARF64...
>
>I expect/though some of the Apple folks had already worked on DWARF5
support?
>DWARF64 - that's been around for a while, and just hasn't been
needed
>by LLVM users thus far, it seems (until recently - where some
>developers have started working on that)
There already implemented debug_names table, but debug_rnglists, 
debug_loclists, type units - are not implemented yet. The thing which 
should probably be changed is that dsymutil should not have its version 
of code generating DWARF tables. It should call already existed 
DWARF5/DWARF64 implementations. Then dsymutil would always 
use last DWARF generators.
>
>> 2. implement multi-threaded execution.
>> 3. support of split DWARF.
>
>Maybe, though I'm still not sure it'd be the right tradeoff -
>especially if it involved having to wait to run the .dwo merger (call
>it DWARF-aware dwp, or dsymutil with dwp support) until after the
>linker ran.
>
>> 4. implement dsymutil for non-darwin platform.
>
>That's probably, essentially (3), more-or-less. Split DWARF is
>somewhat of a formalization of Apple's/MachO DWARF distribution model
>(leave DWARF it in files that aren't linked/use them from a debugger,
>but also be able to merge them into some final file (dsym or dwp) for
>archival purposes)
>
>> All of this is a massive piece of work.
>> Our original investment was to solve two problems:
>>
>> 1. Overlapped address ranges, which is currently close to being solved.
Thank you for helping with that!
>
>Yeah, again, sorry that's taken quite so long/somewhat circuitous route.
>
>> 2. Size of debug info. That still becomes an issue, but we are unsure
whether we are ready to
>>    invest in solving all the above 1-4 problems and how much community
interested in it.
>
>Fair, for sure - I don't think you'd need to sign up to solve all of
>them (don't think they necessarily need solving). Potentially moving
>the logic out into a separate tool as Fangrui's considering - a
>post-link DWARF optimizer, rather than in-linker DWARF optimization.
>
>I really don't want to give you the runaround like this - but multiple
>times slower links is something that seems pretty problematic for most
>users, to the point of weighing the maintainability of lld against the
>convenience of having this functionality in-linker rather than in a
>post-link optimizer.
>
>(I know you've spoken a bit before about your users needs - but if
>it's possible, could you explain (again :/) why they have such a
>strong need for smaller DWARF? While DWARF size is an ongoing concern
>for many users (Google certainly - hence the invention of Split DWARF,
>use of type units and compressed DWARF, etc) - usually it's in rather
>large programs, but it sounds like you're dealing with relatively
>small ones (otherwise the increase in link time, I'd imagine, would be
>prohibitive for your users?)? 
We have many large programs and keep Dayly/Nightly debug builds, 
which takes a lot of disk space. Compilation time for these programs is big. 
The scenario is "compile once".(not compile-debug-compile-debug).
So we think that solution(like dsymutil/DWARFLinker) would not slowdown 
the compilation time of overall build significantly(see above numbers for 
llvm codebase) and would allow us to reduce disk space required to keep 
all of these builds. 
>You mentioned that the usability cost of
>Split DWARF for your users was too high (or high enough to justify
>this alternative work of DWARF-aware linking)? That all seems a bit
>surprising to me - though I understand the deployment issues of Split
>DWARF do present some challenges to users in more heterogenous
>environments than Google's... still, I'd have thought there was some
>hope there)
Our tools does not support split dwarf yet. Though we plan to implement it. 
When we would have support of split dwarf then it would be 
convenient to have easy way to share built debug binaries. llvm-dwp is the 
answer to this. DWARFLinker could probably be another answer.

Thank you, Alexey.
> >> >One way to do that would be to have a CU-local type
indirection table.
> >> >DIEs reference local type numbers (like local address/string
numbers -
> >> >addrx/strx/rnglistx) and that table contains either sig8 (no
linker
> >> >fixups required) or the local type offsets you describe - the
linker
> >> >would then only need to read this type number indirection
table and
> >> >rewrite them to the final type numbers.
> >>
> >> Yes, that could be additionally done if this process would be
time-consuming.
> >>
> >> David, thank you for all your comments and explanations. They are
extremely helpful.
>
> >Sure thing - really appreciate your patience with all this -
it's... a
> >lot of moving parts.
>
> >- Dave
>
> >
> > Thank you, Alexey.
> >
> > > sig8 hash-id would be used to compare types and to deduplicate
them.
> > > It would speed up the current dsymutil context analysis.
> > > Types having the same hash-id could be deduplicated.
> > > This would allow deduplicating a more number of types than
current dsymutil.
> > > Incomplete type definitions having a similar set of members are
not deduplicated by dsymutil currently.
> > > In this case they would have the same hash-id.
> > >
> > > This "type table" would take less space than current
"type units" and current ODR solution.
> > >
> > > Above is just an idea on how to help DWARF-aware linker(based on
idea removing obsolete debug info)
> > > to work faster(if that is interesting).
> > >
> > > Alexey.
> > >
> > > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of James Henderson via llvm-dev
> > > > Sent: Wednesday, June 3, 2020 3:48 AM
> > > > To: David Blaikie <dblaikie at gmail.com>
> > > > Cc: llvm-dev at lists.llvm.org
> > > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.
> > > >
> > > >
> > > >
> > > > It makes me sad that the linker (via a library or otherwise)
has to be "DWARF-aware" to be able to effectively handle
--gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks
of data kicking around.
> > > >
> > > >
> > > >
> > > > The patching to -1 (or equivalent) is probably a good
lightweight solution (though I'd love it if it could be done based on
section type in the future rather than section name, but that's probably
outside the realm of DWARF), as it requires only minimal understanding in the
linker, but anything beyond that seems to be complicated logic that is mostly
due to the structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it too -
there will still be debug data (potentially significant amounts in COMDAT-heavy
objects) that the linker has to write and the debugger has to somehow know how
to skip (even if it knows that -1 is special-case due to the standard being
updated, it needs to get as far as the -1), which is all wasted effort.
> > > >
> > > >
> > > >
> > > > We've already seen from Alexey's prototyping, and
from our own experiences with the Sony proprietary linker (which tried to
rewrite .debug_line only) that deconstructing the DWARF so that it can be more
optimally reassembled at link time is slow going, and will probably inevitably
be however much effort is put into optimising it. For a start, given the current
standards, it's impossible to know how to deconstruct it without having to
parse vast amounts of DWARF, which is typically going to mean a lot more parsing
work than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links that
large amounts of the DWARF will be redundant. Having an option to opt-in
doesn't help much there, since it just means the logic exists without most
people using it, due to it not being good enough, or potentially they don't
even know it exists.
> > > >
> > > >
> > > >
> > > > I don't have particularly concrete suggestions as to how
to solve the structural problems with DWARF at this point. The only thing that
seems obvious to me is a more "blessed" approach to fragmentation of
sections, similar to what I tried with my prototype mentioned earlier in the
thread, although we'd need to figure out the previously stated performance
issues. Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker - each
object could have separate table header sections, which are referenced by the
individual .debug_* blocks, which in turn are one per function/data piece and
easily discardable/merged by the linker.
> > > >
> > > >
> > > >
> > > > Just some thoughts.
> > > >
> > > >
> > > >
> > > > James
> > > >
> > > >
> > > >
> > > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> > > >
> > > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > > > <alapshin at accesssoftek.com> wrote:
> > > > >
> > > > > Hi David, please find my comments inside:
> > > > >
> > > > >
> > > > > >>>Broad question: Do you have any specific
motivation/users/etc in implementing this (if you can speak about it)?
> > > > >
> > > > > >>> - it might help motivate the work,
understand what tradeoffs might be suitable for you/your users, etc.
> > > > >
> > > > > >>There are two general requirements:
> > > > > >> 1) Remove (or clean) invalid debug info.
> > > > >
> > > > > >
> > > > > >Perhaps a simpler direct solution for your
immediate needs might be a much narrower,
> > > > > >and more efficient linker-DWARF-awareness feature:
> > > > > >
> > > > > > With DWARFv5, rnglists present an opportunity for
a DWARF linker to rewrite the ranges
> > > > > > without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist entries
> > > > > > can be referenced either directly, or by index. If
all rnglists are referenced by index, then
> > > > > > a linker could parse only the debug_rnglists
section and rewrite ranges to remove any
> > > > > > address ranges that refer to optimized-out code.
> > > > > >
> > > > > > This would only be correct for rnglists that had
no direct references to them (that only were
> > > > > > referenced via the indexes) - but we could either
implement it with that assumption, or could
> > > > > > add an LLVM extension attribute on the CU that
would say "I promise I only referenced rnglists
> > > > > > via rnglistx forms/indexes). If this DWARF-aware
linking would have to read the CU DIE (not
> > > > > > all the other DIEs) it /could/ also then rewrite
high/low_pc if the CU wasn't using ranges...
> > > > > > but that wouldn't come up in the
function-removal case, because then you'd have ranges anyway,
> > > > > > so no need for that.
> > > > > >
> > > > > > Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions
> > > > > > ended up being laid out next to each other, the
linker could coalesce their ranges together.
> > > > > >
> > > > > > I imagine this could be implemented with very
little overhead to linking, especially compared
> > > > > > to the overhead of full DWARF-aware linking.
> > > > > >
> > > > > >Though none of this fixes Split DWARF, where the
linker doesn't get a chance to see the
> > > > > > addresses being used - but if you only want/need
the CU-level ranges to be correct, this
> > > > > > might be a viable fix, and quite efficient.
> > > > >
> > > > > Yes, we think about that alternative. This would
resolve our problem of invalid debug info
> > > > > and would work much faster. Thus, if we would not have
good results for D74169 then we
> > > > > will implement it. Do you think it could be useful to
have this solution in upstream?
> > > >
> > > > A pure rnglist rewriting - I think it'd be OK to have in
upstream -
> > > > again, cost/benefit/etc would have to be weighed. I'm
not sure it
> > > > would save enough space to be particularly valuable beyond
the
> > > > correctness issue - and it doesn't completely solve the
correctness
> > > > issue for zero-address usage or low-address usage (because
you could
> > > > still have overlapping subprograms inside a CU - so if you
were
> > > > symbolizing you could use the correct rnglist to filter, but
then go
> > > > look inside the CU only to find two subprograms that had
that address
> > > > & not know which one was the correct one an which one
was the
> > > > discarded one).
> > > >
> > > > rnglist rewriting might be easy enough to prototype - but
depends what
> > > > you want to spend your time on, I know this whole issue has
been a
> > > > huge investment of your time already - but maybe this recent
> > > > revitalization of the conversation around having an explicit
value in
> > > > the linker might be sufficient to address everyone's
needs... *fingers
> > > > crossed*)
> > > >
> > > >
> > > > > >> 2) Optimize the DWARF size.
> > > > >
> > > > >
> > > > > > Do your users care much about this? I imagine if
they had significant DWARF size issues,
> > > > > > they'd have significant link time issues and
the kind of cost to link time this feature has would
> > > > > > be prohibitive - but perhaps they're sharing
linked binaries much more often than they're
> > > > > > actually performing linking.
> > > > >
> > > > > Yes, they do. They also have significant link-time
issues.
> > > > > So current performance results of D74169 are not very
acceptable.
> > > > > We hope to improve it.
> > > > >
> > > > >
> > > > >
> > > > > >>The specifics which our users have:
> > > > > >>  - embedded platform which uses 0 as start of
.text section.
> > > > > >>  - custom toolset which does not support all
features yet(f.e. split dwarf).
> > > > > >>  - tolerant of the link-time increase.
> > > > > >>  - need a useful way to share debug builds.
> > > > >
> > > > >
> > > > > > Sharing two files (executable and dwp) is
significantly less useful than sharing one file?
> > > > >
> > > > > Probably not significantly, but yes, it looks less
useful comparing to D74169.
> > > > > Having only two files (executable and .dwp) looks
significantly better than having executable and multiple .dwo files.
> > > > > Having only one file(executable) with minimal size
looks better than the two files with a bigger size.
> > > > >
> > > > > clang compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.
> > > > > clang compiled with -gc-debuginfo takes only 0.76G for
single executable.
> > > > >
> > > > >
> > > > >
> > > > > >>For the first point: we have a problem
"Overlapping address ranges starting from 0"(D59553).
> > > > >
> > > > > >>We use custom solution, but the general
solution like D74169 would be better here.
> > > > >
> > > > >
> > > > > > If CU ranges are the only ones that need fixing,
then I think the above solution might be as
> > > > > > good/better - if more than CU ranges need fixing,
then I think we might want to start talking about
> > > > > > how to fix DWARF itself (split and non-split) to
signal certain addresses point to dead code with a
> > > > > > specific blessed value that linkers would need to
implement - because with Split DWARF there's
> > > > > > no way to solve the non-CU addresses at the
linker.
> > > > >
> > > > > I think the worthful solution for that signal value
would be LowPC > HighPC.
> > > > > That does not require additional bits in DWARF.
> > > > > It would be natural to skip such address ranges since
they explicitly marked as invalid.
> > > > > It could be implemented in a linker very easily.
Probably, it would make sense to describe that
> > > > > usage in DWARF standard.
> > > > >
> > > > > As to the addresses which are not seen by the
linker(since they are in .dwo files) - yes,
> > > > > they need to have another solution. Could you show an
example of such a case, please?
> > > > >
> > > > >
> > > > >
> > > > > >>>2. Support of type units.
> > > > >
> > > > > >>>
> > > > >
> > > > > >>>>  That could be implemented further.
> > > > >
> > > > > >>>Enabling type units increases object size
to make it easier to deduplicate at link time by a DWARF-unaware
> > > > >
> > > > > >>>linker. With a DWARF aware linker it'd
be generally desirable not to have to add that object size overhead to
> > > > >
> > > > > >>>get the linking improvements.
> > > > >
> > > > > >>
> > > > >
> > > > > >>But, DWARFLinker should adequately work with
type units since they are already implemented.
> > > > >
> > > > >
> > > > > > Maybe - it'd be nice & all, but I
don't think it's an outright necessity - if someone knows they're
using
> > > > > > a DWARF-aware linker, they'd probably not use
type units in their object files. It's possible someone
> > > > > > doesn't know for sure & maybe they have
pre-canned debug object files from someone else, etc.
> > > > >
> > > > > I see.
> > > > >
> > > > > >>Another thing is that the idea behind type
units has the potential to help Dwarf-aware linker to work faster.
> > > > >
> > > > > >>Currently, DWARFLinker analyzes context to
understand whether types are the same or not.
> > > > >
> > > > >
> > > > > >When you say "analyzes context" what do
you mean? Usually I'd take that to mean
> > > > > > "looks at things outside the type itself -
like what namespace it's in, etc" - which, yes,
> > > > > > it should do that, but it doesn't seem very
expensive to do. But I guess you actually
> > > > > > mean something about doing structural equivalence
in some way, looking at things inside the type?
> > > > >
> > > > > I think it could be useful for both cases. Currently,
dsymutil does only first thing
> > > > > (look at type name, namespace name, etc..) and does not
do the second thing
> > > > > (doing structural equivalence). Analyzing type names is
currently quite expensive
> > > > > (the only search in string pool takes ~10 sec from 70
sec of overall time).
> > > > > That is expensive because of many things should be done
to work with strings:
> > > > > parse DWARF, search and resolve relocations, compute a
hash for strings,
> > > > > put data into a string pool, create a fully qualified
name(like namespace::function::name).
> > > > > It looks like it could be optimized and finally require
less time, but it still would be a noticeable
> > > > > part of the overall time.
> > > > >
> > > > > If dsymutil starts to check for the structural
equivalence, then the process would be even more slowly.
> > > > > So, If instead of comparing types structure, there
would be checked single hash-id - then this process
> > > > > would also be faster.
> > > > >
> > > > > Thus I think using hash-id to compare types would allow
to make current implementation faster and would
> > > > > allow handling incomplete types by DWARFLinker without
massive performance degradation also.
> > > > >
> > > > > >> But the context is known when types are
generated. So, no need to spent the time analyzing it.
> > > > >
> > > > > >> If types could be compared without analyzing
context, then Dwarf-aware linker would work faster.
> > > > >
> > > > > >> That is just an idea(not for immediate
implementation): If types would be stored in some "type table"
> > > > >
> > > > > >> (instead of COMDAT section group) and could be
accessed through hash-id(like type units
> > > > >
> > > > > >> - then it would be the solution requiring
fewer bits to store but allowing to compare types
> > > > >
> > > > > >> by hash-id(not analysing context).
> > > > > >> In this case, size increasing would be small.
And processing time could be done faster.
> > > > > >>
> > > > > >> this is just an idea and could be discussed
separately from the problem of integrating of D74169.
> > > > >
> > > > > >> >> 6. -flto=thin
> > > > >
> > > > > >> >>    That problem was described in this
review https://reviews.llvm.org/D54747#1503720. It also exists in
> > > > >
> > > > > >> >> current DWARFLinker/dsymutil
implementation. I think that problem should be discussed more: it could
> > > > >
> > > > > >> >> probably be fixed by avoiding
generation of such incomplete declaration during thinlto,
> > > > >
> > > > > >> >> That would be costly to produce
extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> > > > >
> > > > > >> >> more to reduce that redundancy early
on (actually removing definitions from some llvm Modules if the type
> > > > >
> > > > > >> >> definition is known to exist in
another Module, etc)
> > > > > >> >I don't know if it's a problem
since that patch was reverted.
> > > > >
> > > > > >>
> > > > >
> > > > > >> Yes. That patch was reverted, but this
patch(D74169) has the same problem.
> > > > >
> > > > > >> if D74169 would be applied and --gc-debuginfo
used then structure type
> > > > > >> definition would be removed.
> > > > >
> > > > > >> DWARFLinker could handle that case -
"removing definitions from some llvm Modules if the type
> > > > > >> definition is known to exist in another
Module".
> > > > > >> i.e. DWARFLinker could replace the declaration
with the definition.
> > > > >
> > > > > >> But that problem could be more easily resolved
when debug info is generated(probably without
> > > > > >> significant increase of debug info size):
> > > > >
> > > > > >> Here we have:
> > > > >
> > > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".
> > > > > >> DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function "f".
> > > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.
> > > > >
> > > > > >> Code for function "f" was deleted.
gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > > > > >> containing "f" definition (since
there is no corresponding code). But it has structure "Foo" definition
> > > > > >> DW_TAG_structure_type(0x0000011e) referenced
from DW_TAG_compile_unit(0x00000073)
> > > > > >> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
> > > > > >> was removed by thinlto and replaced with
declaration.
> > > > >
> > > > > >> Would it cost too much if type definition
would not be replaced with declaration for "abstract instance root"?
> > > > > >> The number of concrete instances is bigger
than number of abstract instance roots.
> > > > > >> Probably, it would not be too costly to leave
definition in abstract instance root?
> > > > >
> > > > >
> > > > >
> > > > > >> Alternatively, Would it cost too much if type
definition would not be replaced with declaration when
> > > > > >> declaration references type from not used
function? (lto could understand that concrete function is not used).
> > > > >
> > > > >
> > > > > >I don't follow this example - could you provide
a small concrete test case I could reproduce?
> > > > >
> > > > > I would provide a test case if necessary. But it looks
like this issue is finally clear, and you already commented on that.
> > > > >
> > > > > > Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone
> > > > > > definition of 'f' won't be needed - so
it produces one in case one of the inlining opportunities
> > > > > > doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition wasn't needed.
> > > > >
> > > > > > Oh, you're suggesting that these 3 CUs got
emitted into one object file during LTO, but that DWARFLinker
> > > > > > drops a CU without any code in it - even though...
So far as I know, in LTO, LLVM directly references
> > > > > > types across units if the CUs are all emitted in
the same object file. (and if they weren't in the same
> > > > > > object file - then the abstract_origin
couldn't be pointing cross-CU).
> > > > >
> > > > > > I guess some basic things to say:
> > > > >
> > > > > > With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites don't end up
> > > > > > being inlined. So we know it'll be emitted
(but might not be needed by the actual linker)
> > > > > > ANy number of inline calls might exist - but we
shouldn't put the type information into those, because
> > > > > > they aren't guaranteed to emit it (if the
inline function gets optimized away, there would be nothing to
> > > > > > enforce the type being emitted) - and even if we
forced the type information to be emitted into one
> > > > > > object file that has an inline copy of the
function - there's no guarantee that object file will get linked in either.
> > > > >
> > > > > > So, no, I don't think there's much we can
do to keep the size of object files down, while guaranteeing
> > > > > > the type information will be emitted with the
usual linker semantics.
> > > > >
> > > > > Then dsymutil/DWARFLinker could be changed to handle
that(though it would probably be not very efficient).
> > > > > If thinlto would understand that function is not used
finally(and then must not contain referenced type definition),
> > > > > then this situation could be handled more effectively.
> > > > >
> > > > > Thank you, Alexey.
> > > > >
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> LLVM Developers mailing list
> > > > >>> llvm-dev at lists.llvm.org
> > > > >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jun 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.