thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Jonas Devlieghere via llvm-dev

2020-Aug-06 17:39 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi Alexey,

I should've looked at this earlier. I went through the thread again and
I've
made some comments, mostly from the dsymutil point of view.
> Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> generation, which does not support DWARF5(only debug_names table). At the
> same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
> which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer
> should be rewritten using DwarfDebug/DwarfFile. Though I am not sure
> whether it would be easy to re-use DwarfDebug/DwarfFile. It would probably
> be necessary to separate some intermediate level of DwarfDebug/DwarfFile.
These classes serve very different purposes. Last time I looked at them
there
was very little overlap in functionality. In the compiler we're mostly
concerned with generating the DWARF, while in dsymutil we try to copy
everything we don't need to parse, and fix up what we have to. I don't
want
to say it's not possible, but I think supporting DWARF5 in those classes is
going to be a lot less work than trying to reuse the CodeGen variants.
> Measurements show that it is spent ~10 sec in
> llvm::StringMapImpl::LookupBucketFor(). The problem is that the same
> strings, again and again, are added to the string pool. Two attributes
> having the same string value would be analyzed (hash calculated) and
> searched inside the string pool. Even if these strings are already in
> string table(DW_FORM_strp, DW_FORM_strx). The process could be optimized
> for string tables. So that if some string from the string table were
> accessed previously then, it would keep a reference into the string pool.
> This would eliminate a lot of string pool searches.
I'm not sure I fully understand the optimization, but I'd love to speed
this
up, if only for dsymutil's sake. I'd love to talk about this in a
separate
thread or offline.
> Currently, all object files are analyzed sequentially and cloned
> sequentially. Cloning is started in parallel with analyzing. That scheme
> could be changed: analyzing and cloning could be done in parallel for each
> object file. That requires refactoring of DWARFLinker and making string
> pools and DeclContextTree thread-safe.
I'm less familiar with the way that LLD uses the DWARFOptimizer but this is
not possible for dsymutil as it is trying to deduplicate DIEs from different
compile units.
> I think improving dsymutil is a valuable thing. Though there are several
> directions which might be considered to make it more robust:
>
> 1. support of latest DWARF - DWARF5/DWARF64...
Strong +1 on DWARF5. I haven't had the bandwidth yet to really look at this.
Right now we can't find (at least some) rellocations so we bail out. I'd
need
to fix that to assess the current state of things and figure out how much
work would be needed.

I don't think anything in LLVM supports generating DWARF64 though.
> 2. implement multi-threaded execution.
See my earlier comment. At least for the dsymutil case, the current approach
is the best we can do, but I'd love to be proven wrong. :-)
> 3. support of split DWARF.
> 4. implement dsymutil for non-darwin platform.
These two seem to go together. Given the work you did to split off the DWARF
optimization part I think we're closer to this than ever. Thanks again for
doing that.
> We considered three options:
>
> 1. add new functionality into dsymutil. So that dsymutil behaves
> differently on a non-darwin platform and supports another set of
> command-line options.
>
> 2. add new functionality into llvm-objcopy. llvm-objcopy already supports
> various binary objects formats(MachO,ELF,COFF,wasm). It also has several
> options to work with debug-info.
>
> 3. create new utility llvm-dwarfutil which would implement the above
> functionality and reuse DWARFLinker(extracted from dsymutil) library and
> new library ObjectCopy(extracted from llvm-objcopy).
>
> So far our preference is number three. The reason for this is that
separate> utility specifically working with debug info looks as good separation of
> concepts. Adding another behavior to dsymutil looks not very good.
In its current state dsymutil itself is a pretty small tool on top of the
DWARFOptimizer/Linker. I'm curious what the benefits of another tool are
compared to a different frontend (like objcopy) for MachO and ELF. It seems
like that would allow for separation of concerns, while still being able to
share common code without having to push it all the way up into LLVM.
> Extending the already rich interface of llvm-objcopy looks also not very
> good. Having in mind that actual implementation would be shared by
> libraries, the separate utility, working specifically with debug info,
> looks like the right choice. That is our current idea.
> My personal thought would be that extending dsymutil should be ok as the
> functionality goes well with everything else dsymutil does (other than not
> support ELF which the dsymutil maintainers are on board with last I
> checked). That said, I definitely think a write-up will be helpful. No
> matter what I support extracting all of the behavior into libraries and
> using that somewhere :)
Ha, so basically what I was trying to say above.

I look forward to seeing the proposal!

Cheers,
Jonas


On Tue, Aug 4, 2020 at 11:33 PM Eric Christopher <echristo at gmail.com>
wrote:
> Hi Alexey,
>
>
>
> On Mon, Aug 3, 2020 at 8:32 AM Alexey Lapshin <avl.lapshin at
gmail.com>
> wrote:
>
>> Hi Eric, please
>> On 31.07.2020 22:02, Eric Christopher wrote:
>>
>> Hi Alexey,
>>
>> On Fri, Jul 31, 2020 at 4:02 AM Alexey Lapshin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> On 28.07.2020 19:28, David Blaikie wrote:
>>> > On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin <avl.lapshin
at gmail.com>
>>> wrote:
>>> >>
>>> >> On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:
>>> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>> >>> <alapshin at accesssoftek.com> wrote:
>>> >>>>>>>>>>>> This idea goes in
another direction than fragmenting dwarf
>>> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of
>>> fragmenting is too high.
>>> >>>>>>>>>>> I tend to agree - but
I'm sort of leaning towards trying to
>>> use object
>>> >>>>>>>>>>> features as much as
possible, then implementing just enough
>>> custom
>>> >>>>>>>>>>> handling in the linker
to recoup overhead, etc. (eg: add
>>> some kind of
>>> >>>>>>>>>>> small header/brief
description that makes it easy for the
>>> linker to
>>> >>>>>>>>>>> slice-and-dice - but
hopefully a domain-specific such header
>>> can be a
>>> >>>>>>>>>>> bit more compact than
the fully general ELF form)
>>> >>>>>>>>>> I think this indeed should
be implemented and evaluated.
>>> >>>>>>>>>> So that various approaches
could be compared.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> It is not only the
sizes of structures describing fragments
>>> but also the complexity
>>> >>>>>>>>>>>> of tools that
should be taught to work with fragmented
>>> DWARF.
>>> >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file should be able
>>> to read fragmented DWARF,
>>> >>>>>>>>>>>> but applied to
linked executable it should work with
>>> non-fragmented DWARF).
>>> >>>>>>>>>>>> That idea is for
the tool which works the same way as
>>> dsymutil ODR.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be easier
>>> processed by dsymutil/DWARFLinker:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> The idea is to
have only one "type table" per object
>>> file(special section .debug_types_table).
>>> >>>>>>>>>>>> This "type
table" would contain all types.
>>> >>>>>>>>>>>> There could be a
special type of reference - type_offset -
>>> that offset points into the type table.
>>> >>>>>>>>>>>> Basic types could
always be placed into the start of "type
>>> table" thus, offsets to basic types
>>> >>>>>>>>>>>> most often would
be 1 byte. There also would be a special
>>> kind of reference - reference inside the type.
>>> >>>>>>>>>>>> Type units sig8
system - would not be used to reference
>>> types.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Types
deduplication is assumed to be done, not by linker
>>> mechanism for COMDAT,
>>> >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create
>>> resulting .debug_types_table by putting there
>>> >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of
>>> the type would be placed into the
>>> >>>>>>>>>>>> resulting table.
All references pointing to the deleted
>>> copy would be corrected to point
>>> >>>>>>>>>>>> to the single copy
inside "type table". (that is how
>>> dsymutil works currently)
>>> >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for a
>>> general-use
>>> >>>>>>>>>>> tool - it implies
parsing all the DWARF to find those
>>> references and
>>> >>>>>>>>>>> rewrite them, I think.
For a high-performance solution that
>>> could be
>>> >>>>>>>>>>> run by the linker I
think it'd be necessary to have a
>>> solution that
>>> >>>>>>>>>>> doesn't involve
parsing all the DIEs.
>>> >>>>>>>>>> According to the current
dsymutil processing,
>>> >>>>>>>>>> exactly this process is
not the most time-consuming.
>>> >>>>>>>>>> That could be done
relatively fast.
>>> >>>>>>>>> Fair enough - though I'd
still imagine any solution that
>>> involves
>>> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an
>>> order of
>>> >>>>>>>>> magnitude faster than the
current solution even - but that's
>>> stuill,
>>> >>>>>>>>> what, 6 or 7x slower than
linking without the feature?) for
>>> most users
>>> >>>>>>>>> to consider it a good
trade-off.
>>> >>>>>>>> It seems to me that even the
current 6x-7x slowdown could be
>>> useful.
>>> >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
>>> >>>>>>>> would be taught to work with a
split dwarf) tools spend this
>>> time and,
>>> >>>>>>>> in some scenarios, waste disk
space by inter-mediate files.
>>> >>>>>>> FWIW, dwp (llvm-dwp hasn't really
been optimized compared to
>>> binutils
>>> >>>>>>> dwp) is designed to be very quick - by
not needing to do a lot of
>>> >>>>>>> parsing/fixups. Which, yes, means
larger output files than would
>>> be
>>> >>>>>>> possible with more parsing/etc. It
also doesn't take any input
>>> from
>>> >>>>>>> the linker (so it can run in parallel
with the linker) - so it
>>> can't
>>> >>>>>>> remove dead subprograms. Given
Google's the major (perhaps only
>>> >>>>>>> significant?) user of Split DWARF - I
can say that the needs
>>> don't
>>> >>>>>>> necessarily overlap well with
something that would take
>>> significantly
>>> >>>>>>> longer to run or use significantly
more memory.
>>> Faster/cheaper/with
>>> >>>>>>> somewhat bigger output files is
probably the right tradeoff for
>>> >>>>>>> Google's use case, at least.
>>> >>>>>>>
>>> >>>>>>> I imagine Apple's use for dsymutil
is somewhat similar - it's
>>> not used
>>> >>>>>>> in the iterative development cycle,
only in final releases -
>>> well,
>>> >>>>>>> maybe their situation is more
"neutral" - not a major pain point
>>> in
>>> >>>>>>> any case I'd guess.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from lld:
>>> >>>>>>
>>> >>>>>> 1. split-dwarf+llvm-dwp = linking time for
clang 6 sec,
>>> >>>>>>       generating time for .dwp 53 sec,
clang=997M clang.dwp=1.1G.
>>> >>>>> FWIW, llvm-dwp is not very well optimized
(which is to say: it is
>>> not
>>> >>>>> optimized), binutils dwp might be a better
comparison (& even that
>>> >>>>> doesn't have the parallelism & some
potential further memory
>>> savings
>>> >>>>> that lld has that we could take advantage of
in a dwp-like tool)
>>> >>>>>
>>> >>>>> What build mode was the clang binary built in?
Optimized or
>>> unoptimized?
>>> >>>> right, that is unoptimized build with
-ffunction-sections.
>>> >>>>
>>> >>>>>> 2. DWARFLinker from lld = linking time for
clang 72 sec,
>>> clang=760M.
>>> >>> And this is without Split DWARF? Without linker DWARF
compression? -
>>> >>> that seems quite a bit surprising, that the
deduplication of DWARF
>>> >>> could fit into less space than the wasted/reclaimed
space in ranges
>>> (&
>>> >>> line)?
>>> >> that was without split dwarf, without linker compression.
>>> >>
>>> >>> Could you double check these numbers & provide a
clearer summary?
>>> >> sure, I would re-check it.
>>> >>
>>> >>> Here's my attempt at numbers (all with
>>> function-sections+gc-sections)...
>>> >>>
>>> >>> Split DWARF tests didn't seem meaningful -
gc-debuginfo + split DWARF
>>> >>> seemed to drop all the debug info (except gdb_index)
so wasn't
>>> >>> working/comparison wasn't meaningful for Apples to
Apples, but
>>> >>> included it for comparing gc'd non-split to
non-gc'd split (disabled
>>> >>> gnu-pubnames/gdb-index (-gsplit-dwarf
-gno-gnu-pubnames) (which turns
>>> >>> on by default with Split DWARF because gdb needs it -
but a bit of an
>>> >>> unfair comparison without turning on
gnu-pubnames/gdb-index in other
>>> >>> build modes too, since it... /shouldn't/ be
necessary) which might've
>>> >>> been a factor in the data you were looking at)
>>> >> that might be the case. i.e. clang=997M for split
dwarf(from my
>>> previous
>>> >> measurement) might include gnu-pubnames.
>>> >>
>>> >> would recheck it and if that is the case then it is a
unfair
>>> comparison.
>>> >>
>>> >>
>>> >> My point was that "DWARFLinker from lld" takes
less space than
>>> singleton
>>> >> split dwarf file+.dwp file.
>>> >>
>>> >> for -O0 uncompressed:
>>> >>
>>> >> - .dwp took 1.1G(if I built it correctly), singleton
clang(from your
>>> >> measurements) 566 MB
>>> >>
>>> >>      overall 1.6G.
>>> > Oh, yeah, even if there are some measurement issues, linked
executable
>>> > + .dwp is going to be larger than a linked executable using
non-split
>>> > DWARF (in v5), since v5 uses all the same representations as
non-split
>>> > DWARF, and split DWARF adds the indirection overhead of a
split file,
>>> > etc.
>>> >
>>> > Even without DWARF linking, it's true that split DWARF has
overhead
>>> > (dwp+executable will be larger than executable non-split).
>>> >
>>> > But maybe we've ended up down a bit of a tangent in any
case.
>>> >
>>> > Trying to bring this back to "should this be committed to
lld" seems
>>> > valuable, and I'm not sure what the right criteria are for
that.
>>> I think it would be useful to do "removing obsolete debug
info"
>>> in the linker. First thing is that it would be the fastest way(no
need
>>> to copy data/create temp files/built address map...) Second thing
>>> is that it would be a good separation of concepts. All debug info
>>> processing, currently done in the linker(gdb_index, upcoming
>>> debug_names), could be moved into separate library processing
>>> debug info. When gdb_index/debug_names should be built without
>>> "removing of obsolete debug info" it would have the same
>>> performance results as it currently has.
>>>
>>> We decided to give the idea of "removing of obsolete debug
info"
>>> another try and are going to implement it as a separate utility
>>> working with built binary. Making it to be multi-thread would
>>> probably show better performance results and then it could
>>> probably be considered as acceptable to use from the linker.
>>>
>>>
>> I'm quite interested in this direction. One thought I had was to
>> incorporate such a library into dsymutil but with support for ELF. If
you
>> get a proposal written up I'd love to take a look and comment.
>>
>>
>> yes, I would share the proposal in a separate thread within a week or
two.
>>
>>
> Excellent, thanks :)
>
>
>> Shortly: we decided to move in slightly other direction than adding
this
>> functionality
>> into dsymutil. Though if there is a preference to implement it as part
of
>> dsymutil
>> we are OK to do this way.
>>
>>
> I have a vague preference since a lot of functionality already exists
> there on one platform and extending that seems straight forward, however...
>
>
>> In its first version, this new utility supposed to receive built binary
>> with debug info
>> as input(with the new marking for references to removed code sections
>> -1/-2
>> -https://reviews.llvm.org/D84825) and create a new binary with removed
>> obsolete
>> debug info according to the above marking. In the next versions, it
could
>> be extended
>> with other debug info optimizations tasks. F.e. generation new index
>> tables, debug info
>> optimizing... etc...
>>
>> We considered three options:
>>
>> 1. add new functionality into dsymutil. So that dsymutil behaves
>> differently
>>     on a non-darwin platform and supports another set of command-line
>> options.
>>
>> 2. add new functionality into llvm-objcopy. llvm-objcopy already
supports
>> various
>>      binary objects formats(MachO,ELF,COFF,wasm). It also has several
>> options
>>      to work with debug-info.
>>
>> 3. create new utility llvm-dwarfutil which would implement the above
>> functionality
>>      and reuse DWARFLinker(extracted from dsymutil) library and new
>> library
>>      ObjectCopy(extracted from llvm-objcopy).
>>
>> So far our preference is number three. The reason for this is that
>> separate
>> utility specifically working with debug info looks as good separation
of
>> concepts.
>> Adding another behavior to dsymutil looks not very good. Extending the
>> already
>> rich interface of llvm-objcopy looks also not very good. Having in mind
>> that actual
>> implementation would be shared by libraries, the separate utility,
>> working specifically
>> with debug info, looks like the right choice. That is our current idea.
>>
>> I would publish the proposal shortly to discuss it.
>>
>>
> These are solid arguments - in particular, I agree with not extending
> llvm-objcopy :)
>
> +Jonas Devlieghere <jonas at devlieghere.com> and +Adrian Prantl
> <aprantl at apple.com> for dsymutil comments.
>
> My personal thought would be that extending dsymutil should be ok as the
> functionality goes well with everything else dsymutil does (other than not
> support ELF which the dsymutil maintainers are on board with last I
> checked). That said, I definitely think a write-up will be helpful. No
> matter what I support extracting all of the behavior into libraries and
> using that somewhere :)
>
> Thanks!
>
> -eric
>
>
>> Thank you, Alexey.
>>
>> Thanks!
>>
>> -eric
>>
>>
>>> Alexey.
>>>
>>> >
>>> > Ray's the best person to weigh in on that. My 2c is that I
think it
>>> > probably is worthwhile, even just as an experiment, assuming
it's not
>>> > too intrusive to lld.
>>> >
>>> >> - The "DWARFLinker from lld" 820 MB(from your
measurements).
>>> >>
>>> >>
>>> >> So "DWARFLinker from lld" looks two times
better.
>>> >>
>>> >>
>>> >> Anyway, thank you for pointing me to possible mistake. I
would recheck
>>> >> it and update results.
>>> >>
>>> >>
>>> >> Alexey.
>>> >>
>>> >>
>>> >>> * -O0: (baseline, just using strip -g: 356 MB)
>>> >>>     * compressed: 25% smaller with gc-debuginfo (481
MB / 641 MB)
>>> (407
>>> >>> MB split/non-gc)
>>> >>>     * uncompressed: 30% smaller (820 MB / 1.2 GB) (566
MB
>>> split/non-gc)
>>> >>> * -O3: (baseline: 116 MB)
>>> >>>     * compressed: 16% smaller (361 MB / 462 MB) (283
MB split/non-gc)
>>> >>>     * uncompressed: 22% smaller (1022 MB / 1.2 GB)
(156 MB
>>> split/non-gc)
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>> >>> <alapshin at accesssoftek.com> wrote:
>>> >>>>>>>>>>>> This idea goes in
another direction than fragmenting dwarf
>>> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of
>>> fragmenting is too high.
>>> >>>>>>>>>>> I tend to agree - but
I'm sort of leaning towards trying to
>>> use object
>>> >>>>>>>>>>> features as much as
possible, then implementing just enough
>>> custom
>>> >>>>>>>>>>> handling in the linker
to recoup overhead, etc. (eg: add
>>> some kind of
>>> >>>>>>>>>>> small header/brief
description that makes it easy for the
>>> linker to
>>> >>>>>>>>>>> slice-and-dice - but
hopefully a domain-specific such header
>>> can be a
>>> >>>>>>>>>>> bit more compact than
the fully general ELF form)
>>> >>>>>>>>>> I think this indeed should
be implemented and evaluated.
>>> >>>>>>>>>> So that various approaches
could be compared.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> It is not only the
sizes of structures describing fragments
>>> but also the complexity
>>> >>>>>>>>>>>> of tools that
should be taught to work with fragmented
>>> DWARF.
>>> >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file should be able
>>> to read fragmented DWARF,
>>> >>>>>>>>>>>> but applied to
linked executable it should work with
>>> non-fragmented DWARF).
>>> >>>>>>>>>>>> That idea is for
the tool which works the same way as
>>> dsymutil ODR.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be easier
>>> processed by dsymutil/DWARFLinker:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> The idea is to
have only one "type table" per object
>>> file(special section .debug_types_table).
>>> >>>>>>>>>>>> This "type
table" would contain all types.
>>> >>>>>>>>>>>> There could be a
special type of reference - type_offset -
>>> that offset points into the type table.
>>> >>>>>>>>>>>> Basic types could
always be placed into the start of "type
>>> table" thus, offsets to basic types
>>> >>>>>>>>>>>> most often would
be 1 byte. There also would be a special
>>> kind of reference - reference inside the type.
>>> >>>>>>>>>>>> Type units sig8
system - would not be used to reference
>>> types.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Types
deduplication is assumed to be done, not by linker
>>> mechanism for COMDAT,
>>> >>>>>>>>>>>> but by a tool like
dsymutil. This tool would create
>>> resulting .debug_types_table by putting there
>>> >>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of
>>> the type would be placed into the
>>> >>>>>>>>>>>> resulting table.
All references pointing to the deleted
>>> copy would be corrected to point
>>> >>>>>>>>>>>> to the single copy
inside "type table". (that is how
>>> dsymutil works currently)
>>> >>>>>>>>>>> ^ that's the step
that's probably a bit expensive for a
>>> general-use
>>> >>>>>>>>>>> tool - it implies
parsing all the DWARF to find those
>>> references and
>>> >>>>>>>>>>> rewrite them, I think.
For a high-performance solution that
>>> could be
>>> >>>>>>>>>>> run by the linker I
think it'd be necessary to have a
>>> solution that
>>> >>>>>>>>>>> doesn't involve
parsing all the DIEs.
>>> >>>>>>>>>> According to the current
dsymutil processing,
>>> >>>>>>>>>> exactly this process is
not the most time-consuming.
>>> >>>>>>>>>> That could be done
relatively fast.
>>> >>>>>>>>> Fair enough - though I'd
still imagine any solution that
>>> involves
>>> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an
>>> order of
>>> >>>>>>>>> magnitude faster than the
current solution even - but that's
>>> stuill,
>>> >>>>>>>>> what, 6 or 7x slower than
linking without the feature?) for
>>> most users
>>> >>>>>>>>> to consider it a good
trade-off.
>>> >>>>>>>> It seems to me that even the
current 6x-7x slowdown could be
>>> useful.
>>> >>>>>>>> Users who already use dsymutil or
llvm-dwp(assuming DWARFLinker
>>> >>>>>>>> would be taught to work with a
split dwarf) tools spend this
>>> time and,
>>> >>>>>>>> in some scenarios, waste disk
space by inter-mediate files.
>>> >>>>>>> FWIW, dwp (llvm-dwp hasn't really
been optimized compared to
>>> binutils
>>> >>>>>>> dwp) is designed to be very quick - by
not needing to do a lot of
>>> >>>>>>> parsing/fixups. Which, yes, means
larger output files than would
>>> be
>>> >>>>>>> possible with more parsing/etc. It
also doesn't take any input
>>> from
>>> >>>>>>> the linker (so it can run in parallel
with the linker) - so it
>>> can't
>>> >>>>>>> remove dead subprograms. Given
Google's the major (perhaps only
>>> >>>>>>> significant?) user of Split DWARF - I
can say that the needs
>>> don't
>>> >>>>>>> necessarily overlap well with
something that would take
>>> significantly
>>> >>>>>>> longer to run or use significantly
more memory.
>>> Faster/cheaper/with
>>> >>>>>>> somewhat bigger output files is
probably the right tradeoff for
>>> >>>>>>> Google's use case, at least.
>>> >>>>>>>
>>> >>>>>>> I imagine Apple's use for dsymutil
is somewhat similar - it's
>>> not used
>>> >>>>>>> in the iterative development cycle,
only in final releases -
>>> well,
>>> >>>>>>> maybe their situation is more
"neutral" - not a major pain point
>>> in
>>> >>>>>>> any case I'd guess.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>> I see. FWIW, Comparison splitdwarf+dwp and
DWARFLinker from lld:
>>> >>>>>>
>>> >>>>>> 1. split-dwarf+llvm-dwp = linking time for
clang 6 sec,
>>> >>>>>>       generating time for .dwp 53 sec,
clang=997M clang.dwp=1.1G.
>>> >>>>> FWIW, llvm-dwp is not very well optimized
(which is to say: it is
>>> not
>>> >>>>> optimized), binutils dwp might be a better
comparison (& even that
>>> >>>>> doesn't have the parallelism & some
potential further memory
>>> savings
>>> >>>>> that lld has that we could take advantage of
in a dwp-like tool)
>>> >>>>>
>>> >>>>> What build mode was the clang binary built in?
Optimized or
>>> unoptimized?
>>> >>>> right, that is unoptimized build with
-ffunction-sections.
>>> >>>>
>>> >>>>>> 2. DWARFLinker from lld = linking time for
clang 72 sec,
>>> clang=760M.
>>> >>>>> It does seem a tad strange that the clang
binary would be smaller
>>> >>>>> non-split with DWARF linking than it was
split. Though I could
>>> imagine
>>> >>>>> this might be possible in an optimized build
(wehre debug_ranges
>>> >>>>> become quite relatively expensive in the .o
file contribution with
>>> >>>>> Split DWARF)
>>> >>>>> Could you compare the section sizes between
these two clang
>>> binaries, perhaps?
>>> >>>> .debug_ranges is three times bigger and
.debug_line is twice bigger.
>>> >>>>
>>> >>>>>>>> Thus if they would use this LLD
feature in its current state
>>> >>>>>>>> - they would still receive
benefits.
>>> >>>>>>>>
>>> >>>>>>>> Speaking of performance results -
LLD is a multi-thread linker;
>>> >>>>>>>> it handles sections in parallel.
DWARFLinker generates DWARF
>>> using
>>> >>>>>>>> AsmPrinter which is a stream - so
it could make resulting DWARF
>>> only
>>> >>>>>>>> continuously. It is not surprising
that the parallel solution
>>> works faster.
>>> >>>>>>>> Making DWARFLinker truly
multi-threaded would probably allow us
>>> >>>>>>>> to make slowdown to be at 2x-4x
range.
>>> >>>>>>> *nod* that's still a really
expensive link - but I understand
>>> that's a
>>> >>>>>>> suitable tradeoff for your users
>>> >>>>>>>
>>> >>>>>> Btw, 2x or 7x is for pure linking time.
Overall compilation
>>> slowdown
>>> >>>>>> is not so significant. Building LLVM
codebase has only 20%
>>> slowdown.
>>> >>>>> Understood - that's still quite
significant to most users, I'd
>>> imagine.
>>> >>>> I see.
>>> >>>>
>>> >>>>>>>>>> Anyway, I think the
dsymutil approach is still valuable, and
>>> it
>>> >>>>>>>>>> would be useful to
optimize it.
>>> >>>>>>>>>> Do you think it would be
useful to make dsymutil/DWARFLinker
>>> truly multi-thread?
>>> >>>>>>>>>> (To make
dsymutil/DWARFLinker able to process each object
>>> file in a separate thread)
>>> >>>>>>>>> Perhaps - that I'd
probably leave up to the folks who are more
>>> >>>>>>>>> invested in dsymutil (Adrian
Prantl et al). Maybe one day
>>> we'll get it
>>> >>>>>>>>> integrated into llvm-dwp and
then I'll be interested in
>>> getting as
>>> >>>>>>>>> much performance out of it as
lld - so multithreading and
>>> things would
>>> >>>>>>>>> be on the books.
>>> >>>>>>>> I think improving dsymutil is a
valuable thing.
>>> >>>>>>>> Though there are several
directions which might be considered
>>> >>>>>>>> to make it more robust:
>>> >>>>>>>>
>>> >>>>>>>> 1. support of latest DWARF -
DWARF5/DWARF64...
>>> >>>>>>> I expect/though some of the Apple
folks had already worked on
>>> DWARF5 support?
>>> >>>>>>> DWARF64 - that's been around for a
while, and just hasn't been
>>> needed
>>> >>>>>>> by LLVM users thus far, it seems
(until recently - where some
>>> >>>>>>> developers have started working on
that)
>>> >>>>>> There already implemented debug_names
table, but debug_rnglists,
>>> >>>>>> debug_loclists, type units - are not
implemented yet.
>>> >>>>> Superficially, type units wouldn't be on
the list of features (like
>>> >>>>> DWARF64 - it's optional) I'd try to
support in dsymutil - since
>>> their
>>> >>>>> size overhead is more justified for a
DWARF-agnostic linker that's
>>> >>>>> using comdat groups. With a DWARF-aware linker
I'd be specifically
>>> >>>>> hoping to avoid using type units to help
>>> >>>>>> The thing which
>>> >>>>>> should probably be changed is that
dsymutil should not have its
>>> version
>>> >>>>>> of code generating DWARF tables. It should
call already existed
>>> >>>>>> DWARF5/DWARF64 implementations. Then
dsymutil would always
>>> >>>>>> use last DWARF generators.
>>> >>>>> Possibly - I don't know what the
architectural tradeoffs for that
>>> look
>>> >>>>> like - I'd imagine DWARFLinker has
sufficiently different
>>> >>>>> needs/tradeoffs than LLVM's DWARF
generation code (rewriting
>>> existing
>>> >>>>> DIEs compared to building new ones from
scratch, etc) that it
>>> might be
>>> >>>>> hard for them to share a lot of their
implementation.
>>> >>>> It is not easy, and would require some additions,
but it would
>>> benefit
>>> >>>> in that all format implementation is in one place.
Thus changing
>>> that place
>>> >>>> would reflect in other places. There are at least
three
>>> implementations for
>>> >>>> .debug_ranges, .debug_aranges currently...
>>> >>>>
>>> >>>>
>>> >>>>>>>> 2. implement multi-threaded
execution.
>>> >>>>>>>> 3. support of split DWARF.
>>> >>>>>>> Maybe, though I'm still not sure
it'd be the right tradeoff -
>>> >>>>>>> especially if it involved having to
wait to run the .dwo merger
>>> (call
>>> >>>>>>> it DWARF-aware dwp, or dsymutil with
dwp support) until after the
>>> >>>>>>> linker ran.
>>> >>>>>>>
>>> >>>>>>>> 4. implement dsymutil for
non-darwin platform.
>>> >>>>>>> That's probably, essentially (3),
more-or-less. Split DWARF is
>>> >>>>>>> somewhat of a formalization of
Apple's/MachO DWARF distribution
>>> model
>>> >>>>>>> (leave DWARF it in files that
aren't linked/use them from a
>>> debugger,
>>> >>>>>>> but also be able to merge them into
some final file (dsym or
>>> dwp) for
>>> >>>>>>> archival purposes)
>>> >>>>>>>
>>> >>>>>>>> All of this is a massive piece of
work.
>>> >>>>>>>> Our original investment was to
solve two problems:
>>> >>>>>>>>
>>> >>>>>>>> 1. Overlapped address ranges,
which is currently close to being
>>> solved. Thank you for helping with that!
>>> >>>>>>> Yeah, again, sorry that's taken
quite so long/somewhat
>>> circuitous route.
>>> >>>>>>>
>>> >>>>>>>> 2. Size of debug info. That still
becomes an issue, but we are
>>> unsure whether we are ready to
>>> >>>>>>>>      invest in solving all the
above 1-4 problems and how much
>>> community interested in it.
>>> >>>>>>> Fair, for sure - I don't think
you'd need to sign up to solve
>>> all of
>>> >>>>>>> them (don't think they necessarily
need solving). Potentially
>>> moving
>>> >>>>>>> the logic out into a separate tool as
Fangrui's considering - a
>>> >>>>>>> post-link DWARF optimizer, rather than
in-linker DWARF
>>> optimization.
>>> >>>>>>>
>>> >>>>>>> I really don't want to give you
the runaround like this - but
>>> multiple
>>> >>>>>>> times slower links is something that
seems pretty problematic
>>> for most
>>> >>>>>>> users, to the point of weighing the
maintainability of lld
>>> against the
>>> >>>>>>> convenience of having this
functionality in-linker rather than
>>> in a
>>> >>>>>>> post-link optimizer.
>>> >>>>>>>
>>> >>>>>>> (I know you've spoken a bit before
about your users needs - but
>>> if
>>> >>>>>>> it's possible, could you explain
(again :/) why they have such a
>>> >>>>>>> strong need for smaller DWARF? While
DWARF size is an ongoing
>>> concern
>>> >>>>>>> for many users (Google certainly -
hence the invention of Split
>>> DWARF,
>>> >>>>>>> use of type units and compressed
DWARF, etc) - usually it's in
>>> rather
>>> >>>>>>> large programs, but it sounds like
you're dealing with relatively
>>> >>>>>>> small ones (otherwise the increase in
link time, I'd imagine,
>>> would be
>>> >>>>>>> prohibitive for your users?)?
>>> >>>>>> We have many large programs and keep
Dayly/Nightly debug builds,
>>> >>>>>> which takes a lot of disk space.
Compilation time for these
>>> programs is big.
>>> >>>>>> The scenario is "compile
once".(not compile-debug-compile-debug).
>>> >>>>>> So we think that solution(like
dsymutil/DWARFLinker) would not
>>> slowdown
>>> >>>>>> the compilation time of overall build
significantly(see above
>>> numbers for
>>> >>>>>> llvm codebase) and would allow us to
reduce disk space required
>>> to keep
>>> >>>>>> all of these builds.
>>> >>>>> Ah, OK - for archival purposes. So the
interactive developers
>>> wouldn't
>>> >>>>> necessarily be using this feature. Makes sense
- similar to
>>> dsymutil
>>> >>>>> and dwp, mostly used for archival purposes
& you can debug straight
>>> >>>> >from .o/.dwos for interactive/iterative
development.
>>> >>>>
>>> >>>>> In that case, it seems more likely that a
separate tool might
>>> suffice.
>>> >>>> agreed: if to continue the work on this then it
makes sense to
>>> >>>> do it as separate tool. Make it fast enough. And
if there would be
>>> interest
>>> >>>> in it - then it would probably be possible to
return to idea
>>> calling it from linker.
>>> >>>>
>>> >>>>> Also, out of curiosity - have you tried just
compressing the output
>>> >>>>> (-gz (I think that does the right thing for
the linker level
>>> >>>>> compression too, otherwise
-Wl,-compress-debug-sections might do
>>> it))
>>> >>>>> or are you already doing that in addition?
>>> >>>> sure. we use  -Wl,-compress-debug-sections.
>>> >>>>
>>> >>>> Thank you, Alexey.
>>> >>>>
>>> >>>>>>> You mentioned that the usability cost
of
>>> >>>>>>> Split DWARF for your users was too
high (or high enough to
>>> justify
>>> >>>>>>> this alternative work of DWARF-aware
linking)? That all seems a
>>> bit
>>> >>>>>>> surprising to me - though I understand
the deployment issues of
>>> Split
>>> >>>>>>> DWARF do present some challenges to
users in more heterogenous
>>> >>>>>>> environments than Google's...
still, I'd have thought there was
>>> some
>>> >>>>>>> hope there)
>>> >>>>>> Our tools does not support split dwarf
yet. Though we plan to
>>> implement it.
>>> >>>>>> When we would have support of split dwarf
then it would be
>>> >>>>>> convenient to have easy way to share built
debug binaries.
>>> llvm-dwp is the
>>> >>>>>> answer to this. DWARFLinker could probably
be another answer.
>>> >>>>> Ah, fair enough - thanks for the context!
>>> >>>>>>>>> One way to do that would be to
have a CU-local type
>>> indirection table.
>>> >>>>>>>>> DIEs reference local type
numbers (like local address/string
>>> numbers -
>>> >>>>>>>>> addrx/strx/rnglistx) and that
table contains either sig8 (no
>>> linker
>>> >>>>>>>>> fixups required) or the local
type offsets you describe - the
>>> linker
>>> >>>>>>>>> would then only need to read
this type number indirection
>>> table and
>>> >>>>>>>>> rewrite them to the final type
numbers.
>>> >>>>>>>> Yes, that could be additionally
done if this process would be
>>> time-consuming.
>>> >>>>>>>>
>>> >>>>>>>> David, thank you for all your
comments and explanations. They
>>> are extremely helpful.
>>> >>>>>>> Sure thing - really appreciate your
patience with all this -
>>> it's... a
>>> >>>>>>> lot of moving parts.
>>> >>>>>>> - Dave
>>> >>>>>>> Thank you, Alexey.
>>> >>>>>>>
>>> >>>>>>>> sig8 hash-id would be used to
compare types and to deduplicate
>>> them.
>>> >>>>>>>> It would speed up the current
dsymutil context analysis.
>>> >>>>>>>> Types having the same hash-id
could be deduplicated.
>>> >>>>>>>> This would allow deduplicating a
more number of types than
>>> current dsymutil.
>>> >>>>>>>> Incomplete type definitions having
a similar set of members are
>>> not deduplicated by dsymutil currently.
>>> >>>>>>>> In this case they would have the
same hash-id.
>>> >>>>>>>>
>>> >>>>>>>> This "type table" would
take less space than current "type
>>> units" and current ODR solution.
>>> >>>>>>>>
>>> >>>>>>>> Above is just an idea on how to
help DWARF-aware linker(based
>>> on idea removing obsolete debug info)
>>> >>>>>>>> to work faster(if that is
interesting).
>>> >>>>>>>>
>>> >>>>>>>> Alexey.
>>> >>>>>>>>
>>> >>>>>>>>> From: llvm-dev
<llvm-dev-bounces at lists.llvm.org> On Behalf Of
>>> James Henderson via llvm-dev
>>> >>>>>>>>> Sent: Wednesday, June 3, 2020
3:48 AM
>>> >>>>>>>>> To: David Blaikie <dblaikie
at gmail.com>
>>> >>>>>>>>> Cc: llvm-dev at lists.llvm.org
>>> >>>>>>>>> Subject: Re: [llvm-dev]
[Debuginfo][DWARF][LLD] Remove
>>> obsolete debug info in lld.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> It makes me sad that the
linker (via a library or otherwise)
>>> has to be "DWARF-aware" to be able to effectively handle
--gc-sections,
>>> COMDATs, --icf etc for debug info, without leaving large blocks of
data
>>> kicking around.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> The patching to -1 (or
equivalent) is probably a good
>>> lightweight solution (though I'd love it if it could be done
based on
>>> section type in the future rather than section name, but that's
probably
>>> outside the realm of DWARF), as it requires only minimal
understanding in
>>> the linker, but anything beyond that seems to be complicated logic
that is
>>> mostly due to the structure of DWARF. Patching to -1 does feel a
bit like a
>>> sticking plaster/band aid to patch over the issue rather than
properly
>>> solving it too - there will still be debug data (potentially
significant
>>> amounts in COMDAT-heavy objects) that the linker has to write and
the
>>> debugger has to somehow know how to skip (even if it knows that -1
is
>>> special-case due to the standard being updated, it needs to get as
far as
>>> the -1), which is all wasted effort.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> We've already seen from
Alexey's prototyping, and from our own
>>> experiences with the Sony proprietary linker (which tried to
rewrite
>>> .debug_line only) that deconstructing the DWARF so that it can be
more
>>> optimally reassembled at link time is slow going, and will probably
>>> inevitably be however much effort is put into optimising it. For a
start,
>>> given the current standards, it's impossible to know how to
deconstruct it
>>> without having to parse vast amounts of DWARF, which is typically
going to
>>> mean a lot more parsing work than the linker would normally have to
deal
>>> with. Additionally, much of this parsing work is wasted effort,
since it
>>> seems unlikely in many links that large amounts of the DWARF will
be
>>> redundant. Having an option to opt-in doesn't help much there,
since it
>>> just means the logic exists without most people using it, due to it
not
>>> being good enough, or potentially they don't even know it
exists.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> I don't have particularly
concrete suggestions as to how to
>>> solve the structural problems with DWARF at this point. The only
thing that
>>> seems obvious to me is a more "blessed" approach to
fragmentation of
>>> sections, similar to what I tried with my prototype mentioned
earlier in
>>> the thread, although we'd need to figure out the previously
stated
>>> performance issues. Other ideas might tie into this, like somehow
sharing
>>> the various table headers a bit like CIEs in .eh_frame that could
be merged
>>> by the linker - each object could have separate table header
sections,
>>> which are referenced by the individual .debug_* blocks, which in
turn are
>>> one per function/data piece and easily discardable/merged by the
linker.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Just some thoughts.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> James
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, 2 Jun 2020 at 19:24,
David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, May 19, 2020 at 7:17
AM Alexey Lapshin
>>> >>>>>>>>> <alapshin at
accesssoftek.com> wrote:
>>> >>>>>>>>>> Hi David, please find my
comments inside:
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>>>> Broad
question: Do you have any specific
>>> motivation/users/etc in implementing this (if you can speak about
it)?
>>> >>>>>>>>>>>>> - it might
help motivate the work, understand what
>>> tradeoffs might be suitable for you/your users, etc.
>>> >>>>>>>>>>>> There are two
general requirements:
>>> >>>>>>>>>>>> 1) Remove (or
clean) invalid debug info.
>>> >>>>>>>>>>> Perhaps a simpler
direct solution for your immediate needs
>>> might be a much narrower,
>>> >>>>>>>>>>> and more efficient
linker-DWARF-awareness feature:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> With DWARFv5, rnglists
present an opportunity for a DWARF
>>> linker to rewrite the ranges
>>> >>>>>>>>>>> without parsing the
rest of the DWARF. /technically/ this
>>> isn't guaranteed - rnglist entries
>>> >>>>>>>>>>> can be referenced
either directly, or by index. If all
>>> rnglists are referenced by index, then
>>> >>>>>>>>>>> a linker could parse
only the debug_rnglists section and
>>> rewrite ranges to remove any
>>> >>>>>>>>>>> address ranges that
refer to optimized-out code.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> This would only be
correct for rnglists that had no direct
>>> references to them (that only were
>>> >>>>>>>>>>> referenced via the
indexes) - but we could either implement
>>> it with that assumption, or could
>>> >>>>>>>>>>> add an LLVM extension
attribute on the CU that would say "I
>>> promise I only referenced rnglists
>>> >>>>>>>>>>> via rnglistx
forms/indexes). If this DWARF-aware linking
>>> would have to read the CU DIE (not
>>> >>>>>>>>>>> all the other DIEs) it
/could/ also then rewrite high/low_pc
>>> if the CU wasn't using ranges...
>>> >>>>>>>>>>> but that wouldn't
come up in the function-removal case,
>>> because then you'd have ranges anyway,
>>> >>>>>>>>>>> so no need for that.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Such a DWARF-aware
rnglist linking could also simplify
>>> rnglists, in cases where functions
>>> >>>>>>>>>>> ended up being laid
out next to each other, the linker could
>>> coalesce their ranges together.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I imagine this could
be implemented with very little
>>> overhead to linking, especially compared
>>> >>>>>>>>>>> to the overhead of
full DWARF-aware linking.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Though none of this
fixes Split DWARF, where the linker
>>> doesn't get a chance to see the
>>> >>>>>>>>>>> addresses being used -
but if you only want/need the
>>> CU-level ranges to be correct, this
>>> >>>>>>>>>>> might be a viable fix,
and quite efficient.
>>> >>>>>>>>>> Yes, we think about that
alternative. This would resolve our
>>> problem of invalid debug info
>>> >>>>>>>>>> and would work much
faster. Thus, if we would not have good
>>> results for D74169 then we
>>> >>>>>>>>>> will implement it. Do you
think it could be useful to have
>>> this solution in upstream?
>>> >>>>>>>>> A pure rnglist rewriting - I
think it'd be OK to have in
>>> upstream -
>>> >>>>>>>>> again, cost/benefit/etc would
have to be weighed. I'm not sure
>>> it
>>> >>>>>>>>> would save enough space to be
particularly valuable beyond the
>>> >>>>>>>>> correctness issue - and it
doesn't completely solve the
>>> correctness
>>> >>>>>>>>> issue for zero-address usage
or low-address usage (because you
>>> could
>>> >>>>>>>>> still have overlapping
subprograms inside a CU - so if you were
>>> >>>>>>>>> symbolizing you could use the
correct rnglist to filter, but
>>> then go
>>> >>>>>>>>> look inside the CU only to
find two subprograms that had that
>>> address
>>> >>>>>>>>> & not know which one was
the correct one an which one was the
>>> >>>>>>>>> discarded one).
>>> >>>>>>>>>
>>> >>>>>>>>> rnglist rewriting might be
easy enough to prototype - but
>>> depends what
>>> >>>>>>>>> you want to spend your time
on, I know this whole issue has
>>> been a
>>> >>>>>>>>> huge investment of your time
already - but maybe this recent
>>> >>>>>>>>> revitalization of the
conversation around having an explicit
>>> value in
>>> >>>>>>>>> the linker might be sufficient
to address everyone's needs...
>>> *fingers
>>> >>>>>>>>> crossed*)
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>>>> 2) Optimize the
DWARF size.
>>> >>>>>>>>>>> Do your users care
much about this? I imagine if they had
>>> significant DWARF size issues,
>>> >>>>>>>>>>> they'd have
significant link time issues and the kind of
>>> cost to link time this feature has would
>>> >>>>>>>>>>> be prohibitive - but
perhaps they're sharing linked binaries
>>> much more often than they're
>>> >>>>>>>>>>> actually performing
linking.
>>> >>>>>>>>>> Yes, they do. They also
have significant link-time issues.
>>> >>>>>>>>>> So current performance
results of D74169 are not very
>>> acceptable.
>>> >>>>>>>>>> We hope to improve it.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> The specifics
which our users have:
>>> >>>>>>>>>>>>    - embedded
platform which uses 0 as start of .text
>>> section.
>>> >>>>>>>>>>>>    - custom
toolset which does not support all features
>>> yet(f.e. split dwarf).
>>> >>>>>>>>>>>>    - tolerant of
the link-time increase.
>>> >>>>>>>>>>>>    - need a useful
way to share debug builds.
>>> >>>>>>>>>>> Sharing two files
(executable and dwp) is significantly less
>>> useful than sharing one file?
>>> >>>>>>>>>> Probably not
significantly, but yes, it looks less useful
>>> comparing to D74169.
>>> >>>>>>>>>> Having only two files
(executable and .dwp) looks
>>> significantly better than having executable and multiple .dwo
files.
>>> >>>>>>>>>> Having only one
file(executable) with minimal size looks
>>> better than the two files with a bigger size.
>>> >>>>>>>>>>
>>> >>>>>>>>>> clang compiled with
-gsplitdwarf takes 0.9G for executable
>>> and 0.9G for .dwp.
>>> >>>>>>>>>> clang compiled with
-gc-debuginfo takes only 0.76G for single
>>> executable.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> For the first
point: we have a problem "Overlapping address
>>> ranges starting from 0"(D59553).
>>> >>>>>>>>>>>> We use custom
solution, but the general solution like
>>> D74169 would be better here.
>>> >>>>>>>>>>> If CU ranges are the
only ones that need fixing, then I
>>> think the above solution might be as
>>> >>>>>>>>>>> good/better - if more
than CU ranges need fixing, then I
>>> think we might want to start talking about
>>> >>>>>>>>>>> how to fix DWARF
itself (split and non-split) to signal
>>> certain addresses point to dead code with a
>>> >>>>>>>>>>> specific blessed value
that linkers would need to implement
>>> - because with Split DWARF there's
>>> >>>>>>>>>>> no way to solve the
non-CU addresses at the linker.
>>> >>>>>>>>>> I think the worthful
solution for that signal value would be
>>> LowPC > HighPC.
>>> >>>>>>>>>> That does not require
additional bits in DWARF.
>>> >>>>>>>>>> It would be natural to
skip such address ranges since they
>>> explicitly marked as invalid.
>>> >>>>>>>>>> It could be implemented in
a linker very easily. Probably, it
>>> would make sense to describe that
>>> >>>>>>>>>> usage in DWARF standard.
>>> >>>>>>>>>>
>>> >>>>>>>>>> As to the addresses which
are not seen by the linker(since
>>> they are in .dwo files) - yes,
>>> >>>>>>>>>> they need to have another
solution. Could you show an example
>>> of such a case, please?
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>>>> 2. Support of
type units.
>>> >>>>>>>>>>>>>>    That
could be implemented further.
>>> >>>>>>>>>>>>> Enabling type
units increases object size to make it
>>> easier to deduplicate at link time by a DWARF-unaware
>>> >>>>>>>>>>>>> linker. With a
DWARF aware linker it'd be generally
>>> desirable not to have to add that object size overhead to
>>> >>>>>>>>>>>>> get the
linking improvements.
>>> >>>>>>>>>>>> But, DWARFLinker
should adequately work with type units
>>> since they are already implemented.
>>> >>>>>>>>>>> Maybe - it'd be
nice & all, but I don't think it's an
>>> outright necessity - if someone knows they're using
>>> >>>>>>>>>>> a DWARF-aware linker,
they'd probably not use type units in
>>> their object files. It's possible someone
>>> >>>>>>>>>>> doesn't know for
sure & maybe they have pre-canned debug
>>> object files from someone else, etc.
>>> >>>>>>>>>> I see.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> Another thing is
that the idea behind type units has the
>>> potential to help Dwarf-aware linker to work faster.
>>> >>>>>>>>>>>> Currently,
DWARFLinker analyzes context to understand
>>> whether types are the same or not.
>>> >>>>>>>>>>> When you say
"analyzes context" what do you mean? Usually
>>> I'd take that to mean
>>> >>>>>>>>>>> "looks at things
outside the type itself - like what
>>> namespace it's in, etc" - which, yes,
>>> >>>>>>>>>>> it should do that, but
it doesn't seem very expensive to do.
>>> But I guess you actually
>>> >>>>>>>>>>> mean something about
doing structural equivalence in some
>>> way, looking at things inside the type?
>>> >>>>>>>>>> I think it could be useful
for both cases. Currently,
>>> dsymutil does only first thing
>>> >>>>>>>>>> (look at type name,
namespace name, etc..) and does not do
>>> the second thing
>>> >>>>>>>>>> (doing structural
equivalence). Analyzing type names is
>>> currently quite expensive
>>> >>>>>>>>>> (the only search in string
pool takes ~10 sec from 70 sec of
>>> overall time).
>>> >>>>>>>>>> That is expensive because
of many things should be done to
>>> work with strings:
>>> >>>>>>>>>> parse DWARF, search and
resolve relocations, compute a hash
>>> for strings,
>>> >>>>>>>>>> put data into a string
pool, create a fully qualified
>>> name(like namespace::function::name).
>>> >>>>>>>>>> It looks like it could be
optimized and finally require less
>>> time, but it still would be a noticeable
>>> >>>>>>>>>> part of the overall time.
>>> >>>>>>>>>>
>>> >>>>>>>>>> If dsymutil starts to
check for the structural equivalence,
>>> then the process would be even more slowly.
>>> >>>>>>>>>> So, If instead of
comparing types structure, there would be
>>> checked single hash-id - then this process
>>> >>>>>>>>>> would also be faster.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thus I think using hash-id
to compare types would allow to
>>> make current implementation faster and would
>>> >>>>>>>>>> allow handling incomplete
types by DWARFLinker without
>>> massive performance degradation also.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> But the context is
known when types are generated. So, no
>>> need to spent the time analyzing it.
>>> >>>>>>>>>>>> If types could be
compared without analyzing context, then
>>> Dwarf-aware linker would work faster.
>>> >>>>>>>>>>>> That is just an
idea(not for immediate implementation): If
>>> types would be stored in some "type table"
>>> >>>>>>>>>>>> (instead of COMDAT
section group) and could be accessed
>>> through hash-id(like type units
>>> >>>>>>>>>>>> - then it would be
the solution requiring fewer bits to
>>> store but allowing to compare types
>>> >>>>>>>>>>>> by hash-id(not
analysing context).
>>> >>>>>>>>>>>> In this case, size
increasing would be small. And
>>> processing time could be done faster.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> this is just an
idea and could be discussed separately from
>>> the problem of integrating of D74169.
>>> >>>>>>>>>>>>>> 6.
-flto=thin
>>> >>>>>>>>>>>>>>      That
problem was described in this review
>>> https://reviews.llvm.org/D54747#1503720. It also exists in
>>> >>>>>>>>>>>>>> current
DWARFLinker/dsymutil implementation. I think that
>>> problem should be discussed more: it could
>>> >>>>>>>>>>>>>> probably
be fixed by avoiding generation of such
>>> incomplete declaration during thinlto,
>>> >>>>>>>>>>>>>> That would
be costly to produce extra/redundant debug
>>> info in ThinLTO - actually ThinLTO could be doing
>>> >>>>>>>>>>>>>> more to
reduce that redundancy early on (actually
>>> removing definitions from some llvm Modules if the type
>>> >>>>>>>>>>>>>> definition
is known to exist in another Module, etc)
>>> >>>>>>>>>>>>> I don't
know if it's a problem since that patch was
>>> reverted.
>>> >>>>>>>>>>>> Yes. That patch
was reverted, but this patch(D74169) has
>>> the same problem.
>>> >>>>>>>>>>>> if D74169 would be
applied and --gc-debuginfo used then
>>> structure type
>>> >>>>>>>>>>>> definition would
be removed.
>>> >>>>>>>>>>>> DWARFLinker could
handle that case - "removing definitions
>>> from some llvm Modules if the type
>>> >>>>>>>>>>>> definition is
known to exist in another Module".
>>> >>>>>>>>>>>> i.e. DWARFLinker
could replace the declaration with the
>>> definition.
>>> >>>>>>>>>>>> But that problem
could be more easily resolved when debug
>>> info is generated(probably without
>>> >>>>>>>>>>>> significant
increase of debug info size):
>>> >>>>>>>>>>>> Here we have:
>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile unit containing
>>> concrete instance for function "f".
>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile unit containing
>>> abstract instance root for function "f".
>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile unit containing
>>> function "f" definition.
>>> >>>>>>>>>>>> Code for function
"f" was deleted. gc-debuginfo deletes
>>> compile unit DW_TAG_compile_unit(0x000000c1)
>>> >>>>>>>>>>>> containing
"f" definition (since there is no corresponding
>>> code). But it has structure "Foo" definition
>>> >>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced from
>>> DW_TAG_compile_unit(0x00000073)
>>> >>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That
>>> declaration is exactly the case when definition
>>> >>>>>>>>>>>> was removed by
thinlto and replaced with declaration.
>>> >>>>>>>>>>>> Would it cost too
much if type definition would not be
>>> replaced with declaration for "abstract instance root"?
>>> >>>>>>>>>>>> The number of
concrete instances is bigger than number of
>>> abstract instance roots.
>>> >>>>>>>>>>>> Probably, it would
not be too costly to leave definition in
>>> abstract instance root?
>>> >>>>>>>>>>
>>> >>>>>>>>>>>> Alternatively,
Would it cost too much if type definition
>>> would not be replaced with declaration when
>>> >>>>>>>>>>>> declaration
references type from not used function? (lto
>>> could understand that concrete function is not used).
>>> >>>>>>>>>>> I don't follow
this example - could you provide a small
>>> concrete test case I could reproduce?
>>> >>>>>>>>>> I would provide a test
case if necessary. But it looks like
>>> this issue is finally clear, and you already commented on that.
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Oh, I guess this is
happening perhaps because ThinLTO can't
>>> know for sure that a standalone
>>> >>>>>>>>>>> definition of
'f' won't be needed - so it produces one in
>>> case one of the inlining opportunities
>>> >>>>>>>>>>> doesn't end up
inlining. Then it turns out all calls got
>>> inlined, so the external definition wasn't needed.
>>> >>>>>>>>>>> Oh, you're
suggesting that these 3 CUs got emitted into one
>>> object file during LTO, but that DWARFLinker
>>> >>>>>>>>>>> drops a CU without any
code in it - even though... So far as
>>> I know, in LTO, LLVM directly references
>>> >>>>>>>>>>> types across units if
the CUs are all emitted in the same
>>> object file. (and if they weren't in the same
>>> >>>>>>>>>>> object file - then the
abstract_origin couldn't be pointing
>>> cross-CU).
>>> >>>>>>>>>>> I guess some basic
things to say:
>>> >>>>>>>>>>> With ThinLTO, the
concrete/standalone function definition is
>>> emitted in case some call sites don't end up
>>> >>>>>>>>>>> being inlined. So we
know it'll be emitted (but might not be
>>> needed by the actual linker)
>>> >>>>>>>>>>> ANy number of inline
calls might exist - but we shouldn't
>>> put the type information into those, because
>>> >>>>>>>>>>> they aren't
guaranteed to emit it (if the inline function
>>> gets optimized away, there would be nothing to
>>> >>>>>>>>>>> enforce the type being
emitted) - and even if we forced the
>>> type information to be emitted into one
>>> >>>>>>>>>>> object file that has
an inline copy of the function -
>>> there's no guarantee that object file will get linked in
either.
>>> >>>>>>>>>>> So, no, I don't
think there's much we can do to keep the
>>> size of object files down, while guaranteeing
>>> >>>>>>>>>>> the type information
will be emitted with the usual linker
>>> semantics.
>>> >>>>>>>>>> Then dsymutil/DWARFLinker
could be changed to handle
>>> that(though it would probably be not very efficient).
>>> >>>>>>>>>> If thinlto would
understand that function is not used
>>> finally(and then must not contain referenced type definition),
>>> >>>>>>>>>> then this situation could
be handled more effectively.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thank you, Alexey.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
_______________________________________________
>>> >>>>>>>>>>>> LLVM Developers
mailing list
>>> >>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>> >>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>>>>>>>>
_______________________________________________
>>> >>>>>>>>> LLVM Developers mailing list
>>> >>>>>>>>> llvm-dev at lists.llvm.org
>>> >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>> _______________________________________________
>>> >>> LLVM Developers mailing list
>>> >>> llvm-dev at lists.llvm.org
>>> >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200806/9f71fc9e/attachment-0001.html>

Alexey Lapshin via llvm-dev

2020-Aug-10 12:15 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi Jonas,

Thank you for the comments, please find my answers below...

On 06.08.2020 20:39, Jonas Devlieghere wrote:> Hi Alexey,
>
> I should've looked at this earlier. I went through the thread again 
> and I've
> made some comments, mostly from the dsymutil point of view.
>
> > Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> > generation, which does not support DWARF5(only debug_names table). 
> At the
> > same time, there already exists code in
CodeGen/AsmPrinter/DwarfDebug.h,
> > which implements most of DWARF5. It seems that 
> DWARFEmitter/DWARFStreamer
> > should be rewritten using DwarfDebug/DwarfFile. Though I am not sure
> > whether it would be easy to re-use DwarfDebug/DwarfFile. It would 
> probably
> > be necessary to separate some intermediate level of 
> DwarfDebug/DwarfFile.
>
> These classes serve very different purposes. Last time I looked at 
> them there
> was very little overlap in functionality. In the compiler we're mostly
> concerned with generating the DWARF, while in dsymutil we try to copy
> everything we don't need to parse, and fix up what we have to. I
don't
> want
> to say it's not possible, but I think supporting DWARF5 in those 
> classes is
> going to be a lot less work than trying to reuse the CodeGen variants.I agree, in it`s current state it would be less work to write separate 
implementation
than reusing CodeGen variants. The bad thing is that in such a case 
there is a lot of
code duplication:

DwarfStreamer::emitUnitRangesEntries
DwarfDebug::emitDebugARanges
EmitGenDwarfAranges
DWARFYAML::emitDebugAranges

Supporting new standard would require rewriting/modification of all 
these places. In the ideal world,
having single implementation for the DWARF generation allows changing 
one place and having
benefits in others. Probably, CodeGen classes could be rewritten and 
then it would be useful
to write them assuming two use cases - generation from the scratch and 
copying/updating
existing data. In the end, there would be single implementation which 
could be reused in
many places. Though, it is indeed a lot of work.
>
> > Measurements show that it is spent ~10 sec in
> > llvm::StringMapImpl::LookupBucketFor(). The problem is that the same
> > strings, again and again, are added to the string pool. Two attributes
> > having the same string value would be analyzed (hash calculated) and
> > searched inside the string pool. Even if these strings are already in
> > string table(DW_FORM_strp, DW_FORM_strx). The process could be
optimized
> > for string tables. So that if some string from the string table were
> > accessed previously then, it would keep a reference into the string 
> pool.
> > This would eliminate a lot of string pool searches.
>
> I'm not sure I fully understand the optimization, but I'd love to 
> speed this
> up, if only for dsymutil's sake. I'd love to talk about this in a
separate
> thread or offline.
>The measurements show that quite a big time is taken
by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a string
pool takes a significant amount of time. The idea of optimization was to
reduce the number of string pool searches by remembering previous
results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself
but reference a string from a separate table by index. Currently. if 
there are
duplicated strings of DW_FORM_strp, DW_FORM_strx there would be
two/three/...(one per duplicate) searches in string pool
(llvm::StringMapImpl::LookupBucketFor() would be called). If the position
in the pool would be remembered for the index of the first duplicate
then there would not be necessary to call 
llvm::StringMapImpl::LookupBucketFor() next times.

But prototyping of that idea did not show any worthful performance 
improvement.

Some small performance improvement could be achieved if string pools 
would use
llvm::hash_value(StringRef S) instead of llvm::djbHash().

> > Currently, all object files are analyzed sequentially and cloned
> > sequentially. Cloning is started in parallel with analyzing. That
scheme
> > could be changed: analyzing and cloning could be done in parallel 
> for each
> > object file. That requires refactoring of DWARFLinker and making
string
> > pools and DeclContextTree thread-safe.
>
> I'm less familiar with the way that LLD uses the DWARFOptimizer but 
> this is
> not possible for dsymutil as it is trying to deduplicate DIEs from 
> different
> compile units.Right. dsymutil is trying to de-duplicate DIEs from different
compile units. That, probably, does not avoid multi-thread implementation:

1. DeclContextTree.getChildDeclContext() should be done thread safe.
     thus, even if CU would be processed in parallel - DIEs could be 
de-duplicated
     based on DeclContext.
2. UniquingStringPool and OffsetsStringPool should also be done thread safe.
3. Since compilation units would be processed in parallel -
     the size of the compilation unit would not be known until it is 
fully processed.
     That means that all compilation unit's references should be patched 
after
     CU content is generated. In the same manner like forward references
     are currently patched(fixupForwardReferences).
4. DWARFStreamer provides a sequential interface. Instead of a single 
stream
     as the output, there could be generated several outputs for each CU.
     They would be glued together in the end.>
> > I think improving dsymutil is a valuable thing. Though there are
several
> > directions which might be considered to make it more robust:
> >
> > 1. support of latest DWARF - DWARF5/DWARF64...
>
> Strong +1 on DWARF5. I haven't had the bandwidth yet to really look at 
> this.
> Right now we can't find (at least some) rellocations so we bail out. 
> I'd need
> to fix that to assess the current state of things and figure out how much
> work would be needed.
>
> I don't think anything in LLVM supports generating DWARF64 though.
>
> > 2. implement multi-threaded execution.
>
> See my earlier comment. At least for the dsymutil case, the current 
> approach
> is the best we can do, but I'd love to be proven wrong. :-)
>
> > 3. support of split DWARF.
> > 4. implement dsymutil for non-darwin platform.
>
> These two seem to go together. Given the work you did to split off the 
> DWARF
> optimization part I think we're closer to this than ever. Thanks again
for
> doing that.
>
> > We considered three options:
> >
> > 1. add new functionality into dsymutil. So that dsymutil behaves
> > differently on a non-darwin platform and supports another set of
> > command-line options.
> >
> > 2. add new functionality into llvm-objcopy. llvm-objcopy already 
> supports
> > various binary objects formats(MachO,ELF,COFF,wasm). It also has
several
> > options to work with debug-info.
> >
> > 3. create new utility llvm-dwarfutil which would implement the above
> > functionality and reuse DWARFLinker(extracted from dsymutil) library
and
> > new library ObjectCopy(extracted from llvm-objcopy).
> >
> > So far our preference is number three. The reason for this is that 
> separate
> > utility specifically working with debug info looks as good separation
of
> > concepts. Adding another behavior to dsymutil looks not very good.
>
> In its current state dsymutil itself is a pretty small tool on top of the
> DWARFOptimizer/Linker. I'm curious what the benefits of another tool
are
> compared to a different frontend (like objcopy) for MachO and ELF. It 
> seems
> like that would allow for separation of concerns, while still being 
> able to
> share common code without having to push it all the way up into LLVM.my concern is that this tool would have different source data and 
different set of options.
Having in mind that handling different set of input data and different 
set of options
means writing the other frontend - it, probably, would be good not to 
make dsymutil more complex but
to create another small tool. But, If extending dsymutil looks OK - I am 
OK with it.
Let`s discuss this approach within proposal thread.

>
> > Extending the already rich interface of llvm-objcopy looks also not
very
> > good. Having in mind that actual implementation would be shared by
> > libraries, the separate utility, working specifically with debug info,
> > looks like the right choice. That is our current idea.
>
> > My personal thought would be that extending dsymutil should be ok as
the
> > functionality goes well with everything else dsymutil does (other 
> than not
> > support ELF which the dsymutil maintainers are on board with last I
> > checked). That said, I definitely think a write-up will be helpful. No
> > matter what I support extracting all of the behavior into libraries
and
> > using that somewhere :)
>
> Ha, so basically what I was trying to say above.
>
> I look forward to seeing the proposal!
yep, would publish it soon.

Thank you, Alexey.
>
> Cheers,
> Jonas
>
>
> On Tue, Aug 4, 2020 at 11:33 PM Eric Christopher <echristo at gmail.com 
> <mailto:echristo at gmail.com>> wrote:
>
>     Hi Alexey,
>
>
>
>     On Mon, Aug 3, 2020 at 8:32 AM Alexey Lapshin
>     <avl.lapshin at gmail.com <mailto:avl.lapshin at
gmail.com>> wrote:
>
>         Hi Eric, please
>
>         On 31.07.2020 22:02, Eric Christopher wrote:
>>         Hi Alexey,
>>
>>         On Fri, Jul 31, 2020 at 4:02 AM Alexey Lapshin via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>
>>             On 28.07.2020 19:28, David Blaikie wrote:
>>             > On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin
>>             <avl.lapshin at gmail.com <mailto:avl.lapshin at
gmail.com>> wrote:
>>             >>
>>             >> On 28.07.2020 10:29, David Blaikie via llvm-dev
wrote:
>>             >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>             >>> <alapshin at accesssoftek.com
>>             <mailto:alapshin at accesssoftek.com>> wrote:
>>             >>>>>>>>>>>> This idea
goes in another direction than
>>             fragmenting dwarf
>>             >>>>>>>>>>>> using elf
sections&tricks. It seems to me
>>             that the cost of fragmenting is too high.
>>             >>>>>>>>>>> I tend to
agree - but I'm sort of leaning
>>             towards trying to use object
>>             >>>>>>>>>>> features as
much as possible, then
>>             implementing just enough custom
>>             >>>>>>>>>>> handling in
the linker to recoup overhead,
>>             etc. (eg: add some kind of
>>             >>>>>>>>>>> small
header/brief description that makes it
>>             easy for the linker to
>>             >>>>>>>>>>> slice-and-dice
- but hopefully a
>>             domain-specific such header can be a
>>             >>>>>>>>>>> bit more
compact than the fully general ELF form)
>>             >>>>>>>>>> I think this
indeed should be implemented and
>>             evaluated.
>>             >>>>>>>>>> So that various
approaches could be compared.
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> It is not
only the sizes of structures
>>             describing fragments but also the complexity
>>             >>>>>>>>>>>> of tools
that should be taught to work with
>>             fragmented DWARF.
>>             >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file
>>             should be able to read fragmented DWARF,
>>             >>>>>>>>>>>> but
applied to linked executable it should
>>             work with non-fragmented DWARF).
>>             >>>>>>>>>>>> That idea
is for the tool which works the
>>             same way as dsymutil ODR.
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> I will
shortly describe the idea of making
>>             DWARF be easier processed by dsymutil/DWARFLinker:
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> The idea
is to have only one "type table"
>>             per object file(special section .debug_types_table).
>>             >>>>>>>>>>>> This
"type table" would contain all types.
>>             >>>>>>>>>>>> There
could be a special type of reference -
>>             type_offset - that offset points into the type table.
>>             >>>>>>>>>>>> Basic
types could always be placed into the
>>             start of "type table" thus, offsets to basic
types
>>             >>>>>>>>>>>> most often
would be 1 byte. There also would
>>             be a special kind of reference - reference inside the type.
>>             >>>>>>>>>>>> Type units
sig8 system - would not be used
>>             to reference types.
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> Types
deduplication is assumed to be done,
>>             not by linker mechanism for COMDAT,
>>             >>>>>>>>>>>> but by a
tool like dsymutil. This tool would
>>             create resulting .debug_types_table by putting there
>>             >>>>>>>>>>>> types from
source .debug_types_table-s. Only
>>             one copy of the type would be placed into the
>>             >>>>>>>>>>>> resulting
table. All references pointing to
>>             the deleted copy would be corrected to point
>>             >>>>>>>>>>>> to the
single copy inside "type table".
>>             (that is how dsymutil works currently)
>>             >>>>>>>>>>> ^ that's
the step that's probably a bit
>>             expensive for a general-use
>>             >>>>>>>>>>> tool - it
implies parsing all the DWARF to
>>             find those references and
>>             >>>>>>>>>>> rewrite them,
I think. For a high-performance
>>             solution that could be
>>             >>>>>>>>>>> run by the
linker I think it'd be necessary
>>             to have a solution that
>>             >>>>>>>>>>> doesn't
involve parsing all the DIEs.
>>             >>>>>>>>>> According to the
current dsymutil processing,
>>             >>>>>>>>>> exactly this
process is not the most
>>             time-consuming.
>>             >>>>>>>>>> That could be done
relatively fast.
>>             >>>>>>>>> Fair enough - though
I'd still imagine any
>>             solution that involves
>>             >>>>>>>>> parsing all the DIEs
still wouldn't be fast
>>             enough (maybe an order of
>>             >>>>>>>>> magnitude faster than
the current solution even
>>             - but that's stuill,
>>             >>>>>>>>> what, 6 or 7x slower
than linking without the
>>             feature?) for most users
>>             >>>>>>>>> to consider it a good
trade-off.
>>             >>>>>>>> It seems to me that even
the current 6x-7x
>>             slowdown could be useful.
>>             >>>>>>>> Users who already use
dsymutil or
>>             llvm-dwp(assuming DWARFLinker
>>             >>>>>>>> would be taught to work
with a split dwarf)
>>             tools spend this time and,
>>             >>>>>>>> in some scenarios, waste
disk space by
>>             inter-mediate files.
>>             >>>>>>> FWIW, dwp (llvm-dwp hasn't
really been optimized
>>             compared to binutils
>>             >>>>>>> dwp) is designed to be very
quick - by not
>>             needing to do a lot of
>>             >>>>>>> parsing/fixups. Which, yes,
means larger output
>>             files than would be
>>             >>>>>>> possible with more
parsing/etc. It also doesn't
>>             take any input from
>>             >>>>>>> the linker (so it can run in
parallel with the
>>             linker) - so it can't
>>             >>>>>>> remove dead subprograms. Given
Google's the major
>>             (perhaps only
>>             >>>>>>> significant?) user of Split
DWARF - I can say
>>             that the needs don't
>>             >>>>>>> necessarily overlap well with
something that
>>             would take significantly
>>             >>>>>>> longer to run or use
significantly more memory.
>>             Faster/cheaper/with
>>             >>>>>>> somewhat bigger output files
is probably the
>>             right tradeoff for
>>             >>>>>>> Google's use case, at
least.
>>             >>>>>>>
>>             >>>>>>> I imagine Apple's use for
dsymutil is somewhat
>>             similar - it's not used
>>             >>>>>>> in the iterative development
cycle, only in final
>>             releases - well,
>>             >>>>>>> maybe their situation is more
"neutral" - not a
>>             major pain point in
>>             >>>>>>> any case I'd guess.
>>             >>>>>>>
>>             >>>>>>>
>>             >>>>>> I see. FWIW, Comparison
splitdwarf+dwp and
>>             DWARFLinker from lld:
>>             >>>>>>
>>             >>>>>> 1. split-dwarf+llvm-dwp = linking
time for clang 6
>>             sec,
>>             >>>>>>       generating time for .dwp 53
sec, clang=997M
>>             clang.dwp=1.1G.
>>             >>>>> FWIW, llvm-dwp is not very well
optimized (which is
>>             to say: it is not
>>             >>>>> optimized), binutils dwp might be a
better
>>             comparison (& even that
>>             >>>>> doesn't have the parallelism &
some potential
>>             further memory savings
>>             >>>>> that lld has that we could take
advantage of in a
>>             dwp-like tool)
>>             >>>>>
>>             >>>>> What build mode was the clang binary
built in?
>>             Optimized or unoptimized?
>>             >>>> right, that is unoptimized build with
>>             -ffunction-sections.
>>             >>>>
>>             >>>>>> 2. DWARFLinker from lld = linking
time for clang
>>             72 sec, clang=760M.
>>             >>> And this is without Split DWARF? Without
linker DWARF
>>             compression? -
>>             >>> that seems quite a bit surprising, that the
>>             deduplication of DWARF
>>             >>> could fit into less space than the
wasted/reclaimed
>>             space in ranges (&
>>             >>> line)?
>>             >> that was without split dwarf, without linker
compression.
>>             >>
>>             >>> Could you double check these numbers &
provide a
>>             clearer summary?
>>             >> sure, I would re-check it.
>>             >>
>>             >>> Here's my attempt at numbers (all with
>>             function-sections+gc-sections)...
>>             >>>
>>             >>> Split DWARF tests didn't seem meaningful -
>>             gc-debuginfo + split DWARF
>>             >>> seemed to drop all the debug info (except
gdb_index)
>>             so wasn't
>>             >>> working/comparison wasn't meaningful for
Apples to
>>             Apples, but
>>             >>> included it for comparing gc'd non-split
to non-gc'd
>>             split (disabled
>>             >>> gnu-pubnames/gdb-index (-gsplit-dwarf
>>             -gno-gnu-pubnames) (which turns
>>             >>> on by default with Split DWARF because gdb
needs it -
>>             but a bit of an
>>             >>> unfair comparison without turning on
>>             gnu-pubnames/gdb-index in other
>>             >>> build modes too, since it... /shouldn't/
be
>>             necessary) which might've
>>             >>> been a factor in the data you were looking at)
>>             >> that might be the case. i.e. clang=997M for split
>>             dwarf(from my previous
>>             >> measurement) might include gnu-pubnames.
>>             >>
>>             >> would recheck it and if that is the case then it
is a
>>             unfair comparison.
>>             >>
>>             >>
>>             >> My point was that "DWARFLinker from lld"
takes less
>>             space than singleton
>>             >> split dwarf file+.dwp file.
>>             >>
>>             >> for -O0 uncompressed:
>>             >>
>>             >> - .dwp took 1.1G(if I built it correctly),
singleton
>>             clang(from your
>>             >> measurements) 566 MB
>>             >>
>>             >>      overall 1.6G.
>>             > Oh, yeah, even if there are some measurement issues,
>>             linked executable
>>             > + .dwp is going to be larger than a linked executable
>>             using non-split
>>             > DWARF (in v5), since v5 uses all the same
>>             representations as non-split
>>             > DWARF, and split DWARF adds the indirection overhead
of
>>             a split file,
>>             > etc.
>>             >
>>             > Even without DWARF linking, it's true that split
DWARF
>>             has overhead
>>             > (dwp+executable will be larger than executable
non-split).
>>             >
>>             > But maybe we've ended up down a bit of a tangent
in any
>>             case.
>>             >
>>             > Trying to bring this back to "should this be
committed
>>             to lld" seems
>>             > valuable, and I'm not sure what the right criteria
are
>>             for that.
>>             I think it would be useful to do "removing obsolete
debug
>>             info"
>>             in the linker. First thing is that it would be the
>>             fastest way(no need
>>             to copy data/create temp files/built address map...)
>>             Second thing
>>             is that it would be a good separation of concepts. All
>>             debug info
>>             processing, currently done in the linker(gdb_index,
upcoming
>>             debug_names), could be moved into separate library
processing
>>             debug info. When gdb_index/debug_names should be built
>>             without
>>             "removing of obsolete debug info" it would have
the same
>>             performance results as it currently has.
>>
>>             We decided to give the idea of "removing of obsolete
>>             debug info"
>>             another try and are going to implement it as a separate
>>             utility
>>             working with built binary. Making it to be multi-thread
would
>>             probably show better performance results and then it could
>>             probably be considered as acceptable to use from the
linker.
>>
>>
>>         I'm quite interested in this direction. One thought I had
was
>>         to incorporate such a library into dsymutil but with support
>>         for ELF. If you get a proposal written up I'd love to take
a
>>         look and comment.
>>
>
>         yes, I would share the proposal in a separate thread within a
>         week or two.
>
>
>     Excellent, thanks :)
>
>         Shortly: we decided to move in slightly other direction than
>         adding this functionality
>         into dsymutil. Though if there is a preference to implement it
>         as part of dsymutil
>         we are OK to do this way.
>
>
>     I have a vague preference since a lot of functionality already
>     exists there on one platform and extending that seems straight
>     forward, however...
>
>         In its first version, this new utility supposed to receive
>         built binary with debug info
>         as input(with the new marking for references to removed code
>         sections -1/-2
>         -https://reviews.llvm.org/D84825) and create a new binary with
>         removed obsolete
>         debug info according to the above marking. In the next
>         versions, it could be extended
>         with other debug info optimizations tasks. F.e. generation new
>         index tables, debug info
>         optimizing... etc...
>
>         We considered three options:
>
>         1. add new functionality into dsymutil. So that dsymutil
>         behaves differently
>             on a non-darwin platform and supports another set of
>         command-line options.
>
>         2. add new functionality into llvm-objcopy. llvm-objcopy
>         already supports various
>              binary objects formats(MachO,ELF,COFF,wasm). It also has
>         several options
>              to work with debug-info.
>
>         3. create new utility llvm-dwarfutil which would implement the
>         above functionality
>              and reuse DWARFLinker(extracted from dsymutil) library
>         and new library
>              ObjectCopy(extracted from llvm-objcopy).
>
>         So far our preference is number three. The reason for this is
>         that separate
>         utility specifically working with debug info looks as good
>         separation of concepts.
>         Adding another behavior to dsymutil looks not very good.
>         Extending the already
>         rich interface of llvm-objcopy looks also not very good.
>         Having in mind that actual
>         implementation would be shared by libraries, the separate
>         utility, working specifically
>         with debug info, looks like the right choice. That is our
>         current idea.
>
>         I would publish the proposal shortly to discuss it.
>
>
>
>     These are solid arguments - in particular, I agree with not
>     extending llvm-objcopy :)
>
>     +Jonas Devlieghere <mailto:jonas at devlieghere.com> and +Adrian
>     Prantl <mailto:aprantl at apple.com> for dsymutil comments.
>
>     My personal thought would be that extending dsymutil should be ok
>     as the functionality goes well with everything else dsymutil does
>     (other than not support ELF which the dsymutil maintainers are on
>     board with last I checked). That said, I definitely think a
>     write-up will be helpful. No matter what I support extracting all
>     of the behavior into libraries and using that somewhere :)
>
>     Thanks!
>
>     -eric
>
>         Thank you, Alexey.
>>         Thanks!
>>
>>         -eric
>>
>>             Alexey.
>>
>>             >
>>             > Ray's the best person to weigh in on that. My 2c
is
>>             that I think it
>>             > probably is worthwhile, even just as an experiment,
>>             assuming it's not
>>             > too intrusive to lld.
>>             >
>>             >> - The "DWARFLinker from lld" 820 MB(from
your
>>             measurements).
>>             >>
>>             >>
>>             >> So "DWARFLinker from lld" looks two
times better.
>>             >>
>>             >>
>>             >> Anyway, thank you for pointing me to possible
mistake.
>>             I would recheck
>>             >> it and update results.
>>             >>
>>             >>
>>             >> Alexey.
>>             >>
>>             >>
>>             >>> * -O0: (baseline, just using strip -g: 356 MB)
>>             >>>     * compressed: 25% smaller with
gc-debuginfo (481
>>             MB / 641 MB) (407
>>             >>> MB split/non-gc)
>>             >>>     * uncompressed: 30% smaller (820 MB / 1.2
GB)
>>             (566 MB split/non-gc)
>>             >>> * -O3: (baseline: 116 MB)
>>             >>>     * compressed: 16% smaller (361 MB / 462
MB) (283
>>             MB split/non-gc)
>>             >>>     * uncompressed: 22% smaller (1022 MB / 1.2
GB)
>>             (156 MB split/non-gc)
>>             >>>
>>             >>>
>>             >>>
>>             >>>
>>             >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>             >>> <alapshin at accesssoftek.com
>>             <mailto:alapshin at accesssoftek.com>> wrote:
>>             >>>>>>>>>>>> This idea
goes in another direction than
>>             fragmenting dwarf
>>             >>>>>>>>>>>> using elf
sections&tricks. It seems to me
>>             that the cost of fragmenting is too high.
>>             >>>>>>>>>>> I tend to
agree - but I'm sort of leaning
>>             towards trying to use object
>>             >>>>>>>>>>> features as
much as possible, then
>>             implementing just enough custom
>>             >>>>>>>>>>> handling in
the linker to recoup overhead,
>>             etc. (eg: add some kind of
>>             >>>>>>>>>>> small
header/brief description that makes it
>>             easy for the linker to
>>             >>>>>>>>>>> slice-and-dice
- but hopefully a
>>             domain-specific such header can be a
>>             >>>>>>>>>>> bit more
compact than the fully general ELF form)
>>             >>>>>>>>>> I think this
indeed should be implemented and
>>             evaluated.
>>             >>>>>>>>>> So that various
approaches could be compared.
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> It is not
only the sizes of structures
>>             describing fragments but also the complexity
>>             >>>>>>>>>>>> of tools
that should be taught to work with
>>             fragmented DWARF.
>>             >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file
>>             should be able to read fragmented DWARF,
>>             >>>>>>>>>>>> but
applied to linked executable it should
>>             work with non-fragmented DWARF).
>>             >>>>>>>>>>>> That idea
is for the tool which works the
>>             same way as dsymutil ODR.
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> I will
shortly describe the idea of making
>>             DWARF be easier processed by dsymutil/DWARFLinker:
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> The idea
is to have only one "type table"
>>             per object file(special section .debug_types_table).
>>             >>>>>>>>>>>> This
"type table" would contain all types.
>>             >>>>>>>>>>>> There
could be a special type of reference -
>>             type_offset - that offset points into the type table.
>>             >>>>>>>>>>>> Basic
types could always be placed into the
>>             start of "type table" thus, offsets to basic
types
>>             >>>>>>>>>>>> most often
would be 1 byte. There also would
>>             be a special kind of reference - reference inside the type.
>>             >>>>>>>>>>>> Type units
sig8 system - would not be used
>>             to reference types.
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> Types
deduplication is assumed to be done,
>>             not by linker mechanism for COMDAT,
>>             >>>>>>>>>>>> but by a
tool like dsymutil. This tool would
>>             create resulting .debug_types_table by putting there
>>             >>>>>>>>>>>> types from
source .debug_types_table-s. Only
>>             one copy of the type would be placed into the
>>             >>>>>>>>>>>> resulting
table. All references pointing to
>>             the deleted copy would be corrected to point
>>             >>>>>>>>>>>> to the
single copy inside "type table".
>>             (that is how dsymutil works currently)
>>             >>>>>>>>>>> ^ that's
the step that's probably a bit
>>             expensive for a general-use
>>             >>>>>>>>>>> tool - it
implies parsing all the DWARF to
>>             find those references and
>>             >>>>>>>>>>> rewrite them,
I think. For a high-performance
>>             solution that could be
>>             >>>>>>>>>>> run by the
linker I think it'd be necessary
>>             to have a solution that
>>             >>>>>>>>>>> doesn't
involve parsing all the DIEs.
>>             >>>>>>>>>> According to the
current dsymutil processing,
>>             >>>>>>>>>> exactly this
process is not the most
>>             time-consuming.
>>             >>>>>>>>>> That could be done
relatively fast.
>>             >>>>>>>>> Fair enough - though
I'd still imagine any
>>             solution that involves
>>             >>>>>>>>> parsing all the DIEs
still wouldn't be fast
>>             enough (maybe an order of
>>             >>>>>>>>> magnitude faster than
the current solution even
>>             - but that's stuill,
>>             >>>>>>>>> what, 6 or 7x slower
than linking without the
>>             feature?) for most users
>>             >>>>>>>>> to consider it a good
trade-off.
>>             >>>>>>>> It seems to me that even
the current 6x-7x
>>             slowdown could be useful.
>>             >>>>>>>> Users who already use
dsymutil or
>>             llvm-dwp(assuming DWARFLinker
>>             >>>>>>>> would be taught to work
with a split dwarf)
>>             tools spend this time and,
>>             >>>>>>>> in some scenarios, waste
disk space by
>>             inter-mediate files.
>>             >>>>>>> FWIW, dwp (llvm-dwp hasn't
really been optimized
>>             compared to binutils
>>             >>>>>>> dwp) is designed to be very
quick - by not
>>             needing to do a lot of
>>             >>>>>>> parsing/fixups. Which, yes,
means larger output
>>             files than would be
>>             >>>>>>> possible with more
parsing/etc. It also doesn't
>>             take any input from
>>             >>>>>>> the linker (so it can run in
parallel with the
>>             linker) - so it can't
>>             >>>>>>> remove dead subprograms. Given
Google's the major
>>             (perhaps only
>>             >>>>>>> significant?) user of Split
DWARF - I can say
>>             that the needs don't
>>             >>>>>>> necessarily overlap well with
something that
>>             would take significantly
>>             >>>>>>> longer to run or use
significantly more memory.
>>             Faster/cheaper/with
>>             >>>>>>> somewhat bigger output files
is probably the
>>             right tradeoff for
>>             >>>>>>> Google's use case, at
least.
>>             >>>>>>>
>>             >>>>>>> I imagine Apple's use for
dsymutil is somewhat
>>             similar - it's not used
>>             >>>>>>> in the iterative development
cycle, only in final
>>             releases - well,
>>             >>>>>>> maybe their situation is more
"neutral" - not a
>>             major pain point in
>>             >>>>>>> any case I'd guess.
>>             >>>>>>>
>>             >>>>>>>
>>             >>>>>> I see. FWIW, Comparison
splitdwarf+dwp and
>>             DWARFLinker from lld:
>>             >>>>>>
>>             >>>>>> 1. split-dwarf+llvm-dwp = linking
time for clang 6
>>             sec,
>>             >>>>>>       generating time for .dwp 53
sec, clang=997M
>>             clang.dwp=1.1G.
>>             >>>>> FWIW, llvm-dwp is not very well
optimized (which is
>>             to say: it is not
>>             >>>>> optimized), binutils dwp might be a
better
>>             comparison (& even that
>>             >>>>> doesn't have the parallelism &
some potential
>>             further memory savings
>>             >>>>> that lld has that we could take
advantage of in a
>>             dwp-like tool)
>>             >>>>>
>>             >>>>> What build mode was the clang binary
built in?
>>             Optimized or unoptimized?
>>             >>>> right, that is unoptimized build with
>>             -ffunction-sections.
>>             >>>>
>>             >>>>>> 2. DWARFLinker from lld = linking
time for clang
>>             72 sec, clang=760M.
>>             >>>>> It does seem a tad strange that the
clang binary
>>             would be smaller
>>             >>>>> non-split with DWARF linking than it
was split.
>>             Though I could imagine
>>             >>>>> this might be possible in an optimized
build (wehre
>>             debug_ranges
>>             >>>>> become quite relatively expensive in
the .o file
>>             contribution with
>>             >>>>> Split DWARF)
>>             >>>>> Could you compare the section sizes
between these
>>             two clang binaries, perhaps?
>>             >>>> .debug_ranges is three times bigger and
.debug_line
>>             is twice bigger.
>>             >>>>
>>             >>>>>>>> Thus if they would use
this LLD feature in its
>>             current state
>>             >>>>>>>> - they would still receive
benefits.
>>             >>>>>>>>
>>             >>>>>>>> Speaking of performance
results - LLD is a
>>             multi-thread linker;
>>             >>>>>>>> it handles sections in
parallel. DWARFLinker
>>             generates DWARF using
>>             >>>>>>>> AsmPrinter which is a
stream - so it could make
>>             resulting DWARF only
>>             >>>>>>>> continuously. It is not
surprising that the
>>             parallel solution works faster.
>>             >>>>>>>> Making DWARFLinker truly
multi-threaded would
>>             probably allow us
>>             >>>>>>>> to make slowdown to be at
2x-4x range.
>>             >>>>>>> *nod* that's still a
really expensive link - but
>>             I understand that's a
>>             >>>>>>> suitable tradeoff for your
users
>>             >>>>>>>
>>             >>>>>> Btw, 2x or 7x is for pure linking
time. Overall
>>             compilation slowdown
>>             >>>>>> is not so significant. Building
LLVM codebase has
>>             only 20% slowdown.
>>             >>>>> Understood - that's still quite
significant to most
>>             users, I'd imagine.
>>             >>>> I see.
>>             >>>>
>>             >>>>>>>>>> Anyway, I think
the dsymutil approach is still
>>             valuable, and it
>>             >>>>>>>>>> would be useful to
optimize it.
>>             >>>>>>>>>> Do you think it
would be useful to make
>>             dsymutil/DWARFLinker truly multi-thread?
>>             >>>>>>>>>> (To make
dsymutil/DWARFLinker able to process
>>             each object file in a separate thread)
>>             >>>>>>>>> Perhaps - that I'd
probably leave up to the
>>             folks who are more
>>             >>>>>>>>> invested in dsymutil
(Adrian Prantl et al).
>>             Maybe one day we'll get it
>>             >>>>>>>>> integrated into
llvm-dwp and then I'll be
>>             interested in getting as
>>             >>>>>>>>> much performance out
of it as lld - so
>>             multithreading and things would
>>             >>>>>>>>> be on the books.
>>             >>>>>>>> I think improving dsymutil
is a valuable thing.
>>             >>>>>>>> Though there are several
directions which might
>>             be considered
>>             >>>>>>>> to make it more robust:
>>             >>>>>>>>
>>             >>>>>>>> 1. support of latest DWARF
- DWARF5/DWARF64...
>>             >>>>>>> I expect/though some of the
Apple folks had
>>             already worked on DWARF5 support?
>>             >>>>>>> DWARF64 - that's been
around for a while, and
>>             just hasn't been needed
>>             >>>>>>> by LLVM users thus far, it
seems (until recently
>>             - where some
>>             >>>>>>> developers have started
working on that)
>>             >>>>>> There already implemented
debug_names table, but
>>             debug_rnglists,
>>             >>>>>> debug_loclists, type units - are
not implemented yet.
>>             >>>>> Superficially, type units wouldn't
be on the list
>>             of features (like
>>             >>>>> DWARF64 - it's optional) I'd
try to support in
>>             dsymutil - since their
>>             >>>>> size overhead is more justified for a
>>             DWARF-agnostic linker that's
>>             >>>>> using comdat groups. With a
DWARF-aware linker I'd
>>             be specifically
>>             >>>>> hoping to avoid using type units to
help
>>             >>>>>> The thing which
>>             >>>>>> should probably be changed is that
dsymutil should
>>             not have its version
>>             >>>>>> of code generating DWARF tables.
It should call
>>             already existed
>>             >>>>>> DWARF5/DWARF64 implementations.
Then dsymutil
>>             would always
>>             >>>>>> use last DWARF generators.
>>             >>>>> Possibly - I don't know what the
architectural
>>             tradeoffs for that look
>>             >>>>> like - I'd imagine DWARFLinker has
sufficiently
>>             different
>>             >>>>> needs/tradeoffs than LLVM's DWARF
generation code
>>             (rewriting existing
>>             >>>>> DIEs compared to building new ones
from scratch,
>>             etc) that it might be
>>             >>>>> hard for them to share a lot of their
implementation.
>>             >>>> It is not easy, and would require some
additions,
>>             but it would benefit
>>             >>>> in that all format implementation is in
one place.
>>             Thus changing that place
>>             >>>> would reflect in other places. There are
at least
>>             three implementations for
>>             >>>> .debug_ranges, .debug_aranges currently...
>>             >>>>
>>             >>>>
>>             >>>>>>>> 2. implement
multi-threaded execution.
>>             >>>>>>>> 3. support of split DWARF.
>>             >>>>>>> Maybe, though I'm still
not sure it'd be the
>>             right tradeoff -
>>             >>>>>>> especially if it involved
having to wait to run
>>             the .dwo merger (call
>>             >>>>>>> it DWARF-aware dwp, or
dsymutil with dwp support)
>>             until after the
>>             >>>>>>> linker ran.
>>             >>>>>>>
>>             >>>>>>>> 4. implement dsymutil for
non-darwin platform.
>>             >>>>>>> That's probably,
essentially (3), more-or-less.
>>             Split DWARF is
>>             >>>>>>> somewhat of a formalization of
Apple's/MachO
>>             DWARF distribution model
>>             >>>>>>> (leave DWARF it in files that
aren't linked/use
>>             them from a debugger,
>>             >>>>>>> but also be able to merge them
into some final
>>             file (dsym or dwp) for
>>             >>>>>>> archival purposes)
>>             >>>>>>>
>>             >>>>>>>> All of this is a massive
piece of work.
>>             >>>>>>>> Our original investment
was to solve two problems:
>>             >>>>>>>>
>>             >>>>>>>> 1. Overlapped address
ranges, which is currently
>>             close to being solved. Thank you for helping with that!
>>             >>>>>>> Yeah, again, sorry that's
taken quite so
>>             long/somewhat circuitous route.
>>             >>>>>>>
>>             >>>>>>>> 2. Size of debug info.
That still becomes an
>>             issue, but we are unsure whether we are ready to
>>             >>>>>>>>      invest in solving all
the above 1-4
>>             problems and how much community interested in it.
>>             >>>>>>> Fair, for sure - I don't
think you'd need to sign
>>             up to solve all of
>>             >>>>>>> them (don't think they
necessarily need solving).
>>             Potentially moving
>>             >>>>>>> the logic out into a separate
tool as Fangrui's
>>             considering - a
>>             >>>>>>> post-link DWARF optimizer,
rather than in-linker
>>             DWARF optimization.
>>             >>>>>>>
>>             >>>>>>> I really don't want to
give you the runaround
>>             like this - but multiple
>>             >>>>>>> times slower links is
something that seems pretty
>>             problematic for most
>>             >>>>>>> users, to the point of
weighing the
>>             maintainability of lld against the
>>             >>>>>>> convenience of having this
functionality
>>             in-linker rather than in a
>>             >>>>>>> post-link optimizer.
>>             >>>>>>>
>>             >>>>>>> (I know you've spoken a
bit before about your
>>             users needs - but if
>>             >>>>>>> it's possible, could you
explain (again :/) why
>>             they have such a
>>             >>>>>>> strong need for smaller DWARF?
While DWARF size
>>             is an ongoing concern
>>             >>>>>>> for many users (Google
certainly - hence the
>>             invention of Split DWARF,
>>             >>>>>>> use of type units and
compressed DWARF, etc) -
>>             usually it's in rather
>>             >>>>>>> large programs, but it sounds
like you're dealing
>>             with relatively
>>             >>>>>>> small ones (otherwise the
increase in link time,
>>             I'd imagine, would be
>>             >>>>>>> prohibitive for your users?)?
>>             >>>>>> We have many large programs and
keep Dayly/Nightly
>>             debug builds,
>>             >>>>>> which takes a lot of disk space.
Compilation time
>>             for these programs is big.
>>             >>>>>> The scenario is "compile
once".(not
>>             compile-debug-compile-debug).
>>             >>>>>> So we think that solution(like
>>             dsymutil/DWARFLinker) would not slowdown
>>             >>>>>> the compilation time of overall
build
>>             significantly(see above numbers for
>>             >>>>>> llvm codebase) and would allow us
to reduce disk
>>             space required to keep
>>             >>>>>> all of these builds.
>>             >>>>> Ah, OK - for archival purposes. So the
interactive
>>             developers wouldn't
>>             >>>>> necessarily be using this feature.
Makes sense -
>>             similar to dsymutil
>>             >>>>> and dwp, mostly used for archival
purposes & you
>>             can debug straight
>>             >>>> >from .o/.dwos for
interactive/iterative development.
>>             >>>>
>>             >>>>> In that case, it seems more likely
that a separate
>>             tool might suffice.
>>             >>>> agreed: if to continue the work on this
then it
>>             makes sense to
>>             >>>> do it as separate tool. Make it fast
enough. And if
>>             there would be interest
>>             >>>> in it - then it would probably be possible
to return
>>             to idea calling it from linker.
>>             >>>>
>>             >>>>> Also, out of curiosity - have you
tried just
>>             compressing the output
>>             >>>>> (-gz (I think that does the right
thing for the
>>             linker level
>>             >>>>> compression too, otherwise
>>             -Wl,-compress-debug-sections might do it))
>>             >>>>> or are you already doing that in
addition?
>>             >>>> sure. we use -Wl,-compress-debug-sections.
>>             >>>>
>>             >>>> Thank you, Alexey.
>>             >>>>
>>             >>>>>>> You mentioned that the
usability cost of
>>             >>>>>>> Split DWARF for your users was
too high (or high
>>             enough to justify
>>             >>>>>>> this alternative work of
DWARF-aware linking)?
>>             That all seems a bit
>>             >>>>>>> surprising to me - though I
understand the
>>             deployment issues of Split
>>             >>>>>>> DWARF do present some
challenges to users in more
>>             heterogenous
>>             >>>>>>> environments than
Google's... still, I'd have
>>             thought there was some
>>             >>>>>>> hope there)
>>             >>>>>> Our tools does not support split
dwarf yet. Though
>>             we plan to implement it.
>>             >>>>>> When we would have support of
split dwarf then it
>>             would be
>>             >>>>>> convenient to have easy way to
share built debug
>>             binaries. llvm-dwp is the
>>             >>>>>> answer to this. DWARFLinker could
probably be
>>             another answer.
>>             >>>>> Ah, fair enough - thanks for the
context!
>>             >>>>>>>>> One way to do that
would be to have a CU-local
>>             type indirection table.
>>             >>>>>>>>> DIEs reference local
type numbers (like local
>>             address/string numbers -
>>             >>>>>>>>> addrx/strx/rnglistx)
and that table contains
>>             either sig8 (no linker
>>             >>>>>>>>> fixups required) or
the local type offsets you
>>             describe - the linker
>>             >>>>>>>>> would then only need
to read this type number
>>             indirection table and
>>             >>>>>>>>> rewrite them to the
final type numbers.
>>             >>>>>>>> Yes, that could be
additionally done if this
>>             process would be time-consuming.
>>             >>>>>>>>
>>             >>>>>>>> David, thank you for all
your comments and
>>             explanations. They are extremely helpful.
>>             >>>>>>> Sure thing - really appreciate
your patience with
>>             all this - it's... a
>>             >>>>>>> lot of moving parts.
>>             >>>>>>> - Dave
>>             >>>>>>> Thank you, Alexey.
>>             >>>>>>>
>>             >>>>>>>> sig8 hash-id would be used
to compare types and
>>             to deduplicate them.
>>             >>>>>>>> It would speed up the
current dsymutil context
>>             analysis.
>>             >>>>>>>> Types having the same
hash-id could be deduplicated.
>>             >>>>>>>> This would allow
deduplicating a more number of
>>             types than current dsymutil.
>>             >>>>>>>> Incomplete type
definitions having a similar set
>>             of members are not deduplicated by dsymutil currently.
>>             >>>>>>>> In this case they would
have the same hash-id.
>>             >>>>>>>>
>>             >>>>>>>> This "type
table" would take less space than
>>             current "type units" and current ODR solution.
>>             >>>>>>>>
>>             >>>>>>>> Above is just an idea on
how to help DWARF-aware
>>             linker(based on idea removing obsolete debug info)
>>             >>>>>>>> to work faster(if that is
interesting).
>>             >>>>>>>>
>>             >>>>>>>> Alexey.
>>             >>>>>>>>
>>             >>>>>>>>> From: llvm-dev
<llvm-dev-bounces at lists.llvm.org
>>             <mailto:llvm-dev-bounces at lists.llvm.org>> On
Behalf Of
>>             James Henderson via llvm-dev
>>             >>>>>>>>> Sent: Wednesday, June
3, 2020 3:48 AM
>>             >>>>>>>>> To: David Blaikie
<dblaikie at gmail.com
>>             <mailto:dblaikie at gmail.com>>
>>             >>>>>>>>> Cc: llvm-dev at
lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>
>>             >>>>>>>>> Subject: Re:
[llvm-dev] [Debuginfo][DWARF][LLD]
>>             Remove obsolete debug info in lld.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> It makes me sad that
the linker (via a library
>>             or otherwise) has to be "DWARF-aware" to be able
to
>>             effectively handle --gc-sections, COMDATs, --icf etc for
>>             debug info, without leaving large blocks of data kicking
>>             around.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> The patching to -1 (or
equivalent) is probably
>>             a good lightweight solution (though I'd love it if it
>>             could be done based on section type in the future rather
>>             than section name, but that's probably outside the
realm
>>             of DWARF), as it requires only minimal understanding in
>>             the linker, but anything beyond that seems to be
>>             complicated logic that is mostly due to the structure of
>>             DWARF. Patching to -1 does feel a bit like a sticking
>>             plaster/band aid to patch over the issue rather than
>>             properly solving it too - there will still be debug data
>>             (potentially significant amounts in COMDAT-heavy objects)
>>             that the linker has to write and the debugger has to
>>             somehow know how to skip (even if it knows that -1 is
>>             special-case due to the standard being updated, it needs
>>             to get as far as the -1), which is all wasted effort.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> We've already seen
from Alexey's prototyping,
>>             and from our own experiences with the Sony proprietary
>>             linker (which tried to rewrite .debug_line only) that
>>             deconstructing the DWARF so that it can be more optimally
>>             reassembled at link time is slow going, and will probably
>>             inevitably be however much effort is put into optimising
>>             it. For a start, given the current standards, it's
>>             impossible to know how to deconstruct it without having
>>             to parse vast amounts of DWARF, which is typically going
>>             to mean a lot more parsing work than the linker would
>>             normally have to deal with. Additionally, much of this
>>             parsing work is wasted effort, since it seems unlikely in
>>             many links that large amounts of the DWARF will be
>>             redundant. Having an option to opt-in doesn't help much
>>             there, since it just means the logic exists without most
>>             people using it, due to it not being good enough, or
>>             potentially they don't even know it exists.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> I don't have
particularly concrete suggestions
>>             as to how to solve the structural problems with DWARF at
>>             this point. The only thing that seems obvious to me is a
>>             more "blessed" approach to fragmentation of
sections,
>>             similar to what I tried with my prototype mentioned
>>             earlier in the thread, although we'd need to figure out
>>             the previously stated performance issues. Other ideas
>>             might tie into this, like somehow sharing the various
>>             table headers a bit like CIEs in .eh_frame that could be
>>             merged by the linker - each object could have separate
>>             table header sections, which are referenced by the
>>             individual .debug_* blocks, which in turn are one per
>>             function/data piece and easily discardable/merged by the
>>             linker.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> Just some thoughts.
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> James
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>> On Tue, 2 Jun 2020 at
19:24, David Blaikie via
>>             llvm-dev <llvm-dev at lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>> wrote:
>>             >>>>>>>>>
>>             >>>>>>>>> On Tue, May 19, 2020
at 7:17 AM Alexey Lapshin
>>             >>>>>>>>> <alapshin at
accesssoftek.com
>>             <mailto:alapshin at accesssoftek.com>> wrote:
>>             >>>>>>>>>> Hi David, please
find my comments inside:
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>>>> Broad
question: Do you have any specific
>>             motivation/users/etc in implementing this (if you can
>>             speak about it)?
>>             >>>>>>>>>>>>> - it
might help motivate the work,
>>             understand what tradeoffs might be suitable for you/your
>>             users, etc.
>>             >>>>>>>>>>>> There are
two general requirements:
>>             >>>>>>>>>>>> 1) Remove
(or clean) invalid debug info.
>>             >>>>>>>>>>> Perhaps a
simpler direct solution for your
>>             immediate needs might be a much narrower,
>>             >>>>>>>>>>> and more
efficient linker-DWARF-awareness
>>             feature:
>>             >>>>>>>>>>>
>>             >>>>>>>>>>> With DWARFv5,
rnglists present an opportunity
>>             for a DWARF linker to rewrite the ranges
>>             >>>>>>>>>>> without
parsing the rest of the DWARF.
>>             /technically/ this isn't guaranteed - rnglist entries
>>             >>>>>>>>>>> can be
referenced either directly, or by
>>             index. If all rnglists are referenced by index, then
>>             >>>>>>>>>>> a linker could
parse only the debug_rnglists
>>             section and rewrite ranges to remove any
>>             >>>>>>>>>>> address ranges
that refer to optimized-out code.
>>             >>>>>>>>>>>
>>             >>>>>>>>>>> This would
only be correct for rnglists that
>>             had no direct references to them (that only were
>>             >>>>>>>>>>> referenced via
the indexes) - but we could
>>             either implement it with that assumption, or could
>>             >>>>>>>>>>> add an LLVM
extension attribute on the CU
>>             that would say "I promise I only referenced rnglists
>>             >>>>>>>>>>> via rnglistx
forms/indexes). If this
>>             DWARF-aware linking would have to read the CU DIE (not
>>             >>>>>>>>>>> all the other
DIEs) it /could/ also then
>>             rewrite high/low_pc if the CU wasn't using ranges...
>>             >>>>>>>>>>> but that
wouldn't come up in the
>>             function-removal case, because then you'd have ranges
anyway,
>>             >>>>>>>>>>> so no need for
that.
>>             >>>>>>>>>>>
>>             >>>>>>>>>>> Such a
DWARF-aware rnglist linking could also
>>             simplify rnglists, in cases where functions
>>             >>>>>>>>>>> ended up being
laid out next to each other,
>>             the linker could coalesce their ranges together.
>>             >>>>>>>>>>>
>>             >>>>>>>>>>> I imagine this
could be implemented with very
>>             little overhead to linking, especially compared
>>             >>>>>>>>>>> to the
overhead of full DWARF-aware linking.
>>             >>>>>>>>>>>
>>             >>>>>>>>>>> Though none of
this fixes Split DWARF, where
>>             the linker doesn't get a chance to see the
>>             >>>>>>>>>>> addresses
being used - but if you only
>>             want/need the CU-level ranges to be correct, this
>>             >>>>>>>>>>> might be a
viable fix, and quite efficient.
>>             >>>>>>>>>> Yes, we think
about that alternative. This
>>             would resolve our problem of invalid debug info
>>             >>>>>>>>>> and would work
much faster. Thus, if we would
>>             not have good results for D74169 then we
>>             >>>>>>>>>> will implement it.
Do you think it could be
>>             useful to have this solution in upstream?
>>             >>>>>>>>> A pure rnglist
rewriting - I think it'd be OK
>>             to have in upstream -
>>             >>>>>>>>> again,
cost/benefit/etc would have to be
>>             weighed. I'm not sure it
>>             >>>>>>>>> would save enough
space to be particularly
>>             valuable beyond the
>>             >>>>>>>>> correctness issue -
and it doesn't completely
>>             solve the correctness
>>             >>>>>>>>> issue for zero-address
usage or low-address
>>             usage (because you could
>>             >>>>>>>>> still have overlapping
subprograms inside a CU
>>             - so if you were
>>             >>>>>>>>> symbolizing you could
use the correct rnglist
>>             to filter, but then go
>>             >>>>>>>>> look inside the CU
only to find two subprograms
>>             that had that address
>>             >>>>>>>>> & not know which
one was the correct one an
>>             which one was the
>>             >>>>>>>>> discarded one).
>>             >>>>>>>>>
>>             >>>>>>>>> rnglist rewriting
might be easy enough to
>>             prototype - but depends what
>>             >>>>>>>>> you want to spend your
time on, I know this
>>             whole issue has been a
>>             >>>>>>>>> huge investment of
your time already - but
>>             maybe this recent
>>             >>>>>>>>> revitalization of the
conversation around
>>             having an explicit value in
>>             >>>>>>>>> the linker might be
sufficient to address
>>             everyone's needs... *fingers
>>             >>>>>>>>> crossed*)
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>>>> 2)
Optimize the DWARF size.
>>             >>>>>>>>>>> Do your users
care much about this? I imagine
>>             if they had significant DWARF size issues,
>>             >>>>>>>>>>> they'd
have significant link time issues and
>>             the kind of cost to link time this feature has would
>>             >>>>>>>>>>> be prohibitive
- but perhaps they're sharing
>>             linked binaries much more often than they're
>>             >>>>>>>>>>> actually
performing linking.
>>             >>>>>>>>>> Yes, they do. They
also have significant
>>             link-time issues.
>>             >>>>>>>>>> So current
performance results of D74169 are
>>             not very acceptable.
>>             >>>>>>>>>> We hope to improve
it.
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> The
specifics which our users have:
>>             >>>>>>>>>>>>    -
embedded platform which uses 0 as start
>>             of .text section.
>>             >>>>>>>>>>>>    -
custom toolset which does not support
>>             all features yet(f.e. split dwarf).
>>             >>>>>>>>>>>>    -
tolerant of the link-time increase.
>>             >>>>>>>>>>>>    - need
a useful way to share debug builds.
>>             >>>>>>>>>>> Sharing two
files (executable and dwp) is
>>             significantly less useful than sharing one file?
>>             >>>>>>>>>> Probably not
significantly, but yes, it looks
>>             less useful comparing to D74169.
>>             >>>>>>>>>> Having only two
files (executable and .dwp)
>>             looks significantly better than having executable and
>>             multiple .dwo files.
>>             >>>>>>>>>> Having only one
file(executable) with minimal
>>             size looks better than the two files with a bigger size.
>>             >>>>>>>>>>
>>             >>>>>>>>>> clang compiled
with -gsplitdwarf takes 0.9G
>>             for executable and 0.9G for .dwp.
>>             >>>>>>>>>> clang compiled
with -gc-debuginfo takes only
>>             0.76G for single executable.
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> For the
first point: we have a problem
>>             "Overlapping address ranges starting from
0"(D59553).
>>             >>>>>>>>>>>> We use
custom solution, but the general
>>             solution like D74169 would be better here.
>>             >>>>>>>>>>> If CU ranges
are the only ones that need
>>             fixing, then I think the above solution might be as
>>             >>>>>>>>>>> good/better -
if more than CU ranges need
>>             fixing, then I think we might want to start talking about
>>             >>>>>>>>>>> how to fix
DWARF itself (split and non-split)
>>             to signal certain addresses point to dead code with a
>>             >>>>>>>>>>> specific
blessed value that linkers would
>>             need to implement - because with Split DWARF there's
>>             >>>>>>>>>>> no way to
solve the non-CU addresses at the
>>             linker.
>>             >>>>>>>>>> I think the
worthful solution for that signal
>>             value would be LowPC > HighPC.
>>             >>>>>>>>>> That does not
require additional bits in DWARF.
>>             >>>>>>>>>> It would be
natural to skip such address
>>             ranges since they explicitly marked as invalid.
>>             >>>>>>>>>> It could be
implemented in a linker very
>>             easily. Probably, it would make sense to describe that
>>             >>>>>>>>>> usage in DWARF
standard.
>>             >>>>>>>>>>
>>             >>>>>>>>>> As to the
addresses which are not seen by the
>>             linker(since they are in .dwo files) - yes,
>>             >>>>>>>>>> they need to have
another solution. Could you
>>             show an example of such a case, please?
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>
>>             >>>>>>>>>>>>> 2.
Support of type units.
>>             >>>>>>>>>>>>>>   
That could be implemented further.
>>             >>>>>>>>>>>>>
Enabling type units increases object size
>>             to make it easier to deduplicate at link time by a
>>             DWARF-unaware
>>             >>>>>>>>>>>>>
linker. With a DWARF aware linker it'd be
>>             generally desirable not to have to add that object size
>>             overhead to
>>             >>>>>>>>>>>>> get
the linking improvements.
>>             >>>>>>>>>>>> But,
DWARFLinker should adequately work with
>>             type units since they are already implemented.
>>             >>>>>>>>>>> Maybe -
it'd be nice & all, but I don't think
>>             it's an outright necessity - if someone knows
they're using
>>             >>>>>>>>>>> a DWARF-aware
linker, they'd probably not use
>>             type units in their object files. It's possible someone
>>             >>>>>>>>>>> doesn't
know for sure & maybe they have
>>             pre-canned debug object files from someone else, etc.
>>             >>>>>>>>>> I see.
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> Another
thing is that the idea behind type
>>             units has the potential to help Dwarf-aware linker to
>>             work faster.
>>             >>>>>>>>>>>> Currently,
DWARFLinker analyzes context to
>>             understand whether types are the same or not.
>>             >>>>>>>>>>> When you say
"analyzes context" what do you
>>             mean? Usually I'd take that to mean
>>             >>>>>>>>>>> "looks at
things outside the type itself -
>>             like what namespace it's in, etc" - which, yes,
>>             >>>>>>>>>>> it should do
that, but it doesn't seem very
>>             expensive to do. But I guess you actually
>>             >>>>>>>>>>> mean something
about doing structural
>>             equivalence in some way, looking at things inside the type?
>>             >>>>>>>>>> I think it could
be useful for both cases.
>>             Currently, dsymutil does only first thing
>>             >>>>>>>>>> (look at type
name, namespace name, etc..) and
>>             does not do the second thing
>>             >>>>>>>>>> (doing structural
equivalence). Analyzing type
>>             names is currently quite expensive
>>             >>>>>>>>>> (the only search
in string pool takes ~10 sec
>>             from 70 sec of overall time).
>>             >>>>>>>>>> That is expensive
because of many things
>>             should be done to work with strings:
>>             >>>>>>>>>> parse DWARF,
search and resolve relocations,
>>             compute a hash for strings,
>>             >>>>>>>>>> put data into a
string pool, create a fully
>>             qualified name(like namespace::function::name).
>>             >>>>>>>>>> It looks like it
could be optimized and
>>             finally require less time, but it still would be a
noticeable
>>             >>>>>>>>>> part of the
overall time.
>>             >>>>>>>>>>
>>             >>>>>>>>>> If dsymutil starts
to check for the structural
>>             equivalence, then the process would be even more slowly.
>>             >>>>>>>>>> So, If instead of
comparing types structure,
>>             there would be checked single hash-id - then this process
>>             >>>>>>>>>> would also be
faster.
>>             >>>>>>>>>>
>>             >>>>>>>>>> Thus I think using
hash-id to compare types
>>             would allow to make current implementation faster and would
>>             >>>>>>>>>> allow handling
incomplete types by DWARFLinker
>>             without massive performance degradation also.
>>             >>>>>>>>>>
>>             >>>>>>>>>>>> But the
context is known when types are
>>             generated. So, no need to spent the time analyzing it.
>>             >>>>>>>>>>>> If types
could be compared without analyzing
>>             context, then Dwarf-aware linker would work faster.
>>             >>>>>>>>>>>> That is
just an idea(not for immediate
>>             implementation): If types would be stored in some
"type
>>             table"
>>             >>>>>>>>>>>> (instead
of COMDAT section group) and could
>>             be accessed through hash-id(like type units
>>             >>>>>>>>>>>> - then it
would be the solution requiring
>>             fewer bits to store but allowing to compare types
>>             >>>>>>>>>>>> by
hash-id(not analysing context).
>>             >>>>>>>>>>>> In this
case, size increasing would be
>>             small. And processing time could be done faster.
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>> this is
just an idea and could be discussed
>>             separately from the problem of integrating of D74169.
>>             >>>>>>>>>>>>>> 6.
-flto=thin
>>             >>>>>>>>>>>>>>   
  That problem was described in this
>>             review https://reviews.llvm.org/D54747#1503720. It also
>>             exists in
>>             >>>>>>>>>>>>>>
current DWARFLinker/dsymutil
>>             implementation. I think that problem should be discussed
>>             more: it could
>>             >>>>>>>>>>>>>>
probably be fixed by avoiding generation
>>             of such incomplete declaration during thinlto,
>>             >>>>>>>>>>>>>>
That would be costly to produce
>>             extra/redundant debug info in ThinLTO - actually ThinLTO
>>             could be doing
>>             >>>>>>>>>>>>>>
more to reduce that redundancy early on
>>             (actually removing definitions from some llvm Modules if
>>             the type
>>             >>>>>>>>>>>>>>
definition is known to exist in another
>>             Module, etc)
>>             >>>>>>>>>>>>> I
don't know if it's a problem since that
>>             patch was reverted.
>>             >>>>>>>>>>>> Yes. That
patch was reverted, but this
>>             patch(D74169) has the same problem.
>>             >>>>>>>>>>>> if D74169
would be applied and
>>             --gc-debuginfo used then structure type
>>             >>>>>>>>>>>> definition
would be removed.
>>             >>>>>>>>>>>>
DWARFLinker could handle that case -
>>             "removing definitions from some llvm Modules if the
type
>>             >>>>>>>>>>>> definition
is known to exist in another Module".
>>             >>>>>>>>>>>> i.e.
DWARFLinker could replace the
>>             declaration with the definition.
>>             >>>>>>>>>>>> But that
problem could be more easily
>>             resolved when debug info is generated(probably without
>>             >>>>>>>>>>>>
significant increase of debug info size):
>>             >>>>>>>>>>>> Here we
have:
>>             >>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile
>>             unit containing concrete instance for function
"f".
>>             >>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile
>>             unit containing abstract instance root for function
"f".
>>             >>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile
>>             unit containing function "f" definition.
>>             >>>>>>>>>>>> Code for
function "f" was deleted.
>>             gc-debuginfo deletes compile unit
>>             DW_TAG_compile_unit(0x000000c1)
>>             >>>>>>>>>>>> containing
"f" definition (since there is no
>>             corresponding code). But it has structure "Foo"
definition
>>             >>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced
>>             from DW_TAG_compile_unit(0x00000073)
>>             >>>>>>>>>>>> by
declaration
>>             DW_TAG_structure_type(0x000000ae). That declaration is
>>             exactly the case when definition
>>             >>>>>>>>>>>> was
removed by thinlto and replaced with
>>             declaration.
>>             >>>>>>>>>>>> Would it
cost too much if type definition
>>             would not be replaced with declaration for "abstract
>>             instance root"?
>>             >>>>>>>>>>>> The number
of concrete instances is bigger
>>             than number of abstract instance roots.
>>             >>>>>>>>>>>> Probably,
it would not be too costly to
>>             leave definition in abstract instance root?
>>             >>>>>>>>>>
>>             >>>>>>>>>>>>
Alternatively, Would it cost too much if
>>             type definition would not be replaced with declaration when
>>             >>>>>>>>>>>>
declaration references type from not used
>>             function? (lto could understand that concrete function is
>>             not used).
>>             >>>>>>>>>>> I don't
follow this example - could you
>>             provide a small concrete test case I could reproduce?
>>             >>>>>>>>>> I would provide a
test case if necessary. But
>>             it looks like this issue is finally clear, and you
>>             already commented on that.
>>             >>>>>>>>>>
>>             >>>>>>>>>>> Oh, I guess
this is happening perhaps because
>>             ThinLTO can't know for sure that a standalone
>>             >>>>>>>>>>> definition of
'f' won't be needed - so it
>>             produces one in case one of the inlining opportunities
>>             >>>>>>>>>>> doesn't
end up inlining. Then it turns out
>>             all calls got inlined, so the external definition
wasn't
>>             needed.
>>             >>>>>>>>>>> Oh, you're
suggesting that these 3 CUs got
>>             emitted into one object file during LTO, but that
DWARFLinker
>>             >>>>>>>>>>> drops a CU
without any code in it - even
>>             though... So far as I know, in LTO, LLVM directly
references
>>             >>>>>>>>>>> types across
units if the CUs are all emitted
>>             in the same object file. (and if they weren't in the
same
>>             >>>>>>>>>>> object file -
then the abstract_origin
>>             couldn't be pointing cross-CU).
>>             >>>>>>>>>>> I guess some
basic things to say:
>>             >>>>>>>>>>> With ThinLTO,
the concrete/standalone
>>             function definition is emitted in case some call sites
>>             don't end up
>>             >>>>>>>>>>> being inlined.
So we know it'll be emitted
>>             (but might not be needed by the actual linker)
>>             >>>>>>>>>>> ANy number of
inline calls might exist - but
>>             we shouldn't put the type information into those,
because
>>             >>>>>>>>>>> they
aren't guaranteed to emit it (if the
>>             inline function gets optimized away, there would be
>>             nothing to
>>             >>>>>>>>>>> enforce the
type being emitted) - and even if
>>             we forced the type information to be emitted into one
>>             >>>>>>>>>>> object file
that has an inline copy of the
>>             function - there's no guarantee that object file will
get
>>             linked in either.
>>             >>>>>>>>>>> So, no, I
don't think there's much we can do
>>             to keep the size of object files down, while guaranteeing
>>             >>>>>>>>>>> the type
information will be emitted with the
>>             usual linker semantics.
>>             >>>>>>>>>> Then
dsymutil/DWARFLinker could be changed to
>>             handle that(though it would probably be not very
efficient).
>>             >>>>>>>>>> If thinlto would
understand that function is
>>             not used finally(and then must not contain referenced
>>             type definition),
>>             >>>>>>>>>> then this
situation could be handled more
>>             effectively.
>>             >>>>>>>>>>
>>             >>>>>>>>>> Thank you, Alexey.
>>             >>>>>>>>>>
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>>
>>             >>>>>>>>>>>>
_______________________________________________
>>             >>>>>>>>>>>> LLVM
Developers mailing list
>>             >>>>>>>>>>>> llvm-dev
at lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>
>>             >>>>>>>>>>>>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             >>>>>>>>>
_______________________________________________
>>             >>>>>>>>> LLVM Developers
mailing list
>>             >>>>>>>>> llvm-dev at
lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>
>>             >>>>>>>>>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             >>>
_______________________________________________
>>             >>> LLVM Developers mailing list
>>             >>> llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>
>>             >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200810/77480cc4/attachment-0001.html>

David Blaikie via llvm-dev

2020-Aug-10 17:34 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

On Mon, Aug 10, 2020 at 5:15 AM Alexey Lapshin <avl.lapshin at gmail.com>
wrote:>
> Hi Jonas,
>
> Thank you for the comments, please find my answers below...
>
> On 06.08.2020 20:39, Jonas Devlieghere wrote:
>
> Hi Alexey,
>
> I should've looked at this earlier. I went through the thread again and
I've
> made some comments, mostly from the dsymutil point of view.
>
> > Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> > generation, which does not support DWARF5(only debug_names table). At
the
> > same time, there already exists code in
CodeGen/AsmPrinter/DwarfDebug.h,
> > which implements most of DWARF5. It seems that
DWARFEmitter/DWARFStreamer
> > should be rewritten using DwarfDebug/DwarfFile. Though I am not sure
> > whether it would be easy to re-use DwarfDebug/DwarfFile. It would
probably
> > be necessary to separate some intermediate level of
DwarfDebug/DwarfFile.
>
> These classes serve very different purposes. Last time I looked at them
there
> was very little overlap in functionality. In the compiler we're mostly
> concerned with generating the DWARF, while in dsymutil we try to copy
> everything we don't need to parse, and fix up what we have to. I
don't want
> to say it's not possible, but I think supporting DWARF5 in those
classes is
> going to be a lot less work than trying to reuse the CodeGen variants.
>
> I agree, in it`s current state it would be less work to write separate
implementation
> than reusing CodeGen variants. The bad thing is that in such a case there
is a lot of
> code duplication:
>
> DwarfStreamer::emitUnitRangesEntries
> DwarfDebug::emitDebugARanges
> EmitGenDwarfAranges
> DWARFYAML::emitDebugAranges
Probably some opportunities to share some code, even if not the whole
generator - might be best to refactor those opportunistically, rather
than a wholesale "change DWARFLinker to use (all) of
lib/CodeGen/AsmPrinter/Dwarf*". Sort of like the approach that's been
taken with lldb's use of libDebugInfoDWARF - picking particular
features that have high overlap and refactoring them to be reusable
between the two different use cases.
> Supporting new standard would require rewriting/modification of all these
places. In the ideal world,
> having single implementation for the DWARF generation allows changing one
place and having
> benefits in others. Probably, CodeGen classes could be rewritten and then
it would be useful
> to write them assuming two use cases - generation from the scratch and
copying/updating
> existing data. In the end, there would be single implementation which could
be reused in
> many places. Though, it is indeed a lot of work.
>
>
> > Measurements show that it is spent ~10 sec in
> > llvm::StringMapImpl::LookupBucketFor(). The problem is that the same
> > strings, again and again, are added to the string pool. Two attributes
> > having the same string value would be analyzed (hash calculated) and
> > searched inside the string pool. Even if these strings are already in
> > string table(DW_FORM_strp, DW_FORM_strx). The process could be
optimized
> > for string tables. So that if some string from the string table were
> > accessed previously then, it would keep a reference into the string
pool.
> > This would eliminate a lot of string pool searches.
>
> I'm not sure I fully understand the optimization, but I'd love to
speed this
> up, if only for dsymutil's sake. I'd love to talk about this in a
separate
> thread or offline.
>
> The measurements show that quite a big time is taken
> by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a string
> pool takes a significant amount of time. The idea of optimization was to
> reduce the number of string pool searches by remembering previous
> results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself
> but reference a string from a separate table by index. Currently. if there
are
> duplicated strings of DW_FORM_strp, DW_FORM_strx there would be
> two/three/...(one per duplicate) searches in string pool
> (llvm::StringMapImpl::LookupBucketFor() would be called). If the position
> in the pool would be remembered for the index of the first duplicate
> then there would not be necessary to call
llvm::StringMapImpl::LookupBucketFor() next times.
>
> But prototyping of that idea did not show any worthful performance
improvement.
>
> Some small performance improvement could be achieved if string pools would
use
> llvm::hash_value(StringRef S) instead of llvm::djbHash().
>
>
> > Currently, all object files are analyzed sequentially and cloned
> > sequentially. Cloning is started in parallel with analyzing. That
scheme
> > could be changed: analyzing and cloning could be done in parallel for
each
> > object file. That requires refactoring of DWARFLinker and making
string
> > pools and DeclContextTree thread-safe.
>
> I'm less familiar with the way that LLD uses the DWARFOptimizer but
this is
> not possible for dsymutil as it is trying to deduplicate DIEs from
different
> compile units.
>
> Right. dsymutil is trying to de-duplicate DIEs from different
> compile units. That, probably, does not avoid multi-thread implementation:
>
> 1. DeclContextTree.getChildDeclContext() should be done thread safe.
>     thus, even if CU would be processed in parallel - DIEs could be
de-duplicated
>     based on DeclContext.
> 2. UniquingStringPool and OffsetsStringPool should also be done thread
safe.
> 3. Since compilation units would be processed in parallel -
>     the size of the compilation unit would not be known until it is fully
processed.
>     That means that all compilation unit's references should be patched
after
>     CU content is generated. In the same manner like forward references
>     are currently patched(fixupForwardReferences).
> 4. DWARFStreamer provides a sequential interface. Instead of a single
stream
>     as the output, there could be generated several outputs for each CU.
>     They would be glued together in the end.
>
>
> > I think improving dsymutil is a valuable thing. Though there are
several
> > directions which might be considered to make it more robust:
> >
> > 1. support of latest DWARF - DWARF5/DWARF64...
>
> Strong +1 on DWARF5. I haven't had the bandwidth yet to really look at
this.
> Right now we can't find (at least some) rellocations so we bail out.
I'd need
> to fix that to assess the current state of things and figure out how much
> work would be needed.
>
> I don't think anything in LLVM supports generating DWARF64 though.
>
> > 2. implement multi-threaded execution.
>
> See my earlier comment. At least for the dsymutil case, the current
approach
> is the best we can do, but I'd love to be proven wrong. :-)
>
> > 3. support of split DWARF.
> > 4. implement dsymutil for non-darwin platform.
>
> These two seem to go together. Given the work you did to split off the
DWARF
> optimization part I think we're closer to this than ever. Thanks again
for
> doing that.
>
> > We considered three options:
> >
> > 1. add new functionality into dsymutil. So that dsymutil behaves
> > differently on a non-darwin platform and supports another set of
> > command-line options.
> >
> > 2. add new functionality into llvm-objcopy. llvm-objcopy already
supports
> > various binary objects formats(MachO,ELF,COFF,wasm). It also has
several
> > options to work with debug-info.
> >
> > 3. create new utility llvm-dwarfutil which would implement the above
> > functionality and reuse DWARFLinker(extracted from dsymutil) library
and
> > new library ObjectCopy(extracted from llvm-objcopy).
> >
> > So far our preference is number three. The reason for this is that
separate
> > utility specifically working with debug info looks as good separation
of
> > concepts. Adding another behavior to dsymutil looks not very good.
>
> In its current state dsymutil itself is a pretty small tool on top of the
> DWARFOptimizer/Linker. I'm curious what the benefits of another tool
are
> compared to a different frontend (like objcopy) for MachO and ELF. It seems
> like that would allow for separation of concerns, while still being able to
> share common code without having to push it all the way up into LLVM.
>
> my concern is that this tool would have different source data and different
set of options.
> Having in mind that handling different set of input data and different set
of options
> means writing the other frontend - it, probably, would be good not to make
dsymutil more complex but
> to create another small tool. But, If extending dsymutil looks OK - I am OK
with it.
> Let`s discuss this approach within proposal thread.
>
>
>
> > Extending the already rich interface of llvm-objcopy looks also not
very
> > good. Having in mind that actual implementation would be shared by
> > libraries, the separate utility, working specifically with debug info,
> > looks like the right choice. That is our current idea.
>
> > My personal thought would be that extending dsymutil should be ok as
the
> > functionality goes well with everything else dsymutil does (other than
not
> > support ELF which the dsymutil maintainers are on board with last I
> > checked). That said, I definitely think a write-up will be helpful. No
> > matter what I support extracting all of the behavior into libraries
and
> > using that somewhere :)
>
> Ha, so basically what I was trying to say above.
>
> I look forward to seeing the proposal!
>
> yep, would publish it soon.
>
> Thank you, Alexey.
>
>
> Cheers,
> Jonas
>
>
> On Tue, Aug 4, 2020 at 11:33 PM Eric Christopher <echristo at
gmail.com> wrote:
>>
>> Hi Alexey,
>>
>>
>>
>> On Mon, Aug 3, 2020 at 8:32 AM Alexey Lapshin <avl.lapshin at
gmail.com> wrote:
>>>
>>> Hi Eric, please
>>>
>>> On 31.07.2020 22:02, Eric Christopher wrote:
>>>
>>> Hi Alexey,
>>>
>>> On Fri, Jul 31, 2020 at 4:02 AM Alexey Lapshin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>
>>>> On 28.07.2020 19:28, David Blaikie wrote:
>>>> > On Tue, Jul 28, 2020 at 8:55 AM Alexey Lapshin
<avl.lapshin at gmail.com> wrote:
>>>> >>
>>>> >> On 28.07.2020 10:29, David Blaikie via llvm-dev wrote:
>>>> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>>> >>> <alapshin at accesssoftek.com> wrote:
>>>> >>>>>>>>>>>> This idea goes
in another direction than fragmenting dwarf
>>>> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of fragmenting is too high.
>>>> >>>>>>>>>>> I tend to agree -
but I'm sort of leaning towards trying to use object
>>>> >>>>>>>>>>> features as much
as possible, then implementing just enough custom
>>>> >>>>>>>>>>> handling in the
linker to recoup overhead, etc. (eg: add some kind of
>>>> >>>>>>>>>>> small header/brief
description that makes it easy for the linker to
>>>> >>>>>>>>>>> slice-and-dice -
but hopefully a domain-specific such header can be a
>>>> >>>>>>>>>>> bit more compact
than the fully general ELF form)
>>>> >>>>>>>>>> I think this indeed
should be implemented and evaluated.
>>>> >>>>>>>>>> So that various
approaches could be compared.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> It is not only
the sizes of structures describing fragments but also the complexity
>>>> >>>>>>>>>>>> of tools that
should be taught to work with fragmented DWARF.
>>>> >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file should be able to read fragmented DWARF,
>>>> >>>>>>>>>>>> but applied to
linked executable it should work with non-fragmented DWARF).
>>>> >>>>>>>>>>>> That idea is
for the tool which works the same way as dsymutil ODR.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> The idea is to
have only one "type table" per object file(special section
.debug_types_table).
>>>> >>>>>>>>>>>> This
"type table" would contain all types.
>>>> >>>>>>>>>>>> There could be
a special type of reference - type_offset - that offset points into the type
table.
>>>> >>>>>>>>>>>> Basic types
could always be placed into the start of "type table" thus, offsets to
basic types
>>>> >>>>>>>>>>>> most often
would be 1 byte. There also would be a special kind of reference - reference
inside the type.
>>>> >>>>>>>>>>>> Type units
sig8 system - would not be used to reference types.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Types
deduplication is assumed to be done, not by linker mechanism for COMDAT,
>>>> >>>>>>>>>>>> but by a tool
like dsymutil. This tool would create resulting .debug_types_table by putting
there
>>>> >>>>>>>>>>>> types from
source .debug_types_table-s. Only one copy of the type would be placed into the
>>>> >>>>>>>>>>>> resulting
table. All references pointing to the deleted copy would be corrected to point
>>>> >>>>>>>>>>>> to the single
copy inside "type table". (that is how dsymutil works currently)
>>>> >>>>>>>>>>> ^ that's the
step that's probably a bit expensive for a general-use
>>>> >>>>>>>>>>> tool - it implies
parsing all the DWARF to find those references and
>>>> >>>>>>>>>>> rewrite them, I
think. For a high-performance solution that could be
>>>> >>>>>>>>>>> run by the linker
I think it'd be necessary to have a solution that
>>>> >>>>>>>>>>> doesn't
involve parsing all the DIEs.
>>>> >>>>>>>>>> According to the
current dsymutil processing,
>>>> >>>>>>>>>> exactly this process
is not the most time-consuming.
>>>> >>>>>>>>>> That could be done
relatively fast.
>>>> >>>>>>>>> Fair enough - though
I'd still imagine any solution that involves
>>>> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an order of
>>>> >>>>>>>>> magnitude faster than the
current solution even - but that's stuill,
>>>> >>>>>>>>> what, 6 or 7x slower than
linking without the feature?) for most users
>>>> >>>>>>>>> to consider it a good
trade-off.
>>>> >>>>>>>> It seems to me that even the
current 6x-7x slowdown could be useful.
>>>> >>>>>>>> Users who already use dsymutil
or llvm-dwp(assuming DWARFLinker
>>>> >>>>>>>> would be taught to work with a
split dwarf) tools spend this time and,
>>>> >>>>>>>> in some scenarios, waste disk
space by inter-mediate files.
>>>> >>>>>>> FWIW, dwp (llvm-dwp hasn't
really been optimized compared to binutils
>>>> >>>>>>> dwp) is designed to be very quick
- by not needing to do a lot of
>>>> >>>>>>> parsing/fixups. Which, yes, means
larger output files than would be
>>>> >>>>>>> possible with more parsing/etc. It
also doesn't take any input from
>>>> >>>>>>> the linker (so it can run in
parallel with the linker) - so it can't
>>>> >>>>>>> remove dead subprograms. Given
Google's the major (perhaps only
>>>> >>>>>>> significant?) user of Split DWARF
- I can say that the needs don't
>>>> >>>>>>> necessarily overlap well with
something that would take significantly
>>>> >>>>>>> longer to run or use significantly
more memory. Faster/cheaper/with
>>>> >>>>>>> somewhat bigger output files is
probably the right tradeoff for
>>>> >>>>>>> Google's use case, at least.
>>>> >>>>>>>
>>>> >>>>>>> I imagine Apple's use for
dsymutil is somewhat similar - it's not used
>>>> >>>>>>> in the iterative development
cycle, only in final releases - well,
>>>> >>>>>>> maybe their situation is more
"neutral" - not a major pain point in
>>>> >>>>>>> any case I'd guess.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>> I see. FWIW, Comparison splitdwarf+dwp
and DWARFLinker from lld:
>>>> >>>>>>
>>>> >>>>>> 1. split-dwarf+llvm-dwp = linking time
for clang 6 sec,
>>>> >>>>>>       generating time for .dwp 53 sec,
clang=997M clang.dwp=1.1G.
>>>> >>>>> FWIW, llvm-dwp is not very well optimized
(which is to say: it is not
>>>> >>>>> optimized), binutils dwp might be a better
comparison (& even that
>>>> >>>>> doesn't have the parallelism &
some potential further memory savings
>>>> >>>>> that lld has that we could take advantage
of in a dwp-like tool)
>>>> >>>>>
>>>> >>>>> What build mode was the clang binary built
in? Optimized or unoptimized?
>>>> >>>> right, that is unoptimized build with
-ffunction-sections.
>>>> >>>>
>>>> >>>>>> 2. DWARFLinker from lld = linking time
for clang 72 sec, clang=760M.
>>>> >>> And this is without Split DWARF? Without linker
DWARF compression? -
>>>> >>> that seems quite a bit surprising, that the
deduplication of DWARF
>>>> >>> could fit into less space than the
wasted/reclaimed space in ranges (&
>>>> >>> line)?
>>>> >> that was without split dwarf, without linker
compression.
>>>> >>
>>>> >>> Could you double check these numbers & provide
a clearer summary?
>>>> >> sure, I would re-check it.
>>>> >>
>>>> >>> Here's my attempt at numbers (all with
function-sections+gc-sections)...
>>>> >>>
>>>> >>> Split DWARF tests didn't seem meaningful -
gc-debuginfo + split DWARF
>>>> >>> seemed to drop all the debug info (except
gdb_index) so wasn't
>>>> >>> working/comparison wasn't meaningful for
Apples to Apples, but
>>>> >>> included it for comparing gc'd non-split to
non-gc'd split (disabled
>>>> >>> gnu-pubnames/gdb-index (-gsplit-dwarf
-gno-gnu-pubnames) (which turns
>>>> >>> on by default with Split DWARF because gdb needs
it - but a bit of an
>>>> >>> unfair comparison without turning on
gnu-pubnames/gdb-index in other
>>>> >>> build modes too, since it... /shouldn't/ be
necessary) which might've
>>>> >>> been a factor in the data you were looking at)
>>>> >> that might be the case. i.e. clang=997M for split
dwarf(from my previous
>>>> >> measurement) might include gnu-pubnames.
>>>> >>
>>>> >> would recheck it and if that is the case then it is a
unfair comparison.
>>>> >>
>>>> >>
>>>> >> My point was that "DWARFLinker from lld"
takes less space than singleton
>>>> >> split dwarf file+.dwp file.
>>>> >>
>>>> >> for -O0 uncompressed:
>>>> >>
>>>> >> - .dwp took 1.1G(if I built it correctly), singleton
clang(from your
>>>> >> measurements) 566 MB
>>>> >>
>>>> >>      overall 1.6G.
>>>> > Oh, yeah, even if there are some measurement issues,
linked executable
>>>> > + .dwp is going to be larger than a linked executable
using non-split
>>>> > DWARF (in v5), since v5 uses all the same representations
as non-split
>>>> > DWARF, and split DWARF adds the indirection overhead of a
split file,
>>>> > etc.
>>>> >
>>>> > Even without DWARF linking, it's true that split DWARF
has overhead
>>>> > (dwp+executable will be larger than executable non-split).
>>>> >
>>>> > But maybe we've ended up down a bit of a tangent in
any case.
>>>> >
>>>> > Trying to bring this back to "should this be
committed to lld" seems
>>>> > valuable, and I'm not sure what the right criteria are
for that.
>>>> I think it would be useful to do "removing obsolete debug
info"
>>>> in the linker. First thing is that it would be the fastest
way(no need
>>>> to copy data/create temp files/built address map...) Second
thing
>>>> is that it would be a good separation of concepts. All debug
info
>>>> processing, currently done in the linker(gdb_index, upcoming
>>>> debug_names), could be moved into separate library processing
>>>> debug info. When gdb_index/debug_names should be built without
>>>> "removing of obsolete debug info" it would have the
same
>>>> performance results as it currently has.
>>>>
>>>> We decided to give the idea of "removing of obsolete debug
info"
>>>> another try and are going to implement it as a separate utility
>>>> working with built binary. Making it to be multi-thread would
>>>> probably show better performance results and then it could
>>>> probably be considered as acceptable to use from the linker.
>>>>
>>>
>>> I'm quite interested in this direction. One thought I had was
to incorporate such a library into dsymutil but with support for ELF. If you get
a proposal written up I'd love to take a look and comment.
>>>
>>>
>>> yes, I would share the proposal in a separate thread within a week
or two.
>>>
>>
>> Excellent, thanks :)
>>
>>>
>>> Shortly: we decided to move in slightly other direction than adding
this functionality
>>> into dsymutil. Though if there is a preference to implement it as
part of dsymutil
>>> we are OK to do this way.
>>>
>>
>> I have a vague preference since a lot of functionality already exists
there on one platform and extending that seems straight forward, however...
>>
>>>
>>> In its first version, this new utility supposed to receive built
binary with debug info
>>> as input(with the new marking for references to removed code
sections -1/-2
>>> -https://reviews.llvm.org/D84825) and create a new binary with
removed obsolete
>>> debug info according to the above marking. In the next versions, it
could be extended
>>> with other debug info optimizations tasks. F.e. generation new
index tables, debug info
>>> optimizing... etc...
>>>
>>> We considered three options:
>>>
>>> 1. add new functionality into dsymutil. So that dsymutil behaves
differently
>>>     on a non-darwin platform and supports another set of
command-line options.
>>>
>>> 2. add new functionality into llvm-objcopy. llvm-objcopy already
supports various
>>>      binary objects formats(MachO,ELF,COFF,wasm). It also has
several options
>>>      to work with debug-info.
>>>
>>> 3. create new utility llvm-dwarfutil which would implement the
above functionality
>>>      and reuse DWARFLinker(extracted from dsymutil) library and new
library
>>>      ObjectCopy(extracted from llvm-objcopy).
>>>
>>> So far our preference is number three. The reason for this is that
separate
>>> utility specifically working with debug info looks as good
separation of concepts.
>>> Adding another behavior to dsymutil looks not very good. Extending
the already
>>> rich interface of llvm-objcopy looks also not very good. Having in
mind that actual
>>> implementation would be shared by libraries, the separate utility,
working specifically
>>> with debug info, looks like the right choice. That is our current
idea.
>>>
>>> I would publish the proposal shortly to discuss it.
>>>
>>>
>>
>> These are solid arguments - in particular, I agree with not extending
llvm-objcopy :)
>>
>> +Jonas Devlieghere and +Adrian Prantl for dsymutil comments.
>>
>> My personal thought would be that extending dsymutil should be ok as
the functionality goes well with everything else dsymutil does (other than not
support ELF which the dsymutil maintainers are on board with last I checked).
That said, I definitely think a write-up will be helpful. No matter what I
support extracting all of the behavior into libraries and using that somewhere
:)
>>
>> Thanks!
>>
>> -eric
>>
>>>
>>> Thank you, Alexey.
>>>
>>> Thanks!
>>>
>>> -eric
>>>
>>>>
>>>> Alexey.
>>>>
>>>> >
>>>> > Ray's the best person to weigh in on that. My 2c is
that I think it
>>>> > probably is worthwhile, even just as an experiment,
assuming it's not
>>>> > too intrusive to lld.
>>>> >
>>>> >> - The "DWARFLinker from lld" 820 MB(from
your measurements).
>>>> >>
>>>> >>
>>>> >> So "DWARFLinker from lld" looks two times
better.
>>>> >>
>>>> >>
>>>> >> Anyway, thank you for pointing me to possible mistake.
I would recheck
>>>> >> it and update results.
>>>> >>
>>>> >>
>>>> >> Alexey.
>>>> >>
>>>> >>
>>>> >>> * -O0: (baseline, just using strip -g: 356 MB)
>>>> >>>     * compressed: 25% smaller with gc-debuginfo
(481 MB / 641 MB) (407
>>>> >>> MB split/non-gc)
>>>> >>>     * uncompressed: 30% smaller (820 MB / 1.2 GB)
(566 MB split/non-gc)
>>>> >>> * -O3: (baseline: 116 MB)
>>>> >>>     * compressed: 16% smaller (361 MB / 462 MB)
(283 MB split/non-gc)
>>>> >>>     * uncompressed: 22% smaller (1022 MB / 1.2 GB)
(156 MB split/non-gc)
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin
>>>> >>> <alapshin at accesssoftek.com> wrote:
>>>> >>>>>>>>>>>> This idea goes
in another direction than fragmenting dwarf
>>>> >>>>>>>>>>>> using elf
sections&tricks. It seems to me that the cost of fragmenting is too high.
>>>> >>>>>>>>>>> I tend to agree -
but I'm sort of leaning towards trying to use object
>>>> >>>>>>>>>>> features as much
as possible, then implementing just enough custom
>>>> >>>>>>>>>>> handling in the
linker to recoup overhead, etc. (eg: add some kind of
>>>> >>>>>>>>>>> small header/brief
description that makes it easy for the linker to
>>>> >>>>>>>>>>> slice-and-dice -
but hopefully a domain-specific such header can be a
>>>> >>>>>>>>>>> bit more compact
than the fully general ELF form)
>>>> >>>>>>>>>> I think this indeed
should be implemented and evaluated.
>>>> >>>>>>>>>> So that various
approaches could be compared.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> It is not only
the sizes of structures describing fragments but also the complexity
>>>> >>>>>>>>>>>> of tools that
should be taught to work with fragmented DWARF.
>>>> >>>>>>>>>>>> (f.e.
llvm-dwarfdump applied to object file should be able to read fragmented DWARF,
>>>> >>>>>>>>>>>> but applied to
linked executable it should work with non-fragmented DWARF).
>>>> >>>>>>>>>>>> That idea is
for the tool which works the same way as dsymutil ODR.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I will shortly
describe the idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> The idea is to
have only one "type table" per object file(special section
.debug_types_table).
>>>> >>>>>>>>>>>> This
"type table" would contain all types.
>>>> >>>>>>>>>>>> There could be
a special type of reference - type_offset - that offset points into the type
table.
>>>> >>>>>>>>>>>> Basic types
could always be placed into the start of "type table" thus, offsets to
basic types
>>>> >>>>>>>>>>>> most often
would be 1 byte. There also would be a special kind of reference - reference
inside the type.
>>>> >>>>>>>>>>>> Type units
sig8 system - would not be used to reference types.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Types
deduplication is assumed to be done, not by linker mechanism for COMDAT,
>>>> >>>>>>>>>>>> but by a tool
like dsymutil. This tool would create resulting .debug_types_table by putting
there
>>>> >>>>>>>>>>>> types from
source .debug_types_table-s. Only one copy of the type would be placed into the
>>>> >>>>>>>>>>>> resulting
table. All references pointing to the deleted copy would be corrected to point
>>>> >>>>>>>>>>>> to the single
copy inside "type table". (that is how dsymutil works currently)
>>>> >>>>>>>>>>> ^ that's the
step that's probably a bit expensive for a general-use
>>>> >>>>>>>>>>> tool - it implies
parsing all the DWARF to find those references and
>>>> >>>>>>>>>>> rewrite them, I
think. For a high-performance solution that could be
>>>> >>>>>>>>>>> run by the linker
I think it'd be necessary to have a solution that
>>>> >>>>>>>>>>> doesn't
involve parsing all the DIEs.
>>>> >>>>>>>>>> According to the
current dsymutil processing,
>>>> >>>>>>>>>> exactly this process
is not the most time-consuming.
>>>> >>>>>>>>>> That could be done
relatively fast.
>>>> >>>>>>>>> Fair enough - though
I'd still imagine any solution that involves
>>>> >>>>>>>>> parsing all the DIEs still
wouldn't be fast enough (maybe an order of
>>>> >>>>>>>>> magnitude faster than the
current solution even - but that's stuill,
>>>> >>>>>>>>> what, 6 or 7x slower than
linking without the feature?) for most users
>>>> >>>>>>>>> to consider it a good
trade-off.
>>>> >>>>>>>> It seems to me that even the
current 6x-7x slowdown could be useful.
>>>> >>>>>>>> Users who already use dsymutil
or llvm-dwp(assuming DWARFLinker
>>>> >>>>>>>> would be taught to work with a
split dwarf) tools spend this time and,
>>>> >>>>>>>> in some scenarios, waste disk
space by inter-mediate files.
>>>> >>>>>>> FWIW, dwp (llvm-dwp hasn't
really been optimized compared to binutils
>>>> >>>>>>> dwp) is designed to be very quick
- by not needing to do a lot of
>>>> >>>>>>> parsing/fixups. Which, yes, means
larger output files than would be
>>>> >>>>>>> possible with more parsing/etc. It
also doesn't take any input from
>>>> >>>>>>> the linker (so it can run in
parallel with the linker) - so it can't
>>>> >>>>>>> remove dead subprograms. Given
Google's the major (perhaps only
>>>> >>>>>>> significant?) user of Split DWARF
- I can say that the needs don't
>>>> >>>>>>> necessarily overlap well with
something that would take significantly
>>>> >>>>>>> longer to run or use significantly
more memory. Faster/cheaper/with
>>>> >>>>>>> somewhat bigger output files is
probably the right tradeoff for
>>>> >>>>>>> Google's use case, at least.
>>>> >>>>>>>
>>>> >>>>>>> I imagine Apple's use for
dsymutil is somewhat similar - it's not used
>>>> >>>>>>> in the iterative development
cycle, only in final releases - well,
>>>> >>>>>>> maybe their situation is more
"neutral" - not a major pain point in
>>>> >>>>>>> any case I'd guess.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>> I see. FWIW, Comparison splitdwarf+dwp
and DWARFLinker from lld:
>>>> >>>>>>
>>>> >>>>>> 1. split-dwarf+llvm-dwp = linking time
for clang 6 sec,
>>>> >>>>>>       generating time for .dwp 53 sec,
clang=997M clang.dwp=1.1G.
>>>> >>>>> FWIW, llvm-dwp is not very well optimized
(which is to say: it is not
>>>> >>>>> optimized), binutils dwp might be a better
comparison (& even that
>>>> >>>>> doesn't have the parallelism &
some potential further memory savings
>>>> >>>>> that lld has that we could take advantage
of in a dwp-like tool)
>>>> >>>>>
>>>> >>>>> What build mode was the clang binary built
in? Optimized or unoptimized?
>>>> >>>> right, that is unoptimized build with
-ffunction-sections.
>>>> >>>>
>>>> >>>>>> 2. DWARFLinker from lld = linking time
for clang 72 sec, clang=760M.
>>>> >>>>> It does seem a tad strange that the clang
binary would be smaller
>>>> >>>>> non-split with DWARF linking than it was
split. Though I could imagine
>>>> >>>>> this might be possible in an optimized
build (wehre debug_ranges
>>>> >>>>> become quite relatively expensive in the
.o file contribution with
>>>> >>>>> Split DWARF)
>>>> >>>>> Could you compare the section sizes
between these two clang binaries, perhaps?
>>>> >>>> .debug_ranges is three times bigger and
.debug_line is twice bigger.
>>>> >>>>
>>>> >>>>>>>> Thus if they would use this
LLD feature in its current state
>>>> >>>>>>>> - they would still receive
benefits.
>>>> >>>>>>>>
>>>> >>>>>>>> Speaking of performance
results - LLD is a multi-thread linker;
>>>> >>>>>>>> it handles sections in
parallel. DWARFLinker generates DWARF using
>>>> >>>>>>>> AsmPrinter which is a stream -
so it could make resulting DWARF only
>>>> >>>>>>>> continuously. It is not
surprising that the parallel solution works faster.
>>>> >>>>>>>> Making DWARFLinker truly
multi-threaded would probably allow us
>>>> >>>>>>>> to make slowdown to be at
2x-4x range.
>>>> >>>>>>> *nod* that's still a really
expensive link - but I understand that's a
>>>> >>>>>>> suitable tradeoff for your users
>>>> >>>>>>>
>>>> >>>>>> Btw, 2x or 7x is for pure linking
time. Overall compilation slowdown
>>>> >>>>>> is not so significant. Building LLVM
codebase has only 20% slowdown.
>>>> >>>>> Understood - that's still quite
significant to most users, I'd imagine.
>>>> >>>> I see.
>>>> >>>>
>>>> >>>>>>>>>> Anyway, I think the
dsymutil approach is still valuable, and it
>>>> >>>>>>>>>> would be useful to
optimize it.
>>>> >>>>>>>>>> Do you think it would
be useful to make dsymutil/DWARFLinker truly multi-thread?
>>>> >>>>>>>>>> (To make
dsymutil/DWARFLinker able to process each object file in a separate thread)
>>>> >>>>>>>>> Perhaps - that I'd
probably leave up to the folks who are more
>>>> >>>>>>>>> invested in dsymutil
(Adrian Prantl et al). Maybe one day we'll get it
>>>> >>>>>>>>> integrated into llvm-dwp
and then I'll be interested in getting as
>>>> >>>>>>>>> much performance out of it
as lld - so multithreading and things would
>>>> >>>>>>>>> be on the books.
>>>> >>>>>>>> I think improving dsymutil is
a valuable thing.
>>>> >>>>>>>> Though there are several
directions which might be considered
>>>> >>>>>>>> to make it more robust:
>>>> >>>>>>>>
>>>> >>>>>>>> 1. support of latest DWARF -
DWARF5/DWARF64...
>>>> >>>>>>> I expect/though some of the Apple
folks had already worked on DWARF5 support?
>>>> >>>>>>> DWARF64 - that's been around
for a while, and just hasn't been needed
>>>> >>>>>>> by LLVM users thus far, it seems
(until recently - where some
>>>> >>>>>>> developers have started working on
that)
>>>> >>>>>> There already implemented debug_names
table, but debug_rnglists,
>>>> >>>>>> debug_loclists, type units - are not
implemented yet.
>>>> >>>>> Superficially, type units wouldn't be
on the list of features (like
>>>> >>>>> DWARF64 - it's optional) I'd try
to support in dsymutil - since their
>>>> >>>>> size overhead is more justified for a
DWARF-agnostic linker that's
>>>> >>>>> using comdat groups. With a DWARF-aware
linker I'd be specifically
>>>> >>>>> hoping to avoid using type units to help
>>>> >>>>>> The thing which
>>>> >>>>>> should probably be changed is that
dsymutil should not have its version
>>>> >>>>>> of code generating DWARF tables. It
should call already existed
>>>> >>>>>> DWARF5/DWARF64 implementations. Then
dsymutil would always
>>>> >>>>>> use last DWARF generators.
>>>> >>>>> Possibly - I don't know what the
architectural tradeoffs for that look
>>>> >>>>> like - I'd imagine DWARFLinker has
sufficiently different
>>>> >>>>> needs/tradeoffs than LLVM's DWARF
generation code (rewriting existing
>>>> >>>>> DIEs compared to building new ones from
scratch, etc) that it might be
>>>> >>>>> hard for them to share a lot of their
implementation.
>>>> >>>> It is not easy, and would require some
additions, but it would benefit
>>>> >>>> in that all format implementation is in one
place. Thus changing that place
>>>> >>>> would reflect in other places. There are at
least three implementations for
>>>> >>>> .debug_ranges, .debug_aranges currently...
>>>> >>>>
>>>> >>>>
>>>> >>>>>>>> 2. implement multi-threaded
execution.
>>>> >>>>>>>> 3. support of split DWARF.
>>>> >>>>>>> Maybe, though I'm still not
sure it'd be the right tradeoff -
>>>> >>>>>>> especially if it involved having
to wait to run the .dwo merger (call
>>>> >>>>>>> it DWARF-aware dwp, or dsymutil
with dwp support) until after the
>>>> >>>>>>> linker ran.
>>>> >>>>>>>
>>>> >>>>>>>> 4. implement dsymutil for
non-darwin platform.
>>>> >>>>>>> That's probably, essentially
(3), more-or-less. Split DWARF is
>>>> >>>>>>> somewhat of a formalization of
Apple's/MachO DWARF distribution model
>>>> >>>>>>> (leave DWARF it in files that
aren't linked/use them from a debugger,
>>>> >>>>>>> but also be able to merge them
into some final file (dsym or dwp) for
>>>> >>>>>>> archival purposes)
>>>> >>>>>>>
>>>> >>>>>>>> All of this is a massive piece
of work.
>>>> >>>>>>>> Our original investment was to
solve two problems:
>>>> >>>>>>>>
>>>> >>>>>>>> 1. Overlapped address ranges,
which is currently close to being solved. Thank you for helping with that!
>>>> >>>>>>> Yeah, again, sorry that's
taken quite so long/somewhat circuitous route.
>>>> >>>>>>>
>>>> >>>>>>>> 2. Size of debug info. That
still becomes an issue, but we are unsure whether we are ready to
>>>> >>>>>>>>      invest in solving all the
above 1-4 problems and how much community interested in it.
>>>> >>>>>>> Fair, for sure - I don't think
you'd need to sign up to solve all of
>>>> >>>>>>> them (don't think they
necessarily need solving). Potentially moving
>>>> >>>>>>> the logic out into a separate tool
as Fangrui's considering - a
>>>> >>>>>>> post-link DWARF optimizer, rather
than in-linker DWARF optimization.
>>>> >>>>>>>
>>>> >>>>>>> I really don't want to give
you the runaround like this - but multiple
>>>> >>>>>>> times slower links is something
that seems pretty problematic for most
>>>> >>>>>>> users, to the point of weighing
the maintainability of lld against the
>>>> >>>>>>> convenience of having this
functionality in-linker rather than in a
>>>> >>>>>>> post-link optimizer.
>>>> >>>>>>>
>>>> >>>>>>> (I know you've spoken a bit
before about your users needs - but if
>>>> >>>>>>> it's possible, could you
explain (again :/) why they have such a
>>>> >>>>>>> strong need for smaller DWARF?
While DWARF size is an ongoing concern
>>>> >>>>>>> for many users (Google certainly -
hence the invention of Split DWARF,
>>>> >>>>>>> use of type units and compressed
DWARF, etc) - usually it's in rather
>>>> >>>>>>> large programs, but it sounds like
you're dealing with relatively
>>>> >>>>>>> small ones (otherwise the increase
in link time, I'd imagine, would be
>>>> >>>>>>> prohibitive for your users?)?
>>>> >>>>>> We have many large programs and keep
Dayly/Nightly debug builds,
>>>> >>>>>> which takes a lot of disk space.
Compilation time for these programs is big.
>>>> >>>>>> The scenario is "compile
once".(not compile-debug-compile-debug).
>>>> >>>>>> So we think that solution(like
dsymutil/DWARFLinker) would not slowdown
>>>> >>>>>> the compilation time of overall build
significantly(see above numbers for
>>>> >>>>>> llvm codebase) and would allow us to
reduce disk space required to keep
>>>> >>>>>> all of these builds.
>>>> >>>>> Ah, OK - for archival purposes. So the
interactive developers wouldn't
>>>> >>>>> necessarily be using this feature. Makes
sense - similar to dsymutil
>>>> >>>>> and dwp, mostly used for archival purposes
& you can debug straight
>>>> >>>> >from .o/.dwos for interactive/iterative
development.
>>>> >>>>
>>>> >>>>> In that case, it seems more likely that a
separate tool might suffice.
>>>> >>>> agreed: if to continue the work on this then
it makes sense to
>>>> >>>> do it as separate tool. Make it fast enough.
And if there would be interest
>>>> >>>> in it - then it would probably be possible to
return to idea calling it from linker.
>>>> >>>>
>>>> >>>>> Also, out of curiosity - have you tried
just compressing the output
>>>> >>>>> (-gz (I think that does the right thing
for the linker level
>>>> >>>>> compression too, otherwise
-Wl,-compress-debug-sections might do it))
>>>> >>>>> or are you already doing that in addition?
>>>> >>>> sure. we use  -Wl,-compress-debug-sections.
>>>> >>>>
>>>> >>>> Thank you, Alexey.
>>>> >>>>
>>>> >>>>>>> You mentioned that the usability
cost of
>>>> >>>>>>> Split DWARF for your users was too
high (or high enough to justify
>>>> >>>>>>> this alternative work of
DWARF-aware linking)? That all seems a bit
>>>> >>>>>>> surprising to me - though I
understand the deployment issues of Split
>>>> >>>>>>> DWARF do present some challenges
to users in more heterogenous
>>>> >>>>>>> environments than Google's...
still, I'd have thought there was some
>>>> >>>>>>> hope there)
>>>> >>>>>> Our tools does not support split dwarf
yet. Though we plan to implement it.
>>>> >>>>>> When we would have support of split
dwarf then it would be
>>>> >>>>>> convenient to have easy way to share
built debug binaries. llvm-dwp is the
>>>> >>>>>> answer to this. DWARFLinker could
probably be another answer.
>>>> >>>>> Ah, fair enough - thanks for the context!
>>>> >>>>>>>>> One way to do that would
be to have a CU-local type indirection table.
>>>> >>>>>>>>> DIEs reference local type
numbers (like local address/string numbers -
>>>> >>>>>>>>> addrx/strx/rnglistx) and
that table contains either sig8 (no linker
>>>> >>>>>>>>> fixups required) or the
local type offsets you describe - the linker
>>>> >>>>>>>>> would then only need to
read this type number indirection table and
>>>> >>>>>>>>> rewrite them to the final
type numbers.
>>>> >>>>>>>> Yes, that could be
additionally done if this process would be time-consuming.
>>>> >>>>>>>>
>>>> >>>>>>>> David, thank you for all your
comments and explanations. They are extremely helpful.
>>>> >>>>>>> Sure thing - really appreciate
your patience with all this - it's... a
>>>> >>>>>>> lot of moving parts.
>>>> >>>>>>> - Dave
>>>> >>>>>>> Thank you, Alexey.
>>>> >>>>>>>
>>>> >>>>>>>> sig8 hash-id would be used to
compare types and to deduplicate them.
>>>> >>>>>>>> It would speed up the current
dsymutil context analysis.
>>>> >>>>>>>> Types having the same hash-id
could be deduplicated.
>>>> >>>>>>>> This would allow deduplicating
a more number of types than current dsymutil.
>>>> >>>>>>>> Incomplete type definitions
having a similar set of members are not deduplicated by dsymutil currently.
>>>> >>>>>>>> In this case they would have
the same hash-id.
>>>> >>>>>>>>
>>>> >>>>>>>> This "type table"
would take less space than current "type units" and current ODR
solution.
>>>> >>>>>>>>
>>>> >>>>>>>> Above is just an idea on how
to help DWARF-aware linker(based on idea removing obsolete debug info)
>>>> >>>>>>>> to work faster(if that is
interesting).
>>>> >>>>>>>>
>>>> >>>>>>>> Alexey.
>>>> >>>>>>>>
>>>> >>>>>>>>> From: llvm-dev
<llvm-dev-bounces at lists.llvm.org> On Behalf Of James Henderson via
llvm-dev
>>>> >>>>>>>>> Sent: Wednesday, June 3,
2020 3:48 AM
>>>> >>>>>>>>> To: David Blaikie
<dblaikie at gmail.com>
>>>> >>>>>>>>> Cc: llvm-dev at
lists.llvm.org
>>>> >>>>>>>>> Subject: Re: [llvm-dev]
[Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> It makes me sad that the
linker (via a library or otherwise) has to be "DWARF-aware" to be able
to effectively handle --gc-sections, COMDATs, --icf etc for debug info, without
leaving large blocks of data kicking around.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> The patching to -1 (or
equivalent) is probably a good lightweight solution (though I'd love it if
it could be done based on section type in the future rather than section name,
but that's probably outside the realm of DWARF), as it requires only minimal
understanding in the linker, but anything beyond that seems to be complicated
logic that is mostly due to the structure of DWARF. Patching to -1 does feel a
bit like a sticking plaster/band aid to patch over the issue rather than
properly solving it too - there will still be debug data (potentially
significant amounts in COMDAT-heavy objects) that the linker has to write and
the debugger has to somehow know how to skip (even if it knows that -1 is
special-case due to the standard being updated, it needs to get as far as the
-1), which is all wasted effort.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> We've already seen
from Alexey's prototyping, and from our own experiences with the Sony
proprietary linker (which tried to rewrite .debug_line only) that deconstructing
the DWARF so that it can be more optimally reassembled at link time is slow
going, and will probably inevitably be however much effort is put into
optimising it. For a start, given the current standards, it's impossible to
know how to deconstruct it without having to parse vast amounts of DWARF, which
is typically going to mean a lot more parsing work than the linker would
normally have to deal with. Additionally, much of this parsing work is wasted
effort, since it seems unlikely in many links that large amounts of the DWARF
will be redundant. Having an option to opt-in doesn't help much there, since
it just means the logic exists without most people using it, due to it not being
good enough, or potentially they don't even know it exists.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> I don't have
particularly concrete suggestions as to how to solve the structural problems
with DWARF at this point. The only thing that seems obvious to me is a more
"blessed" approach to fragmentation of sections, similar to what I
tried with my prototype mentioned earlier in the thread, although we'd need
to figure out the previously stated performance issues. Other ideas might tie
into this, like somehow sharing the various table headers a bit like CIEs in
.eh_frame that could be merged by the linker - each object could have separate
table header sections, which are referenced by the individual .debug_* blocks,
which in turn are one per function/data piece and easily discardable/merged by
the linker.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> Just some thoughts.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> James
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Tue, 2 Jun 2020 at
19:24, David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Tue, May 19, 2020 at
7:17 AM Alexey Lapshin
>>>> >>>>>>>>> <alapshin at
accesssoftek.com> wrote:
>>>> >>>>>>>>>> Hi David, please find
my comments inside:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>>> Broad
question: Do you have any specific motivation/users/etc in implementing this (if
you can speak about it)?
>>>> >>>>>>>>>>>>> - it might
help motivate the work, understand what tradeoffs might be suitable for you/your
users, etc.
>>>> >>>>>>>>>>>> There are two
general requirements:
>>>> >>>>>>>>>>>> 1) Remove (or
clean) invalid debug info.
>>>> >>>>>>>>>>> Perhaps a simpler
direct solution for your immediate needs might be a much narrower,
>>>> >>>>>>>>>>> and more efficient
linker-DWARF-awareness feature:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> With DWARFv5,
rnglists present an opportunity for a DWARF linker to rewrite the ranges
>>>> >>>>>>>>>>> without parsing
the rest of the DWARF. /technically/ this isn't guaranteed - rnglist entries
>>>> >>>>>>>>>>> can be referenced
either directly, or by index. If all rnglists are referenced by index, then
>>>> >>>>>>>>>>> a linker could
parse only the debug_rnglists section and rewrite ranges to remove any
>>>> >>>>>>>>>>> address ranges
that refer to optimized-out code.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> This would only be
correct for rnglists that had no direct references to them (that only were
>>>> >>>>>>>>>>> referenced via the
indexes) - but we could either implement it with that assumption, or could
>>>> >>>>>>>>>>> add an LLVM
extension attribute on the CU that would say "I promise I only referenced
rnglists
>>>> >>>>>>>>>>> via rnglistx
forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not
>>>> >>>>>>>>>>> all the other
DIEs) it /could/ also then rewrite high/low_pc if the CU wasn't using
ranges...
>>>> >>>>>>>>>>> but that
wouldn't come up in the function-removal case, because then you'd have
ranges anyway,
>>>> >>>>>>>>>>> so no need for
that.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Such a DWARF-aware
rnglist linking could also simplify rnglists, in cases where functions
>>>> >>>>>>>>>>> ended up being
laid out next to each other, the linker could coalesce their ranges together.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> I imagine this
could be implemented with very little overhead to linking, especially compared
>>>> >>>>>>>>>>> to the overhead of
full DWARF-aware linking.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Though none of
this fixes Split DWARF, where the linker doesn't get a chance to see the
>>>> >>>>>>>>>>> addresses being
used - but if you only want/need the CU-level ranges to be correct, this
>>>> >>>>>>>>>>> might be a viable
fix, and quite efficient.
>>>> >>>>>>>>>> Yes, we think about
that alternative. This would resolve our problem of invalid debug info
>>>> >>>>>>>>>> and would work much
faster. Thus, if we would not have good results for D74169 then we
>>>> >>>>>>>>>> will implement it. Do
you think it could be useful to have this solution in upstream?
>>>> >>>>>>>>> A pure rnglist rewriting -
I think it'd be OK to have in upstream -
>>>> >>>>>>>>> again, cost/benefit/etc
would have to be weighed. I'm not sure it
>>>> >>>>>>>>> would save enough space to
be particularly valuable beyond the
>>>> >>>>>>>>> correctness issue - and it
doesn't completely solve the correctness
>>>> >>>>>>>>> issue for zero-address
usage or low-address usage (because you could
>>>> >>>>>>>>> still have overlapping
subprograms inside a CU - so if you were
>>>> >>>>>>>>> symbolizing you could use
the correct rnglist to filter, but then go
>>>> >>>>>>>>> look inside the CU only to
find two subprograms that had that address
>>>> >>>>>>>>> & not know which one
was the correct one an which one was the
>>>> >>>>>>>>> discarded one).
>>>> >>>>>>>>>
>>>> >>>>>>>>> rnglist rewriting might be
easy enough to prototype - but depends what
>>>> >>>>>>>>> you want to spend your
time on, I know this whole issue has been a
>>>> >>>>>>>>> huge investment of your
time already - but maybe this recent
>>>> >>>>>>>>> revitalization of the
conversation around having an explicit value in
>>>> >>>>>>>>> the linker might be
sufficient to address everyone's needs... *fingers
>>>> >>>>>>>>> crossed*)
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>>>> 2) Optimize
the DWARF size.
>>>> >>>>>>>>>>> Do your users care
much about this? I imagine if they had significant DWARF size issues,
>>>> >>>>>>>>>>> they'd have
significant link time issues and the kind of cost to link time this feature has
would
>>>> >>>>>>>>>>> be prohibitive -
but perhaps they're sharing linked binaries much more often than they're
>>>> >>>>>>>>>>> actually
performing linking.
>>>> >>>>>>>>>> Yes, they do. They
also have significant link-time issues.
>>>> >>>>>>>>>> So current performance
results of D74169 are not very acceptable.
>>>> >>>>>>>>>> We hope to improve it.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> The specifics
which our users have:
>>>> >>>>>>>>>>>>    - embedded
platform which uses 0 as start of .text section.
>>>> >>>>>>>>>>>>    - custom
toolset which does not support all features yet(f.e. split dwarf).
>>>> >>>>>>>>>>>>    - tolerant
of the link-time increase.
>>>> >>>>>>>>>>>>    - need a
useful way to share debug builds.
>>>> >>>>>>>>>>> Sharing two files
(executable and dwp) is significantly less useful than sharing one file?
>>>> >>>>>>>>>> Probably not
significantly, but yes, it looks less useful comparing to D74169.
>>>> >>>>>>>>>> Having only two files
(executable and .dwp) looks significantly better than having executable and
multiple .dwo files.
>>>> >>>>>>>>>> Having only one
file(executable) with minimal size looks better than the two files with a bigger
size.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> clang compiled with
-gsplitdwarf takes 0.9G for executable and 0.9G for .dwp.
>>>> >>>>>>>>>> clang compiled with
-gc-debuginfo takes only 0.76G for single executable.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> For the first
point: we have a problem "Overlapping address ranges starting from
0"(D59553).
>>>> >>>>>>>>>>>> We use custom
solution, but the general solution like D74169 would be better here.
>>>> >>>>>>>>>>> If CU ranges are
the only ones that need fixing, then I think the above solution might be as
>>>> >>>>>>>>>>> good/better - if
more than CU ranges need fixing, then I think we might want to start talking
about
>>>> >>>>>>>>>>> how to fix DWARF
itself (split and non-split) to signal certain addresses point to dead code with
a
>>>> >>>>>>>>>>> specific blessed
value that linkers would need to implement - because with Split DWARF
there's
>>>> >>>>>>>>>>> no way to solve
the non-CU addresses at the linker.
>>>> >>>>>>>>>> I think the worthful
solution for that signal value would be LowPC > HighPC.
>>>> >>>>>>>>>> That does not require
additional bits in DWARF.
>>>> >>>>>>>>>> It would be natural to
skip such address ranges since they explicitly marked as invalid.
>>>> >>>>>>>>>> It could be
implemented in a linker very easily. Probably, it would make sense to describe
that
>>>> >>>>>>>>>> usage in DWARF
standard.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> As to the addresses
which are not seen by the linker(since they are in .dwo files) - yes,
>>>> >>>>>>>>>> they need to have
another solution. Could you show an example of such a case, please?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>>> 2. Support
of type units.
>>>> >>>>>>>>>>>>>>   
That could be implemented further.
>>>> >>>>>>>>>>>>> Enabling
type units increases object size to make it easier to deduplicate at link time
by a DWARF-unaware
>>>> >>>>>>>>>>>>> linker.
With a DWARF aware linker it'd be generally desirable not to have to add
that object size overhead to
>>>> >>>>>>>>>>>>> get the
linking improvements.
>>>> >>>>>>>>>>>> But,
DWARFLinker should adequately work with type units since they are already
implemented.
>>>> >>>>>>>>>>> Maybe - it'd
be nice & all, but I don't think it's an outright necessity - if
someone knows they're using
>>>> >>>>>>>>>>> a DWARF-aware
linker, they'd probably not use type units in their object files. It's
possible someone
>>>> >>>>>>>>>>> doesn't know
for sure & maybe they have pre-canned debug object files from someone else,
etc.
>>>> >>>>>>>>>> I see.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> Another thing
is that the idea behind type units has the potential to help Dwarf-aware linker
to work faster.
>>>> >>>>>>>>>>>> Currently,
DWARFLinker analyzes context to understand whether types are the same or not.
>>>> >>>>>>>>>>> When you say
"analyzes context" what do you mean? Usually I'd take that to mean
>>>> >>>>>>>>>>> "looks at
things outside the type itself - like what namespace it's in, etc" -
which, yes,
>>>> >>>>>>>>>>> it should do that,
but it doesn't seem very expensive to do. But I guess you actually
>>>> >>>>>>>>>>> mean something
about doing structural equivalence in some way, looking at things inside the
type?
>>>> >>>>>>>>>> I think it could be
useful for both cases. Currently, dsymutil does only first thing
>>>> >>>>>>>>>> (look at type name,
namespace name, etc..) and does not do the second thing
>>>> >>>>>>>>>> (doing structural
equivalence). Analyzing type names is currently quite expensive
>>>> >>>>>>>>>> (the only search in
string pool takes ~10 sec from 70 sec of overall time).
>>>> >>>>>>>>>> That is expensive
because of many things should be done to work with strings:
>>>> >>>>>>>>>> parse DWARF, search
and resolve relocations, compute a hash for strings,
>>>> >>>>>>>>>> put data into a string
pool, create a fully qualified name(like namespace::function::name).
>>>> >>>>>>>>>> It looks like it could
be optimized and finally require less time, but it still would be a noticeable
>>>> >>>>>>>>>> part of the overall
time.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> If dsymutil starts to
check for the structural equivalence, then the process would be even more
slowly.
>>>> >>>>>>>>>> So, If instead of
comparing types structure, there would be checked single hash-id - then this
process
>>>> >>>>>>>>>> would also be faster.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Thus I think using
hash-id to compare types would allow to make current implementation faster and
would
>>>> >>>>>>>>>> allow handling
incomplete types by DWARFLinker without massive performance degradation also.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> But the
context is known when types are generated. So, no need to spent the time
analyzing it.
>>>> >>>>>>>>>>>> If types could
be compared without analyzing context, then Dwarf-aware linker would work
faster.
>>>> >>>>>>>>>>>> That is just
an idea(not for immediate implementation): If types would be stored in some
"type table"
>>>> >>>>>>>>>>>> (instead of
COMDAT section group) and could be accessed through hash-id(like type units
>>>> >>>>>>>>>>>> - then it
would be the solution requiring fewer bits to store but allowing to compare
types
>>>> >>>>>>>>>>>> by hash-id(not
analysing context).
>>>> >>>>>>>>>>>> In this case,
size increasing would be small. And processing time could be done faster.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> this is just
an idea and could be discussed separately from the problem of integrating of
D74169.
>>>> >>>>>>>>>>>>>> 6.
-flto=thin
>>>> >>>>>>>>>>>>>>     
That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in
>>>> >>>>>>>>>>>>>>
current DWARFLinker/dsymutil implementation. I think that problem should be
discussed more: it could
>>>> >>>>>>>>>>>>>>
probably be fixed by avoiding generation of such incomplete declaration during
thinlto,
>>>> >>>>>>>>>>>>>> That
would be costly to produce extra/redundant debug info in ThinLTO - actually
ThinLTO could be doing
>>>> >>>>>>>>>>>>>> more
to reduce that redundancy early on (actually removing definitions from some llvm
Modules if the type
>>>> >>>>>>>>>>>>>>
definition is known to exist in another Module, etc)
>>>> >>>>>>>>>>>>> I
don't know if it's a problem since that patch was reverted.
>>>> >>>>>>>>>>>> Yes. That
patch was reverted, but this patch(D74169) has the same problem.
>>>> >>>>>>>>>>>> if D74169
would be applied and --gc-debuginfo used then structure type
>>>> >>>>>>>>>>>> definition
would be removed.
>>>> >>>>>>>>>>>> DWARFLinker
could handle that case - "removing definitions from some llvm Modules if
the type
>>>> >>>>>>>>>>>> definition is
known to exist in another Module".
>>>> >>>>>>>>>>>> i.e.
DWARFLinker could replace the declaration with the definition.
>>>> >>>>>>>>>>>> But that
problem could be more easily resolved when debug info is generated(probably
without
>>>> >>>>>>>>>>>> significant
increase of debug info size):
>>>> >>>>>>>>>>>> Here we have:
>>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for
function "f".
>>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root
for function "f".
>>>> >>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f"
definition.
>>>> >>>>>>>>>>>> Code for
function "f" was deleted. gc-debuginfo deletes compile unit
DW_TAG_compile_unit(0x000000c1)
>>>> >>>>>>>>>>>> containing
"f" definition (since there is no corresponding code). But it has
structure "Foo" definition
>>>> >>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
>>>> >>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when
definition
>>>> >>>>>>>>>>>> was removed by
thinlto and replaced with declaration.
>>>> >>>>>>>>>>>> Would it cost
too much if type definition would not be replaced with declaration for
"abstract instance root"?
>>>> >>>>>>>>>>>> The number of
concrete instances is bigger than number of abstract instance roots.
>>>> >>>>>>>>>>>> Probably, it
would not be too costly to leave definition in abstract instance root?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>> Alternatively,
Would it cost too much if type definition would not be replaced with declaration
when
>>>> >>>>>>>>>>>> declaration
references type from not used function? (lto could understand that concrete
function is not used).
>>>> >>>>>>>>>>> I don't follow
this example - could you provide a small concrete test case I could reproduce?
>>>> >>>>>>>>>> I would provide a test
case if necessary. But it looks like this issue is finally clear, and you
already commented on that.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> Oh, I guess this
is happening perhaps because ThinLTO can't know for sure that a standalone
>>>> >>>>>>>>>>> definition of
'f' won't be needed - so it produces one in case one of the inlining
opportunities
>>>> >>>>>>>>>>> doesn't end up
inlining. Then it turns out all calls got inlined, so the external definition
wasn't needed.
>>>> >>>>>>>>>>> Oh, you're
suggesting that these 3 CUs got emitted into one object file during LTO, but
that DWARFLinker
>>>> >>>>>>>>>>> drops a CU without
any code in it - even though... So far as I know, in LTO, LLVM directly
references
>>>> >>>>>>>>>>> types across units
if the CUs are all emitted in the same object file. (and if they weren't in
the same
>>>> >>>>>>>>>>> object file - then
the abstract_origin couldn't be pointing cross-CU).
>>>> >>>>>>>>>>> I guess some basic
things to say:
>>>> >>>>>>>>>>> With ThinLTO, the
concrete/standalone function definition is emitted in case some call sites
don't end up
>>>> >>>>>>>>>>> being inlined. So
we know it'll be emitted (but might not be needed by the actual linker)
>>>> >>>>>>>>>>> ANy number of
inline calls might exist - but we shouldn't put the type information into
those, because
>>>> >>>>>>>>>>> they aren't
guaranteed to emit it (if the inline function gets optimized away, there would
be nothing to
>>>> >>>>>>>>>>> enforce the type
being emitted) - and even if we forced the type information to be emitted into
one
>>>> >>>>>>>>>>> object file that
has an inline copy of the function - there's no guarantee that object file
will get linked in either.
>>>> >>>>>>>>>>> So, no, I
don't think there's much we can do to keep the size of object files
down, while guaranteeing
>>>> >>>>>>>>>>> the type
information will be emitted with the usual linker semantics.
>>>> >>>>>>>>>> Then
dsymutil/DWARFLinker could be changed to handle that(though it would probably be
not very efficient).
>>>> >>>>>>>>>> If thinlto would
understand that function is not used finally(and then must not contain
referenced type definition),
>>>> >>>>>>>>>> then this situation
could be handled more effectively.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Thank you, Alexey.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
_______________________________________________
>>>> >>>>>>>>>>>> LLVM
Developers mailing list
>>>> >>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>> >>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>>>>>>>>
_______________________________________________
>>>> >>>>>>>>> LLVM Developers mailing
list
>>>> >>>>>>>>> llvm-dev at lists.llvm.org
>>>> >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>> _______________________________________________
>>>> >>> LLVM Developers mailing list
>>>> >>> llvm-dev at lists.llvm.org
>>>> >>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Possibly Parallel Threads