thr3ads.net - llvm dev - [llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF. [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Alexey via llvm-dev

2020-Sep-03 12:15 UTC

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On 03.09.2020 01:36, David Blaikie wrote:>
>
> On Wed, Sep 2, 2020 at 3:26 PM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 02.09.2020 21:44, David Blaikie wrote:
>>
>>
>>     On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at gmail.com
>>     <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>
>>         On 01.09.2020 20:07, David Blaikie wrote:
>>>         Fair enough - thanks for clarifying the differences!
(I'd
>>>         still lean a bit towards this being dwz-esque, as you say
>>>         "an extension of classic dwz"
>>         I doubt a little about "llvm-dwz" since it might
confuse
>>         people who would expect exactly the same behavior.
>>         But if we think of it as "an extension of classic
dwz" and
>>         the possible confusion is not a big deal then
>>         I would be fine with "llvm-dwz".
>>>         using a bit more domain knowledge (of terminators and C++
>>>         odr - though I'm not sure dsymutil does rely on the
ODR,
>>>         does it? It relies on it to know that two names represent
>>>         the same type, I suppose, but doesn't assume
they're already
>>>         identical, instead it merges their members))
>>
>>         if dsymutil is able to find a full definition then it would
>>         remove all other definitions(which matched by name) and set
>>         all references to that found definition. If it is not able to
>>         find a full definition then it would do nothing. i.e. if
>>         there are two incomplete definitions(DW_AT_declaration
>>         (true)) with the same name then they would not be merged.
>>         That is a possible improvement - to teach dsymutil to merge
>>         incomplete types.
>>
>>     Huh, what does it do with extra member function definitions found
>>     in later definitions? (eg: struct x { template<typename T>
void
>>     f(); }; - in one translation unit x::f<int> is instantiated,
in
>>     another x::f<float> is instantiated - how are the two
represented
>>     with dsymutil?)
>
>     They would be considered as two not matched types. dsymutil would
>     not merge them somehow and thus would not use single type
>     description. There would be two separate types called "x"
which
>     would have mostly matched members but differ with x::f<int> and
>     x::f<float>. No any de-duplication in that case.
>
> Oh, that's unfortunate. It'd be nice for C++ at least, to implement
a
> potentially faster dsymutil mode that could get this right and not 
> have to actually check for type equivalence, instead relying on the 
> name of the type to determine that it must be identical.
Right. That would result in even more size reduction.
>
> The first instance of the type that's encountered has its fully 
> qualified name or mangled name recorded in a map pointing to the DIE. 
> Any future instance gets downgraded to a declaration, and /certain/ 
> members get dropped, but other members get stuck on the declaration 
> (same sort of DWARF you see with "struct foo { virtual void f1(); 
> template<typename T> void f2() { } }; void test(foo& f) {
f.f2<int>();
> }"). Recording all the member functions of the type/static member 
> variable types might be needed in cases where some member functions 
> are defined in one translation unit and some defined in another - 
> though I guess that infrastructure is already in place/that just works 
> today.My understanding, is that there is not such infrastructure currently. 
Current infrastructure allows to reference single existing type 
declaration(canonical) from other units. It does not allow to reference 
different parts(in different units) of incomplete type.

I think it would be necessary to change the order of how compilation 
units are processed to implement such types merging. Currently, after 
the compilation unit is analyzed(scanned for types and dead info) it 
started to be emitted.
It looks like, to support merging, it would be necessary to analyze all 
CUs first(to create canonical representation) and then start to emit them.

I am going to start to work on a prototype of parallel per-compilation 
unit implementation of DWARFLinker.
(basing on the scenario which Jonas described in other letter in that 
thread).
The types merging could be the next step...

Following is the result of compilation of your example on darwin(showing 
that dsymutil does not merge such types):

$ cat struct.h

#ifndef MY_H
#define MY_H

struct foo {
   template <class T> int fff () { return sizeof(T); }
};

#endif // MY_H

$ cat mod1.cpp

#include "struct.h"
int test1 ( ) {
   foo var;
   return var.fff<int>();
}

$ cat mod2.cpp

#include "struct.h"
int test2 ( ) {
   foo var;
   return var.fff<float>();
}

$ cat main.cpp

#include "struct.h"
int test1();
int test2();
int main ( void ) {
   test1();
   test2();
   return 0;
}

$ clang++ main.cpp mod1.cpp mod2.cpp -O -g -fno-inline

$ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | less

0x00000056: DW_TAG_compile_unit

               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("mod1.cpp")

0x000000ae:   DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

                 DW_AT_name      ("foo")
                 DW_AT_byte_size (0x01)

0x000000b7:     DW_TAG_subprogram

                   DW_AT_linkage_name    ("_ZN3foo3fffIiEEiv")
                   DW_AT_name    ("fff<int>")


0x0000011f: DW_TAG_compile_unit

               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("mod2.cpp")

0x00000177:   DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

                 DW_AT_name      ("foo")
                 DW_AT_byte_size (0x01)

0x00000180:     DW_TAG_subprogram

                   DW_AT_linkage_name    ("_ZN3foo3fffIfEEiv")
                   DW_AT_name    ("fff<float>")

>
> - Dave
>
>
>>         Alexey.
>>
>>>
>>>         But I don't have super strong feelings about the
naming.
>>>
>>>         On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at
gmail.com
>>>         <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>
>>>             On 01.09.2020 06:27, David Blaikie wrote:
>>>>             A quick note: The feature as currently proposed
sounds
>>>>             like it's an exact match for 'dwz'? Is
there any
>>>>             benefit to this over the existing dwz project? Is
it
>>>>             different in some ways I'm not aware of? (I
haven't
>>>>             actually used dwz, so I might have some mistaken
ideas
>>>>             about how it should work)
>>>>
>>>>             If it's going to solve the same general
problem, but be
>>>>             in the llvm project instead, then maybe it should
be
>>>>             called llvm-dwz.
>>>             It looks like dwz and llvm-dwarfutil are not exactly
>>>             matched in functionality.
>>>
>>>             dwz is a  program that attempts to optimize DWARF
>>>             debugging information
>>>             contained in ELF shared libraries and ELF executables
>>>             for *size*.
>>>
>>>             llvm-dwarfutil is a tool that is used for processing
debug
>>>             info(DWARF) located in built binary files to improve
>>>             debug info *quality*,
>>>             reduce debug info *size* and accelerate debug info
>>>             *processing*.
>>>
>>>             Things which are supposed to be done by llvm-dwarfutil
>>>             and which are not
>>>             done by dwz: removing obsolete debug info, building
>>>             indexes, stripping
>>>             unneeded debug sections, compress/decompress debug
sections.
>>>
>>>             Common thing is that both of these tools do debug info
>>>             size reduction.
>>>             But they do this using different approaches:
>>>
>>>             1. dwz reduces the size of debug info by creating
>>>             partial compilation units
>>>                 for duplicated parts. So that these partial
>>>             compilation units could be imported
>>>                 in every duplicated place. AFAIU, That optimization
>>>             gives the most size saving effect.
>>>
>>>                another size saving optimization is ODR types
>>>             deduplication.
>>>
>>>             2. llvm-dwarfutil reduces the size of debug info by ODR
>>>             types deduplication
>>>                which gives the most size saving effect in
>>>             llvm-dwarfutil case.
>>>
>>>                another size saving optimization is removing
obsolete
>>>             debug info.
>>>                (which actually is not only about size but about
>>>             correctness also)
>>>
>>>             So, it looks like these tools are not equal. If we
would
>>>             consider that
>>>             llvm-dwz is an extension of classic dwz then we could
>>>             probably
>>>             name it as llvm-dwz.
>>>
>>>>
>>>>             Though I understand the desire for this to grow
other
>>>>             functionality, like DWARF-aware dwp-ing. Might be
>>>>             better for this to busybox and provide that
>>>>             functionality under llvm-dwp instead, or more
likely I
>>>>             Suspect, that the existing llvm-dwp will be
rewritten
>>>>             (probably by me) to use more of lld's
infrastructure to
>>>>             be more efficient (it's current object
reading/writing
>>>>             logic is using LLVM's libObject and MCStreamer,
which
>>>>             is a bit inefficient for a very content-unaware
linking
>>>>             process) and then maybe that could be taught to use
>>>>             DwarfLinker as a library to optionally do
DWARF-aware
>>>>             linking depending on the users time/space tradeoff
>>>>             desires. Still benefiting from any improvements to
the
>>>>             underlying DwarfLinker library (at which point that
>>>>             would be shared between llvm-dsymutil, llvm-dwz,
and
>>>>             llvm-dwp).
>>>>
>>>>             On Tue, Aug 25, 2020 at 7:29 AM Alexey
>>>>             <avl.lapshin at gmail.com <mailto:avl.lapshin
at gmail.com>>
>>>>             wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                    We propose llvm-dwarfutil - a dsymutil-like
tool
>>>>                 for ELF.
>>>>                    Any thoughts on this?
>>>>                    Thanks in advance, Alexey.
>>>>
>>>>                
=====================================================================>>>>
>>>>                 llvm-dwarfutil(Apndx A) - is a tool that is
used
>>>>                 for processing debug
>>>>                 info(DWARF)
>>>>                 located in built binary files to improve debug
info
>>>>                 quality,
>>>>                 reduce debug info size and accelerate debug
info
>>>>                 processing.
>>>>                 Supported object files formats: ELF,
MachO(Apndx
>>>>                 B), COFF(Apndx C),
>>>>                 WASM(Apndx C).
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Specifically, the tool would do:
>>>>
>>>>                    - Remove obsolete debug info which refers to
>>>>                 code deleted by the linker
>>>>                      doing the garbage collection
(gc-sections).
>>>>
>>>>                    - Deduplicate debug type definitions for
>>>>                 reducing resulting size of
>>>>                 binary.
>>>>
>>>>                    - Build accelerator/index tables.
>>>>                      = .debug_aranges, .debug_names,
.gdb_index,
>>>>                 .debug_pubnames,
>>>>                 .debug_pubtypes.
>>>>
>>>>                    - Strip unneeded tables.
>>>>                      = .debug_aranges, .debug_names,
.gdb_index,
>>>>                 .debug_pubnames,
>>>>                 .debug_pubtypes.
>>>>
>>>>                    - Compress or decompress debug info as
requested.
>>>>
>>>>                 Possible feature:
>>>>
>>>>                    - Join split dwarf .dwo files in a single
file
>>>>                 containing all debug info
>>>>                      (convert split DWARF into monolithic
DWARF).
>>>>
>>>>                
=====================================================================>>>>
>>>>                 User interface:
>>>>
>>>>                    OVERVIEW: A tool for optimizing debug info
>>>>                 located in the built binary.
>>>>
>>>>                    USAGE: llvm-dwarfutil [options] input output
>>>>
>>>>                    OPTIONS: (Apndx E)
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Implementation notes:
>>>>
>>>>                 1. Removing obsolete debug info would be done
using
>>>>                 DWARFLinker llvm
>>>>                 library.
>>>>
>>>>                 2. Data types deduplication would be done using
>>>>                 DWARFLinker llvm library.
>>>>
>>>>                 3. Accelerator/index tables would be generated
>>>>                 using DWARFLinker llvm
>>>>                 library.
>>>>
>>>>                 4. Interface of DWARFLinker library would be
>>>>                 changed in such way that it
>>>>                     would be possible to switch on/off various
stages:
>>>>
>>>>                    class DWARFLinker {
>>>>                      setDoRemoveObsoleteInfo ( bool
>>>>                 DoRemoveObsoleteInfo = false);
>>>>
>>>>                      setDoAppleNames ( bool DoAppleNames =
false );
>>>>                      setDoAppleNamespaces ( bool
DoAppleNamespaces
>>>>                 = false );
>>>>                      setDoAppleTypes ( bool DoAppleTypes =
false );
>>>>                      setDoObjC ( bool DoObjC = false );
>>>>                      setDoDebugPubNames ( bool DoDebugPubNames
>>>>                 false );
>>>>                      setDoDebugPubTypes ( bool DoDebugPubTypes
>>>>                 false );
>>>>
>>>>                      setDoDebugNames (bool DoDebugNames =
false);
>>>>                      setDoGDBIndex (bool DoGDBIndex = false);
>>>>                    }
>>>>
>>>>                 5. Copying source file contents, stripping
tables,
>>>>                 compressing/decompressing tables
>>>>                     would be done by ObjCopy llvm
library(extracted
>>>>                 from llvm-objcopy):
>>>>
>>>>                    Error executeObjcopyOnBinary(const
CopyConfig
>>>>                 &Config,
>>>>                 object::COFFObjectFile &In, Buffer
&Out);
>>>>                    Error executeObjcopyOnBinary(const
CopyConfig
>>>>                 &Config,
>>>>                 object::ELFObjectFileBase &In, Buffer
&Out);
>>>>                    Error executeObjcopyOnBinary(const
CopyConfig
>>>>                 &Config,
>>>>                 object::MachOObjectFile &In, Buffer
&Out);
>>>>                    Error executeObjcopyOnBinary(const
CopyConfig
>>>>                 &Config,
>>>>                 object::WasmObjectFile &In, Buffer
&Out);
>>>>
>>>>                 6. Address ranges and single addresses pointing
to
>>>>                 removed code should
>>>>                 be marked
>>>>                     with tombstone value in the input file:
>>>>
>>>>                     -2 for .debug_ranges and .debug_loc.
>>>>                     -1 for other .debug* tables.
>>>>
>>>>                 7. Prototype implementation -
>>>>                 https://reviews.llvm.org/D86539.
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Roadmap:
>>>>
>>>>                 1. Refactor llvm-objcopy to extract it`s
>>>>                 implementation into separate
>>>>                 library
>>>>                     ObjCopy(in LLVM tree).
>>>>
>>>>                 2. Create a command line utility using existed
>>>>                 DWARFLinker and ObjCopy
>>>>                     implementation. First version is supposed
to
>>>>                 work with only ELF
>>>>                 input object files.
>>>>                     It would take input ELF file with
unoptimized
>>>>                 debug info and create
>>>>                 output
>>>>                     ELF file with optimized debug info. That
>>>>                 version would be done out
>>>>                 of the llvm tree.
>>>>
>>>>                 3. Make a tool to be able to work in
multi-thread mode.
>>>>
>>>>                 4. Consider it to be included into LLVM tree.
>>>>
>>>>                 5. Support DWARF5 tables.
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Appendix A. Should this tool be implemented as
a
>>>>                 new tool or as an extension
>>>>                              to dsymutil/llvm-objcopy?
>>>>
>>>>                     There already exists a tool which removes
>>>>                 obsolete debug info on
>>>>                 darwin - dsymutil.
>>>>                     Why create another tool instead of
extending
>>>>                 the already existed
>>>>                 dsymutil/llvm-objcopy?
>>>>
>>>>                     The main functionality of dsymutil is
located
>>>>                 in a separate library
>>>>                 - DWARFLinker.
>>>>                     Thus, dsymutil utility is a command-line
>>>>                 interface for DWARFLinker.
>>>>                 dsymutil has
>>>>                     another type of input/output data: it takes
>>>>                 several object files and
>>>>                 address map
>>>>                     as input and creates a .dSYM bundle with
linked
>>>>                 debug info as
>>>>                 output. llvm-dwarfutil
>>>>                     would take a built executable as input and
>>>>                 create an optimized
>>>>                 executable as output.
>>>>                     Additionally, there would be many
command-line
>>>>                 options specific for
>>>>                 only one utility.
>>>>                     This means that these
utilities(implementing
>>>>                 command line interface)
>>>>                 would significantly
>>>>                     differ. It makes sense not to put another
>>>>                 command-line utility
>>>>                 inside existing dsymutil,
>>>>                     but make it as a separate utility. That is
the
>>>>                 reason why
>>>>                 llvm-dwarfutil suggested to be
>>>>                     implemented not as sub-part of dsymutil but
as
>>>>                 a separate tool.
>>>>
>>>>                     Please share your preference: whether
>>>>                 llvm-dwarfutil should be
>>>>                     separate utility, or a variant of dsymutil
>>>>                 compiled for ELF?
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Appendix B. The machO object file format is
already
>>>>                 supported by dsymutil.
>>>>                     Depending on the decision whether
>>>>                 llvm-dwarfutil would be done as a
>>>>                 subproject
>>>>                     of dsymutil or as a separate utility -
machO
>>>>                 would be supported or not.
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Appendix C. Support for the COFF and WASM
object
>>>>                 file formats presented as
>>>>                      possible future improvement. It would be
quite
>>>>                 easy to add them
>>>>                 assuming
>>>>                      that llvm-objcopy already supports these
>>>>                 formats. It also would require
>>>>                      supporting DWARF6-suggested tombstone
>>>>                 values(-1/-2).
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Appendix D. Documentation.
>>>>
>>>>                    - proposal for DWARF6 which suggested -1/-2
>>>>                 values for marking bad
>>>>                 addresses
>>>>                
http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>>>                    - dsymutil tool
>>>>                
https://llvm.org/docs/CommandGuide/dsymutil.html.
>>>>                    - proposal "Remove obsolete debug info
in lld."
>>>>                
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>>
>>>>                
=====================================================================>>>>
>>>>                 Appendix E. Possible command line options:
>>>>
>>>>                 DwarfUtil Options:
>>>>
>>>>                    --build-aranges           - generate
>>>>                 .debug_aranges table.
>>>>                    --build-debug-names       - generate
>>>>                 .debug_names table.
>>>>                    --build-debug-pubnames    - generate
>>>>                 .debug_pubnames table.
>>>>                    --build-debug-pubtypes    - generate
>>>>                 .debug_pubtypes table.
>>>>                    --build-gdb-index         - generate
.gdb_index
>>>>                 table.
>>>>                    --compress                - Compress debug
tables.
>>>>                    --decompress              - Decompress debug
tables.
>>>>                    --deduplicate-types       - Do ODR
deduplication
>>>>                 for debug types.
>>>>                    --garbage-collect         - Do garbage
>>>>                 collecting for debug info.
>>>>                    --num-threads=<n> - Specify the
maximum number
>>>>                 (n) of
>>>>                 simultaneous threads
>>>>                                                to use when
>>>>                 optimizing input file.
>>>>                 Defaults to the number of cores on the
>>>>                 current machine.
>>>>                    --strip-all               - Strip all debug
tables.
>>>>                    --strip=<name1,name2> - Strip
specified debug
>>>>                 info tables.
>>>>                    --strip-unoptimized-debug - Strip all
>>>>                 unoptimized debug tables.
>>>>                    --tombstone=<value> - Tombstone value
used as a
>>>>                 marker of
>>>>                 invalid address.
>>>>                      =bfd                    - BFD default
value
>>>>                      =dwarf6                 - Dwarf v6.
>>>>                    --verbose                 - Enable verbose
>>>>                 logging and encoding details.
>>>>
>>>>                 Generic Options:
>>>>
>>>>                    --help                    - Display
available
>>>>                 options (--help-hidden
>>>>                 for more)
>>>>                    --version                 - Display the
version
>>>>                 of this program
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/331e7b91/attachment.html>

David Blaikie via llvm-dev

2020-Sep-03 17:56 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On Thu, Sep 3, 2020 at 5:15 AM Alexey <avl.lapshin at gmail.com> wrote:
>
> On 03.09.2020 01:36, David Blaikie wrote:
>
>
>
> On Wed, Sep 2, 2020 at 3:26 PM Alexey <avl.lapshin at gmail.com>
wrote:
>
>>
>> On 02.09.2020 21:44, David Blaikie wrote:
>>
>>
>>
>> On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at gmail.com>
wrote:
>>
>>>
>>> On 01.09.2020 20:07, David Blaikie wrote:
>>>
>>> Fair enough - thanks for clarifying the differences! (I'd still
lean a
>>> bit towards this being dwz-esque, as you say "an extension of
classic dwz"
>>>
>>> I doubt a little about "llvm-dwz" since it might confuse
people who
>>> would expect exactly the same behavior.
>>> But if we think of it as "an extension of classic dwz"
and the possible
>>> confusion is not a big deal then
>>> I would be fine with "llvm-dwz".
>>>
>>> using a bit more domain knowledge (of terminators and C++ odr -
though
>>> I'm not sure dsymutil does rely on the ODR, does it? It relies
on it to
>>> know that two names represent the same type, I suppose, but
doesn't assume
>>> they're already identical, instead it merges their members))
>>>
>>> if dsymutil is able to find a full definition then it would remove
all
>>> other definitions(which matched by name) and set all references to
that
>>> found definition. If it is not able to find a full definition then
it would
>>> do nothing. i.e. if there are two incomplete
>>> definitions(DW_AT_declaration   (true)) with the same name then
they would
>>> not be merged. That is a possible improvement - to teach dsymutil
to merge
>>> incomplete types.
>>>
>> Huh, what does it do with extra member function definitions found in
>> later definitions? (eg: struct x { template<typename T> void f();
}; - in
>> one translation unit x::f<int> is instantiated, in another
x::f<float> is
>> instantiated - how are the two represented with dsymutil?)
>>
>> They would be considered as two not matched types. dsymutil would not
>> merge them somehow and thus would not use single type description.
There
>> would be two separate types called "x" which would have
mostly matched
>> members but differ with x::f<int> and x::f<float>. No any
de-duplication in
>> that case.
>>
> Oh, that's unfortunate. It'd be nice for C++ at least, to implement
a
> potentially faster dsymutil mode that could get this right and not have to
> actually check for type equivalence, instead relying on the name of the
> type to determine that it must be identical.
>
> Right. That would result in even more size reduction.
>
>
> The first instance of the type that's encountered has its fully
qualified
> name or mangled name recorded in a map pointing to the DIE. Any future
> instance gets downgraded to a declaration, and /certain/ members get
> dropped, but other members get stuck on the declaration (same sort of DWARF
> you see with "struct foo { virtual void f1(); template<typename
T> void
> f2() { } }; void test(foo& f) { f.f2<int>(); }"). Recording
all the member
> functions of the type/static member variable types might be needed in cases
> where some member functions are defined in one translation unit and some
> defined in another - though I guess that infrastructure is already in
> place/that just works today.
>
> My understanding, is that there is not such infrastructure currently.
> Current infrastructure allows to reference single existing type
> declaration(canonical) from other units. It does not allow to reference
> different parts(in different units) of incomplete type.
>
Huh, so what does the DWARF look like when you define one member function
in one file, and another member function (common with inline functions) in
another file?

> I think it would be necessary to change the order of how compilation units
> are processed to implement such types merging.
>
Oh, I wasn't suggesting merging them - or didn't mean to suggest that. I
meant doing something like what we do in LLVM for type homed
(no-standalone) DWARF, where we attach function declarations to type
declarations, eg:

struct x {

  void f1();

  void f2();

  template<typename T>

  static void f3();

};

#ifdef HOME

void x::f1() {

}

#endif

#ifdef AWAY

void x::f2() {

}

#endif

#ifdef TEMPL

template<typename T>

void x::f3() {

}

template void x::f3<int>();

#endif

Building "HOME" would show the DWARF I'd expect to see the first
time a
type definition is encountered during dsym.
Building "AWAY" raises the question of - what does dsymutil do with
this
DWARF? Does it deduplicate the type, and make the definition of 'f2'
point
to the 'f2' declaration in the original type described in the prior CU
defined in "HOME"? If it doesn't do that, it could/that would be
good to
reduce the DWARF size.
Building "TEMPL" would show the DWARF I'd expect to see if a
future use of
that type definition was encountered but the original/home definition had
no declaration of this function: we should then emit maybe an
"extension"
to the type (could be a straight declaration, or maybe some newer/weirder
hybrid that points to the definition with some attribute) & then inject the
declaration of the template/other new member into this extension
definition, etc.

> Currently, after the compilation unit is analyzed(scanned for types and
> dead info) it started to be emitted.
> It looks like, to support merging, it would be necessary to analyze all
> CUs first(to create canonical representation) and then start to emit them.
>
> I am going to start to work on a prototype of parallel per-compilation
> unit implementation of DWARFLinker.
> (basing on the scenario which Jonas described in other letter in that
> thread).
> The types merging could be the next step...
>
> Following is the result of compilation of your example on darwin(showing
> that dsymutil does not merge such types):
>
Ah, yeah, that is unfortunate - so if there were other members of "x"
they
would be duplicated in this case, right?

This is a pretty common issue in C++ - there are 3 reasons I know of where
LLVM would produce distinct descriptions:
1) member function templates, like this
2) member/nested types
3) implicit special members (not present unless instantiated - so if you
copy construct an object in one file and not in another, two different
types)

>
> $ cat struct.h
>
> #ifndef MY_H
> #define MY_H
>
> struct foo {
>   template <class T> int fff () { return sizeof(T); }
> };
>
> #endif // MY_H
>
> $ cat mod1.cpp
>
> #include "struct.h"
> int test1 ( ) {
>   foo var;
>   return var.fff<int>();
> }
>
> $ cat mod2.cpp
>
> #include "struct.h"
> int test2 ( ) {
>   foo var;
>   return var.fff<float>();
> }
>
> $ cat main.cpp
>
> #include "struct.h"
> int test1();
> int test2();
> int main ( void ) {
>   test1();
>   test2();
>   return 0;
> }
>
> $ clang++ main.cpp mod1.cpp mod2.cpp -O -g -fno-inline
>
> $ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | less
>
> 0x00000056: DW_TAG_compile_unit
>
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("mod1.cpp")
>
> 0x000000ae:   DW_TAG_structure_type  
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>                 DW_AT_name      ("foo")
>                 DW_AT_byte_size (0x01)
>
> 0x000000b7:     DW_TAG_subprogram
>
>                   DW_AT_linkage_name    ("_ZN3foo3fffIiEEiv")
>                   DW_AT_name    ("fff<int>")
>
>
> 0x0000011f: DW_TAG_compile_unit
>
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("mod2.cpp")
>
> 0x00000177:   DW_TAG_structure_type  
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>                 DW_AT_name      ("foo")
>                 DW_AT_byte_size (0x01)
>
> 0x00000180:     DW_TAG_subprogram
>
>                   DW_AT_linkage_name    ("_ZN3foo3fffIfEEiv")
>                   DW_AT_name    ("fff<float>")
>
>
>
> - Dave
>
>>
>> Alexey.
>>>
>>>
>>> But I don't have super strong feelings about the naming.
>>>
>>> On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at
gmail.com> wrote:
>>>
>>>>
>>>> On 01.09.2020 06:27, David Blaikie wrote:
>>>>
>>>> A quick note: The feature as currently proposed sounds like
it's an
>>>> exact match for 'dwz'? Is there any benefit to this
over the existing dwz
>>>> project? Is it different in some ways I'm not aware of? (I
haven't actually
>>>> used dwz, so I might have some mistaken ideas about how it
should work)
>>>>
>>>> If it's going to solve the same general problem, but be in
the llvm
>>>> project instead, then maybe it should be called llvm-dwz.
>>>>
>>>> It looks like dwz and llvm-dwarfutil are not exactly matched in
>>>> functionality.
>>>>
>>>> dwz is a  program that attempts to optimize DWARF debugging
information
>>>> contained in ELF shared libraries and ELF executables for
*size*.
>>>>
>>>> llvm-dwarfutil is a tool that is used for processing debug
>>>> info(DWARF) located in built binary files to improve debug info
>>>> *quality*,
>>>> reduce debug info *size* and accelerate debug info
*processing*.
>>>>
>>>> Things which are supposed to be done by llvm-dwarfutil and
which are
>>>> not
>>>> done by dwz: removing obsolete debug info, building indexes,
stripping
>>>> unneeded debug sections, compress/decompress debug sections.
>>>>
>>>> Common thing is that both of these tools do debug info size
reduction.
>>>> But they do this using different approaches:
>>>>
>>>> 1. dwz reduces the size of debug info by creating partial
compilation
>>>> units
>>>>     for duplicated parts. So that these partial compilation
units could
>>>> be imported
>>>>     in every duplicated place. AFAIU, That optimization gives
the most
>>>> size saving effect.
>>>>
>>>>    another size saving optimization is ODR types deduplication.
>>>>
>>>> 2. llvm-dwarfutil reduces the size of debug info by ODR types
>>>> deduplication
>>>>    which gives the most size saving effect in llvm-dwarfutil
case.
>>>>
>>>>    another size saving optimization is removing obsolete debug
info.
>>>>    (which actually is not only about size but about correctness
also)
>>>>
>>>> So, it looks like these tools are not equal. If we would
consider that
>>>> llvm-dwz is an extension of classic dwz then we could probably
>>>> name it as llvm-dwz.
>>>>
>>>>
>>>> Though I understand the desire for this to grow other
functionality,
>>>> like DWARF-aware dwp-ing. Might be better for this to busybox
and provide
>>>> that functionality under llvm-dwp instead, or more likely I
Suspect, that
>>>> the existing llvm-dwp will be rewritten (probably by me) to use
more of
>>>> lld's infrastructure to be more efficient (it's current
object
>>>> reading/writing logic is using LLVM's libObject and
MCStreamer, which is a
>>>> bit inefficient for a very content-unaware linking process) and
then maybe
>>>> that could be taught to use DwarfLinker as a library to
optionally do
>>>> DWARF-aware linking depending on the users time/space tradeoff
desires.
>>>> Still benefiting from any improvements to the underlying
DwarfLinker
>>>> library (at which point that would be shared between
llvm-dsymutil,
>>>> llvm-dwz, and llvm-dwp).
>>>>
>>>> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at
gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>    We propose llvm-dwarfutil - a dsymutil-like tool for
ELF.
>>>>>    Any thoughts on this?
>>>>>    Thanks in advance, Alexey.
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> llvm-dwarfutil(Apndx A) - is a tool that is used for
processing debug
>>>>> info(DWARF)
>>>>> located in built binary files to improve debug info
quality,
>>>>> reduce debug info size and accelerate debug info
processing.
>>>>> Supported object files formats: ELF, MachO(Apndx B),
COFF(Apndx C),
>>>>> WASM(Apndx C).
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Specifically, the tool would do:
>>>>>
>>>>>    - Remove obsolete debug info which refers to code
deleted by the
>>>>> linker
>>>>>      doing the garbage collection (gc-sections).
>>>>>
>>>>>    - Deduplicate debug type definitions for reducing
resulting size of
>>>>> binary.
>>>>>
>>>>>    - Build accelerator/index tables.
>>>>>      = .debug_aranges, .debug_names, .gdb_index,
.debug_pubnames,
>>>>> .debug_pubtypes.
>>>>>
>>>>>    - Strip unneeded tables.
>>>>>      = .debug_aranges, .debug_names, .gdb_index,
.debug_pubnames,
>>>>> .debug_pubtypes.
>>>>>
>>>>>    - Compress or decompress debug info as requested.
>>>>>
>>>>> Possible feature:
>>>>>
>>>>>    - Join split dwarf .dwo files in a single file
containing all debug
>>>>> info
>>>>>      (convert split DWARF into monolithic DWARF).
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> User interface:
>>>>>
>>>>>    OVERVIEW: A tool for optimizing debug info located in
the built
>>>>> binary.
>>>>>
>>>>>    USAGE: llvm-dwarfutil [options] input output
>>>>>
>>>>>    OPTIONS: (Apndx E)
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Implementation notes:
>>>>>
>>>>> 1. Removing obsolete debug info would be done using
DWARFLinker llvm
>>>>> library.
>>>>>
>>>>> 2. Data types deduplication would be done using DWARFLinker
llvm
>>>>> library.
>>>>>
>>>>> 3. Accelerator/index tables would be generated using
DWARFLinker llvm
>>>>> library.
>>>>>
>>>>> 4. Interface of DWARFLinker library would be changed in
such way that
>>>>> it
>>>>>     would be possible to switch on/off various stages:
>>>>>
>>>>>    class DWARFLinker {
>>>>>      setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo =
false);
>>>>>
>>>>>      setDoAppleNames ( bool DoAppleNames = false );
>>>>>      setDoAppleNamespaces ( bool DoAppleNamespaces = false
);
>>>>>      setDoAppleTypes ( bool DoAppleTypes = false );
>>>>>      setDoObjC ( bool DoObjC = false );
>>>>>      setDoDebugPubNames ( bool DoDebugPubNames = false );
>>>>>      setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>>>>
>>>>>      setDoDebugNames (bool DoDebugNames = false);
>>>>>      setDoGDBIndex (bool DoGDBIndex = false);
>>>>>    }
>>>>>
>>>>> 5. Copying source file contents, stripping tables,
>>>>> compressing/decompressing tables
>>>>>     would be done by ObjCopy llvm library(extracted from
llvm-objcopy):
>>>>>
>>>>>    Error executeObjcopyOnBinary(const CopyConfig
&Config,
>>>>>                               object::COFFObjectFile
&In, Buffer &Out);
>>>>>    Error executeObjcopyOnBinary(const CopyConfig
&Config,
>>>>>                               object::ELFObjectFileBase
&In, Buffer
>>>>> &Out);
>>>>>    Error executeObjcopyOnBinary(const CopyConfig
&Config,
>>>>>                               object::MachOObjectFile
&In, Buffer
>>>>> &Out);
>>>>>    Error executeObjcopyOnBinary(const CopyConfig
&Config,
>>>>>                               object::WasmObjectFile
&In, Buffer &Out);
>>>>>
>>>>> 6. Address ranges and single addresses pointing to removed
code should
>>>>> be marked
>>>>>     with tombstone value in the input file:
>>>>>
>>>>>     -2 for .debug_ranges and .debug_loc.
>>>>>     -1 for other .debug* tables.
>>>>>
>>>>> 7. Prototype implementation -
https://reviews.llvm.org/D86539.
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Roadmap:
>>>>>
>>>>> 1. Refactor llvm-objcopy to extract it`s implementation
into separate
>>>>> library
>>>>>     ObjCopy(in LLVM tree).
>>>>>
>>>>> 2. Create a command line utility using existed DWARFLinker
and ObjCopy
>>>>>     implementation. First version is supposed to work with
only ELF
>>>>> input object files.
>>>>>     It would take input ELF file with unoptimized debug
info and
>>>>> create
>>>>> output
>>>>>     ELF file with optimized debug info. That version would
be done out
>>>>> of the llvm tree.
>>>>>
>>>>> 3. Make a tool to be able to work in multi-thread mode.
>>>>>
>>>>> 4. Consider it to be included into LLVM tree.
>>>>>
>>>>> 5. Support DWARF5 tables.
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Appendix A. Should this tool be implemented as a new tool
or as an
>>>>> extension
>>>>>              to dsymutil/llvm-objcopy?
>>>>>
>>>>>     There already exists a tool which removes obsolete
debug info on
>>>>> darwin - dsymutil.
>>>>>     Why create another tool instead of extending the
already existed
>>>>> dsymutil/llvm-objcopy?
>>>>>
>>>>>     The main functionality of dsymutil is located in a
separate
>>>>> library
>>>>> - DWARFLinker.
>>>>>     Thus, dsymutil utility is a command-line interface for
>>>>> DWARFLinker.
>>>>> dsymutil has
>>>>>     another type of input/output data: it takes several
object files
>>>>> and
>>>>> address map
>>>>>     as input and creates a .dSYM bundle with linked debug
info as
>>>>> output. llvm-dwarfutil
>>>>>     would take a built executable as input and create an
optimized
>>>>> executable as output.
>>>>>     Additionally, there would be many command-line options
specific
>>>>> for
>>>>> only one utility.
>>>>>     This means that these utilities(implementing command
line
>>>>> interface)
>>>>> would significantly
>>>>>     differ. It makes sense not to put another command-line
utility
>>>>> inside existing dsymutil,
>>>>>     but make it as a separate utility. That is the reason
why
>>>>> llvm-dwarfutil suggested to be
>>>>>     implemented not as sub-part of dsymutil but as a
separate tool.
>>>>>
>>>>>     Please share your preference: whether llvm-dwarfutil
should be
>>>>>     separate utility, or a variant of dsymutil compiled for
ELF?
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Appendix B. The machO object file format is already
supported by
>>>>> dsymutil.
>>>>>     Depending on the decision whether llvm-dwarfutil would
be done as
>>>>> a
>>>>> subproject
>>>>>     of dsymutil or as a separate utility - machO would be
supported or
>>>>> not.
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Appendix C. Support for the COFF and WASM object file
formats
>>>>> presented as
>>>>>      possible future improvement. It would be quite easy to
add them
>>>>> assuming
>>>>>      that llvm-objcopy already supports these formats. It
also would
>>>>> require
>>>>>      supporting DWARF6-suggested tombstone values(-1/-2).
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Appendix D. Documentation.
>>>>>
>>>>>    - proposal for DWARF6 which suggested -1/-2 values for
marking bad
>>>>> addresses
>>>>>      http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>>>>    - dsymutil tool
https://llvm.org/docs/CommandGuide/dsymutil.html.
>>>>>    - proposal "Remove obsolete debug info in
lld."
>>>>>
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>>>
>>>>>
=====================================================================>>>>>
>>>>> Appendix E. Possible command line options:
>>>>>
>>>>> DwarfUtil Options:
>>>>>
>>>>>    --build-aranges           - generate .debug_aranges
table.
>>>>>    --build-debug-names       - generate .debug_names table.
>>>>>    --build-debug-pubnames    - generate .debug_pubnames
table.
>>>>>    --build-debug-pubtypes    - generate .debug_pubtypes
table.
>>>>>    --build-gdb-index         - generate .gdb_index table.
>>>>>    --compress                - Compress debug tables.
>>>>>    --decompress              - Decompress debug tables.
>>>>>    --deduplicate-types       - Do ODR deduplication for
debug types.
>>>>>    --garbage-collect         - Do garbage collecting for
debug info.
>>>>>    --num-threads=<n>         - Specify the maximum
number (n) of
>>>>> simultaneous threads
>>>>>                                to use when optimizing input
file.
>>>>>                                Defaults to the number of
cores on the
>>>>> current machine.
>>>>>    --strip-all               - Strip all debug tables.
>>>>>    --strip=<name1,name2>     - Strip specified debug
info tables.
>>>>>    --strip-unoptimized-debug - Strip all unoptimized debug
tables.
>>>>>    --tombstone=<value>       - Tombstone value used
as a marker of
>>>>> invalid address.
>>>>>      =bfd                    -   BFD default value
>>>>>      =dwarf6                 -   Dwarf v6.
>>>>>    --verbose                 - Enable verbose logging and
encoding
>>>>> details.
>>>>>
>>>>> Generic Options:
>>>>>
>>>>>    --help                    - Display available options
>>>>> (--help-hidden
>>>>> for more)
>>>>>    --version                 - Display the version of this
program
>>>>>
>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/5fc8af1c/attachment-0001.html>

Alexey via llvm-dev

2020-Sep-04 10:42 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On 03.09.2020 20:56, David Blaikie wrote:>
>
> On Thu, Sep 3, 2020 at 5:15 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 03.09.2020 01:36, David Blaikie wrote:
>>
>>
>>     On Wed, Sep 2, 2020 at 3:26 PM Alexey <avl.lapshin at gmail.com
>>     <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>
>>         On 02.09.2020 21:44, David Blaikie wrote:
>>>
>>>
>>>         On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at
gmail.com
>>>         <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>
>>>             On 01.09.2020 20:07, David Blaikie wrote:
>>>>             Fair enough - thanks for clarifying the
differences!
>>>>             (I'd still lean a bit towards this being
dwz-esque, as
>>>>             you say "an extension of classic dwz"
>>>             I doubt a little about "llvm-dwz" since it
might confuse
>>>             people who would expect exactly the same behavior.
>>>             But if we think of it as "an extension of classic
dwz"
>>>             and the possible confusion is not a big deal then
>>>             I would be fine with "llvm-dwz".
>>>>             using a bit more domain knowledge (of terminators
and
>>>>             C++ odr - though I'm not sure dsymutil does
rely on the
>>>>             ODR, does it? It relies on it to know that two
names
>>>>             represent the same type, I suppose, but doesn't
assume
>>>>             they're already identical, instead it merges
their
>>>>             members))
>>>
>>>             if dsymutil is able to find a full definition then it
>>>             would remove all other definitions(which matched by
>>>             name) and set all references to that found definition.
>>>             If it is not able to find a full definition then it
>>>             would do nothing. i.e. if there are two incomplete
>>>             definitions(DW_AT_declaration (true)) with the same
name
>>>             then they would not be merged. That is a possible
>>>             improvement - to teach dsymutil to merge incomplete
types.
>>>
>>>         Huh, what does it do with extra member function definitions
>>>         found in later definitions? (eg: struct x {
>>>         template<typename T> void f(); }; - in one
translation unit
>>>         x::f<int> is instantiated, in another
x::f<float> is
>>>         instantiated - how are the two represented with dsymutil?)
>>
>>         They would be considered as two not matched types. dsymutil
>>         would not merge them somehow and thus would not use single
>>         type description. There would be two separate types called
>>         "x" which would have mostly matched members but
differ with
>>         x::f<int> and x::f<float>. No any de-duplication in
that case.
>>
>>     Oh, that's unfortunate. It'd be nice for C++ at least, to
>>     implement a potentially faster dsymutil mode that could get this
>>     right and not have to actually check for type equivalence,
>>     instead relying on the name of the type to determine that it must
>>     be identical.
>
>     Right. That would result in even more size reduction.
>
>>
>>     The first instance of the type that's encountered has its fully
>>     qualified name or mangled name recorded in a map pointing to the
>>     DIE. Any future instance gets downgraded to a declaration, and
>>     /certain/ members get dropped, but other members get stuck on the
>>     declaration (same sort of DWARF you see with "struct foo {
>>     virtual void f1(); template<typename T> void f2() { } }; void
>>     test(foo& f) { f.f2<int>(); }"). Recording all the
member
>>     functions of the type/static member variable types might be
>>     needed in cases where some member functions are defined in one
>>     translation unit and some defined in another - though I guess
>>     that infrastructure is already in place/that just works today.
>     My understanding, is that there is not such infrastructure
>     currently. Current infrastructure allows to reference single
>     existing type declaration(canonical) from other units. It does not
>     allow to reference different parts(in different units) of
>     incomplete type.
>
>
> Huh, so what does the DWARF look like when you define one member 
> function in one file, and another member function (common with inline 
> functions) in another file?
>
>     I think it would be necessary to change the order of how
>     compilation units are processed to implement such types merging.
>
>
> Oh, I wasn't suggesting merging them - or didn't mean to suggest
that.
> I meant doing something like what we do in LLVM for type homed 
> (no-standalone) DWARF, where we attach function declarations to type 
> declarations, eg:
>
> struct x {
>
> void f1();
>
> void f2();
>
> template<typename T>
>
> static void f3();
>
> };
>
> #ifdef HOME
>
> void x::f1() {
>
> }
>
> #endif
>
> #ifdef AWAY
>
> void x::f2() {
>
> }
>
> #endif
>
> #ifdef TEMPL
>
> template<typename T>
>
> void x::f3() {
>
> }
>
> template void x::f3<int>();
>
> #endif
>
> Building "HOME" would show the DWARF I'd expect to see the
first time
> a type definition is encountered during dsym.
> Building "AWAY" raises the question of - what does dsymutil do
with
> this DWARF? Does it deduplicate the type, and make the definition of 
> 'f2' point to the 'f2' declaration in the original type
described in
> the prior CU defined in "HOME"? If it doesn't do that, it
could/that
> would be good to reduce the DWARF size.
> Building "TEMPL" would show the DWARF I'd expect to see if a
future
> use of that type definition was encountered but the original/home 
> definition had no declaration of this function: we should then emit 
> maybe an "extension" to the type (could be a straight
declaration, or
> maybe some newer/weirder hybrid that points to the definition with 
> some attribute) & then inject the declaration of the template/other 
> new member into this extension definition, etc.
>Please check the reduced DWARF, generated by current dsymutil for above 
example :

0x0000000b: DW_TAG_compile_unit
               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("home.cpp")
               DW_AT_stmt_list   (0x00000000)
               DW_AT_low_pc      (0x0000000100000f80)
               DW_AT_high_pc     (0x0000000100000f8b)

0x0000002a:   DW_TAG_structure_type
                 DW_AT_name      ("x")
                 DW_AT_byte_size (0x01)

0x00000033:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f1Ev")
                   DW_AT_name    ("f1")
                   DW_AT_type    (0x000000000000005e "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)

0x00000047:       NULL

0x00000048:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f2Ev")
                   DW_AT_name    ("f2")
                   DW_AT_type    (0x000000000000005e "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)

0x0000005c:       NULL
0x0000005d:     NULL

0x0000006a:   DW_TAG_subprogram
                 DW_AT_low_pc    (0x0000000100000f80)
                 DW_AT_high_pc   (0x0000000100000f8b)
                 DW_AT_specification     (0x0000000000000033
"_ZN1x2f1Ev")


0x000000a0: DW_TAG_compile_unit
               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("away.cpp")
               DW_AT_stmt_list   (0x00000048)
               DW_AT_low_pc      (0x0000000100000f90)
               DW_AT_high_pc     (0x0000000100000f9b)

0x000000c6:   DW_TAG_subprogram
                 DW_AT_low_pc    (0x0000000100000f90)
                 DW_AT_high_pc   (0x0000000100000f9b)
                 DW_AT_specification     (0x0000000000000048
"_ZN1x2f2Ev")

0x000000fc: DW_TAG_compile_unit
               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("templ.cpp")
               DW_AT_stmt_list   (0x00000090)
               DW_AT_low_pc      (0x0000000100000fa0)
               DW_AT_high_pc     (0x0000000100000fab)

0x0000011b:   DW_TAG_structure_type
                 DW_AT_name      ("x")
                 DW_AT_byte_size (0x01)

0x00000124:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f1Ev")
                   DW_AT_name    ("f1")
                   DW_AT_type    (0x0000000000000168 "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)
0x00000138:       NULL

0x00000139:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f2Ev")
                   DW_AT_name    ("f2")
                   DW_AT_type    (0x0000000000000168 "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)
0x0000014d:       NULL

0x0000014e:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f3IiEEiv")
                   DW_AT_name    ("f3<int>")
                   DW_AT_type    (0x0000000000000168 "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)
0x00000166:       NULL
0x00000167:     NULL

0x00000174:   DW_TAG_subprogram
                 DW_AT_low_pc    (0x0000000100000fa0)
                 DW_AT_high_pc   (0x0000000100000fab)
                 DW_AT_specification     (0x000000000000014e 
"_ZN1x2f3IiEEiv")
0x00000190:     NULL


 >Building "HOME" would show the DWARF I'd expect to see the
first time
a type definition is encountered during dsym.

compile unit "home.cpp" contains the type definition(0x0000002a) and 
reference to its member(DW_AT_specification     (0x0000000000000033 
"_ZN1x2f1Ev")).

 >Building "AWAY" raises the question of - what does dsymutil do
with
this DWARF? Does it deduplicate the type, and make the definition of 
'f2' point to the 'f2' declaration in the original type
described in the
prior CU defined in "HOME"? If it doesn't do that, it could/that
would
be good to reduce the DWARF size.

compile unit "away.cpp" does not contain type definition and contains 
reference to type definition from compile unit "home.cpp" 
(DW_AT_specification     (0x0000000000000048 "_ZN1x2f2Ev")).
i.e. dsymutil deduplicates the type and makes the definition of 'f2' 
point to the 'f2' declaration in the original type described in the 
prior CU "home.cpp".

 >Building "TEMPL" would show the DWARF I'd expect to see if a
future
use of that type definition was encountered but the original/home 
definition had no declaration of this function: we should then emit 
maybe an "extension" to the type (could be a straight declaration, or 
maybe some newer/weirder hybrid that points to the definition with some 
attribute) & then inject the declaration of the template/other new 
member into this extension definition, etc.

compile unit "templ.cpp" contains the type definition(0x0000011b)
which
matches with (0x0000002a) plus defines the new member 0x0000014e.
It also references this new member by DW_AT_specification 
(0x000000000000014e "_ZN1x2f3IiEEiv"). In this case type description
is
not de-duplicated.

Do you suggest that 0x0000011b should be transformed into something like 
that:

0x000000fc: DW_TAG_compile_unit
               DW_AT_language    (DW_LANG_C_plus_plus)
               DW_AT_name        ("templ.cpp")
               DW_AT_stmt_list   (0x00000090)
               DW_AT_low_pc      (0x0000000100000fa0)
               DW_AT_high_pc     (0x0000000100000fab)

0x0000011b:   DW_TAG_structure_type
                 DW_AT_specification (0x0000002a "x")

0x00000124:     DW_TAG_subprogram
                   DW_AT_linkage_name    ("_ZN1x2f3IiEEiv")
                   DW_AT_name    ("f3<int>")
                   DW_AT_type    (0x000000000000005e "int")
                   DW_AT_declaration     (true)
                   DW_AT_external        (true)
                   DW_AT_APPLE_optimized (true)
0x00000138:       NULL
0x00000139:     NULL

0x00000140:   DW_TAG_subprogram
                 DW_AT_low_pc    (0x0000000100000fa0)
                 DW_AT_high_pc   (0x0000000100000fab)
                 DW_AT_specification     (0x0000000000000124 
"_ZN1x2f3IiEEiv")
0x00000155:     NULL

Did I correctly get the idea?


>     Currently, after the compilation unit is analyzed(scanned for
>     types and dead info) it started to be emitted.
>     It looks like, to support merging, it would be necessary to
>     analyze all CUs first(to create canonical representation) and then
>     start to emit them.
>
>     I am going to start to work on a prototype of parallel
>     per-compilation unit implementation of DWARFLinker.
>     (basing on the scenario which Jonas described in other letter in
>     that thread).
>     The types merging could be the next step...
>
>     Following is the result of compilation of your example on
>     darwin(showing that dsymutil does not merge such types):
>
>
> Ah, yeah, that is unfortunate - so if there were other members of
"x"
> they would be duplicated in this case, right?
>
> This is a pretty common issue in C++ - there are 3 reasons I know of 
> where LLVM would produce distinct descriptions:
> 1) member function templates, like this
> 2) member/nested types
> 3) implicit special members (not present unless instantiated - so if 
> you copy construct an object in one file and not in another, two 
> different types)
>
>
>     $ cat struct.h
>
>     #ifndef MY_H
>     #define MY_H
>
>     struct foo {
>       template <class T> int fff () { return sizeof(T); }
>     };
>
>     #endif // MY_H
>
>     $ cat mod1.cpp
>
>     #include "struct.h"
>     int test1 ( ) {
>       foo var;
>       return var.fff<int>();
>     }
>
>     $ cat mod2.cpp
>
>     #include "struct.h"
>     int test2 ( ) {
>       foo var;
>       return var.fff<float>();
>     }
>
>     $ cat main.cpp
>
>     #include "struct.h"
>     int test1();
>     int test2();
>     int main ( void ) {
>       test1();
>       test2();
>       return 0;
>     }
>
>     $ clang++ main.cpp mod1.cpp mod2.cpp -O -g -fno-inline
>
>     $ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | less
>
>     0x00000056: DW_TAG_compile_unit
>
>                   DW_AT_language    (DW_LANG_C_plus_plus)
>                   DW_AT_name        ("mod1.cpp")
>
>     0x000000ae:   DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>                     DW_AT_name      ("foo")
>                     DW_AT_byte_size (0x01)
>
>     0x000000b7:     DW_TAG_subprogram
>
>                       DW_AT_linkage_name ("_ZN3foo3fffIiEEiv")
>                       DW_AT_name    ("fff<int>")
>
>
>     0x0000011f: DW_TAG_compile_unit
>
>                   DW_AT_language    (DW_LANG_C_plus_plus)
>                   DW_AT_name        ("mod2.cpp")
>
>     0x00000177:   DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>                     DW_AT_name      ("foo")
>                     DW_AT_byte_size (0x01)
>
>     0x00000180:     DW_TAG_subprogram
>
>                       DW_AT_linkage_name ("_ZN3foo3fffIfEEiv")
>                       DW_AT_name    ("fff<float>")
>
>
>>
>>     - Dave
>>
>>
>>>             Alexey.
>>>
>>>>
>>>>             But I don't have super strong feelings about
the naming.
>>>>
>>>>             On Tue, Sep 1, 2020 at 6:36 AM Alexey
>>>>             <avl.lapshin at gmail.com <mailto:avl.lapshin
at gmail.com>>
>>>>             wrote:
>>>>
>>>>
>>>>                 On 01.09.2020 06:27, David Blaikie wrote:
>>>>>                 A quick note: The feature as currently
proposed
>>>>>                 sounds like it's an exact match for
'dwz'? Is
>>>>>                 there any benefit to this over the existing
dwz
>>>>>                 project? Is it different in some ways
I'm not
>>>>>                 aware of? (I haven't actually used dwz,
so I might
>>>>>                 have some mistaken ideas about how it
should work)
>>>>>
>>>>>                 If it's going to solve the same general
problem,
>>>>>                 but be in the llvm project instead, then
maybe it
>>>>>                 should be called llvm-dwz.
>>>>                 It looks like dwz and llvm-dwarfutil are not
>>>>                 exactly matched in functionality.
>>>>
>>>>                 dwz is a  program that attempts to optimize
DWARF
>>>>                 debugging information
>>>>                 contained in ELF shared libraries and ELF
>>>>                 executables for *size*.
>>>>
>>>>                 llvm-dwarfutil is a tool that is used for
>>>>                 processing debug
>>>>                 info(DWARF) located in built binary files to
>>>>                 improve debug info *quality*,
>>>>                 reduce debug info *size* and accelerate debug
info
>>>>                 *processing*.
>>>>
>>>>                 Things which are supposed to be done by
>>>>                 llvm-dwarfutil and which are not
>>>>                 done by dwz: removing obsolete debug info,
building
>>>>                 indexes, stripping
>>>>                 unneeded debug sections, compress/decompress
debug
>>>>                 sections.
>>>>
>>>>                 Common thing is that both of these tools do
debug
>>>>                 info size reduction.
>>>>                 But they do this using different approaches:
>>>>
>>>>                 1. dwz reduces the size of debug info by
creating
>>>>                 partial compilation units
>>>>                     for duplicated parts. So that these partial
>>>>                 compilation units could be imported
>>>>                     in every duplicated place. AFAIU, That
>>>>                 optimization gives the most size saving effect.
>>>>
>>>>                    another size saving optimization is ODR
types
>>>>                 deduplication.
>>>>
>>>>                 2. llvm-dwarfutil reduces the size of debug
info by
>>>>                 ODR types deduplication
>>>>                    which gives the most size saving effect in
>>>>                 llvm-dwarfutil case.
>>>>
>>>>                    another size saving optimization is removing
>>>>                 obsolete debug info.
>>>>                    (which actually is not only about size but
about
>>>>                 correctness also)
>>>>
>>>>                 So, it looks like these tools are not equal. If
we
>>>>                 would consider that
>>>>                 llvm-dwz is an extension of classic dwz then we
>>>>                 could probably
>>>>                 name it as llvm-dwz.
>>>>
>>>>>
>>>>>                 Though I understand the desire for this to
grow
>>>>>                 other functionality, like DWARF-aware
dwp-ing.
>>>>>                 Might be better for this to busybox and
provide
>>>>>                 that functionality under llvm-dwp instead,
or more
>>>>>                 likely I Suspect, that the existing
llvm-dwp will
>>>>>                 be rewritten (probably by me) to use more
of lld's
>>>>>                 infrastructure to be more efficient
(it's current
>>>>>                 object reading/writing logic is using
LLVM's
>>>>>                 libObject and MCStreamer, which is a bit
>>>>>                 inefficient for a very content-unaware
linking
>>>>>                 process) and then maybe that could be
taught to
>>>>>                 use DwarfLinker as a library to optionally
do
>>>>>                 DWARF-aware linking depending on the users
>>>>>                 time/space tradeoff desires. Still
benefiting from
>>>>>                 any improvements to the underlying
DwarfLinker
>>>>>                 library (at which point that would be
shared
>>>>>                 between llvm-dsymutil, llvm-dwz, and
llvm-dwp).
>>>>>
>>>>>                 On Tue, Aug 25, 2020 at 7:29 AM Alexey
>>>>>                 <avl.lapshin at gmail.com
>>>>>                 <mailto:avl.lapshin at gmail.com>>
wrote:
>>>>>
>>>>>                     Hi,
>>>>>
>>>>>                        We propose llvm-dwarfutil - a
dsymutil-like
>>>>>                     tool for ELF.
>>>>>                        Any thoughts on this?
>>>>>                        Thanks in advance, Alexey.
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     llvm-dwarfutil(Apndx A) - is a tool
that is
>>>>>                     used for processing debug
>>>>>                     info(DWARF)
>>>>>                     located in built binary files to
improve debug
>>>>>                     info quality,
>>>>>                     reduce debug info size and accelerate
debug
>>>>>                     info processing.
>>>>>                     Supported object files formats: ELF,
>>>>>                     MachO(Apndx B), COFF(Apndx C),
>>>>>                     WASM(Apndx C).
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Specifically, the tool would do:
>>>>>
>>>>>                        - Remove obsolete debug info which
refers
>>>>>                     to code deleted by the linker
>>>>>                          doing the garbage collection
(gc-sections).
>>>>>
>>>>>                        - Deduplicate debug type definitions
for
>>>>>                     reducing resulting size of
>>>>>                     binary.
>>>>>
>>>>>                        - Build accelerator/index tables.
>>>>>                          = .debug_aranges, .debug_names,
>>>>>                     .gdb_index, .debug_pubnames,
>>>>>                     .debug_pubtypes.
>>>>>
>>>>>                        - Strip unneeded tables.
>>>>>                          = .debug_aranges, .debug_names,
>>>>>                     .gdb_index, .debug_pubnames,
>>>>>                     .debug_pubtypes.
>>>>>
>>>>>                        - Compress or decompress debug info
as
>>>>>                     requested.
>>>>>
>>>>>                     Possible feature:
>>>>>
>>>>>                        - Join split dwarf .dwo files in a
single
>>>>>                     file containing all debug info
>>>>>                          (convert split DWARF into
monolithic DWARF).
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     User interface:
>>>>>
>>>>>                        OVERVIEW: A tool for optimizing
debug info
>>>>>                     located in the built binary.
>>>>>
>>>>>                        USAGE: llvm-dwarfutil [options]
input output
>>>>>
>>>>>                        OPTIONS: (Apndx E)
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Implementation notes:
>>>>>
>>>>>                     1. Removing obsolete debug info would
be done
>>>>>                     using DWARFLinker llvm
>>>>>                     library.
>>>>>
>>>>>                     2. Data types deduplication would be
done
>>>>>                     using DWARFLinker llvm library.
>>>>>
>>>>>                     3. Accelerator/index tables would be
generated
>>>>>                     using DWARFLinker llvm
>>>>>                     library.
>>>>>
>>>>>                     4. Interface of DWARFLinker library
would be
>>>>>                     changed in such way that it
>>>>>                         would be possible to switch on/off
various
>>>>>                     stages:
>>>>>
>>>>>                        class DWARFLinker {
>>>>>                     setDoRemoveObsoleteInfo ( bool
>>>>>                     DoRemoveObsoleteInfo = false);
>>>>>
>>>>>                          setDoAppleNames ( bool
DoAppleNames >>>>>                     false );
>>>>>                     setDoAppleNamespaces ( bool
DoAppleNamespaces
>>>>>                     = false );
>>>>>                          setDoAppleTypes ( bool
DoAppleTypes >>>>>                     false );
>>>>>                          setDoObjC ( bool DoObjC = false );
>>>>>                          setDoDebugPubNames ( bool
DoDebugPubNames
>>>>>                     = false );
>>>>>                          setDoDebugPubTypes ( bool
DoDebugPubTypes
>>>>>                     = false );
>>>>>
>>>>>                          setDoDebugNames (bool DoDebugNames
= false);
>>>>>                          setDoGDBIndex (bool DoGDBIndex =
false);
>>>>>                        }
>>>>>
>>>>>                     5. Copying source file contents,
stripping
>>>>>                     tables,
>>>>>                     compressing/decompressing tables
>>>>>                         would be done by ObjCopy llvm
>>>>>                     library(extracted from llvm-objcopy):
>>>>>
>>>>>                        Error executeObjcopyOnBinary(const
>>>>>                     CopyConfig &Config,
>>>>>                                                  
>>>>>                     object::COFFObjectFile &In, Buffer
&Out);
>>>>>                        Error executeObjcopyOnBinary(const
>>>>>                     CopyConfig &Config,
>>>>>                                                  
>>>>>                     object::ELFObjectFileBase &In,
Buffer &Out);
>>>>>                        Error executeObjcopyOnBinary(const
>>>>>                     CopyConfig &Config,
>>>>>                                                  
>>>>>                     object::MachOObjectFile &In, Buffer
&Out);
>>>>>                        Error executeObjcopyOnBinary(const
>>>>>                     CopyConfig &Config,
>>>>>                                                  
>>>>>                     object::WasmObjectFile &In, Buffer
&Out);
>>>>>
>>>>>                     6. Address ranges and single addresses
>>>>>                     pointing to removed code should
>>>>>                     be marked
>>>>>                         with tombstone value in the input
file:
>>>>>
>>>>>                         -2 for .debug_ranges and
.debug_loc.
>>>>>                         -1 for other .debug* tables.
>>>>>
>>>>>                     7. Prototype implementation -
>>>>>                     https://reviews.llvm.org/D86539.
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Roadmap:
>>>>>
>>>>>                     1. Refactor llvm-objcopy to extract
it`s
>>>>>                     implementation into separate
>>>>>                     library
>>>>>                         ObjCopy(in LLVM tree).
>>>>>
>>>>>                     2. Create a command line utility using
existed
>>>>>                     DWARFLinker and ObjCopy
>>>>>                         implementation. First version is
supposed
>>>>>                     to work with only ELF
>>>>>                     input object files.
>>>>>                         It would take input ELF file with
>>>>>                     unoptimized debug info and create
>>>>>                     output
>>>>>                         ELF file with optimized debug info.
That
>>>>>                     version would be done out
>>>>>                     of the llvm tree.
>>>>>
>>>>>                     3. Make a tool to be able to work in
>>>>>                     multi-thread mode.
>>>>>
>>>>>                     4. Consider it to be included into LLVM
tree.
>>>>>
>>>>>                     5. Support DWARF5 tables.
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Appendix A. Should this tool be
implemented as
>>>>>                     a new tool or as an extension
>>>>>                                  to dsymutil/llvm-objcopy?
>>>>>
>>>>>                         There already exists a tool which
removes
>>>>>                     obsolete debug info on
>>>>>                     darwin - dsymutil.
>>>>>                         Why create another tool instead of
>>>>>                     extending the already existed
>>>>>                     dsymutil/llvm-objcopy?
>>>>>
>>>>>                         The main functionality of dsymutil
is
>>>>>                     located in a separate library
>>>>>                     - DWARFLinker.
>>>>>                         Thus, dsymutil utility is a
command-line
>>>>>                     interface for DWARFLinker.
>>>>>                     dsymutil has
>>>>>                         another type of input/output data:
it
>>>>>                     takes several object files and
>>>>>                     address map
>>>>>                         as input and creates a .dSYM bundle
with
>>>>>                     linked debug info as
>>>>>                     output. llvm-dwarfutil
>>>>>                         would take a built executable as
input and
>>>>>                     create an optimized
>>>>>                     executable as output.
>>>>>                         Additionally, there would be many
>>>>>                     command-line options specific for
>>>>>                     only one utility.
>>>>>                         This means that these
>>>>>                     utilities(implementing command line
interface)
>>>>>                     would significantly
>>>>>                         differ. It makes sense not to put
another
>>>>>                     command-line utility
>>>>>                     inside existing dsymutil,
>>>>>                         but make it as a separate utility.
That is
>>>>>                     the reason why
>>>>>                     llvm-dwarfutil suggested to be
>>>>>                         implemented not as sub-part of
dsymutil
>>>>>                     but as a separate tool.
>>>>>
>>>>>                         Please share your preference:
whether
>>>>>                     llvm-dwarfutil should be
>>>>>                         separate utility, or a variant of
dsymutil
>>>>>                     compiled for ELF?
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Appendix B. The machO object file
format is
>>>>>                     already supported by dsymutil.
>>>>>                         Depending on the decision whether
>>>>>                     llvm-dwarfutil would be done as a
>>>>>                     subproject
>>>>>                         of dsymutil or as a separate
utility -
>>>>>                     machO would be supported or not.
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Appendix C. Support for the COFF and
WASM
>>>>>                     object file formats presented as
>>>>>                          possible future improvement. It
would be
>>>>>                     quite easy to add them
>>>>>                     assuming
>>>>>                          that llvm-objcopy already supports
these
>>>>>                     formats. It also would require
>>>>>                          supporting DWARF6-suggested
tombstone
>>>>>                     values(-1/-2).
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Appendix D. Documentation.
>>>>>
>>>>>                        - proposal for DWARF6 which
suggested -1/-2
>>>>>                     values for marking bad
>>>>>                     addresses
>>>>>                    
http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>>>>                        - dsymutil tool
>>>>>                    
https://llvm.org/docs/CommandGuide/dsymutil.html.
>>>>>                        - proposal "Remove obsolete
debug info in lld."
>>>>>                    
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>>>
>>>>>                    
=====================================================================>>>>>
>>>>>                     Appendix E. Possible command line
options:
>>>>>
>>>>>                     DwarfUtil Options:
>>>>>
>>>>>                     --build-aranges - generate
.debug_aranges table.
>>>>>                     --build-debug-names - generate
.debug_names table.
>>>>>                     --build-debug-pubnames - generate
>>>>>                     .debug_pubnames table.
>>>>>                     --build-debug-pubtypes - generate
>>>>>                     .debug_pubtypes table.
>>>>>                     --build-gdb-index - generate .gdb_index
table.
>>>>>                     --compress - Compress debug tables.
>>>>>                     --decompress - Decompress debug tables.
>>>>>                     --deduplicate-types - Do ODR
deduplication for
>>>>>                     debug types.
>>>>>                     --garbage-collect - Do garbage
collecting for
>>>>>                     debug info.
>>>>>                     --num-threads=<n> - Specify the
maximum number
>>>>>                     (n) of
>>>>>                     simultaneous threads
>>>>>                                                    to use
when
>>>>>                     optimizing input file.
>>>>>                                                    Defaults
to the
>>>>>                     number of cores on the
>>>>>                     current machine.
>>>>>                     --strip-all - Strip all debug tables.
>>>>>                     --strip=<name1,name2> - Strip
specified debug
>>>>>                     info tables.
>>>>>                     --strip-unoptimized-debug - Strip all
>>>>>                     unoptimized debug tables.
>>>>>                     --tombstone=<value> - Tombstone
value used as
>>>>>                     a marker of
>>>>>                     invalid address.
>>>>>                     =bfd -   BFD default value
>>>>>                     =dwarf6 -   Dwarf v6.
>>>>>                     --verbose - Enable verbose logging and
>>>>>                     encoding details.
>>>>>
>>>>>                     Generic Options:
>>>>>
>>>>>                     --help - Display available options
(--help-hidden
>>>>>                     for more)
>>>>>                     --version - Display the version of this
program
>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/91638b2b/attachment-0001.html>

llvm dev - Sep 2020 - [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.