thr3ads.net - llvm dev - [llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF. [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Alexey via llvm-dev

2020-Aug-31 15:06 UTC

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Hi James,

Thank you for the comments.

 >I think we're not terribly far from that ideal, now, for ELF. Maybe 
only these three things need to be done? --
 >  1. Teach lld how to emit a separated debuginfo output file directly, 
without requiring an objcopy step.
 >  2. Integrate DWARFLinker into lld.
 >  3. Create a new tool which takes the separated debuginfo and DWO/DWP 
files and uses DWARFLinker library
 > to create a new (dwarf-linked) separated-debug file, that doesn't 
depend on DWO/DWP files.

The three goals which you`ve described are our far goals.
Indeed, the best solution would be to create valid optimized debug info 
without additional
stages and additional modifications of resulting binaries.

There was an attempt to use DWARFLinker from the lld - 
https://reviews.llvm.org/D74169
It did not receive enough support to be integrated yet. There are fair 
reasons for that:

1. Execution time. The time required by DWARFLinker for processing clang 
binary is 8x bigger
than the usual linking time. Linking clang binary with DWARFLinker takes 
72 sec,
linking with the only lld takes 9 sec.

2. "Removing obsolete debug info" could not be switched off. Thus, lld
could not use DWARFLinker for
other tasks(like generation of index tables - .gdb_index, .debug_names) 
without significant performance
degradation.

3. DWARFLinker does not support split dwarf at the moment.

All these reasons are not blockers. And I believe implementation from 
D74169 might be integrated and
incrementally improved if there would be agreement on that.

Using DWARFLinker from llvm-dwarfutil is another possibility to use and 
improve it.
When finally implemented - llvm-dwarfutil should solve the above three 
issues and there
would probably be more reasons to include DWARFLinker into lld.

Even if we would have the best solution - it is still useful to have a 
tool like llvm-dwarfutil
for cases when it is necessary to process already created binaries.

So in short, the suggested tool - llvm-dwarfutil - is a step towards the 
ideal solution.
Its benefit is that it could be used until we created the best solution 
or for cases
where "the best solution" is not applicable.

Thank you, Alexey.


On 29.08.2020 00:23, James Y Knight wrote:> If we're designing a new tool and process, it would be wonderful if it 
> did not require multiple stages of copying and slightly modifying the 
> binary, in order to create final output with separate debug info. It 
> seems to me that the variants of this sort of thing which exist today 
> are somewhat suboptimal.
>
> With Mach-O and dsymutil:
>   1. Given a collection of object files (which contain debuginfo), 
> link a binary with ld. The binary then includes special references to 
> the object files that were actually used as part of the link.
>   2. Given the linked binary, and all of the same object files, link 
> the debuginfo with dsymutil.
>   3. Strip the references to the object file paths from the binary.
>   Finally, you have a binary without debug info, and a dsym debuginfo 
> file. But it would be better if the binary created in step 1 didn't 
> need to include the extraneous object-file path info, and that was 
> instead emitted in a second file. Then we wouldn't need step 3.
>
> With "normal" ELF:
>   1. Given a collection of object files (which contain debuginfo), 
> link a binary with ld, which includes linking all the debug info into 
> the binary.
>   2. Given the linked binary, objcopy --only-keep-debug to create a 
> new separated debug file.
>   3. Given the linked binary, objcopy --strip-debug to create a copy 
> of the binary without debug info.
>   Finally you have a binary without debug info, and a separate debug 
> file. But it would be better if the linker could just write the debug 
> info into a separate file in the first place, then we'd only have the 
> one step. (But, downside, the linker needs to manage all the debug 
> info, which can be excessively large.)
>
> With "split-dwarf" ELF support:
>   1. Given object files (which exclude /most/ but not all of the 
> debuginfo), link a binary. The binary will include that smaller set of 
> debug info.
>   2. Given the collection of dwo files corresponding to the object 
> files, run the "dwp" tool to create a dwp file.
>   3. objcopy --only-keep-debug
>   4. --strip-debug
>   And then you need to keep both a debug file /and/ a dwp file, which 
> is weird.
>
>
> I think, ideally, users would have the following three /good/ options:
>   Easy option: store debuginfo in the object files, and have the 
> linker create a pair of {binary, separated dwarf-optimized debuginfo} 
> files directly from the object files.
>   More scalable option: emit (most of the) debuginfo in separate *.dwo 
> files using -gsplit-dwarf, and then,
>     1. run the linker on the object files to create a pair of {binary, 
> separated debuginfo} files. In this case the latter file contains the 
> minimal debuginfo which was in the object files.
>     2. run a second tool, which reads the minimal debuginfo from 
> above, and all the DWO files, and creates a full 
> optimized/deduplicated debuginfo output file.
>   Faster developer builds: Like previous, but omit step 2 -- running 
> the debugger directly after step 1 can use the dwo files on-disk.
>
> I think we're not terribly far from that ideal, now, for ELF. Maybe 
> only these three things need to be done? --
>   1. Teach lld how to emit a separated debuginfo output file directly, 
> without requiring an objcopy step.
>   2. Integrate DWARFLinker into lld.
>   3. Create a new tool which takes the separated debuginfo and DWO/DWP 
> files and uses DWARFLinker library to create a new (dwarf-linked) 
> separated-debug file, that doesn't depend on DWO/DWP files.
>
> My hope is that the tool you're creating will be the implementation of 
> #3, but I'm afraid the intent is for this tool to be an additional 
> stage that non-split-dwarf users would need to run post-link, /instead 
> of/ integrating DWARFLinker into lld.
>
> On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi,
>
>        We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>        Any thoughts on this?
>        Thanks in advance, Alexey.
>
>    
=====================================================================>
>     llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>     info(DWARF)
>     located in built binary files to improve debug info quality,
>     reduce debug info size and accelerate debug info processing.
>     Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>     WASM(Apndx C).
>
>    
=====================================================================>
>     Specifically, the tool would do:
>
>        - Remove obsolete debug info which refers to code deleted by
>     the linker
>          doing the garbage collection (gc-sections).
>
>        - Deduplicate debug type definitions for reducing resulting
>     size of
>     binary.
>
>        - Build accelerator/index tables.
>          = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>     .debug_pubtypes.
>
>        - Strip unneeded tables.
>          = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>     .debug_pubtypes.
>
>        - Compress or decompress debug info as requested.
>
>     Possible feature:
>
>        - Join split dwarf .dwo files in a single file containing all
>     debug info
>          (convert split DWARF into monolithic DWARF).
>
>    
=====================================================================>
>     User interface:
>
>        OVERVIEW: A tool for optimizing debug info located in the built
>     binary.
>
>        USAGE: llvm-dwarfutil [options] input output
>
>        OPTIONS: (Apndx E)
>
>    
=====================================================================>
>     Implementation notes:
>
>     1. Removing obsolete debug info would be done using DWARFLinker llvm
>     library.
>
>     2. Data types deduplication would be done using DWARFLinker llvm
>     library.
>
>     3. Accelerator/index tables would be generated using DWARFLinker llvm
>     library.
>
>     4. Interface of DWARFLinker library would be changed in such way
>     that it
>         would be possible to switch on/off various stages:
>
>        class DWARFLinker {
>          setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>
>          setDoAppleNames ( bool DoAppleNames = false );
>          setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>          setDoAppleTypes ( bool DoAppleTypes = false );
>          setDoObjC ( bool DoObjC = false );
>          setDoDebugPubNames ( bool DoDebugPubNames = false );
>          setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>
>          setDoDebugNames (bool DoDebugNames = false);
>          setDoGDBIndex (bool DoGDBIndex = false);
>        }
>
>     5. Copying source file contents, stripping tables,
>     compressing/decompressing tables
>         would be done by ObjCopy llvm library(extracted from
>     llvm-objcopy):
>
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::COFFObjectFile &In, Buffer
>     &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::ELFObjectFileBase &In,
>     Buffer &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::MachOObjectFile &In, Buffer
>     &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::WasmObjectFile &In, Buffer
>     &Out);
>
>     6. Address ranges and single addresses pointing to removed code
>     should
>     be marked
>         with tombstone value in the input file:
>
>         -2 for .debug_ranges and .debug_loc.
>         -1 for other .debug* tables.
>
>     7. Prototype implementation - https://reviews.llvm.org/D86539.
>
>    
=====================================================================>
>     Roadmap:
>
>     1. Refactor llvm-objcopy to extract it`s implementation into separate
>     library
>         ObjCopy(in LLVM tree).
>
>     2. Create a command line utility using existed DWARFLinker and ObjCopy
>         implementation. First version is supposed to work with only ELF
>     input object files.
>         It would take input ELF file with unoptimized debug info and
>     create
>     output
>         ELF file with optimized debug info. That version would be done
>     out
>     of the llvm tree.
>
>     3. Make a tool to be able to work in multi-thread mode.
>
>     4. Consider it to be included into LLVM tree.
>
>     5. Support DWARF5 tables.
>
>    
=====================================================================>
>     Appendix A. Should this tool be implemented as a new tool or as an
>     extension
>                  to dsymutil/llvm-objcopy?
>
>         There already exists a tool which removes obsolete debug info on
>     darwin - dsymutil.
>         Why create another tool instead of extending the already existed
>     dsymutil/llvm-objcopy?
>
>         The main functionality of dsymutil is located in a separate
>     library
>     - DWARFLinker.
>         Thus, dsymutil utility is a command-line interface for
>     DWARFLinker.
>     dsymutil has
>         another type of input/output data: it takes several object
>     files and
>     address map
>         as input and creates a .dSYM bundle with linked debug info as
>     output. llvm-dwarfutil
>         would take a built executable as input and create an optimized
>     executable as output.
>         Additionally, there would be many command-line options
>     specific for
>     only one utility.
>         This means that these utilities(implementing command line
>     interface)
>     would significantly
>         differ. It makes sense not to put another command-line utility
>     inside existing dsymutil,
>         but make it as a separate utility. That is the reason why
>     llvm-dwarfutil suggested to be
>         implemented not as sub-part of dsymutil but as a separate tool.
>
>         Please share your preference: whether llvm-dwarfutil should be
>         separate utility, or a variant of dsymutil compiled for ELF?
>
>    
=====================================================================>
>     Appendix B. The machO object file format is already supported by
>     dsymutil.
>         Depending on the decision whether llvm-dwarfutil would be done
>     as a
>     subproject
>         of dsymutil or as a separate utility - machO would be
>     supported or not.
>
>    
=====================================================================>
>     Appendix C. Support for the COFF and WASM object file formats
>     presented as
>          possible future improvement. It would be quite easy to add them
>     assuming
>          that llvm-objcopy already supports these formats. It also
>     would require
>          supporting DWARF6-suggested tombstone values(-1/-2).
>
>    
=====================================================================>
>     Appendix D. Documentation.
>
>        - proposal for DWARF6 which suggested -1/-2 values for marking bad
>     addresses
>     http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>        - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>        - proposal "Remove obsolete debug info in lld."
>     http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>
>    
=====================================================================>
>     Appendix E. Possible command line options:
>
>     DwarfUtil Options:
>
>        --build-aranges           - generate .debug_aranges table.
>        --build-debug-names       - generate .debug_names table.
>        --build-debug-pubnames    - generate .debug_pubnames table.
>        --build-debug-pubtypes    - generate .debug_pubtypes table.
>        --build-gdb-index         - generate .gdb_index table.
>        --compress                - Compress debug tables.
>        --decompress              - Decompress debug tables.
>        --deduplicate-types       - Do ODR deduplication for debug types.
>        --garbage-collect         - Do garbage collecting for debug info.
>        --num-threads=<n>         - Specify the maximum number (n) of
>     simultaneous threads
>                                    to use when optimizing input file.
>                                    Defaults to the number of cores on the
>     current machine.
>        --strip-all               - Strip all debug tables.
>        --strip=<name1,name2>     - Strip specified debug info tables.
>        --strip-unoptimized-debug - Strip all unoptimized debug tables.
>        --tombstone=<value>       - Tombstone value used as a marker
of
>     invalid address.
>          =bfd                    -   BFD default value
>          =dwarf6                 -   Dwarf v6.
>        --verbose                 - Enable verbose logging and encoding
>     details.
>
>     Generic Options:
>
>        --help                    - Display available options
>     (--help-hidden
>     for more)
>        --version                 - Display the version of this program
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/560d20ed/attachment-0001.html>

Fangrui Song via llvm-dev

2020-Aug-31 21:54 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On 2020-08-31, Alexey via llvm-dev wrote:>Hi James,
>
>Thank you for the comments.
>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe 
>only these three things need to be done? --
>>  1. Teach lld how to emit a separated debuginfo output file 
>directly, without requiring an objcopy step.
>>  2. Integrate DWARFLinker into lld.
>>  3. Create a new tool which takes the separated debuginfo and 
>DWO/DWP files and uses DWARFLinker library
>> to create a new (dwarf-linked) separated-debug file, that doesn't 
>depend on DWO/DWP files.
>
>The three goals which you`ve described are our far goals.
>Indeed, the best solution would be to create valid optimized debug 
>info without additional
>stages and additional modifications of resulting binaries.
>
>There was an attempt to use DWARFLinker from the lld - 
>https://reviews.llvm.org/D74169
>It did not receive enough support to be integrated yet. There are fair 
>reasons for that:
>
>1. Execution time. The time required by DWARFLinker for processing 
>clang binary is 8x bigger
>than the usual linking time. Linking clang binary with DWARFLinker 
>takes 72 sec,
>linking with the only lld takes 9 sec.
>
>2. "Removing obsolete debug info" could not be switched off. Thus,
lld
>could not use DWARFLinker for
>other tasks(like generation of index tables - .gdb_index, 
>.debug_names) without significant performance
>degradation.
>
>3. DWARFLinker does not support split dwarf at the moment.
>
>All these reasons are not blockers. And I believe implementation from 
>D74169 might be integrated and
>incrementally improved if there would be agreement on that.
>
>Using DWARFLinker from llvm-dwarfutil is another possibility to use 
>and improve it.
>When finally implemented - llvm-dwarfutil should solve the above three 
>issues and there
>would probably be more reasons to include DWARFLinker into lld.
>
>Even if we would have the best solution - it is still useful to have a 
>tool like llvm-dwarfutil
>for cases when it is necessary to process already created binaries.
>
>So in short, the suggested tool - llvm-dwarfutil - is a step towards 
>the ideal solution.
>Its benefit is that it could be used until we created the best 
>solution or for cases
>where "the best solution" is not applicable.
>
>Thank you, Alexey.
>
>
>On 29.08.2020 00:23, James Y Knight wrote:
>>If we're designing a new tool and process, it would be wonderful if 
>>it did not require multiple stages of copying and slightly modifying 
>>the binary, in order to create final output with separate 
>>debug info. It seems to me that the variants of this sort of thing 
>>which exist today are somewhat suboptimal.
>>
>>With Mach-O and dsymutil:
>>  1. Given a collection of object files (which contain debuginfo), 
>>link a binary with ld. The binary then includes special references 
>>to the object files that were actually used as part of the link.
>>  2. Given the linked binary, and all of the same object files, link 
>>the debuginfo with dsymutil.
>>  3. Strip the references to the object file paths from the binary.
>>  Finally, you have a binary without debug info, and a dsym 
>>debuginfo file. But it would be better if the binary created in step 
>>1 didn't need to include the extraneous object-file path info, and 
>>that was instead emitted in a second file. Then we wouldn't need 
>>step 3.
>>
>>With "normal" ELF:
>>  1. Given a collection of object files (which contain debuginfo), 
>>link a binary with ld, which includes linking all the debug info 
>>into the binary.
>>  2. Given the linked binary, objcopy --only-keep-debug to create a 
>>new separated debug file.
>>  3. Given the linked binary, objcopy --strip-debug to create a copy 
>>of the binary without debug info.
>>  Finally you have a binary without debug info, and a separate debug 
>>file. But it would be better if the linker could just write the 
>>debug info into a separate file in the first place, then we'd only 
>>have the one step. (But, downside, the linker needs to manage all 
>>the debug info, which can be excessively large.)
>>
>>With "split-dwarf" ELF support:
>>  1. Given object files (which exclude /most/ but not all of the 
>>debuginfo), link a binary. The binary will include that smaller set 
>>of debug info.
>>  2. Given the collection of dwo files corresponding to the object 
>>files, run the "dwp" tool to create a dwp file.
>>  3. objcopy --only-keep-debug
>>  4. --strip-debug
>>  And then you need to keep both a debug file /and/ a dwp file, 
>>which is weird.
>>
>>
>>I think, ideally, users would have the following three /good/ options:
>>  Easy option: store debuginfo in the object files, and have the 
>>linker create a pair of {binary, separated dwarf-optimized 
>>debuginfo} files directly from the object files.
>>  More scalable option: emit (most of the) debuginfo in separate 
>>*.dwo files using -gsplit-dwarf, and then,
>>    1. run the linker on the object files to create a pair of 
>>{binary, separated debuginfo} files. In this case the latter file 
>>contains the minimal debuginfo which was in the object files.
>>    2. run a second tool, which reads the minimal debuginfo from 
>>above, and all the DWO files, and creates a full 
>>optimized/deduplicated debuginfo output file.
>>  Faster developer builds: Like previous, but omit step 2 -- running 
>>the debugger directly after step 1 can use the dwo files on-disk.
>>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe 
>>only these three things need to be done? --
>>  1. Teach lld how to emit a separated debuginfo output file 
>>directly, without requiring an objcopy step.
This is very similar to Solaris's ancillary objects (ET_SUNW_ANCILLARY).
There are more details on
http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
In short, Solari's `ld -z ancillary[=outfile]` writes non-SHF_ALLOC sections
to the
ancillary object. Perhaps we will need some coordination with GNU. Some
GNU folks are interested in a new object file type:
https://groups.google.com/forum/#!topic/generic-abi/tJq7anc6WKs


A debug file created by {,llvm-}objcopy --only-keep-debug has different
contents (see https://reviews.llvm.org/D67137 for details):
non-SHF_ALLOC sections and SHT_NOTE sections. 
http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
does not say whether program headers are retained in the debug file, but
{,llvm-}objcopy --only-keep-debug keeps one copy (neither gdb/lldb needs
the program headers).
>>  2. Integrate DWARFLinker into lld.
>>  3. Create a new tool which takes the separated debuginfo and 
>>DWO/DWP files and uses DWARFLinker library to create a new 
>>(dwarf-linked) separated-debug file, that doesn't depend on DWO/DWP 
>>files.
>>
>>My hope is that the tool you're creating will be the implementation 
>>of #3, but I'm afraid the intent is for this tool to be an 
>>additional stage that non-split-dwarf users would need to run 
>>post-link, /instead of/ integrating DWARFLinker into lld.
>>On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev 
>><llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>    Hi,
>>
>>       We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>       Any thoughts on this?
>>       Thanks in advance, Alexey.
>>
>>   
=====================================================================>>
>>    llvm-dwarfutil(Apndx A) - is a tool that is used for processing
debug
>>    info(DWARF)
>>    located in built binary files to improve debug info quality,
>>    reduce debug info size and accelerate debug info processing.
>>    Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>>    WASM(Apndx C).
>>
>>   
=====================================================================>>
>>    Specifically, the tool would do:
>>
>>       - Remove obsolete debug info which refers to code deleted by
>>    the linker
>>         doing the garbage collection (gc-sections).
>>
>>       - Deduplicate debug type definitions for reducing resulting
>>    size of
>>    binary.
>>
>>       - Build accelerator/index tables.
>>         = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>    .debug_pubtypes.
>>
>>       - Strip unneeded tables.
>>         = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>    .debug_pubtypes.
>>
>>       - Compress or decompress debug info as requested.
>>
>>    Possible feature:
>>
>>       - Join split dwarf .dwo files in a single file containing all
>>    debug info
>>         (convert split DWARF into monolithic DWARF).
>>
>>   
=====================================================================>>
>>    User interface:
>>
>>       OVERVIEW: A tool for optimizing debug info located in the built
>>    binary.
>>
>>       USAGE: llvm-dwarfutil [options] input output
>>
>>       OPTIONS: (Apndx E)
>>
>>   
=====================================================================>>
>>    Implementation notes:
>>
>>    1. Removing obsolete debug info would be done using DWARFLinker llvm
>>    library.
>>
>>    2. Data types deduplication would be done using DWARFLinker llvm
>>    library.
>>
>>    3. Accelerator/index tables would be generated using DWARFLinker
llvm
>>    library.
>>
>>    4. Interface of DWARFLinker library would be changed in such way
>>    that it
>>        would be possible to switch on/off various stages:
>>
>>       class DWARFLinker {
>>         setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>
>>         setDoAppleNames ( bool DoAppleNames = false );
>>         setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>         setDoAppleTypes ( bool DoAppleTypes = false );
>>         setDoObjC ( bool DoObjC = false );
>>         setDoDebugPubNames ( bool DoDebugPubNames = false );
>>         setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>>         setDoDebugNames (bool DoDebugNames = false);
>>         setDoGDBIndex (bool DoGDBIndex = false);
>>       }
>>
>>    5. Copying source file contents, stripping tables,
>>    compressing/decompressing tables
>>        would be done by ObjCopy llvm library(extracted from
>>    llvm-objcopy):
>>
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::COFFObjectFile &In, Buffer
>>    &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::ELFObjectFileBase &In,
>>    Buffer &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::MachOObjectFile &In,
Buffer
>>    &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::WasmObjectFile &In, Buffer
>>    &Out);
>>
>>    6. Address ranges and single addresses pointing to removed code
>>    should
>>    be marked
>>        with tombstone value in the input file:
>>
>>        -2 for .debug_ranges and .debug_loc.
>>        -1 for other .debug* tables.
>>
>>    7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>>   
=====================================================================>>
>>    Roadmap:
>>
>>    1. Refactor llvm-objcopy to extract it`s implementation into
separate
>>    library
>>        ObjCopy(in LLVM tree).
>>
>>    2. Create a command line utility using existed DWARFLinker and
ObjCopy
>>        implementation. First version is supposed to work with only ELF
>>    input object files.
>>        It would take input ELF file with unoptimized debug info and
>>    create
>>    output
>>        ELF file with optimized debug info. That version would be done
>>    out
>>    of the llvm tree.
>>
>>    3. Make a tool to be able to work in multi-thread mode.
>>
>>    4. Consider it to be included into LLVM tree.
>>
>>    5. Support DWARF5 tables.
>>
>>   
=====================================================================>>
>>    Appendix A. Should this tool be implemented as a new tool or as an
>>    extension
>>                 to dsymutil/llvm-objcopy?
>>
>>        There already exists a tool which removes obsolete debug info on
>>    darwin - dsymutil.
>>        Why create another tool instead of extending the already existed
>>    dsymutil/llvm-objcopy?
>>
>>        The main functionality of dsymutil is located in a separate
>>    library
>>    - DWARFLinker.
>>        Thus, dsymutil utility is a command-line interface for
>>    DWARFLinker.
>>    dsymutil has
>>        another type of input/output data: it takes several object
>>    files and
>>    address map
>>        as input and creates a .dSYM bundle with linked debug info as
>>    output. llvm-dwarfutil
>>        would take a built executable as input and create an optimized
>>    executable as output.
>>        Additionally, there would be many command-line options
>>    specific for
>>    only one utility.
>>        This means that these utilities(implementing command line
>>    interface)
>>    would significantly
>>        differ. It makes sense not to put another command-line utility
>>    inside existing dsymutil,
>>        but make it as a separate utility. That is the reason why
>>    llvm-dwarfutil suggested to be
>>        implemented not as sub-part of dsymutil but as a separate tool.
>>
>>        Please share your preference: whether llvm-dwarfutil should be
>>        separate utility, or a variant of dsymutil compiled for ELF?
>>
>>   
=====================================================================>>
>>    Appendix B. The machO object file format is already supported by
>>    dsymutil.
>>        Depending on the decision whether llvm-dwarfutil would be done
>>    as a
>>    subproject
>>        of dsymutil or as a separate utility - machO would be
>>    supported or not.
>>
>>   
=====================================================================>>
>>    Appendix C. Support for the COFF and WASM object file formats
>>    presented as
>>         possible future improvement. It would be quite easy to add them
>>    assuming
>>         that llvm-objcopy already supports these formats. It also
>>    would require
>>         supporting DWARF6-suggested tombstone values(-1/-2).
>>
>>   
=====================================================================>>
>>    Appendix D. Documentation.
>>
>>       - proposal for DWARF6 which suggested -1/-2 values for marking
bad
>>    addresses
>>    http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>       - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>       - proposal "Remove obsolete debug info in lld."
>>    http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>>   
=====================================================================>>
>>    Appendix E. Possible command line options:
>>
>>    DwarfUtil Options:
>>
>>       --build-aranges           - generate .debug_aranges table.
>>       --build-debug-names       - generate .debug_names table.
>>       --build-debug-pubnames    - generate .debug_pubnames table.
>>       --build-debug-pubtypes    - generate .debug_pubtypes table.
>>       --build-gdb-index         - generate .gdb_index table.
>>       --compress                - Compress debug tables.
>>       --decompress              - Decompress debug tables.
>>       --deduplicate-types       - Do ODR deduplication for debug types.
>>       --garbage-collect         - Do garbage collecting for debug info.
>>       --num-threads=<n>         - Specify the maximum number (n)
of
>>    simultaneous threads
>>                                   to use when optimizing input file.
>>                                   Defaults to the number of cores on
the
>>    current machine.
>>       --strip-all               - Strip all debug tables.
>>       --strip=<name1,name2>     - Strip specified debug info
tables.
>>       --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>       --tombstone=<value>       - Tombstone value used as a
marker of
>>    invalid address.
>>         =bfd                    -   BFD default value
>>         =dwarf6                 -   Dwarf v6.
>>       --verbose                 - Enable verbose logging and encoding
>>    details.
>>
>>    Generic Options:
>>
>>       --help                    - Display available options
>>    (--help-hidden
>>    for more)
>>       --version                 - Display the version of this program
>>
>>    _______________________________________________
>>    LLVM Developers mailing list
>>    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>    https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Y Knight via llvm-dev

2020-Sep-01 04:30 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On Mon, Aug 31, 2020 at 5:54 PM Fangrui Song <maskray at google.com>
wrote:
> >>  1. Teach lld how to emit a separated debuginfo output file
> >>directly, without requiring an objcopy step.
>
> This is very similar to Solaris's ancillary objects
(ET_SUNW_ANCILLARY).
> There are more details on
>
http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
> In short, Solari's `ld -z ancillary[=outfile]` writes non-SHF_ALLOC
> sections to the
> ancillary object. Perhaps we will need some coordination with GNU. Some
> GNU folks are interested in a new object file type:
> https://groups.google.com/forum/#!topic/generic-abi/tJq7anc6WKs
>
>
> A debug file created by {,llvm-}objcopy --only-keep-debug has different
> contents (see https://reviews.llvm.org/D67137 for details):
> non-SHF_ALLOC sections and SHT_NOTE sections.
>
http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
> does not say whether program headers are retained in the debug file, but
> {,llvm-}objcopy --only-keep-debug keeps one copy (neither gdb/lldb needs
> the program headers).

What I meant that lld should emit the same files you'd get via `objcopy
--strip-debug; objcopy --only-keep-debug; objcopy --add-gnu-debuglink` (or
`eu-strip -f foo.debug foo`). Only difference is that it's directly output
from the linker, instead of via a post-processing step. Could be invoked
like `ld.lld -o foo -s --debug-output=foo.debug`, or with `-S`, instead, if
you want to keep the symtab in the binary instead of the debuginfo.

The original GNU proposal for the new object type flag in that thread was
just a tiny modification of the existing formats, to enable identifying a
debuginfo file. We can easily implement that extra flag, if it happens.
It's not clear to me that introducing some other new behavior here would be
particularly interesting or useful -- even having seen that thread.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/55ca8b3c/attachment.html>

James Y Knight via llvm-dev

2020-Sep-01 17:18 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On Mon, Aug 31, 2020 at 11:06 AM Alexey <avl.lapshin at gmail.com> wrote:
> Hi James,
>
> Thank you for the comments.
>
> >I think we're not terribly far from that ideal, now, for ELF. Maybe
only
> these three things need to be done? --
> >  1. Teach lld how to emit a separated debuginfo output file directly,
> without requiring an objcopy step.
> >  2. Integrate DWARFLinker into lld.
> >  3. Create a new tool which takes the separated debuginfo and DWO/DWP
> files and uses DWARFLinker library
> > to create a new (dwarf-linked) separated-debug file, that doesn't
depend
> on DWO/DWP files.
>
> The three goals which you`ve described are our far goals.
> Indeed, the best solution would be to create valid optimized debug info
> without additional
> stages and additional modifications of resulting binaries.
>
> There was an attempt to use DWARFLinker from the lld -
> https://reviews.llvm.org/D74169
> It did not receive enough support to be integrated yet. There are fair
> reasons for that:
>
> 1. Execution time. The time required by DWARFLinker for processing clang
> binary is 8x bigger
> than the usual linking time. Linking clang binary with DWARFLinker takes
> 72 sec,
> linking with the only lld takes 9 sec.
>
> 2. "Removing obsolete debug info" could not be switched off.
Thus, lld
> could not use DWARFLinker for
> other tasks(like generation of index tables - .gdb_index, .debug_names)
> without significant performance
> degradation.
>
> 3. DWARFLinker does not support split dwarf at the moment.
>
> All these reasons are not blockers. And I believe implementation from
> D74169 might be integrated and
> incrementally improved if there would be agreement on that.
>
Those do sound like absolutely critical issues for deploying this for real
-- whether as a separate tool or integrated with lld. But possibly not
critical enough to prevent adding this behind an experimental flag, and
working on the code incrementally in-tree. However (without having looked
at the code in question), I wonder if the reported 8x regression in
link-time is even going to be salvageable just by incremental
optimizations, or if it might require a complete re-architecting of the
DwarfLinker code.

Using DWARFLinker from llvm-dwarfutil is another possibility to use
and> improve it.
>When finally implemented - llvm-dwarfutil should solve the above
three> issues and there
> would probably be more reasons to include DWARFLinker into lld.
>
Is it the case that if the code is built to support the "read an
executable, output a new better executable" use-case, it will actually be
what's needed for the "output an optimized executable while linking
object
files" use-case? I worry that those could have enough different
requirements that you really need to be developing the linker-integrated
version from the very beginning in order to get a good result, rather than
trying to shoehorn it in as an afterthought.

Even if we would have the best solution - it is still useful to have a
tool> like llvm-dwarfutil
>for cases when it is necessary to process already created
binaries.>
Sure -- I just think that should be considered as a secondary use-case, and
not the primary goal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/6996e79e/attachment.html>

David Blaikie via llvm-dev

2020-Sep-01 17:25 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On Tue, Sep 1, 2020 at 10:18 AM James Y Knight <jyknight at google.com>
wrote:
>
>
> On Mon, Aug 31, 2020 at 11:06 AM Alexey <avl.lapshin at gmail.com>
wrote:
>
>> Hi James,
>>
>> Thank you for the comments.
>>
>> >I think we're not terribly far from that ideal, now, for ELF.
Maybe only
>> these three things need to be done? --
>> >  1. Teach lld how to emit a separated debuginfo output file
directly,
>> without requiring an objcopy step.
>> >  2. Integrate DWARFLinker into lld.
>> >  3. Create a new tool which takes the separated debuginfo and
DWO/DWP
>> files and uses DWARFLinker library
>> > to create a new (dwarf-linked) separated-debug file, that
doesn't
>> depend on DWO/DWP files.
>>
>> The three goals which you`ve described are our far goals.
>> Indeed, the best solution would be to create valid optimized debug info
>> without additional
>> stages and additional modifications of resulting binaries.
>>
>> There was an attempt to use DWARFLinker from the lld -
>> https://reviews.llvm.org/D74169
>> It did not receive enough support to be integrated yet. There are fair
>> reasons for that:
>>
>> 1. Execution time. The time required by DWARFLinker for processing
clang
>> binary is 8x bigger
>> than the usual linking time. Linking clang binary with DWARFLinker
takes
>> 72 sec,
>> linking with the only lld takes 9 sec.
>>
>> 2. "Removing obsolete debug info" could not be switched off.
Thus, lld
>> could not use DWARFLinker for
>> other tasks(like generation of index tables - .gdb_index, .debug_names)
>> without significant performance
>> degradation.
>>
>> 3. DWARFLinker does not support split dwarf at the moment.
>>
>> All these reasons are not blockers. And I believe implementation from
>> D74169 might be integrated and
>> incrementally improved if there would be agreement on that.
>>
>
> Those do sound like absolutely critical issues for deploying this for real
> -- whether as a separate tool or integrated with lld. But possibly not
> critical enough to prevent adding this behind an experimental flag, and
> working on the code incrementally in-tree. However (without having looked
> at the code in question),
>
Yep, that's my feeling too.

> I wonder if the reported 8x regression in link-time is even going to be
> salvageable just by incremental optimizations, or if it might require a
> complete re-architecting of the DwarfLinker code.
>
Jonas, who's looked at llvm-dsymutil performance for its own sake
(motivated to improve llvm-dsymutil runtime, etc) & has mentioned on
this/related threads that there might be minimal headroom to improve things
there - so, yes, if there are greater opportunities it may require a fairly
large/broad investment (though a second/third set of eyes on the current
code to see if there are some hidden opportunities isn't a bad thing).

> Using DWARFLinker from llvm-dwarfutil is another possibility to use and
>> improve it.
>>
> When finally implemented - llvm-dwarfutil should solve the above three
>> issues and there
>> would probably be more reasons to include DWARFLinker into lld.
>>
>
> Is it the case that if the code is built to support the "read an
> executable, output a new better executable" use-case, it will actually
be
> what's needed for the "output an optimized executable while
linking object
> files" use-case? I worry that those could have enough different
> requirements that you really need to be developing the linker-integrated
> version from the very beginning in order to get a good result, rather than
> trying to shoehorn it in as an afterthought.
>
Fair concern. I think there's probably a good chance of a lot of overlap in
functionality/benefits - but, yes, likely some unspecified amount that
would be context-dependent/different between lld/dwz/dwp/dsymutil use cases
that are all slightly different.

> Even if we would have the best solution - it is still useful to have a
>> tool like llvm-dwarfutil
>>
> for cases when it is necessary to process already created binaries.
>>
>
> Sure -- I just think that should be considered as a secondary use-case,
> and not the primary goal.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/9b10aa2a/attachment-0001.html>

Alexey via llvm-dev

2020-Sep-01 18:55 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

On 01.09.2020 20:18, James Y Knight wrote:>
>
> On Mon, Aug 31, 2020 at 11:06 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>     Hi James,
>
>     Thank you for the comments.
>
>     >I think we're not terribly far from that ideal, now, for ELF.
>     Maybe only these three things need to be done? --
>     >  1. Teach lld how to emit a separated debuginfo output file
>     directly, without requiring an objcopy step.
>     >  2. Integrate DWARFLinker into lld.
>     >  3. Create a new tool which takes the separated debuginfo and
>     DWO/DWP files and uses DWARFLinker library
>     > to create a new (dwarf-linked) separated-debug file, that
>     doesn't depend on DWO/DWP files.
>
>     The three goals which you`ve described are our far goals.
>     Indeed, the best solution would be to create valid optimized debug
>     info without additional
>     stages and additional modifications of resulting binaries.
>
>     There was an attempt to use DWARFLinker from the lld -
>     https://reviews.llvm.org/D74169
>     It did not receive enough support to be integrated yet. There are
>     fair reasons for that:
>
>     1. Execution time. The time required by DWARFLinker for processing
>     clang binary is 8x bigger
>     than the usual linking time. Linking clang binary with DWARFLinker
>     takes 72 sec,
>     linking with the only lld takes 9 sec.
>
>     2. "Removing obsolete debug info" could not be switched off.
Thus,
>     lld could not use DWARFLinker for
>     other tasks(like generation of index tables - .gdb_index,
>     .debug_names) without significant performance
>     degradation.
>
>     3. DWARFLinker does not support split dwarf at the moment.
>
>     All these reasons are not blockers. And I believe implementation
>     from D74169 might be integrated and
>     incrementally improved if there would be agreement on that.
>
>
> Those do sound like absolutely critical issues for deploying this for 
> real -- whether as a separate tool or integrated with lld. But 
> possibly not critical enough to prevent adding this behind an 
> experimental flag, and working on the code incrementally in-tree. 
> However (without having looked at the code in question), I wonder if 
> the reported 8x regression in link-time is even going to be 
> salvageable just by incremental optimizations, or if it might require 
> a complete re-architecting of the DwarfLinker code.

That would be more like complete re-architecturing of the DWARFLinker code.
Current dsymutil implementation does "analyzing" and
"cloning" stages in
parallel.
i.e. it sequentially analyzes all object files and sequentially clones 
them. Speed up
ratio of parallelization is 2x. Changing this scenario to process 
compilation units in
parallel might speed up execution time. Supporting that scenario would 
require
huge refactoring. The advantage could be speedup execution time and reducing
memory usage. I am planning to make a prototype of it to prove the fact 
that
such a refactoring will have these benefits.

>
>     Using DWARFLinker from llvm-dwarfutil is another possibility to
>     use and improve it.
>
>     When finally implemented - llvm-dwarfutil should solve the above
>     three issues and there
>     would probably be more reasons to include DWARFLinker into lld.
>
>
> Is it the case that if the code is built to support the "read an 
> executable, output a new better executable" use-case, it will actually
> be what's needed for the "output an optimized executable while
linking
> object files" use-case? I worry that those could have enough different
> requirements that you really need to be developing the 
> linker-integrated version from the very beginning in order to get a 
> good result, rather than trying to shoehorn it in as an afterthought.There are already exist working prototypes of the "read an executable, 
output a new better executable" use case - 
https://reviews.llvm.org/D86539. And "output an optimized executable 
while linking object files" https://reviews.llvm.org/D74169. They share 
the most of DWARFLinker. The differences exist but they could be managed.

The major problem of D86539 is that it loads all dies into the 
memory(since we have only one source file).
For clang it requires approx 30G of memory. D74169 does not have such a 
problem since it loads/frees dies per object file. Changing processing 
from "per source file" into "per compilation unit" should
help with this
problem.

So it looks like DWARFLinker refactored to parallel "per compilation 
unit" scenario fits quite well for both of these tasks.

At current moment I do not see other problems which would prevent from 
using
the same DWARFLinker library for both of these tasks. There could be 
unseen some, of course.

>
>     Even if we would have the best solution - it is still useful to
>     have a tool like llvm-dwarfutil
>
>     for cases when it is necessary to process already created binaries.
>
>
> Sure -- I just think that should be considered as a secondary 
> use-case, and not the primary goal.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/ee48ad1a/attachment.html>

Alexey via llvm-dev

2020-Sep-14 11:17 UTC

head link

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Debuginfo folks,

What is your opinion on this proposal?

Do we need to work on better DWARFLinker library for now? or Can we 
start to integrate llvm-dwarfutil as a series of small patches?

If it is OK to start integrating of llvm-dwarfutil, Is it OK to move 
llvm-objcopy implementation into separate library ObjCopy ?

Thank you, Alexey.

On 01.09.2020 20:18, James Y Knight wrote:>
>
> On Mon, Aug 31, 2020 at 11:06 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>     Hi James,
>
>     Thank you for the comments.
>
>     >I think we're not terribly far from that ideal, now, for ELF.
>     Maybe only these three things need to be done? --
>     >  1. Teach lld how to emit a separated debuginfo output file
>     directly, without requiring an objcopy step.
>     >  2. Integrate DWARFLinker into lld.
>     >  3. Create a new tool which takes the separated debuginfo and
>     DWO/DWP files and uses DWARFLinker library
>     > to create a new (dwarf-linked) separated-debug file, that
>     doesn't depend on DWO/DWP files.
>
>     The three goals which you`ve described are our far goals.
>     Indeed, the best solution would be to create valid optimized debug
>     info without additional
>     stages and additional modifications of resulting binaries.
>
>     There was an attempt to use DWARFLinker from the lld -
>     https://reviews.llvm.org/D74169
>     It did not receive enough support to be integrated yet. There are
>     fair reasons for that:
>
>     1. Execution time. The time required by DWARFLinker for processing
>     clang binary is 8x bigger
>     than the usual linking time. Linking clang binary with DWARFLinker
>     takes 72 sec,
>     linking with the only lld takes 9 sec.
>
>     2. "Removing obsolete debug info" could not be switched off.
Thus,
>     lld could not use DWARFLinker for
>     other tasks(like generation of index tables - .gdb_index,
>     .debug_names) without significant performance
>     degradation.
>
>     3. DWARFLinker does not support split dwarf at the moment.
>
>     All these reasons are not blockers. And I believe implementation
>     from D74169 might be integrated and
>     incrementally improved if there would be agreement on that.
>
>
> Those do sound like absolutely critical issues for deploying this for 
> real -- whether as a separate tool or integrated with lld. But 
> possibly not critical enough to prevent adding this behind an 
> experimental flag, and working on the code incrementally in-tree. 
> However (without having looked at the code in question), I wonder if 
> the reported 8x regression in link-time is even going to be 
> salvageable just by incremental optimizations, or if it might require 
> a complete re-architecting of the DwarfLinker code.
>
>     Using DWARFLinker from llvm-dwarfutil is another possibility to
>     use and improve it.
>
>     When finally implemented - llvm-dwarfutil should solve the above
>     three issues and there
>     would probably be more reasons to include DWARFLinker into lld.
>
>
> Is it the case that if the code is built to support the "read an 
> executable, output a new better executable" use-case, it will actually
> be what's needed for the "output an optimized executable while
linking
> object files" use-case? I worry that those could have enough different
> requirements that you really need to be developing the 
> linker-integrated version from the very beginning in order to get a 
> good result, rather than trying to shoehorn it in as an afterthought.
>
>     Even if we would have the best solution - it is still useful to
>     have a tool like llvm-dwarfutil
>
>     for cases when it is necessary to process already created binaries.
>
>
> Sure -- I just think that should be considered as a secondary 
> use-case, and not the primary goal.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200914/e1ebb3b2/attachment.html>

llvm dev - Aug 2020 - [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.