Alexey via llvm-dev
2020-Aug-25 14:29 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Hi, We propose llvm-dwarfutil - a dsymutil-like tool for ELF. Any thoughts on this? Thanks in advance, Alexey. ===================================================================== llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info quality, reduce debug info size and accelerate debug info processing. Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), WASM(Apndx C). ===================================================================== Specifically, the tool would do: - Remove obsolete debug info which refers to code deleted by the linker doing the garbage collection (gc-sections). - Deduplicate debug type definitions for reducing resulting size of binary. - Build accelerator/index tables. = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, .debug_pubtypes. - Strip unneeded tables. = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, .debug_pubtypes. - Compress or decompress debug info as requested. Possible feature: - Join split dwarf .dwo files in a single file containing all debug info (convert split DWARF into monolithic DWARF). ===================================================================== User interface: OVERVIEW: A tool for optimizing debug info located in the built binary. USAGE: llvm-dwarfutil [options] input output OPTIONS: (Apndx E) ===================================================================== Implementation notes: 1. Removing obsolete debug info would be done using DWARFLinker llvm library. 2. Data types deduplication would be done using DWARFLinker llvm library. 3. Accelerator/index tables would be generated using DWARFLinker llvm library. 4. Interface of DWARFLinker library would be changed in such way that it would be possible to switch on/off various stages: class DWARFLinker { setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); setDoAppleNames ( bool DoAppleNames = false ); setDoAppleNamespaces ( bool DoAppleNamespaces = false ); setDoAppleTypes ( bool DoAppleTypes = false ); setDoObjC ( bool DoObjC = false ); setDoDebugPubNames ( bool DoDebugPubNames = false ); setDoDebugPubTypes ( bool DoDebugPubTypes = false ); setDoDebugNames (bool DoDebugNames = false); setDoGDBIndex (bool DoGDBIndex = false); } 5. Copying source file contents, stripping tables, compressing/decompressing tables would be done by ObjCopy llvm library(extracted from llvm-objcopy): Error executeObjcopyOnBinary(const CopyConfig &Config, object::COFFObjectFile &In, Buffer &Out); Error executeObjcopyOnBinary(const CopyConfig &Config, object::ELFObjectFileBase &In, Buffer &Out); Error executeObjcopyOnBinary(const CopyConfig &Config, object::MachOObjectFile &In, Buffer &Out); Error executeObjcopyOnBinary(const CopyConfig &Config, object::WasmObjectFile &In, Buffer &Out); 6. Address ranges and single addresses pointing to removed code should be marked with tombstone value in the input file: -2 for .debug_ranges and .debug_loc. -1 for other .debug* tables. 7. Prototype implementation - https://reviews.llvm.org/D86539. ===================================================================== Roadmap: 1. Refactor llvm-objcopy to extract it`s implementation into separate library ObjCopy(in LLVM tree). 2. Create a command line utility using existed DWARFLinker and ObjCopy implementation. First version is supposed to work with only ELF input object files. It would take input ELF file with unoptimized debug info and create output ELF file with optimized debug info. That version would be done out of the llvm tree. 3. Make a tool to be able to work in multi-thread mode. 4. Consider it to be included into LLVM tree. 5. Support DWARF5 tables. ===================================================================== Appendix A. Should this tool be implemented as a new tool or as an extension to dsymutil/llvm-objcopy? There already exists a tool which removes obsolete debug info on darwin - dsymutil. Why create another tool instead of extending the already existed dsymutil/llvm-objcopy? The main functionality of dsymutil is located in a separate library - DWARFLinker. Thus, dsymutil utility is a command-line interface for DWARFLinker. dsymutil has another type of input/output data: it takes several object files and address map as input and creates a .dSYM bundle with linked debug info as output. llvm-dwarfutil would take a built executable as input and create an optimized executable as output. Additionally, there would be many command-line options specific for only one utility. This means that these utilities(implementing command line interface) would significantly differ. It makes sense not to put another command-line utility inside existing dsymutil, but make it as a separate utility. That is the reason why llvm-dwarfutil suggested to be implemented not as sub-part of dsymutil but as a separate tool. Please share your preference: whether llvm-dwarfutil should be separate utility, or a variant of dsymutil compiled for ELF? ===================================================================== Appendix B. The machO object file format is already supported by dsymutil. Depending on the decision whether llvm-dwarfutil would be done as a subproject of dsymutil or as a separate utility - machO would be supported or not. ===================================================================== Appendix C. Support for the COFF and WASM object file formats presented as possible future improvement. It would be quite easy to add them assuming that llvm-objcopy already supports these formats. It also would require supporting DWARF6-suggested tombstone values(-1/-2). ===================================================================== Appendix D. Documentation. - proposal for DWARF6 which suggested -1/-2 values for marking bad addresses http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. - proposal "Remove obsolete debug info in lld." http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html ===================================================================== Appendix E. Possible command line options: DwarfUtil Options: --build-aranges - generate .debug_aranges table. --build-debug-names - generate .debug_names table. --build-debug-pubnames - generate .debug_pubnames table. --build-debug-pubtypes - generate .debug_pubtypes table. --build-gdb-index - generate .gdb_index table. --compress - Compress debug tables. --decompress - Decompress debug tables. --deduplicate-types - Do ODR deduplication for debug types. --garbage-collect - Do garbage collecting for debug info. --num-threads=<n> - Specify the maximum number (n) of simultaneous threads to use when optimizing input file. Defaults to the number of cores on the current machine. --strip-all - Strip all debug tables. --strip=<name1,name2> - Strip specified debug info tables. --strip-unoptimized-debug - Strip all unoptimized debug tables. --tombstone=<value> - Tombstone value used as a marker of invalid address. =bfd - BFD default value =dwarf6 - Dwarf v6. --verbose - Enable verbose logging and encoding details. Generic Options: --help - Display available options (--help-hidden for more) --version - Display the version of this program
James Henderson via llvm-dev
2020-Aug-26 07:58 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
In principle, this sounds reasonable to me. I don't know enough about dsymutil's interface to know whether it makes sense to try to make it multi-format compatible or not. If it doesn't I'm perfectly happy for a new tool to be added using the DWARFLinker library. Some more general thoughts: 1) Assuming the proposal is accepted, this should be introduced piecemeal into LLVM from the beginning as it is developed, rather than having a separate step 4 in the roadmap. 2) The default tombstone values used for dead debug data should be those produced by LLD, in my opinion. In an ideal world, we'd factor them into some shared constant. Note that at the time of writing, I believe LLD is currently using BFD-style tombstones, not the new -1/-2. 3) Does the DWARFLinker library already support multi-threading? If not, it might be a lot of work making things thread-safe. 4) Given that DWARF v6 doesn't exist yet, I wouldn't include that as an option name just yet...! Thanks for looking at this! Please keep me involved in any related reviews etc. James On Tue, 25 Aug 2020 at 15:29, Alexey via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer &Out); > > 6. Address ranges and single addresses pointing to removed code should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and create > output > ELF file with optimized debug info. That version would be done out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for DWARFLinker. > dsymutil has > another type of input/output data: it takes several object files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options specific for > only one utility. > This means that these utilities(implementing command line interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by dsymutil. > Depending on the decision whether llvm-dwarfutil would be done as a > subproject > of dsymutil or as a separate utility - machO would be supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also would > require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding details. > > Generic Options: > > --help - Display available options (--help-hidden > for more) > --version - Display the version of this program > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200826/70b2fc5c/attachment-0001.html>
Alexey via llvm-dev
2020-Aug-26 14:01 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
On 26.08.2020 10:58, James Henderson wrote:> In principle, this sounds reasonable to me. I don't know enough about > dsymutil's interface to know whether it makes sense to try to make it > multi-format compatible or not. If it doesn't I'm perfectly happy for > a new tool to be added using the DWARFLinker library. > > Some more general thoughts: > 1) Assuming the proposal is accepted, this should be introduced > piecemeal into LLVM from the beginning as it is developed, rather than > having a separate step 4 in the roadmap. > 2) The default tombstone values used for dead debug data should be > those produced by LLD, in my opinion. In an ideal world, we'd factor > them into some shared constant. Note that at the time of writing, I > believe LLD is currently using BFD-style tombstones, not the new -1/-2.agreed.> 3) Does the DWARFLinker library already support multi-threading? If > not, it might be a lot of work making things thread-safe.It does, but in a limited way. It can parallelize analyzing and cloning stages. i.e. the maximal speedup is two times. To have a greater performance impact it could probably be parallelized per compilation unit basis. Another thing is that dsymutil currently loads all DIEs from source object file into the memory. And releases them after object file is processed. For non-linked binary this works OK(big binaries usually compiled from several object files). For linked binary that means all DIEs are loaded into the memory. In the result it requires a lot of memory resources. The solution for this problem could be changing splitting of source data from the file to the compilation unit basis. yes, making dsymutil/dwarfutil to work on compilation unit basis supporting multi-threading is a quite a big piece of work. It looks like it would be good for both dsymutil and dwarfutil.> 4) Given that DWARF v6 doesn't exist yet, I wouldn't include that as > an option name just yet...!Would "maxpc" be OK? --tombstone=maxpc ?> > Thanks for looking at this! Please keep me involved in any related > reviews etc.sure. Thank you for the comments. Alexey.> > James > > On Tue, 25 Aug 2020 at 15:29, Alexey via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by > the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting > size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all > debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built > binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm > library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way > that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from > llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, > Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer > &Out); > > 6. Address ranges and single addresses pointing to removed code > should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and > create > output > ELF file with optimized debug info. That version would be done > out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate > library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for > DWARFLinker. > dsymutil has > another type of input/output data: it takes several object > files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options > specific for > only one utility. > This means that these utilities(implementing command line > interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by > dsymutil. > Depending on the decision whether llvm-dwarfutil would be done > as a > subproject > of dsymutil or as a separate utility - machO would be > supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats > presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also > would require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding > details. > > Generic Options: > > --help - Display available options > (--help-hidden > for more) > --version - Display the version of this program > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200826/e613e473/attachment.html>
Jonas Devlieghere via llvm-dev
2020-Aug-26 23:05 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Hey Alexey, I haven't had time to look at the corresponding patch yet, but I hope to do that soon. Here are my initial thoughts on the proposal. On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com> wrote:> Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built binary. > > USAGE: llvm-dwarfutil [options] input output >Nit: I would make the output a separate flag with `-o` for consistency with other similar tools.> > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. >This sounds reasonable to me. I think there is value in having all this in LLVM because LLD wants to use a subset of this functionality. If it weren't for that I'd probably prefer to have this isolated to just the tool.> > 4. Interface of DWARFLinker library would be changed in such way that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } >We can discuss this in the patch, but in dsymutil we pass LinkOption to the linker. I think that would work great for enabling certain functionality.> > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer &Out); >Just to make sure I understand this correctly. The current method names suggest that you'd be running objcopy as an external tool, but when implemented as a library you'd call the code in-process, right?> > 6. Address ranges and single addresses pointing to removed code should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). >What exactly needs to be copied? In dsymutil we create a Mach-O companion file, which is really just a regular Mach-O with only the debug info sections in it. I think we do copy over a few segments, but we have to rewrite the load commands and obviously the DWARF sections. Which part of that would be handled by the objcopy library. It seems like this could be a first, standalone patch. Or do you only plan to use this for the ELF parts?> 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and create > output > ELF file with optimized debug info. That version would be done out > of the llvm tree. >I would prefer doing this incrementally in-tree. It will make reviewing these patches much easier and hopefully allow us to identify opportunities where we can improve both the ELF and the Mach-O variant.> > 3. Make a tool to be able to work in multi-thread mode. >I'm a bit confused by what you mean here. The current DwarfLinker already does the analysis and cloning in parallel. As I've mentioned in the original thread, when I implemented this, there was no way to do better if you want to deduplicate across compilation units which is what gives the biggest size reduction.> > 4. Consider it to be included into LLVM tree. >As I said before I'd rather see this developed incrementally in-tree.> > 5. Support DWARF5 tables. >I assume you mean the line tables (and not the accelerator tables, i.e. debug names)?> > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for DWARFLinker. > dsymutil has > another type of input/output data: it takes several object files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options specific for > only one utility. > This means that these utilities(implementing command line interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? >As the majority of the code has already been hoisted to LLVM for use in LLD, I think two separate tools are fine. I would prefer trying to share a common interface, I'm thinking mostly of the command line options. I'm not saying they should be a drop-in replacement for each other, but I'd be nice if we didn't diverge on common functionality.> =====================================================================> > Appendix B. The machO object file format is already supported by dsymutil. > Depending on the decision whether llvm-dwarfutil would be done as a > subproject > of dsymutil or as a separate utility - machO would be supported or not. >I don't think there's any value in having the new tool support Mach-O. Things that could be shared should be hoisted into L> > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also would > require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. >This is of course up to you to decide, but as a potential user I might be worried about making all the functionality opt-in. For dsymutil you don't have pass any options most of the time. Maybe it would be nice to have a set of defaults and the ability to -fenable or -fdisable them? Or having something like -debugger-tuning in clang?> --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. >We can make `j` the default alias for this option. It's supported by dsymutil but we kept the long option in the help output but I'm happy to change that.> --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding details. > > Generic Options: > > --help - Display available options (--help-hidden > for more) > --version - Display the version of this program > >dsymutil also has a --verify option which runs the DWARF verifier on the output (I'm working on a patch to also run it on the input). It might be a nice addition to have this too down the road. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200826/41f33b82/attachment.html>
Alexey via llvm-dev
2020-Aug-27 20:48 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Hi Jonas, please find my comments below... On 27.08.2020 02:05, Jonas Devlieghere wrote:> Hey Alexey, > > I haven't had time to look at the corresponding patch yet, but I hope > to do that soon. Here are my initial thoughts on the proposal. > > On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com > <mailto:avl.lapshin at gmail.com>> wrote: > > Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by > the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting > size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all > debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built > binary. > > USAGE: llvm-dwarfutil [options] input output > > > Nit: I would make the output a separate flag with `-o` for consistency > with other similar tools.Ok.> > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm > library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > > This sounds reasonable to me. I think there is value in having all > this in LLVM because LLD wants to use a subset of this functionality. > If it weren't for that I'd probably prefer to have this isolated to > just the tool. > > > 4. Interface of DWARFLinker library would be changed in such way > that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > > We can discuss this in the patch, but in dsymutil we pass LinkOption > to the linker. I think that would work great for enabling certain > functionality.Ok, Let`s discuss this in the patch.> > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from > llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, > Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer > &Out); > > > Just to make sure I understand this correctly. The current method > names suggest that you'd be running objcopy as an external tool, but > when implemented as a library you'd call the code in-process, right?Not exactly. I suggest to move them into the library first and then call from dwarfutil code: The example of such call is in the prototype : tools/llvm-dwarfutil/llvm-dwarfutil.cpp: template <class ELFT> Error writeOutputFile(const Options &Options, ELFObjectFile<ELFT> &InputFile, DataBits &OutBits) { ........ objectcopy::FileBuffer FB(Config.OutputFilename); return objectcopy::elf::executeObjcopyOnBinary(Config, InputFile, FB); }> > 6. Address ranges and single addresses pointing to removed code > should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > > What exactly needs to be copied? In dsymutil we create a Mach-O > companion file, which is really just a regular Mach-O with only the > debug info sections in it. I think we do copy over a few segments, but > we have to rewrite the load commands and obviously the DWARF sections. > Which part of that would be handled by the objcopy library. It seems > like this could be a first, standalone patch. Or do you only plan to > use this for the ELF parts?objcopy could replace debug info sections. So the idea is to use objcopy functionality to copy original file without modifications except replacing debug info sections. i.e. specify new sections to objcopy config: CopyConfig.h StringMap<StringRef> NewDebugSections; add code to copy these sections to ELF/ELFObjcopy.cpp: for (const auto &Sec : Config.NewDebugSections) { ArrayRef<uint8_t> DataBits((const uint8_t *)Sec.getValue().data(), Sec.getValue().size()); Section NewSection(DataBits); if (Config.CompressionType != DebugCompressionType::None) Obj.addSection<CompressedSection>(NewSection, Config.CompressionType); else Obj.addSection<Section>(NewSection); } Finally, it would be possible to call executeObjcopyOnBinary() and source file would be copied with replaced debug info sections: objectcopy::elf::executeObjcopyOnBinary(Config, InputFile, FB); Speaking of what should be moved from llvm-obcopy into ObjCopy library. It is Buffer.h, CopyConfig.h and entire ELF, MachO, WASM, COFF directories. It is done in the prototype(prototype copied only ELF part.) The external interface of that library would be described by : ELF/ELFObjcopy.h COFF/COFFObjcopy.h MachO/MachOObjcopy.h wasm/WasmObjcopy.h> 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and > create > output > ELF file with optimized debug info. That version would be done > out > of the llvm tree. > > > I would prefer doing this incrementally in-tree. It will make > reviewing these patches much easier and hopefully allow us to identify > opportunities where we can improve both the ELF and the Mach-O variant.It is OK to me to start doing it in-tree.> > 3. Make a tool to be able to work in multi-thread mode. > > > I'm a bit confused by what you mean here. The current DwarfLinker > already does the analysis and cloning in parallel. As I've mentioned > in the original thread, when I implemented this, there was no way to > do better if you want to deduplicate across compilation units which is > what gives the biggest size reduction. > > > 4. Consider it to be included into LLVM tree. > > > As I said before I'd rather see this developed incrementally in-tree. > > > 5. Support DWARF5 tables. > > > I assume you mean the line tables (and not the accelerator tables, > i.e. debug names)?debug_names is already done in dsymutil/DWARFLinker - so no need to support this. I mean debug_line/.debug_line_str, debug_rnglists, debug_loclists, DW_OP_addrx.> > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate > library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for > DWARFLinker. > dsymutil has > another type of input/output data: it takes several object > files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options > specific for > only one utility. > This means that these utilities(implementing command line > interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > > As the majority of the code has already been hoisted to LLVM for use > in LLD, I think two separate tools are fine. I would prefer trying to > share a common interface, I'm thinking mostly of the command line > options. I'm not saying they should be a drop-in replacement for each > other, but I'd be nice if we didn't diverge on common functionality.agreed.> > =====================================================================> > Appendix B. The machO object file format is already supported by > dsymutil. > Depending on the decision whether llvm-dwarfutil would be done > as a > subproject > of dsymutil or as a separate utility - machO would be > supported or not. > > > I don't think there's any value in having the new tool support Mach-O. > Things that could be shared should be hoisted into L > > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats > presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also > would require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > > > This is of course up to you to decide, but as a potential user I might > be worried about making all the functionality opt-in. For dsymutil you > don't have pass any options most of the time. Maybe it would be nice > to have a set of defaults and the ability to -fenable or -fdisable > them? Or having something like -debugger-tuning in clang?yes, the idea is to have defaults and be able to switch options on/off. For the updated prototype: "llvm-dwarfutil bin/test_clang_in -o bin/test_clang_out" assumes --garbage-collect, --strip-unoptimized-debug, --tombstone=bfd. additionally these options could be explicitly switched on/off: "llvm-dwarfutil --strip-unoptimized-debug=0 bin/test_clang_in -o bin/test_clang_out"> --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > > > We can make `j` the default alias for this option. It's supported by > dsymutil but we kept the long option in the help output but I'm happy > to change that.added "j" as alias for the --num-threads.> --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding > details. > > Generic Options: > > --help - Display available options > (--help-hidden > for more) > --version - Display the version of this program > > > dsymutil also has a --verify option which runs the DWARF verifier on > the output (I'm working on a patch to also run it on the input). It > might be a nice addition to have this too down the road.Ok, would add it. Thank you for the comments! Alexey. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200827/73f7b594/attachment-0001.html>
James Y Knight via llvm-dev
2020-Aug-28 21:23 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
If we're designing a new tool and process, it would be wonderful if it did not require multiple stages of copying and slightly modifying the binary, in order to create final output with separate debug info. It seems to me that the variants of this sort of thing which exist today are somewhat suboptimal. With Mach-O and dsymutil: 1. Given a collection of object files (which contain debuginfo), link a binary with ld. The binary then includes special references to the object files that were actually used as part of the link. 2. Given the linked binary, and all of the same object files, link the debuginfo with dsymutil. 3. Strip the references to the object file paths from the binary. Finally, you have a binary without debug info, and a dsym debuginfo file. But it would be better if the binary created in step 1 didn't need to include the extraneous object-file path info, and that was instead emitted in a second file. Then we wouldn't need step 3. With "normal" ELF: 1. Given a collection of object files (which contain debuginfo), link a binary with ld, which includes linking all the debug info into the binary. 2. Given the linked binary, objcopy --only-keep-debug to create a new separated debug file. 3. Given the linked binary, objcopy --strip-debug to create a copy of the binary without debug info. Finally you have a binary without debug info, and a separate debug file. But it would be better if the linker could just write the debug info into a separate file in the first place, then we'd only have the one step. (But, downside, the linker needs to manage all the debug info, which can be excessively large.) With "split-dwarf" ELF support: 1. Given object files (which exclude *most* but not all of the debuginfo), link a binary. The binary will include that smaller set of debug info. 2. Given the collection of dwo files corresponding to the object files, run the "dwp" tool to create a dwp file. 3. objcopy --only-keep-debug 4. --strip-debug And then you need to keep both a debug file *and* a dwp file, which is weird. I think, ideally, users would have the following three *good* options: Easy option: store debuginfo in the object files, and have the linker create a pair of {binary, separated dwarf-optimized debuginfo} files directly from the object files. More scalable option: emit (most of the) debuginfo in separate *.dwo files using -gsplit-dwarf, and then, 1. run the linker on the object files to create a pair of {binary, separated debuginfo} files. In this case the latter file contains the minimal debuginfo which was in the object files. 2. run a second tool, which reads the minimal debuginfo from above, and all the DWO files, and creates a full optimized/deduplicated debuginfo output file. Faster developer builds: Like previous, but omit step 2 -- running the debugger directly after step 1 can use the dwo files on-disk. I think we're not terribly far from that ideal, now, for ELF. Maybe only these three things need to be done? -- 1. Teach lld how to emit a separated debuginfo output file directly, without requiring an objcopy step. 2. Integrate DWARFLinker into lld. 3. Create a new tool which takes the separated debuginfo and DWO/DWP files and uses DWARFLinker library to create a new (dwarf-linked) separated-debug file, that doesn't depend on DWO/DWP files. My hope is that the tool you're creating will be the implementation of #3, but I'm afraid the intent is for this tool to be an additional stage that non-split-dwarf users would need to run post-link, *instead of* integrating DWARFLinker into lld. On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer &Out); > > 6. Address ranges and single addresses pointing to removed code should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and create > output > ELF file with optimized debug info. That version would be done out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for DWARFLinker. > dsymutil has > another type of input/output data: it takes several object files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options specific for > only one utility. > This means that these utilities(implementing command line interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by dsymutil. > Depending on the decision whether llvm-dwarfutil would be done as a > subproject > of dsymutil or as a separate utility - machO would be supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also would > require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding details. > > Generic Options: > > --help - Display available options (--help-hidden > for more) > --version - Display the version of this program > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/c0ebd72a/attachment.html>
Alexey via llvm-dev
2020-Aug-31 15:06 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Hi James, Thank you for the comments. >I think we're not terribly far from that ideal, now, for ELF. Maybe only these three things need to be done? -- > 1. Teach lld how to emit a separated debuginfo output file directly, without requiring an objcopy step. > 2. Integrate DWARFLinker into lld. > 3. Create a new tool which takes the separated debuginfo and DWO/DWP files and uses DWARFLinker library > to create a new (dwarf-linked) separated-debug file, that doesn't depend on DWO/DWP files. The three goals which you`ve described are our far goals. Indeed, the best solution would be to create valid optimized debug info without additional stages and additional modifications of resulting binaries. There was an attempt to use DWARFLinker from the lld - https://reviews.llvm.org/D74169 It did not receive enough support to be integrated yet. There are fair reasons for that: 1. Execution time. The time required by DWARFLinker for processing clang binary is 8x bigger than the usual linking time. Linking clang binary with DWARFLinker takes 72 sec, linking with the only lld takes 9 sec. 2. "Removing obsolete debug info" could not be switched off. Thus, lld could not use DWARFLinker for other tasks(like generation of index tables - .gdb_index, .debug_names) without significant performance degradation. 3. DWARFLinker does not support split dwarf at the moment. All these reasons are not blockers. And I believe implementation from D74169 might be integrated and incrementally improved if there would be agreement on that. Using DWARFLinker from llvm-dwarfutil is another possibility to use and improve it. When finally implemented - llvm-dwarfutil should solve the above three issues and there would probably be more reasons to include DWARFLinker into lld. Even if we would have the best solution - it is still useful to have a tool like llvm-dwarfutil for cases when it is necessary to process already created binaries. So in short, the suggested tool - llvm-dwarfutil - is a step towards the ideal solution. Its benefit is that it could be used until we created the best solution or for cases where "the best solution" is not applicable. Thank you, Alexey. On 29.08.2020 00:23, James Y Knight wrote:> If we're designing a new tool and process, it would be wonderful if it > did not require multiple stages of copying and slightly modifying the > binary, in order to create final output with separate debug info. It > seems to me that the variants of this sort of thing which exist today > are somewhat suboptimal. > > With Mach-O and dsymutil: > 1. Given a collection of object files (which contain debuginfo), > link a binary with ld. The binary then includes special references to > the object files that were actually used as part of the link. > 2. Given the linked binary, and all of the same object files, link > the debuginfo with dsymutil. > 3. Strip the references to the object file paths from the binary. > Finally, you have a binary without debug info, and a dsym debuginfo > file. But it would be better if the binary created in step 1 didn't > need to include the extraneous object-file path info, and that was > instead emitted in a second file. Then we wouldn't need step 3. > > With "normal" ELF: > 1. Given a collection of object files (which contain debuginfo), > link a binary with ld, which includes linking all the debug info into > the binary. > 2. Given the linked binary, objcopy --only-keep-debug to create a > new separated debug file. > 3. Given the linked binary, objcopy --strip-debug to create a copy > of the binary without debug info. > Finally you have a binary without debug info, and a separate debug > file. But it would be better if the linker could just write the debug > info into a separate file in the first place, then we'd only have the > one step. (But, downside, the linker needs to manage all the debug > info, which can be excessively large.) > > With "split-dwarf" ELF support: > 1. Given object files (which exclude /most/ but not all of the > debuginfo), link a binary. The binary will include that smaller set of > debug info. > 2. Given the collection of dwo files corresponding to the object > files, run the "dwp" tool to create a dwp file. > 3. objcopy --only-keep-debug > 4. --strip-debug > And then you need to keep both a debug file /and/ a dwp file, which > is weird. > > > I think, ideally, users would have the following three /good/ options: > Easy option: store debuginfo in the object files, and have the > linker create a pair of {binary, separated dwarf-optimized debuginfo} > files directly from the object files. > More scalable option: emit (most of the) debuginfo in separate *.dwo > files using -gsplit-dwarf, and then, > 1. run the linker on the object files to create a pair of {binary, > separated debuginfo} files. In this case the latter file contains the > minimal debuginfo which was in the object files. > 2. run a second tool, which reads the minimal debuginfo from > above, and all the DWO files, and creates a full > optimized/deduplicated debuginfo output file. > Faster developer builds: Like previous, but omit step 2 -- running > the debugger directly after step 1 can use the dwo files on-disk. > > I think we're not terribly far from that ideal, now, for ELF. Maybe > only these three things need to be done? -- > 1. Teach lld how to emit a separated debuginfo output file directly, > without requiring an objcopy step. > 2. Integrate DWARFLinker into lld. > 3. Create a new tool which takes the separated debuginfo and DWO/DWP > files and uses DWARFLinker library to create a new (dwarf-linked) > separated-debug file, that doesn't depend on DWO/DWP files. > > My hope is that the tool you're creating will be the implementation of > #3, but I'm afraid the intent is for this tool to be an additional > stage that non-split-dwarf users would need to run post-link, /instead > of/ integrating DWARFLinker into lld. > > On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by > the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting > size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all > debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built > binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm > library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way > that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from > llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, > Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer > &Out); > > 6. Address ranges and single addresses pointing to removed code > should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and > create > output > ELF file with optimized debug info. That version would be done > out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate > library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for > DWARFLinker. > dsymutil has > another type of input/output data: it takes several object > files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options > specific for > only one utility. > This means that these utilities(implementing command line > interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by > dsymutil. > Depending on the decision whether llvm-dwarfutil would be done > as a > subproject > of dsymutil or as a separate utility - machO would be > supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats > presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also > would require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding > details. > > Generic Options: > > --help - Display available options > (--help-hidden > for more) > --version - Display the version of this program > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/560d20ed/attachment-0001.html>
David Blaikie via llvm-dev
2020-Sep-01 03:24 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
On Fri, Aug 28, 2020 at 2:24 PM James Y Knight <jyknight at google.com> wrote:> If we're designing a new tool and process, it would be wonderful if it did > not require multiple stages of copying and slightly modifying the binary, > in order to create final output with separate debug info. It seems to me > that the variants of this sort of thing which exist today are somewhat > suboptimal. > > With Mach-O and dsymutil: > 1. Given a collection of object files (which contain debuginfo), link a > binary with ld. The binary then includes special references to the object > files that were actually used as part of the link. > 2. Given the linked binary, and all of the same object files, link the > debuginfo with dsymutil. > 3. Strip the references to the object file paths from the binary. > Finally, you have a binary without debug info, and a dsym debuginfo > file. But it would be better if the binary created in step 1 didn't need to > include the extraneous object-file path info, and that was instead emitted > in a second file. Then we wouldn't need step 3. > > With "normal" ELF: > 1. Given a collection of object files (which contain debuginfo), link a > binary with ld, which includes linking all the debug info into the binary. > 2. Given the linked binary, objcopy --only-keep-debug to create a new > separated debug file. > 3. Given the linked binary, objcopy --strip-debug to create a copy of > the binary without debug info. > Finally you have a binary without debug info, and a separate debug file. > But it would be better if the linker could just write the debug info into a > separate file in the first place, then we'd only have the one step. (But, > downside, the linker needs to manage all the debug info, which can be > excessively large.) > > With "split-dwarf" ELF support: > 1. Given object files (which exclude *most* but not all of the > debuginfo), link a binary. The binary will include that smaller set of > debug info. > 2. Given the collection of dwo files corresponding to the object > files, run the "dwp" tool to create a dwp file. > 3. objcopy --only-keep-debug > 4. --strip-debug > And then you need to keep both a debug file *and* a dwp file, which is > weird. > > > I think, ideally, users would have the following three *good* options: > Easy option: store debuginfo in the object files, and have the linker > create a pair of {binary, separated dwarf-optimized debuginfo} files > directly from the object files. >(as discussed by other replies - that was an early proposal, didn't gain a lot of traction/Eric & Ray weren't super convinced it was worth adding to lld at this stage, given the link time cost & thus the small expected user base)> More scalable option: emit (most of the) debuginfo in separate *.dwo > files using -gsplit-dwarf, and then, > 1. run the linker on the object files to create a pair of {binary, > separated debuginfo} files. In this case the latter file contains the > minimal debuginfo which was in the object files. >Yeah, that ^ is probably a nice feature regardless. Save folks an extra objcopy, etc. Usable right now for any build that is already running only-keep-debug/strip-debug.> 2. run a second tool, which reads the minimal debuginfo from above, > and all the DWO files, and creates a full optimized/deduplicated debuginfo > output file. >Fair - this then looks a lot like the MachO debug info distribution/linking model (with the advantage that the DWARF isn't in the .o files, so doesn't have to be shipped to the machine doing the linking), so far as I know.> Faster developer builds: Like previous, but omit step 2 -- running the > debugger directly after step 1 can use the dwo files on-disk. > > I think we're not terribly far from that ideal, now, for ELF. Maybe only > these three things need to be done? -- > 1. Teach lld how to emit a separated debuginfo output file directly, > without requiring an objcopy step. > 2. Integrate DWARFLinker into lld. > 3. Create a new tool which takes the separated debuginfo and DWO/DWP > files and uses DWARFLinker library to create a new (dwarf-linked) > separated-debug file, that doesn't depend on DWO/DWP files. > > My hope is that the tool you're creating will be the implementation of #3, > but I'm afraid the intent is for this tool to be an additional stage that > non-split-dwarf users would need to run post-link, *instead of* > integrating DWARFLinker into lld. >Yeah, that's the direction lld folks have pushed for - a post-processing, rather than link-time. Mostly due to the current performance of DWARF-aware linking being quite slow, so the idea that not many users would be willing to take that link-time performance hit to use the feature. (whereas as a post-processing step before archiving DWARF (like building a dwp from dwo files) it might be more appealing/interesting - and maybe with sufficient performance improvements, could then be rolled into lld as originally proposed) Curiously Alexey's needs include not wanting to use fission because a single debuggable binary simplifies his users use-case/makes it easier to distribute than two files. So he's probably not interested in the strip-debug/only-keep-debug kind of debug info distribution model, at least for his own users/use case. So far as I understand it. I've got mixed feelings about that - and encourage you to express/clarify/discuss your thoughts here, as I think the whole conversation could use some more voices. - Dave> > On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi, >> >> We propose llvm-dwarfutil - a dsymutil-like tool for ELF. >> Any thoughts on this? >> Thanks in advance, Alexey. >> >> =====================================================================>> >> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug >> info(DWARF) >> located in built binary files to improve debug info quality, >> reduce debug info size and accelerate debug info processing. >> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), >> WASM(Apndx C). >> >> =====================================================================>> >> Specifically, the tool would do: >> >> - Remove obsolete debug info which refers to code deleted by the linker >> doing the garbage collection (gc-sections). >> >> - Deduplicate debug type definitions for reducing resulting size of >> binary. >> >> - Build accelerator/index tables. >> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, >> .debug_pubtypes. >> >> - Strip unneeded tables. >> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, >> .debug_pubtypes. >> >> - Compress or decompress debug info as requested. >> >> Possible feature: >> >> - Join split dwarf .dwo files in a single file containing all debug >> info >> (convert split DWARF into monolithic DWARF). >> >> =====================================================================>> >> User interface: >> >> OVERVIEW: A tool for optimizing debug info located in the built binary. >> >> USAGE: llvm-dwarfutil [options] input output >> >> OPTIONS: (Apndx E) >> >> =====================================================================>> >> Implementation notes: >> >> 1. Removing obsolete debug info would be done using DWARFLinker llvm >> library. >> >> 2. Data types deduplication would be done using DWARFLinker llvm library. >> >> 3. Accelerator/index tables would be generated using DWARFLinker llvm >> library. >> >> 4. Interface of DWARFLinker library would be changed in such way that it >> would be possible to switch on/off various stages: >> >> class DWARFLinker { >> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); >> >> setDoAppleNames ( bool DoAppleNames = false ); >> setDoAppleNamespaces ( bool DoAppleNamespaces = false ); >> setDoAppleTypes ( bool DoAppleTypes = false ); >> setDoObjC ( bool DoObjC = false ); >> setDoDebugPubNames ( bool DoDebugPubNames = false ); >> setDoDebugPubTypes ( bool DoDebugPubTypes = false ); >> >> setDoDebugNames (bool DoDebugNames = false); >> setDoGDBIndex (bool DoGDBIndex = false); >> } >> >> 5. Copying source file contents, stripping tables, >> compressing/decompressing tables >> would be done by ObjCopy llvm library(extracted from llvm-objcopy): >> >> Error executeObjcopyOnBinary(const CopyConfig &Config, >> object::COFFObjectFile &In, Buffer &Out); >> Error executeObjcopyOnBinary(const CopyConfig &Config, >> object::ELFObjectFileBase &In, Buffer &Out); >> Error executeObjcopyOnBinary(const CopyConfig &Config, >> object::MachOObjectFile &In, Buffer &Out); >> Error executeObjcopyOnBinary(const CopyConfig &Config, >> object::WasmObjectFile &In, Buffer &Out); >> >> 6. Address ranges and single addresses pointing to removed code should >> be marked >> with tombstone value in the input file: >> >> -2 for .debug_ranges and .debug_loc. >> -1 for other .debug* tables. >> >> 7. Prototype implementation - https://reviews.llvm.org/D86539. >> >> =====================================================================>> >> Roadmap: >> >> 1. Refactor llvm-objcopy to extract it`s implementation into separate >> library >> ObjCopy(in LLVM tree). >> >> 2. Create a command line utility using existed DWARFLinker and ObjCopy >> implementation. First version is supposed to work with only ELF >> input object files. >> It would take input ELF file with unoptimized debug info and create >> output >> ELF file with optimized debug info. That version would be done out >> of the llvm tree. >> >> 3. Make a tool to be able to work in multi-thread mode. >> >> 4. Consider it to be included into LLVM tree. >> >> 5. Support DWARF5 tables. >> >> =====================================================================>> >> Appendix A. Should this tool be implemented as a new tool or as an >> extension >> to dsymutil/llvm-objcopy? >> >> There already exists a tool which removes obsolete debug info on >> darwin - dsymutil. >> Why create another tool instead of extending the already existed >> dsymutil/llvm-objcopy? >> >> The main functionality of dsymutil is located in a separate library >> - DWARFLinker. >> Thus, dsymutil utility is a command-line interface for DWARFLinker. >> dsymutil has >> another type of input/output data: it takes several object files and >> address map >> as input and creates a .dSYM bundle with linked debug info as >> output. llvm-dwarfutil >> would take a built executable as input and create an optimized >> executable as output. >> Additionally, there would be many command-line options specific for >> only one utility. >> This means that these utilities(implementing command line interface) >> would significantly >> differ. It makes sense not to put another command-line utility >> inside existing dsymutil, >> but make it as a separate utility. That is the reason why >> llvm-dwarfutil suggested to be >> implemented not as sub-part of dsymutil but as a separate tool. >> >> Please share your preference: whether llvm-dwarfutil should be >> separate utility, or a variant of dsymutil compiled for ELF? >> >> =====================================================================>> >> Appendix B. The machO object file format is already supported by dsymutil. >> Depending on the decision whether llvm-dwarfutil would be done as a >> subproject >> of dsymutil or as a separate utility - machO would be supported or >> not. >> >> =====================================================================>> >> Appendix C. Support for the COFF and WASM object file formats presented as >> possible future improvement. It would be quite easy to add them >> assuming >> that llvm-objcopy already supports these formats. It also would >> require >> supporting DWARF6-suggested tombstone values(-1/-2). >> >> =====================================================================>> >> Appendix D. Documentation. >> >> - proposal for DWARF6 which suggested -1/-2 values for marking bad >> addresses >> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 >> - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. >> - proposal "Remove obsolete debug info in lld." >> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html >> >> =====================================================================>> >> Appendix E. Possible command line options: >> >> DwarfUtil Options: >> >> --build-aranges - generate .debug_aranges table. >> --build-debug-names - generate .debug_names table. >> --build-debug-pubnames - generate .debug_pubnames table. >> --build-debug-pubtypes - generate .debug_pubtypes table. >> --build-gdb-index - generate .gdb_index table. >> --compress - Compress debug tables. >> --decompress - Decompress debug tables. >> --deduplicate-types - Do ODR deduplication for debug types. >> --garbage-collect - Do garbage collecting for debug info. >> --num-threads=<n> - Specify the maximum number (n) of >> simultaneous threads >> to use when optimizing input file. >> Defaults to the number of cores on the >> current machine. >> --strip-all - Strip all debug tables. >> --strip=<name1,name2> - Strip specified debug info tables. >> --strip-unoptimized-debug - Strip all unoptimized debug tables. >> --tombstone=<value> - Tombstone value used as a marker of >> invalid address. >> =bfd - BFD default value >> =dwarf6 - Dwarf v6. >> --verbose - Enable verbose logging and encoding >> details. >> >> Generic Options: >> >> --help - Display available options (--help-hidden >> for more) >> --version - Display the version of this program >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/996abee3/attachment.html>
David Blaikie via llvm-dev
2020-Sep-01 03:27 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
A quick note: The feature as currently proposed sounds like it's an exact match for 'dwz'? Is there any benefit to this over the existing dwz project? Is it different in some ways I'm not aware of? (I haven't actually used dwz, so I might have some mistaken ideas about how it should work) If it's going to solve the same general problem, but be in the llvm project instead, then maybe it should be called llvm-dwz. Though I understand the desire for this to grow other functionality, like DWARF-aware dwp-ing. Might be better for this to busybox and provide that functionality under llvm-dwp instead, or more likely I Suspect, that the existing llvm-dwp will be rewritten (probably by me) to use more of lld's infrastructure to be more efficient (it's current object reading/writing logic is using LLVM's libObject and MCStreamer, which is a bit inefficient for a very content-unaware linking process) and then maybe that could be taught to use DwarfLinker as a library to optionally do DWARF-aware linking depending on the users time/space tradeoff desires. Still benefiting from any improvements to the underlying DwarfLinker library (at which point that would be shared between llvm-dsymutil, llvm-dwz, and llvm-dwp). On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com> wrote:> Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer &Out); > > 6. Address ranges and single addresses pointing to removed code should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and create > output > ELF file with optimized debug info. That version would be done out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for DWARFLinker. > dsymutil has > another type of input/output data: it takes several object files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options specific for > only one utility. > This means that these utilities(implementing command line interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by dsymutil. > Depending on the decision whether llvm-dwarfutil would be done as a > subproject > of dsymutil or as a separate utility - machO would be supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also would > require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding details. > > Generic Options: > > --help - Display available options (--help-hidden > for more) > --version - Display the version of this program > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/61bdf279/attachment.html>
Alexey via llvm-dev
2020-Sep-01 13:36 UTC
[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
On 01.09.2020 06:27, David Blaikie wrote:> A quick note: The feature as currently proposed sounds like it's an > exact match for 'dwz'? Is there any benefit to this over the existing > dwz project? Is it different in some ways I'm not aware of? (I haven't > actually used dwz, so I might have some mistaken ideas about how it > should work) > > If it's going to solve the same general problem, but be in the llvm > project instead, then maybe it should be called llvm-dwz.It looks like dwz and llvm-dwarfutil are not exactly matched in functionality. dwz is a program that attempts to optimize DWARF debugging information contained in ELF shared libraries and ELF executables for *size*. llvm-dwarfutil is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info *quality*, reduce debug info *size* and accelerate debug info *processing*. Things which are supposed to be done by llvm-dwarfutil and which are not done by dwz: removing obsolete debug info, building indexes, stripping unneeded debug sections, compress/decompress debug sections. Common thing is that both of these tools do debug info size reduction. But they do this using different approaches: 1. dwz reduces the size of debug info by creating partial compilation units for duplicated parts. So that these partial compilation units could be imported in every duplicated place. AFAIU, That optimization gives the most size saving effect. another size saving optimization is ODR types deduplication. 2. llvm-dwarfutil reduces the size of debug info by ODR types deduplication which gives the most size saving effect in llvm-dwarfutil case. another size saving optimization is removing obsolete debug info. (which actually is not only about size but about correctness also) So, it looks like these tools are not equal. If we would consider that llvm-dwz is an extension of classic dwz then we could probably name it as llvm-dwz.> > Though I understand the desire for this to grow other functionality, > like DWARF-aware dwp-ing. Might be better for this to busybox and > provide that functionality under llvm-dwp instead, or more likely I > Suspect, that the existing llvm-dwp will be rewritten (probably by me) > to use more of lld's infrastructure to be more efficient (it's current > object reading/writing logic is using LLVM's libObject and MCStreamer, > which is a bit inefficient for a very content-unaware linking process) > and then maybe that could be taught to use DwarfLinker as a library to > optionally do DWARF-aware linking depending on the users time/space > tradeoff desires. Still benefiting from any improvements to the > underlying DwarfLinker library (at which point that would be shared > between llvm-dsymutil, llvm-dwz, and llvm-dwp). > > On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com > <mailto:avl.lapshin at gmail.com>> wrote: > > Hi, > > We propose llvm-dwarfutil - a dsymutil-like tool for ELF. > Any thoughts on this? > Thanks in advance, Alexey. > > =====================================================================> > llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug > info(DWARF) > located in built binary files to improve debug info quality, > reduce debug info size and accelerate debug info processing. > Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C), > WASM(Apndx C). > > =====================================================================> > Specifically, the tool would do: > > - Remove obsolete debug info which refers to code deleted by > the linker > doing the garbage collection (gc-sections). > > - Deduplicate debug type definitions for reducing resulting > size of > binary. > > - Build accelerator/index tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Strip unneeded tables. > = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames, > .debug_pubtypes. > > - Compress or decompress debug info as requested. > > Possible feature: > > - Join split dwarf .dwo files in a single file containing all > debug info > (convert split DWARF into monolithic DWARF). > > =====================================================================> > User interface: > > OVERVIEW: A tool for optimizing debug info located in the built > binary. > > USAGE: llvm-dwarfutil [options] input output > > OPTIONS: (Apndx E) > > =====================================================================> > Implementation notes: > > 1. Removing obsolete debug info would be done using DWARFLinker llvm > library. > > 2. Data types deduplication would be done using DWARFLinker llvm > library. > > 3. Accelerator/index tables would be generated using DWARFLinker llvm > library. > > 4. Interface of DWARFLinker library would be changed in such way > that it > would be possible to switch on/off various stages: > > class DWARFLinker { > setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false); > > setDoAppleNames ( bool DoAppleNames = false ); > setDoAppleNamespaces ( bool DoAppleNamespaces = false ); > setDoAppleTypes ( bool DoAppleTypes = false ); > setDoObjC ( bool DoObjC = false ); > setDoDebugPubNames ( bool DoDebugPubNames = false ); > setDoDebugPubTypes ( bool DoDebugPubTypes = false ); > > setDoDebugNames (bool DoDebugNames = false); > setDoGDBIndex (bool DoGDBIndex = false); > } > > 5. Copying source file contents, stripping tables, > compressing/decompressing tables > would be done by ObjCopy llvm library(extracted from > llvm-objcopy): > > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::COFFObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::ELFObjectFileBase &In, > Buffer &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::MachOObjectFile &In, Buffer > &Out); > Error executeObjcopyOnBinary(const CopyConfig &Config, > object::WasmObjectFile &In, Buffer > &Out); > > 6. Address ranges and single addresses pointing to removed code > should > be marked > with tombstone value in the input file: > > -2 for .debug_ranges and .debug_loc. > -1 for other .debug* tables. > > 7. Prototype implementation - https://reviews.llvm.org/D86539. > > =====================================================================> > Roadmap: > > 1. Refactor llvm-objcopy to extract it`s implementation into separate > library > ObjCopy(in LLVM tree). > > 2. Create a command line utility using existed DWARFLinker and ObjCopy > implementation. First version is supposed to work with only ELF > input object files. > It would take input ELF file with unoptimized debug info and > create > output > ELF file with optimized debug info. That version would be done > out > of the llvm tree. > > 3. Make a tool to be able to work in multi-thread mode. > > 4. Consider it to be included into LLVM tree. > > 5. Support DWARF5 tables. > > =====================================================================> > Appendix A. Should this tool be implemented as a new tool or as an > extension > to dsymutil/llvm-objcopy? > > There already exists a tool which removes obsolete debug info on > darwin - dsymutil. > Why create another tool instead of extending the already existed > dsymutil/llvm-objcopy? > > The main functionality of dsymutil is located in a separate > library > - DWARFLinker. > Thus, dsymutil utility is a command-line interface for > DWARFLinker. > dsymutil has > another type of input/output data: it takes several object > files and > address map > as input and creates a .dSYM bundle with linked debug info as > output. llvm-dwarfutil > would take a built executable as input and create an optimized > executable as output. > Additionally, there would be many command-line options > specific for > only one utility. > This means that these utilities(implementing command line > interface) > would significantly > differ. It makes sense not to put another command-line utility > inside existing dsymutil, > but make it as a separate utility. That is the reason why > llvm-dwarfutil suggested to be > implemented not as sub-part of dsymutil but as a separate tool. > > Please share your preference: whether llvm-dwarfutil should be > separate utility, or a variant of dsymutil compiled for ELF? > > =====================================================================> > Appendix B. The machO object file format is already supported by > dsymutil. > Depending on the decision whether llvm-dwarfutil would be done > as a > subproject > of dsymutil or as a separate utility - machO would be > supported or not. > > =====================================================================> > Appendix C. Support for the COFF and WASM object file formats > presented as > possible future improvement. It would be quite easy to add them > assuming > that llvm-objcopy already supports these formats. It also > would require > supporting DWARF6-suggested tombstone values(-1/-2). > > =====================================================================> > Appendix D. Documentation. > > - proposal for DWARF6 which suggested -1/-2 values for marking bad > addresses > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html. > - proposal "Remove obsolete debug info in lld." > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html > > =====================================================================> > Appendix E. Possible command line options: > > DwarfUtil Options: > > --build-aranges - generate .debug_aranges table. > --build-debug-names - generate .debug_names table. > --build-debug-pubnames - generate .debug_pubnames table. > --build-debug-pubtypes - generate .debug_pubtypes table. > --build-gdb-index - generate .gdb_index table. > --compress - Compress debug tables. > --decompress - Decompress debug tables. > --deduplicate-types - Do ODR deduplication for debug types. > --garbage-collect - Do garbage collecting for debug info. > --num-threads=<n> - Specify the maximum number (n) of > simultaneous threads > to use when optimizing input file. > Defaults to the number of cores on the > current machine. > --strip-all - Strip all debug tables. > --strip=<name1,name2> - Strip specified debug info tables. > --strip-unoptimized-debug - Strip all unoptimized debug tables. > --tombstone=<value> - Tombstone value used as a marker of > invalid address. > =bfd - BFD default value > =dwarf6 - Dwarf v6. > --verbose - Enable verbose logging and encoding > details. > > Generic Options: > > --help - Display available options > (--help-hidden > for more) > --version - Display the version of this program >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/512f3ac9/attachment.html>