Frédéric Riss
2014-Nov-07 17:53 UTC
[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper
> On Nov 7, 2014, at 9:20 AM, Shankar Easwaran <shankare at codeaurora.org> wrote: > > Hi Fred, > > Could this tool be extended to read DWARF information in the final image and then pack it differently for other architectures as well ?I guess it could, depending on what you exactly mean by “pack it differently”. It could certainly strip some parts, or merge it with other file’s debug information (But I’m not sure why you’d do that on a fully linked binary).> I believe, this could be important for Fission as well, when other formats accomodate Fission. > > Few OS'es like hp-ux used to run something called PXDB for this purpose. > > <snip from ld man page : http://nixdoc.net/man-pages/hp-ux/man1/ld_pa.1.html <http://nixdoc.net/man-pages/hp-ux/man1/ld_pa.1.html>> > The LD_PXDB environment variable defines the full execution path for > the debug preprocessor pxdb. The default value is > /opt/langtools/bin/pxdb. ld invokes pxdb on its output file if that > file is executable and contains debug information. To defer > invocation of pxdb until the first debug session, set LD_PXDB to > /bin/true. > > </snip> > > Few questions :- > > a) Will the utility understand that the linker garbage collected few functions and the utility not create map for it ?Yes. It’s not dsymutil that creates the map though. It’s the linker that emit the map, and the map tells dsymutil that some atoms aren’t present in the linked binary (in fact the map won’t mention these at all and that’s how the utility knows that they have been dropped).> b) How will it work with LTO ?With LTO you have to get access to the object file generated by the LTO link to be able to extract its debug info. ld64 has an option for that (-object_path_lto) that instructs it to write out the object file in the given path rather than /tmp/lto.o, and to not delete it when it has finished the link. It is then the build system’s duty to delete this temporary file once it has run dsymutil on the binary. This is cumbersome and is one of the reasons why the dsymutil link step should really be carried out by lld itself, so that the build system doesn’t need to be aware of that kind of subtelties. Fred> Shankar Easwaran > > On 11/7/2014 10:09 AM, Frédéric Riss wrote: >> Hi, >> >> [ I Cc'd lld people and debug info people. Apologies if I omitted some stakeholder. ] >> >> As stated in the subject, I’d like to start working on an in-tree reimplementation of Darwin’s dsymutil utility. This is an initial step on the path to having lld handle the debug information itself. >> >> For those who are not familiar with the debug flow on MacOS, dsymutil is a DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info found in the object files, instead it writes a “debug-map” in the linked binary. This debug-map describes what objects were linked together and what atoms of each object file are present in the binary along with their addresses. The debug-map has two uses: >> 1) During the build->debug cycle, lldb reads the debug-map and uses it to find the .o files and extract the relevant dwarf debug info. >> 2) For Release builds, dsymutil reads the debug-map then loads, merges, and optimizes all the dwarf debug info and writes it as as a .dSYM >> >> The long term goal is that dwarf linking functionality be available as a library for LLVM tools. Eventually, we’d like lld to be able to make use of the dwarf linking library and not need a stand along dsymutil tool. The first step is to use the dwarf linking library in a stand along dsymutil replacement tool. We want this tool to be bit-for-bit compatible with the existing Darwin dsymutil. >> >> The main reason we want to take the first step of a separate tool is testability. The code committed to the LLVM repository will feature unit tests, but they won’t offer the coverage that a real world usage would. I plan to run the new tool through big internal validation campaigns during which the llvm powered dsymutil output would be compared to the system’s dsymutil one. This is also the reason we aim for bit-for-bit compatibility. >> >> The current plan is to host the code in the llvm repository. dsymutil will make heavy use of libDebugInfo and won’t share anything with the lld codebase (The underlying concepts are just too different). It’s also not clear yet where most of the implementation logic will end up. I expect most of the core logic to be in tools/dsymutil, but some of it might be better folded directly into libDebugInfo. >> >> So how does it work? dsymutil doesn’t simply paste the debug sections together while applying relocations to them. This wouldn’t work for ld64 as it is able (like lld) to split the sections apart and discard/reorder the contents. Thus dsymutil needs some semantic knowledge of the DWARF contents to be able to “patch” the relocatable debug info with accurate values. It is also able to remove parts of the DIE tree that aren’t needed or to unique types across the compilation unit boundaries. In libDebugInfo, we have the needed tooling to read the debug info, but we currently lack the ability to write it back to disk. Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the bill, but I won't be sure until I try to write the code. I’ll see along the way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions welcome. >> >> Although the implementation of the dsymutil command line tool will be fairly Darwin specific (it accepts mach-o files as input and emits a dSYM bundle), most of the implementation will be format agnostic. I’ll make an effort to split the mach-o specific parts into their own files so that this code can be reused in a generic way. Would there be interest in that kind of code for other platforms also? What’s the story of lld Dwarf support for ELF? >> >> I plan on sending the initial code (that does basically only parse the debug map of mach-o files) out for review in the coming days if there are no objections to the general principle. >> >> Fred >> > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/4660a1d3/attachment.html>
Shankar Easwaran
2014-Nov-07 18:07 UTC
[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper
Thanks for your reply, Fred. It might work better, if its in a form of an API. So that the linker could call an API instead of running a tool ? On 11/7/2014 11:53 AM, Frédéric Riss wrote:>> On Nov 7, 2014, at 9:20 AM, Shankar Easwaran <shankare at codeaurora.org> wrote: >> >> Hi Fred, >> >> Could this tool be extended to read DWARF information in the final image and then pack it differently for other architectures as well ? > I guess it could, depending on what you exactly mean by “pack it differently”. It could certainly strip some parts, or merge it with other file’s debug information (But I’m not sure why you’d do that on a fully linked binary).I meant pack it differently for de-duplication. Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
Frédéric Riss
2014-Nov-07 18:25 UTC
[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper
> On Nov 7, 2014, at 10:07 AM, Shankar Easwaran <shankare at codeaurora.org> wrote: > > Thanks for your reply, Fred. > > It might work better, if its in a form of an API. So that the linker could call an API instead of running a tool ?Yes, the long term goal would be to expose it as an API. The separate tool is just a first step. I can’t give you the exact shape of the API know, but basically the main entry point would be something like: DwarfLinker::link(const DebugMap& map); Where the DebugMap is a collection of object files containing debug info associated with symbol mappings. In the end the dsymutil utility should just be a thin command line wrapper around that.> On 11/7/2014 11:53 AM, Frédéric Riss wrote: >>> On Nov 7, 2014, at 9:20 AM, Shankar Easwaran <shankare at codeaurora.org> wrote: >>> >>> Hi Fred, >>> >>> Could this tool be extended to read DWARF information in the final image and then pack it differently for other architectures as well ? >> I guess it could, depending on what you exactly mean by “pack it differently”. It could certainly strip some parts, or merge it with other file’s debug information (But I’m not sure why you’d do that on a fully linked binary). > I meant pack it differently for de-duplication.I see. Just in order to reduce the debug info size. I /think/ it should be possible to apply some of dsymutil’s optimizations to a linked dwarf binary. It’s outside of the scope of what I plan to do for the initial implementation, but it should be possible to reus of the code to implement that. I can definitely try to keep that usecase in mind when I write the code though. Fred> Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation >
Eric Christopher
2014-Nov-10 20:32 UTC
[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper
On Fri Nov 07 2014 at 9:53:14 AM Frédéric Riss <friss at apple.com> wrote:> On Nov 7, 2014, at 9:20 AM, Shankar Easwaran <shankare at codeaurora.org> > wrote: > > Hi Fred, > > Could this tool be extended to read DWARF information in the final image > and then pack it differently for other architectures as well ? > > > I guess it could, depending on what you exactly mean by “pack it > differently”. It could certainly strip some parts, or merge it with other > file’s debug information (But I’m not sure why you’d do that on a fully > linked binary). > >FWIW you'll want to look at some tools like dwz etc that can produce debug packages that deduplicate debug info across entire linked objects like open office. It's a pretty useful tool for small archival debug info if you only want to debug a particular process rather than having debug info for a particular set of libraries on the system. b) How will it work with LTO ?> With LTO you have to get access to the object file generated by the LTO > link to be able to extract its debug info. ld64 has an option for that > (-object_path_lto) that instructs it to write out the object file in the > given path rather than /tmp/lto.o, and to not delete it when it has > finished the link. It is then the build system’s duty to delete this > temporary file once it has run dsymutil on the binary. This is cumbersome > and is one of the reasons why the dsymutil link step should really be > carried out by lld itself, so that the build system doesn’t need to be > aware of that kind of subtelties. > >Seems reasonable. It would be nice to still be able to call it by hand as well IMO - for testing if nothing else. -eric> Fred > > Shankar Easwaran > > On 11/7/2014 10:09 AM, Frédéric Riss wrote: > > Hi, > > [ I Cc'd lld people and debug info people. Apologies if I omitted some stakeholder. ] > > As stated in the subject, I’d like to start working on an in-tree reimplementation of Darwin’s dsymutil utility. This is an initial step on the path to having lld handle the debug information itself. > > For those who are not familiar with the debug flow on MacOS, dsymutil is a DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info found in the object files, instead it writes a “debug-map” in the linked binary. This debug-map describes what objects were linked together and what atoms of each object file are present in the binary along with their addresses. The debug-map has two uses: > 1) During the build->debug cycle, lldb reads the debug-map and uses it to find the .o files and extract the relevant dwarf debug info. > 2) For Release builds, dsymutil reads the debug-map then loads, merges, and optimizes all the dwarf debug info and writes it as as a .dSYM > > The long term goal is that dwarf linking functionality be available as a library for LLVM tools. Eventually, we’d like lld to be able to make use of the dwarf linking library and not need a stand along dsymutil tool. The first step is to use the dwarf linking library in a stand along dsymutil replacement tool. We want this tool to be bit-for-bit compatible with the existing Darwin dsymutil. > > The main reason we want to take the first step of a separate tool is testability. The code committed to the LLVM repository will feature unit tests, but they won’t offer the coverage that a real world usage would. I plan to run the new tool through big internal validation campaigns during which the llvm powered dsymutil output would be compared to the system’s dsymutil one. This is also the reason we aim for bit-for-bit compatibility. > > The current plan is to host the code in the llvm repository. dsymutil will make heavy use of libDebugInfo and won’t share anything with the lld codebase (The underlying concepts are just too different). It’s also not clear yet where most of the implementation logic will end up. I expect most of the core logic to be in tools/dsymutil, but some of it might be better folded directly into libDebugInfo. > > So how does it work? dsymutil doesn’t simply paste the debug sections together while applying relocations to them. This wouldn’t work for ld64 as it is able (like lld) to split the sections apart and discard/reorder the contents. Thus dsymutil needs some semantic knowledge of the DWARF contents to be able to “patch” the relocatable debug info with accurate values. It is also able to remove parts of the DIE tree that aren’t needed or to unique types across the compilation unit boundaries. In libDebugInfo, we have the needed tooling to read the debug info, but we currently lack the ability to write it back to disk. Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the bill, but I won't be sure until I try to write the code. I’ll see along the way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions welcome. > > Although the implementation of the dsymutil command line tool will be fairly Darwin specific (it accepts mach-o files as input and emits a dSYM bundle), most of the implementation will be format agnostic. I’ll make an effort to split the mach-o specific parts into their own files so that this code can be reused in a generic way. Would there be interest in that kind of code for other platforms also? What’s the story of lld Dwarf support for ELF? > > I plan on sending the initial code (that does basically only parse the debug map of mach-o files) out for review in the coming days if there are no objections to the general principle. > > Fred > > > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141110/7e1ab642/attachment.html>