thr3ads.net - llvm dev - [LLVMdev] Reimplementing Darwin's dsymutil as an lld helper [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Frédéric Riss

2014-Nov-07 16:09 UTC

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

Hi,

[ I Cc'd lld people and debug info people. Apologies if I omitted some
stakeholder. ]

As stated in the subject, I’d like to start working on an in-tree
reimplementation of Darwin’s dsymutil utility. This is an initial step on the
path to having lld handle the debug information itself.

For those who are not familiar with the debug flow on MacOS, dsymutil is a DWARF
linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info found in the
object files, instead it writes a “debug-map” in the linked binary. This
debug-map describes what objects were linked together and what atoms of each
object file are present in the binary along with their addresses. The debug-map
has two uses:
1) During the build->debug cycle, lldb reads the debug-map and uses it to
find the .o files and extract the relevant dwarf debug info.
2) For Release builds, dsymutil reads the debug-map then loads, merges, and
optimizes all the dwarf debug info and writes it as as a .dSYM

The long term goal is that dwarf linking functionality be available as a library
for LLVM tools. Eventually, we’d like lld to be able to make use of the dwarf
linking library and not need a stand along dsymutil tool.  The first step is to
use the dwarf linking library in a stand along dsymutil replacement tool. We
want this tool to be bit-for-bit compatible with the existing Darwin dsymutil.

The main reason we want to take the first step of a separate tool is
testability. The code committed to the LLVM repository will feature unit tests,
but they won’t offer the coverage that a real world usage would. I plan to run
the new tool through big internal validation campaigns during which the llvm
powered dsymutil output would be compared to the system’s dsymutil one. This is
also the reason we aim for bit-for-bit compatibility.

The current plan is to host the code in the llvm repository. dsymutil will make
heavy use of libDebugInfo and won’t share anything with the lld codebase (The
underlying concepts are just too different). It’s also not clear yet where most
of the implementation logic will end up. I expect most of the core logic to be
in tools/dsymutil, but some of it might be better folded directly into
libDebugInfo.

So how does it work? dsymutil doesn’t simply paste the debug sections together
while applying relocations to them. This wouldn’t work for ld64 as it is able
(like lld) to split the sections apart and discard/reorder the contents. Thus
dsymutil needs some semantic knowledge of the DWARF contents to be able to
“patch” the relocatable debug info with accurate values. It is also able to
remove parts of the DIE tree that aren’t needed or to unique types across the
compilation unit boundaries. In libDebugInfo, we have the needed tooling to read
the debug info, but we currently lack the ability to write it back to disk.
Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the
bill, but I won't be sure until I try to write the code. I’ll see along the
way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions
welcome.

Although the implementation of the dsymutil command line tool will be fairly
Darwin specific (it accepts mach-o files as input and emits a dSYM bundle), most
of the implementation will be format agnostic. I’ll make an effort to split the
mach-o specific parts into their own files so that this code can be reused in a
generic way. Would there be interest in that kind of code for other platforms
also? What’s the story of lld Dwarf support for ELF?

I plan on sending the initial code (that does basically only parse the debug map
of mach-o files) out for review in the coming days if there are no objections to
the general principle.

Fred

Shankar Easwaran

2014-Nov-07 17:20 UTC

head link

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

Hi Fred,

Could this tool be extended to read DWARF information in the final image 
and then pack it differently for other architectures as well ?

I believe, this could be important for Fission as well, when other 
formats accomodate Fission.

Few OS'es like hp-ux used to run something called PXDB for this purpose.

<snip from ld man page : 
http://nixdoc.net/man-pages/hp-ux/man1/ld_pa.1.html>

The*LD_PXDB*  environment variable defines the full execution path for
       the debug preprocessor*pxdb*.  The default value is
       */opt/langtools/bin/pxdb*.*ld*  invokes*pxdb*  on its output file if that
       file is executable and contains debug information.  To defer
       invocation of*pxdb*  until the first debug session, set*LD_PXDB*  to
       */bin/true*.

</snip>

_Few questions_:-

a) Will the utility understand that the linker garbage collected few functions
and the utility not create map for it ?
b) How will it work with LTO ?

Shankar Easwaran

On 11/7/2014 10:09 AM, Frédéric Riss wrote:> Hi,
>
> [ I Cc'd lld people and debug info people. Apologies if I omitted some
stakeholder. ]
>
> As stated in the subject, I’d like to start working on an in-tree
reimplementation of Darwin’s dsymutil utility. This is an initial step on the
path to having lld handle the debug information itself.
>
> For those who are not familiar with the debug flow on MacOS, dsymutil is a
DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info found in
the object files, instead it writes a “debug-map” in the linked binary. This
debug-map describes what objects were linked together and what atoms of each
object file are present in the binary along with their addresses. The debug-map
has two uses:
> 1) During the build->debug cycle, lldb reads the debug-map and uses it
to find the .o files and extract the relevant dwarf debug info.
> 2) For Release builds, dsymutil reads the debug-map then loads, merges, and
optimizes all the dwarf debug info and writes it as as a .dSYM
>
> The long term goal is that dwarf linking functionality be available as a
library for LLVM tools. Eventually, we’d like lld to be able to make use of the
dwarf linking library and not need a stand along dsymutil tool.  The first step
is to use the dwarf linking library in a stand along dsymutil replacement tool.
We want this tool to be bit-for-bit compatible with the existing Darwin
dsymutil.
>
> The main reason we want to take the first step of a separate tool is
testability. The code committed to the LLVM repository will feature unit tests,
but they won’t offer the coverage that a real world usage would. I plan to run
the new tool through big internal validation campaigns during which the llvm
powered dsymutil output would be compared to the system’s dsymutil one. This is
also the reason we aim for bit-for-bit compatibility.
>
> The current plan is to host the code in the llvm repository. dsymutil will
make heavy use of libDebugInfo and won’t share anything with the lld codebase
(The underlying concepts are just too different). It’s also not clear yet where
most of the implementation logic will end up. I expect most of the core logic to
be in tools/dsymutil, but some of it might be better folded directly into
libDebugInfo.
>
> So how does it work? dsymutil doesn’t simply paste the debug sections
together while applying relocations to them. This wouldn’t work for ld64 as it
is able (like lld) to split the sections apart and discard/reorder the contents.
Thus dsymutil needs some semantic knowledge of the DWARF contents to be able to
“patch” the relocatable debug info with accurate values. It is also able to
remove parts of the DIE tree that aren’t needed or to unique types across the
compilation unit boundaries. In libDebugInfo, we have the needed tooling to read
the debug info, but we currently lack the ability to write it back to disk.
Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the
bill, but I won't be sure until I try to write the code. I’ll see along the
way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions
welcome.
>
> Although the implementation of the dsymutil command line tool will be
fairly Darwin specific (it accepts mach-o files as input and emits a dSYM
bundle), most of the implementation will be format agnostic. I’ll make an effort
to split the mach-o specific parts into their own files so that this code can be
reused in a generic way. Would there be interest in that kind of code for other
platforms also? What’s the story of lld Dwarf support for ELF?
>
> I plan on sending the initial code (that does basically only parse the
debug map of mach-o files) out for review in the coming days if there are no
objections to the general principle.
>
> Fred
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/ecd60093/attachment.html>

Frédéric Riss

2014-Nov-07 17:53 UTC

head link

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

> On Nov 7, 2014, at 9:20 AM, Shankar Easwaran <shankare at
codeaurora.org> wrote:
> 
> Hi Fred,
> 
> Could this tool be extended to read DWARF information in the final image
and then pack it differently for other architectures as well ?
I guess it could, depending on what you exactly mean by “pack it differently”.
It could certainly strip some parts, or merge it with other file’s debug
information (But I’m not sure why you’d do that on a fully linked binary).
> I believe, this could be important for Fission as well, when other formats
accomodate Fission.
> 
> Few OS'es like hp-ux used to run something called PXDB for this
purpose.
> 
> <snip from ld man page :
http://nixdoc.net/man-pages/hp-ux/man1/ld_pa.1.html
<http://nixdoc.net/man-pages/hp-ux/man1/ld_pa.1.html>>
> The LD_PXDB environment variable defines the full execution path for
>       the debug preprocessor pxdb.  The default value is
>       /opt/langtools/bin/pxdb.  ld invokes pxdb on its output file if that
>       file is executable and contains debug information.  To defer
>       invocation of pxdb until the first debug session, set LD_PXDB to
>       /bin/true.
> 
> </snip>
> 
> Few questions :-
> 
> a) Will the utility understand that the linker garbage collected few
functions and the utility not create map for it ?Yes. It’s not dsymutil that creates the map though. It’s the linker that emit
the map, and the map tells dsymutil that some atoms aren’t present in the linked
binary (in fact the map won’t mention these at all and that’s how the utility
knows that they have been dropped).
> b) How will it work with LTO ?With LTO you have to get access to the object file generated by the LTO link to
be able to extract its debug info. ld64 has an option for that
(-object_path_lto) that instructs it to write out the object file in the given
path rather than /tmp/lto.o, and to not delete it when it has finished the link.
It is then the build system’s duty to delete this temporary file once it has run
dsymutil on the binary. This is cumbersome and is one of the reasons why the
dsymutil link step should really be carried out by lld itself, so that the build
system doesn’t need to be aware of that kind of subtelties.

Fred
> Shankar Easwaran
> 
> On 11/7/2014 10:09 AM, Frédéric Riss wrote:
>> Hi,
>> 
>> [ I Cc'd lld people and debug info people. Apologies if I omitted
some stakeholder. ]
>> 
>> As stated in the subject, I’d like to start working on an in-tree
reimplementation of Darwin’s dsymutil utility. This is an initial step on the
path to having lld handle the debug information itself.
>> 
>> For those who are not familiar with the debug flow on MacOS, dsymutil
is a DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info
found in the object files, instead it writes a “debug-map” in the linked binary.
This debug-map describes what objects were linked together and what atoms of
each object file are present in the binary along with their addresses. The
debug-map has two uses:
>> 1) During the build->debug cycle, lldb reads the debug-map and uses
it to find the .o files and extract the relevant dwarf debug info.
>> 2) For Release builds, dsymutil reads the debug-map then loads, merges,
and optimizes all the dwarf debug info and writes it as as a .dSYM
>> 
>> The long term goal is that dwarf linking functionality be available as
a library for LLVM tools. Eventually, we’d like lld to be able to make use of
the dwarf linking library and not need a stand along dsymutil tool.  The first
step is to use the dwarf linking library in a stand along dsymutil replacement
tool. We want this tool to be bit-for-bit compatible with the existing Darwin
dsymutil.
>> 
>> The main reason we want to take the first step of a separate tool is
testability. The code committed to the LLVM repository will feature unit tests,
but they won’t offer the coverage that a real world usage would. I plan to run
the new tool through big internal validation campaigns during which the llvm
powered dsymutil output would be compared to the system’s dsymutil one. This is
also the reason we aim for bit-for-bit compatibility.
>> 
>> The current plan is to host the code in the llvm repository. dsymutil
will make heavy use of libDebugInfo and won’t share anything with the lld
codebase (The underlying concepts are just too different). It’s also not clear
yet where most of the implementation logic will end up. I expect most of the
core logic to be in tools/dsymutil, but some of it might be better folded
directly into libDebugInfo.
>> 
>> So how does it work? dsymutil doesn’t simply paste the debug sections
together while applying relocations to them. This wouldn’t work for ld64 as it
is able (like lld) to split the sections apart and discard/reorder the contents.
Thus dsymutil needs some semantic knowledge of the DWARF contents to be able to
“patch” the relocatable debug info with accurate values. It is also able to
remove parts of the DIE tree that aren’t needed or to unique types across the
compilation unit boundaries. In libDebugInfo, we have the needed tooling to read
the debug info, but we currently lack the ability to write it back to disk.
Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the
bill, but I won't be sure until I try to write the code. I’ll see along the
way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions
welcome.
>> 
>> Although the implementation of the dsymutil command line tool will be
fairly Darwin specific (it accepts mach-o files as input and emits a dSYM
bundle), most of the implementation will be format agnostic. I’ll make an effort
to split the mach-o specific parts into their own files so that this code can be
reused in a generic way. Would there be interest in that kind of code for other
platforms also? What’s the story of lld Dwarf support for ELF?
>> 
>> I plan on sending the initial code (that does basically only parse the
debug map of mach-o files) out for review in the coming days if there are no
objections to the general principle.
>> 
>> Fred
>> 
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/4660a1d3/attachment.html>

Alexey Samsonov

2014-Nov-07 19:26 UTC

head link

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

On Fri, Nov 7, 2014 at 8:09 AM, Frédéric Riss <friss at apple.com> wrote:
> Hi,
>
> [ I Cc'd lld people and debug info people. Apologies if I omitted some
> stakeholder. ]
>
> As stated in the subject, I’d like to start working on an in-tree
> reimplementation of Darwin’s dsymutil utility. This is an initial step on
> the path to having lld handle the debug information itself.
>
> For those who are not familiar with the debug flow on MacOS, dsymutil is a
> DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info
> found in the object files, instead it writes a “debug-map” in the linked
> binary. This debug-map describes what objects were linked together and what
> atoms of each object file are present in the binary along with their
> addresses. The debug-map has two uses:
> 1) During the build->debug cycle, lldb reads the debug-map and uses it
to
> find the .o files and extract the relevant dwarf debug info.
> 2) For Release builds, dsymutil reads the debug-map then loads, merges,
> and optimizes all the dwarf debug info and writes it as as a .dSYM
>
> The long term goal is that dwarf linking functionality be available as a
> library for LLVM tools. Eventually, we’d like lld to be able to make use of
> the dwarf linking library and not need a stand along dsymutil tool.  The
> first step is to use the dwarf linking library in a stand along dsymutil
> replacement tool. We want this tool to be bit-for-bit compatible with the
> existing Darwin dsymutil.
>
> The main reason we want to take the first step of a separate tool is
> testability. The code committed to the LLVM repository will feature unit
> tests, but they won’t offer the coverage that a real world usage would. I
> plan to run the new tool through big internal validation campaigns during
> which the llvm powered dsymutil output would be compared to the system’s
> dsymutil one. This is also the reason we aim for bit-for-bit compatibility.
>
> The current plan is to host the code in the llvm repository. dsymutil will
> make heavy use of libDebugInfo and won’t share anything with the lld
> codebase (The underlying concepts are just too different). It’s also not
> clear yet where most of the implementation logic will end up. I expect most
> of the core logic to be in tools/dsymutil, but some of it might be better
> folded directly into libDebugInfo.
>
> So how does it work? dsymutil doesn’t simply paste the debug sections
> together while applying relocations to them. This wouldn’t work for ld64 as
> it is able (like lld) to split the sections apart and discard/reorder the
> contents. Thus dsymutil needs some semantic knowledge of the DWARF contents
> to be able to “patch” the relocatable debug info with accurate values. It
> is also able to remove parts of the DIE tree that aren’t needed or to
> unique types across the compilation unit boundaries. In libDebugInfo, we
> have the needed tooling to read the debug info, but we currently lack the
> ability to write it back to disk. Maybe what’s in lib/CodeGen/AsmPrinter to
> emit the debug info would fit the bill, but I won't be sure until I try
to
> write the code. I’ll see along the way if libDebugInfo should grow it’s own
> Dwarf streaming capabilities. Opinions welcome.
>
> Although the implementation of the dsymutil command line tool will be
> fairly Darwin specific (it accepts mach-o files as input and emits a dSYM
> bundle), most of the implementation will be format agnostic. I’ll make an
> effort to split the mach-o specific parts into their own files so that this
> code can be reused in a generic way. Would there be interest in that kind
> of code for other platforms also? What’s the story of lld Dwarf support for
> ELF?
>
> I plan on sending the initial code (that does basically only parse the
> debug map of mach-o files) out for review in the coming days if there are
> no objections to the general principle.
>
Sounds reasonable to me. It would be nice to have dsymutil implemented as
an LLVM tool, update is as needed as we change the debug info emitted by
the compiler, ensure that it understands and
behaves well with reduced -gline-tables-only debug info, etc.

It also sounds like you'd have to extend libDebugInfo with DWARF emission
capabilities, that is, reuse part of the code currently stored in
AsmPrinter. Note that currently LLVM backend tools (and Clang) doesn't
depend on libDebugInfo, and it's probably a right thing - they don't
need
to read/analyze DWARF or symbolize addresses. I wonder if we'd have to
change the library layout - have some generic library that would describe
DWARF entities (something more powerful than a bunch of enums declared in
Support/Dwarf.h), and make current AsmPrinter and DebugInfo its two
specialized users. In this way, Clang can only contain the former,
llvm-dwarfdump and llvm-symbolizer can only contain the latter, and DWARF
transformation tools (be it linker, or dsymutil) can contain both.

>
> Fred
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/a2f8def2/attachment.html>

Frédéric Riss

2014-Nov-07 19:54 UTC

head link

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

> On Nov 7, 2014, at 11:26 AM, Alexey Samsonov <vonosmas at gmail.com>
wrote:
> 
> 
> On Fri, Nov 7, 2014 at 8:09 AM, Frédéric Riss <friss at apple.com
<mailto:friss at apple.com>> wrote:
> Hi,
> 
> [ I Cc'd lld people and debug info people. Apologies if I omitted some
stakeholder. ]
> 
> As stated in the subject, I’d like to start working on an in-tree
reimplementation of Darwin’s dsymutil utility. This is an initial step on the
path to having lld handle the debug information itself.
> 
> For those who are not familiar with the debug flow on MacOS, dsymutil is a
DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info found in
the object files, instead it writes a “debug-map” in the linked binary. This
debug-map describes what objects were linked together and what atoms of each
object file are present in the binary along with their addresses. The debug-map
has two uses:
> 1) During the build->debug cycle, lldb reads the debug-map and uses it
to find the .o files and extract the relevant dwarf debug info.
> 2) For Release builds, dsymutil reads the debug-map then loads, merges, and
optimizes all the dwarf debug info and writes it as as a .dSYM
> 
> The long term goal is that dwarf linking functionality be available as a
library for LLVM tools. Eventually, we’d like lld to be able to make use of the
dwarf linking library and not need a stand along dsymutil tool.  The first step
is to use the dwarf linking library in a stand along dsymutil replacement tool.
We want this tool to be bit-for-bit compatible with the existing Darwin
dsymutil.
> 
> The main reason we want to take the first step of a separate tool is
testability. The code committed to the LLVM repository will feature unit tests,
but they won’t offer the coverage that a real world usage would. I plan to run
the new tool through big internal validation campaigns during which the llvm
powered dsymutil output would be compared to the system’s dsymutil one. This is
also the reason we aim for bit-for-bit compatibility.
> 
> The current plan is to host the code in the llvm repository. dsymutil will
make heavy use of libDebugInfo and won’t share anything with the lld codebase
(The underlying concepts are just too different). It’s also not clear yet where
most of the implementation logic will end up. I expect most of the core logic to
be in tools/dsymutil, but some of it might be better folded directly into
libDebugInfo.
> 
> So how does it work? dsymutil doesn’t simply paste the debug sections
together while applying relocations to them. This wouldn’t work for ld64 as it
is able (like lld) to split the sections apart and discard/reorder the contents.
Thus dsymutil needs some semantic knowledge of the DWARF contents to be able to
“patch” the relocatable debug info with accurate values. It is also able to
remove parts of the DIE tree that aren’t needed or to unique types across the
compilation unit boundaries. In libDebugInfo, we have the needed tooling to read
the debug info, but we currently lack the ability to write it back to disk.
Maybe what’s in lib/CodeGen/AsmPrinter to emit the debug info would fit the
bill, but I won't be sure until I try to write the code. I’ll see along the
way if libDebugInfo should grow it’s own Dwarf streaming capabilities. Opinions
welcome.
> 
> Although the implementation of the dsymutil command line tool will be
fairly Darwin specific (it accepts mach-o files as input and emits a dSYM
bundle), most of the implementation will be format agnostic. I’ll make an effort
to split the mach-o specific parts into their own files so that this code can be
reused in a generic way. Would there be interest in that kind of code for other
platforms also? What’s the story of lld Dwarf support for ELF?
> 
> I plan on sending the initial code (that does basically only parse the
debug map of mach-o files) out for review in the coming days if there are no
objections to the general principle.
> 
> Sounds reasonable to me. It would be nice to have dsymutil implemented as
an LLVM tool, update is as needed as we change the debug info emitted by the
compiler, ensure that it understands and
> behaves well with reduced -gline-tables-only debug info, etc.
Yes, once the tool is complete, this will be the first benefit we get from
having it in tree.
> It also sounds like you'd have to extend libDebugInfo with DWARF
emission capabilities, that is, reuse part of the code currently stored in
AsmPrinter. Note that currently LLVM backend tools (and Clang) doesn't
depend on libDebugInfo, and it's probably a right thing - they don't
need to read/analyze DWARF or symbolize addresses. I wonder if we'd have to
change the library layout - have some generic library that would describe DWARF
entities (something more powerful than a bunch of enums declared in
Support/Dwarf.h), and make current AsmPrinter and DebugInfo its two specialized
users. In this way, Clang can only contain the former, llvm-dwarfdump and
llvm-symbolizer can only contain the latter, and DWARF transformation tools (be
it linker, or dsymutil) can contain both.
I’m open to opinions about this, but I also think we need to see the code to
make an educated decision. My current plan is to try to reuse what’s in
AsmPrinter first. I’ll then see if the abstraction’s overhead isn’t too high
(the streaming use of dsymutil is fairly different from the compiler debug info
emission use). All that’s needed if that works out is a bridge between
llvm::DWARFDebugInfoEntryMinimal and llvm::DIE. I can host the bridge in the
tool directory for a start and then once we know this works, we can decide to
restructure the libraries to integrate it as some kind of common abstraction.

Fred

> 
> Fred
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> 
> -- 
> Alexey Samsonov
> vonosmas at gmail.com <mailto:vonosmas at gmail.com>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/fb0cab6f/attachment.html>

Rui Ueyama

2014-Nov-07 23:37 UTC

head link

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

Seems I have no enough knowledge both on the Darwin-specific command and on
the debug info in general, but as far as I can tell I didn't see any red
sign. This is on my to-do list to bring DWARF support for ELF and for
PE/COFF to LLD, so I'm happy to see you have started working on the DWARF
support.

On Fri, Nov 7, 2014 at 8:09 AM, Frédéric Riss <friss at apple.com> wrote:
> Hi,
>
> [ I Cc'd lld people and debug info people. Apologies if I omitted some
> stakeholder. ]
>
> As stated in the subject, I’d like to start working on an in-tree
> reimplementation of Darwin’s dsymutil utility. This is an initial step on
> the path to having lld handle the debug information itself.
>
> For those who are not familiar with the debug flow on MacOS, dsymutil is a
> DWARF linker. Darwin’s linker (ld64) doesn’t link the DWARF debug info
> found in the object files, instead it writes a “debug-map” in the linked
> binary. This debug-map describes what objects were linked together and what
> atoms of each object file are present in the binary along with their
> addresses. The debug-map has two uses:
> 1) During the build->debug cycle, lldb reads the debug-map and uses it
to
> find the .o files and extract the relevant dwarf debug info.
> 2) For Release builds, dsymutil reads the debug-map then loads, merges,
> and optimizes all the dwarf debug info and writes it as as a .dSYM
>
> The long term goal is that dwarf linking functionality be available as a
> library for LLVM tools. Eventually, we’d like lld to be able to make use of
> the dwarf linking library and not need a stand along dsymutil tool.  The
> first step is to use the dwarf linking library in a stand along dsymutil
> replacement tool. We want this tool to be bit-for-bit compatible with the
> existing Darwin dsymutil.
>
> The main reason we want to take the first step of a separate tool is
> testability. The code committed to the LLVM repository will feature unit
> tests, but they won’t offer the coverage that a real world usage would. I
> plan to run the new tool through big internal validation campaigns during
> which the llvm powered dsymutil output would be compared to the system’s
> dsymutil one. This is also the reason we aim for bit-for-bit compatibility.
>
> The current plan is to host the code in the llvm repository. dsymutil will
> make heavy use of libDebugInfo and won’t share anything with the lld
> codebase (The underlying concepts are just too different). It’s also not
> clear yet where most of the implementation logic will end up. I expect most
> of the core logic to be in tools/dsymutil, but some of it might be better
> folded directly into libDebugInfo.
>
> So how does it work? dsymutil doesn’t simply paste the debug sections
> together while applying relocations to them. This wouldn’t work for ld64 as
> it is able (like lld) to split the sections apart and discard/reorder the
> contents. Thus dsymutil needs some semantic knowledge of the DWARF contents
> to be able to “patch” the relocatable debug info with accurate values. It
> is also able to remove parts of the DIE tree that aren’t needed or to
> unique types across the compilation unit boundaries. In libDebugInfo, we
> have the needed tooling to read the debug info, but we currently lack the
> ability to write it back to disk. Maybe what’s in lib/CodeGen/AsmPrinter to
> emit the debug info would fit the bill, but I won't be sure until I try
to
> write the code. I’ll see along the way if libDebugInfo should grow it’s own
> Dwarf streaming capabilities. Opinions welcome.
>
> Although the implementation of the dsymutil command line tool will be
> fairly Darwin specific (it accepts mach-o files as input and emits a dSYM
> bundle), most of the implementation will be format agnostic. I’ll make an
> effort to split the mach-o specific parts into their own files so that this
> code can be reused in a generic way. Would there be interest in that kind
> of code for other platforms also? What’s the story of lld Dwarf support for
> ELF?
>
> I plan on sending the initial code (that does basically only parse the
> debug map of mach-o files) out for review in the coming days if there are
> no objections to the general principle.
>
> Fred-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141107/71dd9303/attachment.html>

llvm dev - Nov 2014 - [LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper

[LLVMdev] Reimplementing Darwin's dsymutil as an lld helper