thr3ads.net - llvm dev - [llvm-dev] RFC: Adding "minidump" support to obj2yaml [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Pavel Labath via llvm-dev

2019-Mar-06 14:00 UTC

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

Hello all,

yesterday I sent an email
<http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html> to
lldb-dev proposing a new tool in lldb for yamlization of minidump files.
It's been suggested to me that instead of a new tool it may be better to
add support for that format to obj2yaml instead. Hence, this email. :)

As I expect most people are unfamiliar with this format, I'm going to
start off with a brief introduction.

Minidump is the native "core file" format for windows systems.
However,
it is widely used on other systems too. Probably the most popular tools
producing this format are the Google "breakpad" and
"crashpad" crash
reporting systems. LLDB has support for this format since 2016, when it
was added as a GSoC project by Dimitar Vlahovski. It currently in active
use and development by several lldb contributors.

The format itself is fairly simple and extensible. The file starts of
with a header containing some basic info and a collection of
"streams".
Each stream contains various types of information about the state of the
process at the time when the snapshot (minidump) was taken. This
includes information such as:
- list of loaded modules
- list of threads
- chunks of process memory
- etc.

The problem I'm trying to solve right now is how to write tests for this
functionality. We currently don't have any tool which could create
minidump files from human-readable descriptions of them, so our tests
are relying on checking in opaque binary blobs. This makes reviewing the
changes hard and also complicates creating test cases (real-world
minidumps tend to be large). In other words, we are missing a tool like
yaml2minidump.

=== end of introduction ==
While we could create an lldb tool for converting between minidump and
yaml files, there is some appeal in making everything available from a
single tool (i.e., yaml2obj). The main obstacle to that is that there is
currently no support for parsing these files in llvm, and apart from
yaml2obj, it's not clear to me whether any other llvm tool/project would
benefit from this functionality being available in the main llvm
project. For example tools, like llvm-readelf have support for elf core
files, but this is mostly a byproduct of the fact that elf core files
are similar to elf executables. However, there is no "executable" form
of minidumps.

So I am asking this question: Do you think having minidump parsing code
in llvm is a good idea?

To give you an idea of what this involves, the current minidump parser
in lldb is about 2000 LOC. It's already fairly independent of the rest
of lldb, though it would need to be cleaned up a bit to be up to llvm
standards. My expectation is that the yaml conversion code would add
another 1-2 kLOC.

The natural place for this in llvm would seem to be the Object library,
so I'd propose for this code to be placed there. The thing I'm not sure
about is whether it makes sense to integrate this into the existing
ObjectFile hierarchy. While the minidump "streams" could be
represented
as sections, I'm not sure we'd be doing anyone a favour by doing that.
The ObjectFile sections assume they are referring to sections in regular
object files, which have things like relocations, symbol lists, etc., and
minidump streams have none of those. Therefore I'm leaning towards the
option of just implementing this as a standalone MinidumpFile class.
This would be kind of similar to the existing ELFFile class, only there
wouldn't
be an ELFObjectFile sitting on top of that.

Please let me know what do you think,
pavel

James Henderson via llvm-dev

2019-Mar-06 14:30 UTC

head link

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

I'm all for anything that allows people to test without having to use
pre-canned binaries. I'm not particularly familiar with the minidump
format, so I'm not sure what the best place for code relating to it would
be, but I do agree that extending yaml2obj sounds like a good idea. From
what you say, minidumps don't sound like they'd fit the ObjectFile class
well, so I don't see an issue with a new MinidumpFile class, if it will
work well with how yaml2obj is currently written.

James

On Wed, 6 Mar 2019 at 14:00, Pavel Labath <labath at google.com> wrote:
> Hello all,
>
> yesterday I sent an email
> <http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html> to
> lldb-dev proposing a new tool in lldb for yamlization of minidump files.
> It's been suggested to me that instead of a new tool it may be better
to
> add support for that format to obj2yaml instead. Hence, this email. :)
>
> As I expect most people are unfamiliar with this format, I'm going to
> start off with a brief introduction.
>
> Minidump is the native "core file" format for windows systems.
However,
> it is widely used on other systems too. Probably the most popular tools
> producing this format are the Google "breakpad" and
"crashpad" crash
> reporting systems. LLDB has support for this format since 2016, when it
> was added as a GSoC project by Dimitar Vlahovski. It currently in active
> use and development by several lldb contributors.
>
> The format itself is fairly simple and extensible. The file starts of
> with a header containing some basic info and a collection of
"streams".
> Each stream contains various types of information about the state of the
> process at the time when the snapshot (minidump) was taken. This
> includes information such as:
> - list of loaded modules
> - list of threads
> - chunks of process memory
> - etc.
>
> The problem I'm trying to solve right now is how to write tests for
this
> functionality. We currently don't have any tool which could create
> minidump files from human-readable descriptions of them, so our tests
> are relying on checking in opaque binary blobs. This makes reviewing the
> changes hard and also complicates creating test cases (real-world
> minidumps tend to be large). In other words, we are missing a tool like
> yaml2minidump.
>
> === end of introduction ==>
> While we could create an lldb tool for converting between minidump and
> yaml files, there is some appeal in making everything available from a
> single tool (i.e., yaml2obj). The main obstacle to that is that there is
> currently no support for parsing these files in llvm, and apart from
> yaml2obj, it's not clear to me whether any other llvm tool/project
would
> benefit from this functionality being available in the main llvm
> project. For example tools, like llvm-readelf have support for elf core
> files, but this is mostly a byproduct of the fact that elf core files
> are similar to elf executables. However, there is no "executable"
form
> of minidumps.
>
> So I am asking this question: Do you think having minidump parsing code
> in llvm is a good idea?
>
> To give you an idea of what this involves, the current minidump parser
> in lldb is about 2000 LOC. It's already fairly independent of the rest
> of lldb, though it would need to be cleaned up a bit to be up to llvm
> standards. My expectation is that the yaml conversion code would add
> another 1-2 kLOC.
>
> The natural place for this in llvm would seem to be the Object library,
> so I'd propose for this code to be placed there. The thing I'm not
sure
> about is whether it makes sense to integrate this into the existing
> ObjectFile hierarchy. While the minidump "streams" could be
represented
> as sections, I'm not sure we'd be doing anyone a favour by doing
that.
> The ObjectFile sections assume they are referring to sections in regular
> object files, which have things like relocations, symbol lists, etc., and
> minidump streams have none of those. Therefore I'm leaning towards the
> option of just implementing this as a standalone MinidumpFile class.
> This would be kind of similar to the existing ELFFile class, only there
> wouldn't
> be an ELFObjectFile sitting on top of that.
>
> Please let me know what do you think,
> pavel
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190306/d6fbeef1/attachment.html>

Adrian Prantl via llvm-dev

2019-Mar-06 16:43 UTC

head link

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

I have no problem with extending yaml2obj. As for the minidump parsing code, do
you think it would be possible lay it out in a way that compiling it can be
optional? I would imagine that this feature is less interesting for people who
want to build, e.g., non-crosscompiling Linux toolchains and since the code size
of LLVM is growing very quickly people are becoming more sensitive to it.

-- adrian

Pavel Labath via llvm-dev

2019-Mar-06 18:44 UTC

head link

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

Thanks for the support, James.

Adrian, I do share the concerns about code size. I suppose I could put
the minidump parsing code into a subfolder of lib/Object, such that it
is a separate library and can be disabled by excluding it from
LLVM_DYLIB_COMPONENTS by people trying to minimize size footprint (I
don't expect this should have impact on anything other than the llvm
shared library, as the tools which don't use this code simply will not
have it linked in). If that's the consensus, then I'm happy to
implement that, but I'm not sure if this doesn't give more prominence
to the minidump code than it deserves (i.e., why should it get a
special subfolder, and elf/macho/coff/wasm code be stuffed into the
same folder).

Or we could just say that the niceness of having a single tool for
yaml<->binary conversions (and to me that really seems like the main
advantage of putting this code in llvm) isn't worth the size increase,
and just have a separate tool for that in the lldb repo, at least
until we have another reason to have minidump parsing code live in
llvm.

regards,
pavel

On Wed, 6 Mar 2019 at 17:43, Adrian Prantl <aprantl at apple.com>
wrote:>
> I have no problem with extending yaml2obj. As for the minidump parsing
code, do you think it would be possible lay it out in a way that compiling it
can be optional? I would imagine that this feature is less interesting for
people who want to build, e.g., non-crosscompiling Linux toolchains and since
the code size of LLVM is growing very quickly people are becoming more sensitive
to it.
>
> -- adrian

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Mar 2019 - RFC: Adding "minidump" support to obj2yaml

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

[llvm-dev] RFC: Adding "minidump" support to obj2yaml

Maybe Matching Threads