Pavel Labath via llvm-dev
2019-Mar-06 14:00 UTC
[llvm-dev] RFC: Adding "minidump" support to obj2yaml
Hello all, yesterday I sent an email <http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html> to lldb-dev proposing a new tool in lldb for yamlization of minidump files. It's been suggested to me that instead of a new tool it may be better to add support for that format to obj2yaml instead. Hence, this email. :) As I expect most people are unfamiliar with this format, I'm going to start off with a brief introduction. Minidump is the native "core file" format for windows systems. However, it is widely used on other systems too. Probably the most popular tools producing this format are the Google "breakpad" and "crashpad" crash reporting systems. LLDB has support for this format since 2016, when it was added as a GSoC project by Dimitar Vlahovski. It currently in active use and development by several lldb contributors. The format itself is fairly simple and extensible. The file starts of with a header containing some basic info and a collection of "streams". Each stream contains various types of information about the state of the process at the time when the snapshot (minidump) was taken. This includes information such as: - list of loaded modules - list of threads - chunks of process memory - etc. The problem I'm trying to solve right now is how to write tests for this functionality. We currently don't have any tool which could create minidump files from human-readable descriptions of them, so our tests are relying on checking in opaque binary blobs. This makes reviewing the changes hard and also complicates creating test cases (real-world minidumps tend to be large). In other words, we are missing a tool like yaml2minidump. === end of introduction == While we could create an lldb tool for converting between minidump and yaml files, there is some appeal in making everything available from a single tool (i.e., yaml2obj). The main obstacle to that is that there is currently no support for parsing these files in llvm, and apart from yaml2obj, it's not clear to me whether any other llvm tool/project would benefit from this functionality being available in the main llvm project. For example tools, like llvm-readelf have support for elf core files, but this is mostly a byproduct of the fact that elf core files are similar to elf executables. However, there is no "executable" form of minidumps. So I am asking this question: Do you think having minidump parsing code in llvm is a good idea? To give you an idea of what this involves, the current minidump parser in lldb is about 2000 LOC. It's already fairly independent of the rest of lldb, though it would need to be cleaned up a bit to be up to llvm standards. My expectation is that the yaml conversion code would add another 1-2 kLOC. The natural place for this in llvm would seem to be the Object library, so I'd propose for this code to be placed there. The thing I'm not sure about is whether it makes sense to integrate this into the existing ObjectFile hierarchy. While the minidump "streams" could be represented as sections, I'm not sure we'd be doing anyone a favour by doing that. The ObjectFile sections assume they are referring to sections in regular object files, which have things like relocations, symbol lists, etc., and minidump streams have none of those. Therefore I'm leaning towards the option of just implementing this as a standalone MinidumpFile class. This would be kind of similar to the existing ELFFile class, only there wouldn't be an ELFObjectFile sitting on top of that. Please let me know what do you think, pavel
James Henderson via llvm-dev
2019-Mar-06 14:30 UTC
[llvm-dev] RFC: Adding "minidump" support to obj2yaml
I'm all for anything that allows people to test without having to use pre-canned binaries. I'm not particularly familiar with the minidump format, so I'm not sure what the best place for code relating to it would be, but I do agree that extending yaml2obj sounds like a good idea. From what you say, minidumps don't sound like they'd fit the ObjectFile class well, so I don't see an issue with a new MinidumpFile class, if it will work well with how yaml2obj is currently written. James On Wed, 6 Mar 2019 at 14:00, Pavel Labath <labath at google.com> wrote:> Hello all, > > yesterday I sent an email > <http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html> to > lldb-dev proposing a new tool in lldb for yamlization of minidump files. > It's been suggested to me that instead of a new tool it may be better to > add support for that format to obj2yaml instead. Hence, this email. :) > > As I expect most people are unfamiliar with this format, I'm going to > start off with a brief introduction. > > Minidump is the native "core file" format for windows systems. However, > it is widely used on other systems too. Probably the most popular tools > producing this format are the Google "breakpad" and "crashpad" crash > reporting systems. LLDB has support for this format since 2016, when it > was added as a GSoC project by Dimitar Vlahovski. It currently in active > use and development by several lldb contributors. > > The format itself is fairly simple and extensible. The file starts of > with a header containing some basic info and a collection of "streams". > Each stream contains various types of information about the state of the > process at the time when the snapshot (minidump) was taken. This > includes information such as: > - list of loaded modules > - list of threads > - chunks of process memory > - etc. > > The problem I'm trying to solve right now is how to write tests for this > functionality. We currently don't have any tool which could create > minidump files from human-readable descriptions of them, so our tests > are relying on checking in opaque binary blobs. This makes reviewing the > changes hard and also complicates creating test cases (real-world > minidumps tend to be large). In other words, we are missing a tool like > yaml2minidump. > > === end of introduction ==> > While we could create an lldb tool for converting between minidump and > yaml files, there is some appeal in making everything available from a > single tool (i.e., yaml2obj). The main obstacle to that is that there is > currently no support for parsing these files in llvm, and apart from > yaml2obj, it's not clear to me whether any other llvm tool/project would > benefit from this functionality being available in the main llvm > project. For example tools, like llvm-readelf have support for elf core > files, but this is mostly a byproduct of the fact that elf core files > are similar to elf executables. However, there is no "executable" form > of minidumps. > > So I am asking this question: Do you think having minidump parsing code > in llvm is a good idea? > > To give you an idea of what this involves, the current minidump parser > in lldb is about 2000 LOC. It's already fairly independent of the rest > of lldb, though it would need to be cleaned up a bit to be up to llvm > standards. My expectation is that the yaml conversion code would add > another 1-2 kLOC. > > The natural place for this in llvm would seem to be the Object library, > so I'd propose for this code to be placed there. The thing I'm not sure > about is whether it makes sense to integrate this into the existing > ObjectFile hierarchy. While the minidump "streams" could be represented > as sections, I'm not sure we'd be doing anyone a favour by doing that. > The ObjectFile sections assume they are referring to sections in regular > object files, which have things like relocations, symbol lists, etc., and > minidump streams have none of those. Therefore I'm leaning towards the > option of just implementing this as a standalone MinidumpFile class. > This would be kind of similar to the existing ELFFile class, only there > wouldn't > be an ELFObjectFile sitting on top of that. > > Please let me know what do you think, > pavel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190306/d6fbeef1/attachment.html>
Adrian Prantl via llvm-dev
2019-Mar-06 16:43 UTC
[llvm-dev] RFC: Adding "minidump" support to obj2yaml
I have no problem with extending yaml2obj. As for the minidump parsing code, do you think it would be possible lay it out in a way that compiling it can be optional? I would imagine that this feature is less interesting for people who want to build, e.g., non-crosscompiling Linux toolchains and since the code size of LLVM is growing very quickly people are becoming more sensitive to it. -- adrian
Pavel Labath via llvm-dev
2019-Mar-06 18:44 UTC
[llvm-dev] RFC: Adding "minidump" support to obj2yaml
Thanks for the support, James. Adrian, I do share the concerns about code size. I suppose I could put the minidump parsing code into a subfolder of lib/Object, such that it is a separate library and can be disabled by excluding it from LLVM_DYLIB_COMPONENTS by people trying to minimize size footprint (I don't expect this should have impact on anything other than the llvm shared library, as the tools which don't use this code simply will not have it linked in). If that's the consensus, then I'm happy to implement that, but I'm not sure if this doesn't give more prominence to the minidump code than it deserves (i.e., why should it get a special subfolder, and elf/macho/coff/wasm code be stuffed into the same folder). Or we could just say that the niceness of having a single tool for yaml<->binary conversions (and to me that really seems like the main advantage of putting this code in llvm) isn't worth the size increase, and just have a separate tool for that in the lldb repo, at least until we have another reason to have minidump parsing code live in llvm. regards, pavel On Wed, 6 Mar 2019 at 17:43, Adrian Prantl <aprantl at apple.com> wrote:> > I have no problem with extending yaml2obj. As for the minidump parsing code, do you think it would be possible lay it out in a way that compiling it can be optional? I would imagine that this feature is less interesting for people who want to build, e.g., non-crosscompiling Linux toolchains and since the code size of LLVM is growing very quickly people are becoming more sensitive to it. > > -- adrian