All, Greetings. I am new to this mailing list. We have been working with XML for digital forensics. One of the areas that we wish to create a schema for is the representation of registry entries. We are interested in hivexml as a tool for extracting the registry as an XML representation. In our discussion with possible users, we have generally come to the conclusion that it is useful to represent each XML key as a fully expanded path, rather than preserving the tree structure of the registry hive. Although this may seem verbose, it makes processing the data significantly easier. Is working with the hivexml system in a production environment? If so, do you have any thoughts on this matter? You can find an example of the digital forensics XML at: http://www.forensicswiki.org/wiki/Fiwalk Regards, Simson Garfinkel
This would be very much dependent on the kind of processing desired; I can immediately see several XPath queries I might want to write which would be unwieldy to represent without the tree structure preserved. Flattening the document removes much of the utility of XML-based toolchains, while still paying a penalty in storage size and parser complexity; at that point, why not just export to the conventional .reg text format? On Fri, Mar 19, 2010 at 3:45 PM, Simson Garfinkel <simsong at acm.org> wrote:> All, > > Greetings. I am new to this mailing list. > > We have been working with XML for digital forensics. One of the areas that > we wish to create a schema for is the representation of registry entries. > > We are interested in hivexml as a tool for extracting the registry as an > XML representation. > > In our discussion with possible users, we have generally come to the > conclusion that it is useful to represent each XML key as a fully expanded > path, rather than preserving the tree structure of the registry hive. > Although this may seem verbose, it makes processing the data significantly > easier. > > Is working with the hivexml system in a production environment? If so, do > you have any thoughts on this matter? > > You can find an example of the digital forensics XML at: > http://www.forensicswiki.org/wiki/Fiwalk > > Regards, > > Simson Garfinkel > > > _______________________________________________ > Libguestfs mailing list > Libguestfs at redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/libguestfs/attachments/20100319/7f51a84e/attachment.htm>
Richard W.M. Jones
2010-Mar-20 08:20 UTC
[Libguestfs] hivexml - Flattened vs. Expanded XML
On Fri, Mar 19, 2010 at 01:45:34PM -0700, Simson Garfinkel wrote:> Greetings. I am new to this mailing list. > > We have been working with XML for digital forensics. One of the areas that we wish to create a schema for is the representation of registry entries. > > We are interested in hivexml as a tool for extracting the registry as an XML representation.'hivexml' is really just a demo program that I wrote. You can change the XML format or even rewrite it -- the whole program is only 345 lines of code. http://git.annexia.org/?p=hivex.git;a=blob;f=xml/hivexml.c;hb=HEAD If you are interested in forensic analysis, it might be worth looking at the analysis tools we wrote as well / instead: http://git.annexia.org/?p=hivex.git;a=tree;f=lib/tools;hb=HEAD These analysis tools look at the registry in much more detail and can look for inconsistencies, hidden keys etc. which we don't care so much about in the main hivex library. One issue that may be of concern is string encoding in registry values, which is not well defined. Naturally for XML I suppose you'd want to represent string values as UTF-8. However it's almost impossible to know for sure how strings are encoded in the registry, so doing this conversion would either involve a heuristic, or you'd have to store binary blobs in the XML (encoded as Base64 or as hex digits). The registry is a mess in this respect. [...]> You can find an example of the digital forensics XML at: > http://www.forensicswiki.org/wiki/FiwalkLooks interesting. It should be easily possible to get libguestfs to write this format for disk images. There is already a (trivial) demo program I wrote along those lines: http://git.annexia.org/?p=libguestfs.git;a=blob;f=examples/to-xml.c;hb=HEAD - - - If you have changes for libguestfs or hivex, please submit them to this mailing list as for any open source project: http://people.redhat.com/~rjones/how-to-supply-code-to-open-source-projects/ Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora