Richard W.M. Jones
2009-Oct-29 15:44 UTC
[Libguestfs] [PATCH] Support for Windows Registry.
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -------------- next part -------------->From a59adc51418f30194b3d3a8c94bef054104f2d60 Mon Sep 17 00:00:00 2001From: Richard Jones <rjones at redhat.com> Date: Mon, 26 Oct 2009 11:03:07 +0000 Subject: [PATCH] Support for Windows Registry. In hivex/: This mini-library allows us to extract Windows Registry binary files ("hives"). There are also two tools: hivexml converts a hive to a self-describing XML format. hivexget can be used to extract single subkeys from a hive. New tool: virt-win-reg. This is a wrapper around the library functionality allowing you to pull out data from the registries of Windows guests. --- .gitignore | 5 + HACKING | 4 + Makefile.am | 2 +- README | 2 + configure.ac | 7 + hivex/LICENSE | 508 +++++++++++++++++++ hivex/Makefile.am | 71 +++ hivex/README | 32 ++ hivex/hivex.c | 1398 ++++++++++++++++++++++++++++++++++++++++++++++++++++ hivex/hivex.h | 114 +++++ hivex/hivex.pod | 396 +++++++++++++++ hivex/hivexget.c | 266 ++++++++++ hivex/hivexget.pod | 94 ++++ hivex/hivexml.c | 330 +++++++++++++ hivex/hivexml.pod | 64 +++ po/POTFILES.in | 3 + tools/Makefile.am | 2 +- tools/run-locally | 3 +- tools/virt-win-reg | 300 +++++++++++ 19 files changed, 3598 insertions(+), 3 deletions(-) create mode 100644 hivex/LICENSE create mode 100644 hivex/Makefile.am create mode 100644 hivex/README create mode 100644 hivex/hivex.c create mode 100644 hivex/hivex.h create mode 100644 hivex/hivex.pod create mode 100644 hivex/hivexget.c create mode 100644 hivex/hivexget.pod create mode 100644 hivex/hivexml.c create mode 100644 hivex/hivexml.pod create mode 100755 tools/virt-win-reg diff --git a/.gitignore b/.gitignore index 3c3b956..06e9850 100644 --- a/.gitignore +++ b/.gitignore @@ -66,6 +66,10 @@ haskell/Guestfs010Launch haskell/Guestfs050LVCreate haskell/Guestfs.hs *.hi +hivex/*.1 +hivex/*.3 +hivex/hivexget +hivex/hivexml html/guestfish.1.html html/guestfs.3.html html/recipes.html @@ -76,6 +80,7 @@ html/virt-inspector.1.html html/virt-ls.1.html html/virt-rescue.1.html html/virt-tar.1.html +html/virt-win-reg.1.html images/100kallnewlines images/100kallspaces images/100kallzeroes diff --git a/HACKING b/HACKING index d4e030c..cc5b1c2 100644 --- a/HACKING +++ b/HACKING @@ -86,6 +86,10 @@ fish/ haskell/ Haskell bindings. +hivex/ + Hive extraction library, for reading Windows Registry files. + See hivex/README for more details. + images/ Some guest images to test against. These are gzipped to save space. You have to unzip them before use. diff --git a/Makefile.am b/Makefile.am index 4689686..9d423a5 100644 --- a/Makefile.am +++ b/Makefile.am @@ -17,7 +17,7 @@ ACLOCAL_AMFLAGS = -I m4 -SUBDIRS = gnulib/lib src daemon appliance fish po examples images \ +SUBDIRS = gnulib/lib hivex src daemon appliance fish po examples images \ gnulib/tests capitests regressions test-tool # NB: Must build inspector directory after perl and before ocaml. diff --git a/README b/README index 5e00e4e..41902ea 100644 --- a/README +++ b/README @@ -50,6 +50,8 @@ Requirements - genisoimage / mkisofs +- libxml2 + - (Optional) Augeas (http://augeas.net/) - perldoc (pod2man, pod2text) to generate the manual pages and diff --git a/configure.ac b/configure.ac index 7fd23de..88f2a96 100644 --- a/configure.ac +++ b/configure.ac @@ -86,6 +86,7 @@ if test "$gl_gcc_warnings" = yes; then # ?? -Wstrict-overflow nw="$nw -Wunsafe-loop-optimizations" # just a warning that an optimization # was not possible, safe to ignore + nw="$nw -Wpacked" # Allow attribute((packed)) on structs gl_MANYWARN_ALL_GCC([ws]) gl_MANYWARN_COMPLEMENT([ws], [$ws], [$nw]) @@ -415,6 +416,11 @@ dnl For i18n. AM_GNU_GETTEXT([external]) AM_GNU_GETTEXT_VERSION([0.17]) +dnl libxml2 is used by the hivex library. +PKG_CHECK_MODULES([LIBXML2], [libxml-2.0]) +AC_SUBST([LIBXML2_CFLAGS]) +AC_SUBST([LIBXML2_LIBS]) + dnl Check for OCaml (optional, for OCaml bindings). AC_PROG_OCAML AC_PROG_FINDLIB @@ -722,6 +728,7 @@ AC_CONFIG_FILES([Makefile libguestfs.pc gnulib/lib/Makefile gnulib/tests/Makefile + hivex/Makefile ocaml/META perl/Makefile.PL]) AC_OUTPUT diff --git a/hivex/LICENSE b/hivex/LICENSE new file mode 100644 index 0000000..f641b6d --- /dev/null +++ b/hivex/LICENSE @@ -0,0 +1,508 @@ +This is the license for the hivex library. + +---------------------------------------------------------------------- + + GNU LESSER GENERAL PUBLIC LICENSE + Version 2.1, February 1999 + + Copyright (C) 1991, 1999 Free Software Foundation, Inc. + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +[This is the first released version of the Lesser GPL. It also counts + as the successor of the GNU Library Public License, version 2, hence + the version number 2.1.] + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +Licenses are intended to guarantee your freedom to share and change +free software--to make sure the software is free for all its users. + + This license, the Lesser General Public License, applies to some +specially designated software packages--typically libraries--of the +Free Software Foundation and other authors who decide to use it. You +can use it too, but we suggest you first think carefully about whether +this license or the ordinary General Public License is the better +strategy to use in any particular case, based on the explanations below. + + When we speak of free software, we are referring to freedom of use, +not price. Our General Public Licenses are designed to make sure that +you have the freedom to distribute copies of free software (and charge +for this service if you wish); that you receive source code or can get +it if you want it; that you can change the software and use pieces of +it in new free programs; and that you are informed that you can do +these things. + + To protect your rights, we need to make restrictions that forbid +distributors to deny you these rights or to ask you to surrender these +rights. These restrictions translate to certain responsibilities for +you if you distribute copies of the library or if you modify it. + + For example, if you distribute copies of the library, whether gratis +or for a fee, you must give the recipients all the rights that we gave +you. You must make sure that they, too, receive or can get the source +code. If you link other code with the library, you must provide +complete object files to the recipients, so that they can relink them +with the library after making changes to the library and recompiling +it. And you must show them these terms so they know their rights. + + We protect your rights with a two-step method: (1) we copyright the +library, and (2) we offer you this license, which gives you legal +permission to copy, distribute and/or modify the library. + + To protect each distributor, we want to make it very clear that +there is no warranty for the free library. Also, if the library is +modified by someone else and passed on, the recipients should know +that what they have is not the original version, so that the original +author's reputation will not be affected by problems that might be +introduced by others. + + Finally, software patents pose a constant threat to the existence of +any free program. We wish to make sure that a company cannot +effectively restrict the users of a free program by obtaining a +restrictive license from a patent holder. Therefore, we insist that +any patent license obtained for a version of the library must be +consistent with the full freedom of use specified in this license. + + Most GNU software, including some libraries, is covered by the +ordinary GNU General Public License. This license, the GNU Lesser +General Public License, applies to certain designated libraries, and +is quite different from the ordinary General Public License. We use +this license for certain libraries in order to permit linking those +libraries into non-free programs. + + When a program is linked with a library, whether statically or using +a shared library, the combination of the two is legally speaking a +combined work, a derivative of the original library. The ordinary +General Public License therefore permits such linking only if the +entire combination fits its criteria of freedom. The Lesser General +Public License permits more lax criteria for linking other code with +the library. + + We call this license the "Lesser" General Public License because it +does Less to protect the user's freedom than the ordinary General +Public License. It also provides other free software developers Less +of an advantage over competing non-free programs. These disadvantages +are the reason we use the ordinary General Public License for many +libraries. However, the Lesser license provides advantages in certain +special circumstances. + + For example, on rare occasions, there may be a special need to +encourage the widest possible use of a certain library, so that it becomes +a de-facto standard. To achieve this, non-free programs must be +allowed to use the library. A more frequent case is that a free +library does the same job as widely used non-free libraries. In this +case, there is little to gain by limiting the free library to free +software only, so we use the Lesser General Public License. + + In other cases, permission to use a particular library in non-free +programs enables a greater number of people to use a large body of +free software. For example, permission to use the GNU C Library in +non-free programs enables many more people to use the whole GNU +operating system, as well as its variant, the GNU/Linux operating +system. + + Although the Lesser General Public License is Less protective of the +users' freedom, it does ensure that the user of a program that is +linked with the Library has the freedom and the wherewithal to run +that program using a modified version of the Library. + + The precise terms and conditions for copying, distribution and +modification follow. Pay close attention to the difference between a +"work based on the library" and a "work that uses the library". The +former contains code derived from the library, whereas the latter must +be combined with the library in order to run. + + GNU LESSER GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License Agreement applies to any software library or other +program which contains a notice placed by the copyright holder or +other authorized party saying it may be distributed under the terms of +this Lesser General Public License (also called "this License"). +Each licensee is addressed as "you". + + A "library" means a collection of software functions and/or data +prepared so as to be conveniently linked with application programs +(which use some of those functions and data) to form executables. + + The "Library", below, refers to any such software library or work +which has been distributed under these terms. A "work based on the +Library" means either the Library or any derivative work under +copyright law: that is to say, a work containing the Library or a +portion of it, either verbatim or with modifications and/or translated +straightforwardly into another language. (Hereinafter, translation is +included without limitation in the term "modification".) + + "Source code" for a work means the preferred form of the work for +making modifications to it. For a library, complete source code means +all the source code for all modules it contains, plus any associated +interface definition files, plus the scripts used to control compilation +and installation of the library. + + Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running a program using the Library is not restricted, and output from +such a program is covered only if its contents constitute a work based +on the Library (independent of the use of the Library in a tool for +writing it). Whether that is true depends on what the Library does +and what the program that uses the Library does. + + 1. You may copy and distribute verbatim copies of the Library's +complete source code as you receive it, in any medium, provided that +you conspicuously and appropriately publish on each copy an +appropriate copyright notice and disclaimer of warranty; keep intact +all the notices that refer to this License and to the absence of any +warranty; and distribute a copy of this License along with the +Library. + + You may charge a fee for the physical act of transferring a copy, +and you may at your option offer warranty protection in exchange for a +fee. + + 2. You may modify your copy or copies of the Library or any portion +of it, thus forming a work based on the Library, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) The modified work must itself be a software library. + + b) You must cause the files modified to carry prominent notices + stating that you changed the files and the date of any change. + + c) You must cause the whole of the work to be licensed at no + charge to all third parties under the terms of this License. + + d) If a facility in the modified Library refers to a function or a + table of data to be supplied by an application program that uses + the facility, other than as an argument passed when the facility + is invoked, then you must make a good faith effort to ensure that, + in the event an application does not supply such function or + table, the facility still operates, and performs whatever part of + its purpose remains meaningful. + + (For example, a function in a library to compute square roots has + a purpose that is entirely well-defined independent of the + application. Therefore, Subsection 2d requires that any + application-supplied function or table used by this function must + be optional: if the application does not supply it, the square + root function must still compute square roots.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Library, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Library, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote +it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Library. + +In addition, mere aggregation of another work not based on the Library +with the Library (or with a work based on the Library) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may opt to apply the terms of the ordinary GNU General Public +License instead of this License to a given copy of the Library. To do +this, you must alter all the notices that refer to this License, so +that they refer to the ordinary GNU General Public License, version 2, +instead of to this License. (If a newer version than version 2 of the +ordinary GNU General Public License has appeared, then you can specify +that version instead if you wish.) Do not make any other change in +these notices. + + Once this change is made in a given copy, it is irreversible for +that copy, so the ordinary GNU General Public License applies to all +subsequent copies and derivative works made from that copy. + + This option is useful when you wish to copy part of the code of +the Library into a program that is not a library. + + 4. You may copy and distribute the Library (or a portion or +derivative of it, under Section 2) in object code or executable form +under the terms of Sections 1 and 2 above provided that you accompany +it with the complete corresponding machine-readable source code, which +must be distributed under the terms of Sections 1 and 2 above on a +medium customarily used for software interchange. + + If distribution of object code is made by offering access to copy +from a designated place, then offering equivalent access to copy the +source code from the same place satisfies the requirement to +distribute the source code, even though third parties are not +compelled to copy the source along with the object code. + + 5. A program that contains no derivative of any portion of the +Library, but is designed to work with the Library by being compiled or +linked with it, is called a "work that uses the Library". Such a +work, in isolation, is not a derivative work of the Library, and +therefore falls outside the scope of this License. + + However, linking a "work that uses the Library" with the Library +creates an executable that is a derivative of the Library (because it +contains portions of the Library), rather than a "work that uses the +library". The executable is therefore covered by this License. +Section 6 states terms for distribution of such executables. + + When a "work that uses the Library" uses material from a header file +that is part of the Library, the object code for the work may be a +derivative work of the Library even though the source code is not. +Whether this is true is especially significant if the work can be +linked without the Library, or if the work is itself a library. The +threshold for this to be true is not precisely defined by law. + + If such an object file uses only numerical parameters, data +structure layouts and accessors, and small macros and small inline +functions (ten lines or less in length), then the use of the object +file is unrestricted, regardless of whether it is legally a derivative +work. (Executables containing this object code plus portions of the +Library will still fall under Section 6.) + + Otherwise, if the work is a derivative of the Library, you may +distribute the object code for the work under the terms of Section 6. +Any executables containing that work also fall under Section 6, +whether or not they are linked directly with the Library itself. + + 6. As an exception to the Sections above, you may also combine or +link a "work that uses the Library" with the Library to produce a +work containing portions of the Library, and distribute that work +under terms of your choice, provided that the terms permit +modification of the work for the customer's own use and reverse +engineering for debugging such modifications. + + You must give prominent notice with each copy of the work that the +Library is used in it and that the Library and its use are covered by +this License. You must supply a copy of this License. If the work +during execution displays copyright notices, you must include the +copyright notice for the Library among them, as well as a reference +directing the user to the copy of this License. Also, you must do one +of these things: + + a) Accompany the work with the complete corresponding + machine-readable source code for the Library including whatever + changes were used in the work (which must be distributed under + Sections 1 and 2 above); and, if the work is an executable linked + with the Library, with the complete machine-readable "work that + uses the Library", as object code and/or source code, so that the + user can modify the Library and then relink to produce a modified + executable containing the modified Library. (It is understood + that the user who changes the contents of definitions files in the + Library will not necessarily be able to recompile the application + to use the modified definitions.) + + b) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (1) uses at run time a + copy of the library already present on the user's computer system, + rather than copying library functions into the executable, and (2) + will operate properly with a modified version of the library, if + the user installs one, as long as the modified version is + interface-compatible with the version that the work was made with. + + c) Accompany the work with a written offer, valid for at + least three years, to give the same user the materials + specified in Subsection 6a, above, for a charge no more + than the cost of performing this distribution. + + d) If distribution of the work is made by offering access to copy + from a designated place, offer equivalent access to copy the above + specified materials from the same place. + + e) Verify that the user has already received a copy of these + materials or that you have already sent this user a copy. + + For an executable, the required form of the "work that uses the +Library" must include any data and utility programs needed for +reproducing the executable from it. However, as a special exception, +the materials to be distributed need not include anything that is +normally distributed (in either source or binary form) with the major +components (compiler, kernel, and so on) of the operating system on +which the executable runs, unless that component itself accompanies +the executable. + + It may happen that this requirement contradicts the license +restrictions of other proprietary libraries that do not normally +accompany the operating system. Such a contradiction means you cannot +use both them and the Library together in an executable that you +distribute. + + 7. You may place library facilities that are a work based on the +Library side-by-side in a single library together with other library +facilities not covered by this License, and distribute such a combined +library, provided that the separate distribution of the work based on +the Library and of the other library facilities is otherwise +permitted, and provided that you do these two things: + + a) Accompany the combined library with a copy of the same work + based on the Library, uncombined with any other library + facilities. This must be distributed under the terms of the + Sections above. + + b) Give prominent notice with the combined library of the fact + that part of it is a work based on the Library, and explaining + where to find the accompanying uncombined form of the same work. + + 8. You may not copy, modify, sublicense, link with, or distribute +the Library except as expressly provided under this License. Any +attempt otherwise to copy, modify, sublicense, link with, or +distribute the Library is void, and will automatically terminate your +rights under this License. However, parties who have received copies, +or rights, from you under this License will not have their licenses +terminated so long as such parties remain in full compliance. + + 9. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Library or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Library (or any work based on the +Library), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Library or works based on it. + + 10. Each time you redistribute the Library (or any work based on the +Library), the recipient automatically receives a license from the +original licensor to copy, distribute, link with or modify the Library +subject to these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties with +this License. + + 11. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Library at all. For example, if a patent +license would not permit royalty-free redistribution of the Library by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Library. + +If any portion of this section is held invalid or unenforceable under any +particular circumstance, the balance of the section is intended to apply, +and the section as a whole is intended to apply in other circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 12. If the distribution and/or use of the Library is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Library under this License may add +an explicit geographical distribution limitation excluding those countries, +so that distribution is permitted only in or among countries not thus +excluded. In such case, this License incorporates the limitation as if +written in the body of this License. + + 13. The Free Software Foundation may publish revised and/or new +versions of the Lesser General Public License from time to time. +Such new versions will be similar in spirit to the present version, +but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Library +specifies a version number of this License which applies to it and +"any later version", you have the option of following the terms and +conditions either of that version or of any later version published by +the Free Software Foundation. If the Library does not specify a +license version number, you may choose any version ever published by +the Free Software Foundation. + + 14. If you wish to incorporate parts of the Library into other free +programs whose distribution conditions are incompatible with these, +write to the author to ask for permission. For software which is +copyrighted by the Free Software Foundation, write to the Free +Software Foundation; we sometimes make exceptions for this. Our +decision will be guided by the two goals of preserving the free status +of all derivatives of our free software and of promoting the sharing +and reuse of software generally. + + NO WARRANTY + + 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO +WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. +EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR +OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY +KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE +LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME +THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN +WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY +AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU +FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR +CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE +LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING +RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A +FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF +SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Libraries + + If you develop a new library, and you want it to be of the greatest +possible use to the public, we recommend making it free software that +everyone can redistribute and change. You can do so by permitting +redistribution under these terms (or, alternatively, under the terms of the +ordinary General Public License). + + To apply these terms, attach the following notices to the library. It is +safest to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least the +"copyright" line and a pointer to where the full notice is found. + + <one line to give the library's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this library; if not, write to the Free Software + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + +Also add information on how to contact you by electronic and paper mail. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the library, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the + library `Frob' (a library for tweaking knobs) written by James Random Hacker. + + <signature of Ty Coon>, 1 April 1990 + Ty Coon, President of Vice + +That's all there is to it! + + diff --git a/hivex/Makefile.am b/hivex/Makefile.am new file mode 100644 index 0000000..418abf1 --- /dev/null +++ b/hivex/Makefile.am @@ -0,0 +1,71 @@ +# libguestfs +# Copyright (C) 2009 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +EXTRA_DIST = hivex.pod hivexml.pod hivexget.pod + +lib_LTLIBRARIES = libhivex.la + +libhivex_la_SOURCES = \ + hivex.c \ + hivex.h + +libhivex_la_LDFLAGS = -version-info 0:0:0 +libhivex_la_CFLAGS = \ + $(WARN_CFLAGS) $(WERROR_CFLAGS) + +bin_PROGRAMS = hivexml hivexget + +hivexml_SOURCES = \ + hivexml.c + +hivexml_LDADD = libhivex.la $(LIBXML2_LIBS) +hivexml_CFLAGS = \ + $(LIBXML2_CFLAGS) \ + $(WARN_CFLAGS) $(WERROR_CFLAGS) + +hivexget_SOURCES = \ + hivexget.c + +hivexget_LDADD = libhivex.la +hivexget_CFLAGS = \ + $(WARN_CFLAGS) $(WERROR_CFLAGS) + +man_MANS = hivex.3 hivexml.1 hivexget.1 + +hivex.3: hivex.pod + $(POD2MAN) \ + --section 3 \ + -c "Windows Registry" \ + --name "hivex" \ + --release "$(PACKAGE_NAME)-$(PACKAGE_VERSION)" \ + $< > $@-t; mv $@-t $@ + +hivexml.1: hivexml.pod + $(POD2MAN) \ + --section 1 \ + -c "Windows Registry" \ + --name "hivexml" \ + --release "$(PACKAGE_NAME)-$(PACKAGE_VERSION)" \ + $< > $@-t; mv $@-t $@ + +hivexget.1: hivexget.pod + $(POD2MAN) \ + --section 1 \ + -c "Windows Registry" \ + --name "hivexget" \ + --release "$(PACKAGE_NAME)-$(PACKAGE_VERSION)" \ + $< > $@-t; mv $@-t $@ diff --git a/hivex/README b/hivex/README new file mode 100644 index 0000000..449db77 --- /dev/null +++ b/hivex/README @@ -0,0 +1,32 @@ +hivex - by Richard W.M. Jones, rjones at redhat.com +Copyright (C) 2009 Red Hat Inc. +---------------------------------------------------------------------- + +This is a self-contained library for reading Windows Registry "hive" +binary files. + +It is totally dedicated to reading the files and doesn't deal with +writing or modifying them in any way. + +Unlike many other tools in this area, it doesn't use the textual .REG +format for output, because parsing that is as much trouble as parsing +the original binary format. Instead it makes the file available +through a C API, or there is a separate program to export the hive as +XML. + +This library was derived from several sources: + + . NTREG registry reader/writer library by Petter Nordahl-Hagen + (LGPL v2.1 licensed library and program) + . http://home.eunet.no/pnordahl/ntpasswd/WinReg.txt + . dumphive (a BSD-licensed Pascal program by Markus Stephany) + +Like NTREG, this library only attempts to read Windows NT registry +files (ie. not Windows 3.1 or Windows 95/98/ME). See the link above +for documentation on the older formats if you wish to read them. + +Unlike NTREG, this code is much more careful about handling error +cases, corrupt and malicious registry files, and endianness. + +The license for this library is LGPL v2.1, but not later versions. +For full details, see the file LICENSE in this directory. diff --git a/hivex/hivex.c b/hivex/hivex.c new file mode 100644 index 0000000..2274102 --- /dev/null +++ b/hivex/hivex.c @@ -0,0 +1,1398 @@ +/* hivex - Windows Registry "hive" extraction library. + * Copyright (C) 2009 Red Hat Inc. + * Derived from code by Petter Nordahl-Hagen under a compatible license: + * Copyright (c) 1997-2007 Petter Nordahl-Hagen. + * Derived from code by Markus Stephany under a compatible license: + * Copyright (c) 2000-2004, Markus Stephany. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * See file LICENSE for the full license. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <endian.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <iconv.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <assert.h> + +#include "hivex.h" + +struct hive_h { + int fd; + size_t size; + int msglvl; + + /* Memory-mapped (readonly) registry file. */ + union { + char *addr; + struct ntreg_header *hdr; + }; + + /* Use a bitmap to store which file offsets are valid (point to a + * used block). We only need to store 1 bit per 32 bits of the file + * (because blocks are 4-byte aligned). We found that the average + * block size in a registry file is ~50 bytes. So roughly 1 in 12 + * bits in the bitmap will be set, making it likely a more efficient + * structure than a hash table. + */ + char *bitmap; +#define BITMAP_SET(bitmap,off) (bitmap[(off)>>5] |= 1 << (((off)>>2)&7)) +#define BITMAP_CLR(bitmap,off) (bitmap[(off)>>5] &= ~ (1 << (((off)>>2)&7))) +#define BITMAP_TST(bitmap,off) (bitmap[(off)>>5] & (1 << (((off)>>2)&7))) +#define IS_VALID_BLOCK(h,off) \ + (((off) & 3) == 0 && \ + (off) >= 0x1000 && \ + (off) < (h)->size && \ + BITMAP_TST((h)->bitmap,(off))) + + /* Fields from the header, extracted from little-endianness. */ + size_t rootoffs; /* Root key offset (always an nk-block). */ + + /* Stats. */ + size_t pages; /* Number of hbin pages read. */ + size_t blocks; /* Total number of blocks found. */ + size_t used_blocks; /* Total number of used blocks found. */ + size_t used_size; /* Total size (bytes) of used blocks. */ +}; + +/* NB. All fields are little endian. */ +struct ntreg_header { + char magic[4]; /* "regf" */ + uint32_t unknown1; + uint32_t unknown2; + char last_modified[8]; + uint32_t unknown3; /* 1 */ + uint32_t unknown4; /* 3 */ + uint32_t unknown5; /* 0 */ + uint32_t unknown6; /* 1 */ + uint32_t offset; /* offset of root key record - 4KB */ + uint32_t blocks; /* size in bytes of data (filesize - 4KB) */ + uint32_t unknown7; /* 1 */ + char name[0x1fc-0x2c]; + uint32_t csum; /* checksum: sum of 32 bit words 0-0x1fb. */ +} __attribute__((__packed__)); + +struct ntreg_hbin_page { + char magic[4]; /* "hbin" */ + uint32_t offset_first; /* offset from 1st block */ + uint32_t offset_next; /* offset of next (relative to this) */ + char unknown[20]; + /* Linked list of blocks follows here. */ +} __attribute__((__packed__)); + +struct ntreg_hbin_block { + int32_t seg_len; /* length of this block (-ve for used block) */ + char id[2]; /* the block type (eg. "nk" for nk record) */ + /* Block data follows here. */ +} __attribute__((__packed__)); + +#define BLOCK_ID_EQ(h,offs,eqid) \ + (strncmp (((struct ntreg_hbin_block *)((h)->addr + (offs)))->id, (eqid), 2) == 0) + +static size_t +block_len (hive_h *h, size_t blkoff, int *used) +{ + struct ntreg_hbin_block *block; + block = (struct ntreg_hbin_block *) (h->addr + blkoff); + + int32_t len = le32toh (block->seg_len); + if (len < 0) { + if (used) *used = 1; + len = -len; + } else { + if (used) *used = 0; + } + + return (size_t) len; +} + +struct ntreg_nk_record { + int32_t seg_len; /* length (always -ve because used) */ + char id[2]; /* "nk" */ + uint16_t flags; + char timestamp[12]; + uint32_t parent; /* offset of owner/parent */ + uint32_t nr_subkeys; /* number of subkeys */ + uint32_t unknown1; + uint32_t subkey_lf; /* lf record containing list of subkeys */ + uint32_t unknown2; + uint32_t nr_values; /* number of values */ + uint32_t vallist; /* value-list record */ + uint32_t sk; /* offset of sk-record */ + uint32_t classname; /* offset of classname record */ + char unknown3[16]; + uint32_t unknown4; + uint16_t name_len; /* length of name */ + uint16_t classname_len; /* length of classname */ + char name[1]; /* name follows here */ +} __attribute__((__packed__)); + +struct ntreg_lf_record { + int32_t seg_len; + char id[2]; /* "lf" */ + uint16_t nr_keys; /* number of keys in this record */ + struct { + uint32_t offset; /* offset of nk-record for this subkey */ + char name[4]; /* first 4 characters of subkey name */ + } keys[1]; +} __attribute__((__packed__)); + +struct ntreg_ri_record { + int32_t seg_len; + char id[2]; /* "ri" */ + uint16_t nr_offsets; /* number of pointers to lh records */ + uint32_t offset[1]; /* list of pointers to lh records */ +} __attribute__((__packed__)); + +/* This has no ID header. */ +struct ntreg_value_list { + int32_t seg_len; + uint32_t offset[1]; /* list of pointers to vk records */ +} __attribute__((__packed__)); + +struct ntreg_vk_record { + int32_t seg_len; /* length (always -ve because used) */ + char id[2]; /* "vk" */ + uint16_t name_len; /* length of name */ + /* length of the data: + * If data_len is <= 4, then it's stored inline. + * If data_len is 0x80000000, then it's an inline dword. + * Top bit may be set or not set at random. + */ + uint32_t data_len; + uint32_t data_offset; /* pointer to the data (or data if inline) */ + hive_type data_type; /* type of the data */ + uint16_t unknown1; /* possibly always 1 */ + uint16_t unknown2; + char name[1]; /* key name follows here */ +} __attribute__((__packed__)); + +hive_h * +hivex_open (const char *filename, int flags) +{ + hive_h *h = NULL; + + h = calloc (1, sizeof *h); + if (h == NULL) + goto error; + + h->msglvl = flags & HIVEX_OPEN_MSGLVL_MASK; + + const char *debug = getenv ("HIVEX_DEBUG"); + if (debug && strcmp (debug, "1") == 0) + h->msglvl = 2; + + if (h->msglvl >= 2) + printf ("hivex_open: created handle %p\n", h); + + h->fd = open (filename, O_RDONLY); + if (h->fd == -1) + goto error; + + struct stat statbuf; + if (fstat (h->fd, &statbuf) == -1) + goto error; + + h->size = statbuf.st_size; + + h->addr = mmap (NULL, h->size, PROT_READ, MAP_SHARED, h->fd, 0); + if (h->addr == MAP_FAILED) + goto error; + + if (h->msglvl >= 2) + printf ("hivex_open: mapped file at %p\n", h->addr); + + /* Check header. */ + if (h->hdr->magic[0] != 'r' || + h->hdr->magic[1] != 'e' || + h->hdr->magic[2] != 'g' || + h->hdr->magic[3] != 'f') { + fprintf (stderr, "hivex: %s: not a Windows NT Registry hive file\n", + filename); + errno = ENOTSUP; + goto error; + } + + h->bitmap = calloc (1 + h->size / 32, 1); + +#if 0 /* Doesn't work. */ + /* Header checksum. */ + uint32_t *daddr = h->addr; + size_t i; + uint32_t sum = 0; + for (i = 0; i < 0x1fc / 4; ++i) { + sum += le32toh (*daddr); + daddr++; + } + if (sum != le32toh (h->hdr->csum)) { + fprintf (stderr, "hivex: %s: bad checksum in hive header\n", filename); + errno = EINVAL; + goto error; + } +#endif + + h->rootoffs = le32toh (h->hdr->offset) + 0x1000; + + if (h->msglvl >= 2) + printf ("hivex_open: root offset = %zu\n", h->rootoffs); + + /* We'll set this flag when we see a block with the root offset (ie. + * the root block). + */ + int seen_root_block = 0, bad_root_block = 0; + + /* Read the pages and blocks. The aim here is to be robust against + * corrupt or malicious registries. So we make sure the loops + * always make forward progress. We add the address of each block + * we read to a hash table so pointers will only reference the start + * of valid blocks. + */ + size_t off; + struct ntreg_hbin_page *page; + for (off = 0x1000; off < h->size; off += le32toh (page->offset_next)) { + h->pages++; + + page = (struct ntreg_hbin_page *) (h->addr + off); + if (page->magic[0] != 'h' || + page->magic[1] != 'b' || + page->magic[2] != 'i' || + page->magic[3] != 'n') { + /* This error is seemingly common in uncorrupt registry files. */ + /* + fprintf (stderr, "hivex: %s: ignoring trailing garbage at end of file (at %zu, after %zu pages)\n", + filename, off, h->pages); + */ + break; + } + + if (h->msglvl >= 2) + printf ("hivex_open: page at %zu\n", off); + + if (le32toh (page->offset_next) <= sizeof (struct ntreg_hbin_page) || + (le32toh (page->offset_next) & 3) != 0) { + fprintf (stderr, "hivex: %s: pagesize %d at %zu, bad registry\n", + filename, le32toh (page->offset_next), off); + errno = ENOTSUP; + goto error; + } + + /* Read the blocks in this page. */ + size_t blkoff; + struct ntreg_hbin_block *block; + int32_t seg_len; + for (blkoff = off + 0x20; + blkoff < off + le32toh (page->offset_next); + blkoff += seg_len) { + h->blocks++; + + int is_root = blkoff == h->rootoffs; + if (is_root) + seen_root_block = 1; + + block = (struct ntreg_hbin_block *) (h->addr + blkoff); + int used; + seg_len = block_len (h, blkoff, &used); + if (seg_len <= 4 || (seg_len & 3) != 0) { + fprintf (stderr, "hivex: %s: block size %d at %zu, bad registry\n", + filename, le32toh (block->seg_len), blkoff); + errno = ENOTSUP; + goto error; + } + + if (h->msglvl >= 2) + printf ("hivex_open: %s block id %d,%d at %zu%s\n", + used ? "used" : "free", block->id[0], block->id[1], blkoff, + is_root ? " (root)" : ""); + + if (is_root && !used) + bad_root_block = 1; + + if (used) { + h->used_blocks++; + h->used_size += seg_len; + + /* Root block must be an nk-block. */ + if (is_root && (block->id[0] != 'n' || block->id[1] != 'k')) + bad_root_block = 1; + + /* Note this blkoff is a valid address. */ + BITMAP_SET (h->bitmap, blkoff); + } + } + } + + if (!seen_root_block) { + fprintf (stderr, "hivex: %s: no root block found\n", filename); + errno = ENOTSUP; + goto error; + } + + if (bad_root_block) { + fprintf (stderr, "hivex: %s: bad root block (free or not nk)\n", filename); + errno = ENOTSUP; + goto error; + } + + if (h->msglvl >= 1) + printf ("hivex_open: successfully read Windows Registry hive file:\n" + " pages: %zu\n" + " blocks: %zu\n" + " blocks used: %zu\n" + " bytes used: %zu\n", + h->pages, h->blocks, h->used_blocks, h->used_size); + + return h; + + error:; + int err = errno; + if (h) { + free (h->bitmap); + if (h->addr && h->size && h->addr != MAP_FAILED) + munmap (h->addr, h->size); + if (h->fd >= 0) + close (h->fd); + free (h); + } + errno = err; + return NULL; +} + +int +hivex_close (hive_h *h) +{ + int r; + + free (h->bitmap); + munmap (h->addr, h->size); + r = close (h->fd); + free (h); + + return r; +} + +hive_node_h +hivex_root (hive_h *h) +{ + hive_node_h ret = h->rootoffs; + if (!IS_VALID_BLOCK (h, ret)) { + errno = ENOKEY; + return 0; + } + return ret; +} + +char * +hivex_node_name (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return NULL; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + /* AFAIK the node name is always plain ASCII, so no conversion + * to UTF-8 is necessary. However we do need to nul-terminate + * the string. + */ + + /* nk->name_len is unsigned, 16 bit, so this is safe ... However + * we have to make sure the length doesn't exceed the block length. + */ + size_t len = le16toh (nk->name_len); + size_t seg_len = block_len (h, node, NULL); + if (sizeof (struct ntreg_nk_record) + len - 1 > seg_len) { + if (h->msglvl >= 2) + printf ("hivex_node_name: returning EFAULT because node name is too long (%zu, %zu)\n", + len, seg_len); + errno = EFAULT; + return NULL; + } + + char *ret = malloc (len + 1); + if (ret == NULL) + return NULL; + memcpy (ret, nk->name, len); + ret[len] = '\0'; + return ret; +} + +#if 0 +/* I think the documentation for the sk and classname fields in the nk + * record is wrong, or else the offset field is in the wrong place. + * Otherwise this makes no sense. Disabled this for now -- it's not + * useful for reading the registry anyway. + */ + +hive_security_h +hivex_node_security (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return 0; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + hive_node_h ret = le32toh (nk->sk); + ret += 0x1000; + if (!IS_VALID_BLOCK (h, ret)) { + errno = EFAULT; + return 0; + } + return ret; +} + +hive_classname_h +hivex_node_classname (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return 0; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + hive_node_h ret = le32toh (nk->classname); + ret += 0x1000; + if (!IS_VALID_BLOCK (h, ret)) { + errno = EFAULT; + return 0; + } + return ret; +} +#endif + +hive_node_h * +hivex_node_children (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return NULL; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + size_t nr_subkeys_in_nk = le32toh (nk->nr_subkeys); + + /* Deal with the common "no subkeys" case quickly. */ + hive_node_h *ret; + if (nr_subkeys_in_nk == 0) { + ret = malloc (sizeof (hive_node_h)); + if (ret == NULL) + return NULL; + ret[0] = 0; + return ret; + } + + /* Arbitrarily limit the number of subkeys we will ever deal with. */ + if (nr_subkeys_in_nk > 1000000) { + errno = ERANGE; + return NULL; + } + + /* The subkey_lf field can point either to an lf-record, which is + * the common case, or if there are lots of subkeys, to an + * ri-record. + */ + size_t subkey_lf = le32toh (nk->subkey_lf); + subkey_lf += 0x1000; + if (!IS_VALID_BLOCK (h, subkey_lf)) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because subkey_lf is not a valid block (%zu)\n", + subkey_lf); + errno = EFAULT; + return NULL; + } + + struct ntreg_hbin_block *block + (struct ntreg_hbin_block *) (h->addr + subkey_lf); + + /* Points to lf-record? (Note, also "lh" but that is basically the + * same as "lf" as far as we are concerned here). + */ + if (block->id[0] == 'l' && (block->id[1] == 'f' || block->id[1] == 'h')) { + struct ntreg_lf_record *lf = (struct ntreg_lf_record *) block; + + /* Check number of subkeys in the nk-record matches number of subkeys + * in the lf-record. + */ + size_t nr_subkeys_in_lf = le16toh (lf->nr_keys); + + if (h->msglvl >= 2) + printf ("hivex_node_children: nr_subkeys_in_nk = %zu, nr_subkeys_in_lf = %zu\n", + nr_subkeys_in_nk, nr_subkeys_in_lf); + + if (nr_subkeys_in_nk != nr_subkeys_in_lf) { + errno = ENOTSUP; + return NULL; + } + + size_t len = block_len (h, subkey_lf, NULL); + if (8 + nr_subkeys_in_lf * 8 > len) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because too many subkeys (%zu, %zu)\n", + nr_subkeys_in_lf, len); + errno = EFAULT; + return NULL; + } + + /* Allocate space for the returned values. Note that + * nr_subkeys_in_lf is limited to a 16 bit value. + */ + ret = malloc ((1 + nr_subkeys_in_lf) * sizeof (hive_node_h)); + if (ret == NULL) + return NULL; + + size_t i; + for (i = 0; i < nr_subkeys_in_lf; ++i) { + hive_node_h subkey = lf->keys[i].offset; + subkey += 0x1000; + if (!IS_VALID_BLOCK (h, subkey)) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because subkey is not a valid block (%zu)\n", + subkey); + errno = EFAULT; + free (ret); + return NULL; + } + ret[i] = subkey; + } + ret[i] = 0; + return ret; + } + /* Points to ri-record? */ + else if (block->id[0] == 'r' && block->id[1] == 'i') { + struct ntreg_ri_record *ri = (struct ntreg_ri_record *) block; + + size_t nr_offsets = le16toh (ri->nr_offsets); + + /* Count total number of children. */ + size_t i, count = 0; + for (i = 0; i < nr_offsets; ++i) { + hive_node_h offset = ri->offset[i]; + offset += 0x1000; + if (!IS_VALID_BLOCK (h, offset)) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because ri-offset is not a valid block (%zu)\n", + offset); + errno = EFAULT; + return NULL; + } + if (!BLOCK_ID_EQ (h, offset, "lf") && !BLOCK_ID_EQ (h, offset, "lh")) { + errno = ENOTSUP; + return NULL; + } + + struct ntreg_lf_record *lf + (struct ntreg_lf_record *) (h->addr + offset); + + count += le16toh (lf->nr_keys); + } + + if (h->msglvl >= 2) + printf ("hivex_node_children: nr_subkeys_in_nk = %zu, counted = %zu\n", + nr_subkeys_in_nk, count); + + if (nr_subkeys_in_nk != count) { + errno = ENOTSUP; + return NULL; + } + + /* Copy list of children. Note nr_subkeys_in_nk is limited to + * something reasonable above. + */ + ret = malloc ((1 + nr_subkeys_in_nk) * sizeof (hive_node_h)); + if (ret == NULL) + return NULL; + + count = 0; + for (i = 0; i < nr_offsets; ++i) { + hive_node_h offset = ri->offset[i]; + offset += 0x1000; + if (!IS_VALID_BLOCK (h, offset)) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because ri-offset is not a valid block (%zu)\n", + offset); + errno = EFAULT; + return NULL; + } + if (!BLOCK_ID_EQ (h, offset, "lf") && !BLOCK_ID_EQ (h, offset, "lh")) { + errno = ENOTSUP; + return NULL; + } + + struct ntreg_lf_record *lf + (struct ntreg_lf_record *) (h->addr + offset); + + size_t j; + for (j = 0; j < le16toh (lf->nr_keys); ++j) { + hive_node_h subkey = lf->keys[j].offset; + subkey += 0x1000; + if (!IS_VALID_BLOCK (h, subkey)) { + if (h->msglvl >= 2) + printf ("hivex_node_children: returning EFAULT because indirect subkey is not a valid block (%zu)\n", + subkey); + errno = EFAULT; + free (ret); + return NULL; + } + ret[count++] = subkey; + } + } + ret[count] = 0; + + return ret; + } + else { + errno = ENOTSUP; + return NULL; + } +} + +/* Very inefficient, but at least having a separate API call + * allows us to make it more efficient in future. + */ +hive_node_h +hivex_node_get_child (hive_h *h, hive_node_h node, const char *nname) +{ + hive_node_h *children = NULL; + char *name = NULL; + hive_node_h ret = 0; + + children = hivex_node_children (h, node); + if (!children) goto error; + + size_t i; + for (i = 0; children[i] != 0; ++i) { + name = hivex_node_name (h, children[i]); + if (!name) goto error; + if (strcasecmp (name, nname) == 0) { + ret = children[i]; + break; + } + free (name); name = NULL; + } + + error: + free (children); + free (name); + return ret; +} + +hive_node_h +hivex_node_parent (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return 0; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + hive_node_h ret = le32toh (nk->parent); + ret += 0x1000; + printf ("parent = %zu\n", ret); + if (!IS_VALID_BLOCK (h, ret)) { + if (h->msglvl >= 2) + printf ("hivex_node_parent: returning EFAULT because parent is not a valid block (%zu)\n", + ret); + errno = EFAULT; + return 0; + } + return ret; +} + +hive_value_h * +hivex_node_values (hive_h *h, hive_node_h node) +{ + if (!IS_VALID_BLOCK (h, node) || !BLOCK_ID_EQ (h, node, "nk")) { + errno = EINVAL; + return 0; + } + + struct ntreg_nk_record *nk = (struct ntreg_nk_record *) (h->addr + node); + + size_t nr_values = le32toh (nk->nr_values); + + if (h->msglvl >= 2) + printf ("hivex_node_values: nr_values = %zu\n", nr_values); + + /* Deal with the common "no values" case quickly. */ + hive_node_h *ret; + if (nr_values == 0) { + ret = malloc (sizeof (hive_node_h)); + if (ret == NULL) + return NULL; + ret[0] = 0; + return ret; + } + + /* Arbitrarily limit the number of values we will ever deal with. */ + if (nr_values > 100000) { + errno = ERANGE; + return NULL; + } + + /* Get the value list and check it looks reasonable. */ + size_t vlist_offset = le32toh (nk->vallist); + vlist_offset += 0x1000; + if (!IS_VALID_BLOCK (h, vlist_offset)) { + if (h->msglvl >= 2) + printf ("hivex_node_values: returning EFAULT because value list is not a valid block (%zu)\n", + vlist_offset); + errno = EFAULT; + return NULL; + } + + struct ntreg_value_list *vlist + (struct ntreg_value_list *) (h->addr + vlist_offset); + + size_t len = block_len (h, vlist_offset, NULL); + if (4 + nr_values * 4 > len) { + if (h->msglvl >= 2) + printf ("hivex_node_values: returning EFAULT because value list is too long (%zu, %zu)\n", + nr_values, len); + errno = EFAULT; + return NULL; + } + + /* Allocate return array and copy values in. */ + ret = malloc ((1 + nr_values) * sizeof (hive_node_h)); + if (ret == NULL) + return NULL; + + size_t i; + for (i = 0; i < nr_values; ++i) { + hive_node_h value = vlist->offset[i]; + value += 0x1000; + if (!IS_VALID_BLOCK (h, value)) { + if (h->msglvl >= 2) + printf ("hivex_node_values: returning EFAULT because value is not a valid block (%zu)\n", + value); + errno = EFAULT; + free (ret); + return NULL; + } + ret[i] = value; + } + + ret[i] = 0; + return ret; +} + +/* Very inefficient, but at least having a separate API call + * allows us to make it more efficient in future. + */ +hive_value_h +hivex_node_get_value (hive_h *h, hive_node_h node, const char *key) +{ + hive_value_h *values = NULL; + char *name = NULL; + hive_value_h ret = 0; + + values = hivex_node_values (h, node); + if (!values) goto error; + + size_t i; + for (i = 0; values[i] != 0; ++i) { + name = hivex_value_key (h, values[i]); + if (!name) goto error; + if (strcasecmp (name, key) == 0) { + ret = values[i]; + break; + } + free (name); name = NULL; + } + + error: + free (values); + free (name); + return ret; +} + +char * +hivex_value_key (hive_h *h, hive_value_h value) +{ + if (!IS_VALID_BLOCK (h, value) || !BLOCK_ID_EQ (h, value, "vk")) { + errno = EINVAL; + return 0; + } + + struct ntreg_vk_record *vk = (struct ntreg_vk_record *) (h->addr + value); + + /* AFAIK the key is always plain ASCII, so no conversion to UTF-8 is + * necessary. However we do need to nul-terminate the string. + */ + + /* vk->name_len is unsigned, 16 bit, so this is safe ... However + * we have to make sure the length doesn't exceed the block length. + */ + size_t len = le16toh (vk->name_len); + size_t seg_len = block_len (h, value, NULL); + if (sizeof (struct ntreg_vk_record) + len - 1 > seg_len) { + if (h->msglvl >= 2) + printf ("hivex_value_key: returning EFAULT because key length is too long (%zu, %zu)\n", + len, seg_len); + errno = EFAULT; + return NULL; + } + + char *ret = malloc (len + 1); + if (ret == NULL) + return NULL; + memcpy (ret, vk->name, len); + ret[len] = '\0'; + return ret; +} + +int +hivex_value_type (hive_h *h, hive_value_h value, hive_type *t, size_t *len) +{ + if (!IS_VALID_BLOCK (h, value) || !BLOCK_ID_EQ (h, value, "vk")) { + errno = EINVAL; + return -1; + } + + struct ntreg_vk_record *vk = (struct ntreg_vk_record *) (h->addr + value); + + if (t) + *t = le32toh (vk->data_type); + + if (len) { + *len = le32toh (vk->data_len); + if (*len == 0x80000000) { /* special case */ + *len = 4; + if (t) *t = hive_t_dword; + } + *len &= 0x7fffffff; + } + + return 0; +} + +char * +hivex_value_value (hive_h *h, hive_value_h value, + hive_type *t_rtn, size_t *len_rtn) +{ + if (!IS_VALID_BLOCK (h, value) || !BLOCK_ID_EQ (h, value, "vk")) { + errno = EINVAL; + return NULL; + } + + struct ntreg_vk_record *vk = (struct ntreg_vk_record *) (h->addr + value); + + hive_type t; + size_t len; + + t = le32toh (vk->data_type); + + len = le32toh (vk->data_len); + if (len == 0x80000000) { /* special case */ + len = 4; + t = hive_t_dword; + } + len &= 0x7fffffff; + + if (h->msglvl >= 2) + printf ("hivex_value_value: value=%zu, t=%d, len=%zu\n", + value, t, len); + + if (t_rtn) + *t_rtn = t; + if (len_rtn) + *len_rtn = len; + + /* Arbitrarily limit the length that we will read. */ + if (len > 1000000) { + errno = ERANGE; + return NULL; + } + + char *ret = malloc (len); + if (ret == NULL) + return NULL; + + /* If length is <= 4 it's always stored inline. */ + if (len <= 4) { + memcpy (ret, (char *) &vk->data_offset, len); + return ret; + } + + size_t data_offset = vk->data_offset; + data_offset += 0x1000; + if (!IS_VALID_BLOCK (h, data_offset)) { + if (h->msglvl >= 2) + printf ("hivex_value_value: returning EFAULT because data offset is not a valid block (%zu)\n", + data_offset); + errno = EFAULT; + free (ret); + return NULL; + } + + /* Check that the declared size isn't larger than the block its in. */ + size_t blen = block_len (h, data_offset, NULL); + if (blen < len) { + if (h->msglvl >= 2) + printf ("hivex_value_value: returning EFAULT because data is longer than its block (%zu, %zu)\n", + blen, len); + errno = EFAULT; + free (ret); + return NULL; + } + + char *data = h->addr + data_offset + 4; + memcpy (ret, data, len); + return ret; +} + +static char * +windows_utf16_to_utf8 (/* const */ char *input, size_t len) +{ + iconv_t ic = iconv_open ("UTF-8", "UTF-16"); + if (ic == (iconv_t) -1) + return NULL; + + /* iconv(3) has an insane interface ... */ + + /* Mostly UTF-8 will be smaller, so this is a good initial guess. */ + size_t outalloc = len; + + again:; + size_t inlen = len; + size_t outlen = outalloc; + char *out = malloc (outlen + 1); + if (out == NULL) { + int err = errno; + iconv_close (ic); + errno = err; + return NULL; + } + char *inp = input; + char *outp = out; + + size_t r = iconv (ic, &inp, &inlen, &outp, &outlen); + if (r == (size_t) -1) { + if (errno == E2BIG) { + /* Try again with a larger output buffer. */ + free (out); + outalloc *= 2; + goto again; + } + else { + /* Else some conversion failure, eg. EILSEQ, EINVAL. */ + int err = errno; + iconv_close (ic); + free (out); + errno = err; + return NULL; + } + } + + *outp = '\0'; + iconv_close (ic); + + return out; +} + +char * +hivex_value_string (hive_h *h, hive_value_h value) +{ + hive_type t; + size_t len; + char *data = hivex_value_value (h, value, &t, &len); + + if (data == NULL) + return NULL; + + if (t != hive_t_string && t != hive_t_expand_string && t != hive_t_link) { + free (data); + errno = EINVAL; + return NULL; + } + + char *ret = windows_utf16_to_utf8 (data, len); + free (data); + if (ret == NULL) + return NULL; + + return ret; +} + +static void +free_strings (char **argv) +{ + if (argv) { + size_t i; + + for (i = 0; argv[i] != NULL; ++i) + free (argv[i]); + free (argv); + } +} + +/* Get the length of a UTF-16 format string. Handle the string as + * pairs of bytes, looking for the first \0\0 pair. + */ +static size_t +utf16_string_len_in_bytes (const char *str) +{ + size_t ret = 0; + + while (str[0] || str[1]) { + str += 2; + ret += 2; + } + + return ret; +} + +/* http://blogs.msdn.com/oldnewthing/archive/2009/10/08/9904646.aspx */ +char ** +hivex_value_multiple_strings (hive_h *h, hive_value_h value) +{ + hive_type t; + size_t len; + char *data = hivex_value_value (h, value, &t, &len); + + if (data == NULL) + return NULL; + + if (t != hive_t_multiple_strings) { + free (data); + errno = EINVAL; + return NULL; + } + + size_t nr_strings = 0; + char **ret = malloc ((1 + nr_strings) * sizeof (char *)); + if (ret == NULL) { + free (data); + return NULL; + } + ret[0] = NULL; + + char *p = data; + size_t plen; + + while (p < data + len && (plen = utf16_string_len_in_bytes (p)) > 0) { + nr_strings++; + char **ret2 = realloc (ret, (1 + nr_strings) * sizeof (char *)); + if (ret2 == NULL) { + free_strings (ret); + free (data); + return NULL; + } + ret = ret2; + + ret[nr_strings-1] = windows_utf16_to_utf8 (p, plen); + ret[nr_strings] = NULL; + if (ret[nr_strings-1] == NULL) { + free_strings (ret); + free (data); + return NULL; + } + + p += plen + 2 /* skip over UTF-16 \0\0 at the end of this string */; + } + + free (data); + return ret; +} + +int32_t +hivex_value_dword (hive_h *h, hive_value_h value) +{ + hive_type t; + size_t len; + char *data = hivex_value_value (h, value, &t, &len); + + if (data == NULL) + return -1; + + if ((t != hive_t_dword && t != hive_t_dword_be) || len != 4) { + free (data); + errno = EINVAL; + return -1; + } + + int32_t ret = *(int32_t*)data; + free (data); + if (t == hive_t_dword) /* little endian */ + ret = le32toh (ret); + else + ret = be32toh (ret); + + return ret; +} + +int64_t +hivex_value_qword (hive_h *h, hive_value_h value) +{ + hive_type t; + size_t len; + char *data = hivex_value_value (h, value, &t, &len); + + if (data == NULL) + return -1; + + if (t != hive_t_qword || len != 8) { + free (data); + errno = EINVAL; + return -1; + } + + int64_t ret = *(int64_t*)data; + free (data); + ret = le64toh (ret); /* always little endian */ + + return ret; +} + +int +hivex_visit (hive_h *h, const struct hivex_visitor *visitor, size_t len, + void *opaque, int flags) +{ + return hivex_visit_node (h, hivex_root (h), visitor, len, opaque, flags); +} + +static int hivex__visit_node (hive_h *h, hive_node_h node, const struct hivex_visitor *vtor, char *unvisited, void *opaque, int flags); + +int +hivex_visit_node (hive_h *h, hive_node_h node, + const struct hivex_visitor *visitor, size_t len, void *opaque, + int flags) +{ + struct hivex_visitor vtor; + memset (&vtor, 0, sizeof vtor); + + /* Note that len might be larger *or smaller* than the expected size. */ + size_t copysize = len <= sizeof vtor ? len : sizeof vtor; + memcpy (&vtor, visitor, copysize); + + /* This bitmap records unvisited nodes, so we don't loop if the + * registry contains cycles. + */ + char *unvisited = malloc (1 + h->size / 32); + if (unvisited == NULL) + return -1; + memcpy (unvisited, h->bitmap, 1 + h->size / 32); + + int r = hivex__visit_node (h, node, &vtor, unvisited, opaque, flags); + free (unvisited); + return r; +} + +static int +hivex__visit_node (hive_h *h, hive_node_h node, + const struct hivex_visitor *vtor, char *unvisited, + void *opaque, int flags) +{ + int skip_bad = flags & HIVEX_VISIT_SKIP_BAD; + char *name = NULL; + hive_value_h *values = NULL; + hive_node_h *children = NULL; + char *key = NULL; + char *str = NULL; + char **strs = NULL; + int i; + + /* Return -1 on all callback errors. However on internal errors, + * check if skip_bad is set and suppress those errors if so. + */ + int ret = -1; + + if (!BITMAP_TST (unvisited, node)) { + if (h->msglvl >= 2) + printf ("hivex__visit_node: contains cycle: visited node %zu already\n", + node); + + errno = ELOOP; + return skip_bad ? 0 : -1; + } + BITMAP_CLR (unvisited, node); + + name = hivex_node_name (h, node); + if (!name) return skip_bad ? 0 : -1; + if (vtor->node_start && vtor->node_start (h, opaque, node, name) == -1) + goto error; + + values = hivex_node_values (h, node); + if (!values) { + ret = skip_bad ? 0 : -1; + goto error; + } + + for (i = 0; values[i] != 0; ++i) { + hive_type t; + size_t len; + + if (hivex_value_type (h, values[i], &t, &len) == -1) { + ret = skip_bad ? 0 : -1; + goto error; + } + + key = hivex_value_key (h, values[i]); + if (key == NULL) { + ret = skip_bad ? 0 : -1; + goto error; + } + + switch (t) { + case hive_t_none: + str = hivex_value_value (h, values[i], &t, &len); + if (str == NULL) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (t != hive_t_none) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (vtor->value_none && + vtor->value_none (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + break; + + case hive_t_string: + case hive_t_expand_string: + case hive_t_link: + str = hivex_value_string (h, values[i]); + if (str == NULL) { + if (errno != EILSEQ && errno != EINVAL) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (vtor->value_string_invalid_utf16) { + str = hivex_value_value (h, values[i], &t, &len); + if (vtor->value_string_invalid_utf16 (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + } + break; + } + if (vtor->value_string && + vtor->value_string (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + break; + + case hive_t_dword: + case hive_t_dword_be: { + int32_t i32 = hivex_value_dword (h, values[i]); + if (vtor->value_dword && + vtor->value_dword (h, opaque, node, values[i], t, len, key, i32) == -1) + goto error; + break; + } + + case hive_t_qword: { + int64_t i64 = hivex_value_qword (h, values[i]); + if (vtor->value_qword && + vtor->value_qword (h, opaque, node, values[i], t, len, key, i64) == -1) + goto error; + break; + } + + case hive_t_binary: + str = hivex_value_value (h, values[i], &t, &len); + if (str == NULL) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (t != hive_t_binary) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (vtor->value_binary && + vtor->value_binary (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + break; + + case hive_t_multiple_strings: + strs = hivex_value_multiple_strings (h, values[i]); + if (strs == NULL) { + if (errno != EILSEQ && errno != EINVAL) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (vtor->value_string_invalid_utf16) { + str = hivex_value_value (h, values[i], &t, &len); + if (vtor->value_string_invalid_utf16 (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + } + break; + } + if (vtor->value_multiple_strings && + vtor->value_multiple_strings (h, opaque, node, values[i], t, len, key, strs) == -1) + goto error; + free_strings (strs); strs = NULL; + break; + + case hive_t_resource_list: + case hive_t_full_resource_description: + case hive_t_resource_requirements_list: + default: + str = hivex_value_value (h, values[i], &t, &len); + if (str == NULL) { + ret = skip_bad ? 0 : -1; + goto error; + } + if (vtor->value_other && + vtor->value_other (h, opaque, node, values[i], t, len, key, str) == -1) + goto error; + free (str); str = NULL; + break; + } + + free (key); key = NULL; + } + + children = hivex_node_children (h, node); + if (children == NULL) { + ret = skip_bad ? 0 : -1; + goto error; + } + + for (i = 0; children[i] != 0; ++i) { + if (h->msglvl >= 2) + printf ("hivex__visit_node: %s: visiting subkey %d (%zu)\n", + name, i, children[i]); + + if (hivex__visit_node (h, children[i], vtor, unvisited, opaque, flags) == -1) + goto error; + } + + if (vtor->node_end && vtor->node_end (h, opaque, node, name) == -1) + goto error; + + ret = 0; + + error: + free (name); + free (values); + free (children); + free (key); + free (str); + free_strings (strs); + return ret; +} diff --git a/hivex/hivex.h b/hivex/hivex.h new file mode 100644 index 0000000..14bdcc5 --- /dev/null +++ b/hivex/hivex.h @@ -0,0 +1,114 @@ +/* hivex - Windows Registry "hive" extraction library. + * Copyright (C) 2009 Red Hat Inc. + * Derived from code by Petter Nordahl-Hagen under a compatible license: + * Copyright (c) 1997-2007 Petter Nordahl-Hagen. + * Derived from code by Markus Stephany under a compatible license: + * Copyright (c)2000-2004, Markus Stephany. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * See file LICENSE for the full license. + */ + +#ifndef HIVEX_H_ +#define HIVEX_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +/* NOTE: This API is documented in the man page hivex(3). */ + +typedef struct hive_h hive_h; +typedef size_t hive_node_h; +typedef size_t hive_value_h; + +enum hive_type { + /* Just a key without a value. */ + hive_t_none = 0, + + /* A UTF-16 Windows string. */ + hive_t_string = 1, + + /* A UTF-16 Windows string that contains %env% (environment variable + * substitutions). + */ + hive_t_expand_string = 2, + + /* A blob of binary. */ + hive_t_binary = 3, + + /* Two ways to encode DWORDs (32 bit words). The first is little-endian. */ + hive_t_dword = 4, + hive_t_dword_be = 5, + + /* Symbolic link, we think to another part of the registry tree. */ + hive_t_link = 6, + + /* Multiple UTF-16 Windows strings, each separated by zero byte. See: + * http://blogs.msdn.com/oldnewthing/archive/2009/10/08/9904646.aspx + */ + hive_t_multiple_strings = 7, + + /* These three are unknown. */ + hive_t_resource_list = 8, + hive_t_full_resource_description = 9, + hive_t_resource_requirements_list = 10, + + /* A QWORD (64 bit word). This is stored in the file little-endian. */ + hive_t_qword = 11 +}; + +typedef enum hive_type hive_type; + +#define HIVEX_OPEN_VERBOSE 1 +#define HIVEX_OPEN_DEBUG 2 +#define HIVEX_OPEN_MSGLVL_MASK 3 + +extern hive_h *hivex_open (const char *filename, int flags); +extern int hivex_close (hive_h *h); +extern hive_node_h hivex_root (hive_h *h); +extern char *hivex_node_name (hive_h *h, hive_node_h node); +extern hive_node_h *hivex_node_children (hive_h *h, hive_node_h node); +extern hive_node_h hivex_node_get_child (hive_h *h, hive_node_h node, const char *name); +extern hive_node_h hivex_node_parent (hive_h *h, hive_node_h node); +extern hive_value_h *hivex_node_values (hive_h *h, hive_node_h node); +extern hive_value_h hivex_node_get_value (hive_h *h, hive_node_h node, const char *key); +extern char *hivex_value_key (hive_h *h, hive_value_h value); +extern int hivex_value_type (hive_h *h, hive_value_h value, hive_type *t, size_t *len); +extern char *hivex_value_value (hive_h *h, hive_value_h value, hive_type *t, size_t *len); +extern char *hivex_value_string (hive_h *h, hive_value_h value); +extern char **hivex_value_multiple_strings (hive_h *h, hive_value_h value); +extern int32_t hivex_value_dword (hive_h *h, hive_value_h value); +extern int64_t hivex_value_qword (hive_h *h, hive_value_h value); +struct hivex_visitor { + int (*node_start) (hive_h *, void *opaque, hive_node_h, const char *name); + int (*node_end) (hive_h *, void *opaque, hive_node_h, const char *name); + int (*value_string) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *str); + int (*value_multiple_strings) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, char **argv); + int (*value_string_invalid_utf16) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *str); + int (*value_dword) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, int32_t); + int (*value_qword) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, int64_t); + int (*value_binary) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); + int (*value_none) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); + int (*value_other) (hive_h *, void *opaque, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); +}; + +#define HIVEX_VISIT_SKIP_BAD 1 + +extern int hivex_visit (hive_h *h, const struct hivex_visitor *visitor, size_t len, void *opaque, int flags); +extern int hivex_visit_node (hive_h *h, hive_node_h node, const struct hivex_visitor *visitor, size_t len, void *opaque, int flags); + +#ifdef __cplusplus +} +#endif + +#endif /* HIVEX_H_ */ diff --git a/hivex/hivex.pod b/hivex/hivex.pod new file mode 100644 index 0000000..1078bf1 --- /dev/null +++ b/hivex/hivex.pod @@ -0,0 +1,396 @@ +=encoding utf8 + +=head1 NAME + +hivex - Windows Registry "hive" extraction library + +=head1 SYNOPSIS + + hive_h *hivex_open (const char *filename, int flags); + int hivex_close (hive_h *h); + +=head1 DESCRIPTION + +libhivex is a library for extracting the contents of Windows Registry +"hive" files. It is designed to be secure against buggy or malicious +registry files, and to have limited functionality (writing or +modifying these files is not in the scope of this library). + +Unlike many other tools in this area, it doesn't use the textual .REG +format for output, because parsing that is as much trouble as parsing +the original binary format. Instead it makes the file available +through a C API, or there is a separate program to export the hive as +XML (see L<hivexml(1)>), or to get individual keys (see +L<hivexget(1)>). + +=head2 OPENING AND CLOSING A HIVE + +=over 4 + +=item hive_h *hivex_open (const char *filename, int flags); + +Opens the hive named C<filename> for reading. + +Flags is an ORed list of the open flags (or C<0> if you don't +want to pass any flags). Currently the only +flags defined are: + +=over 4 + +=item HIVEX_OPEN_VERBOSE + +Verbose messages. + +=item HIVEX_OPEN_DEBUG + +Very verbose messages, suitable for debugging problems in the library +itself. + +This is also selected if the C<HIVEX_DEBUG> environment variable +is set to 1. + +=back + +C<hivex_open> returns a hive handle. On error this returns NULL and +sets C<errno> to indicate the error. + +=item int hivex_close (hive_h *h); + +Close a hive handle and free all associated resources. + +Returns 0 on success. On error this returns -1 and sets errno. + +=back + +=head2 NAVIGATING THE TREE OF HIVE SUBKEYS + +=over 4 + +=item hive_node_h hivex_root (hive_h *h); + +Return root node of the hive. All valid registries must contain +a root node. + +On error this returns 0 and sets errno. + +=item char *hivex_node_name (hive_h *h, hive_node_h node); + +Return the name of the node. The name is reencoded as UTF-8 +and returned as a C string. + +The string should be freed by the caller when it is no longer needed. + +Note that the name of the root node is a dummy, such as +C<$$$PROTO.HIV> (other names are possible: it seems to depend on the +tool or program that created the hive in the first place). You can +only know the "real" name of the root node by knowing which registry +file this hive originally comes from, which is knowledge that is +outside the scope of this library. + +On error this returns NULL and sets errno. + +=item hive_node_h *hivex_node_children (hive_h *h, hive_node_h node); + +Return a 0-terminated array of nodes which are the subkeys +(children) of C<node>. + +The array should be freed by the caller when it is no longer needed. + +On error this returns NULL and sets errno. + +=item hive_node_h hivex_node_get_child (hive_h *h, hive_node_h node, const char *name); + +Return the child of node with the name C<name>, if it exists. + +The name is matched case insensitively. + +If the child node does not exist, this returns 0 without +setting errno. + +On error this returns 0 and sets errno. + +=item hive_node_h hivex_node_parent (hive_h *h, hive_node_h node); + +Return the parent of C<node>. + +On error this returns 0 and sets errno. + +The parent pointer of the root node in registry files that we +have examined seems to be invalid, and so this function will +return an error if called on the root node. + +=back + +=head2 GETTING VALUES AT A NODE + +The enum below describes the possible types for the value(s) +stored at each node. + + enum hive_type { + hive_t_none = 0, + hive_t_string = 1, + hive_t_expand_string = 2, + hive_t_binary = 3, + hive_t_dword = 4, + hive_t_dword_be = 5, + hive_t_link = 6, + hive_t_multiple_strings = 7, + hive_t_resource_list = 8, + hive_t_full_resource_description = 9, + hive_t_resource_requirements_list = 10, + hive_t_qword = 11 + }; + +=over 4 + +=item hive_value_h *hivex_node_values (hive_h *h, hive_node_h node); + +Return the 0-terminated array of (key, value) pairs attached to +this node. + +The array should be freed by the caller when it is no longer needed. + +On error this returns NULL and sets errno. + +=item hive_value_h hivex_node_get_value (hive_h *h, hive_node_h node, const char *key); + +Return the value attached to this node which has the name C<key>, +if it exists. + +The key name is matched case insensitively. + +Note that to get the default key, you should pass the empty +string C<""> here. The default key is often written C<"@">, but +inside hives that has no meaning and won't give you the +default key. + +If no such key exists, this returns 0 and does not set errno. + +On error this returns 0 and sets errno. + +=item char *hivex_value_key (hive_h *h, hive_value_h value); + +Return the key (name) of a (key, value) pair. The name +is reencoded as UTF-8 and returned as a C string. + +The string should be freed by the caller when it is no longer needed. + +Note that this function can return a zero-length string. In the +context of Windows Registries, this means that this value is the +default key for this node in the tree. This is usually written +as C<"@">. + +On error this returns NULL and sets errno. + +=item int hivex_value_type (hive_h *h, hive_value_h value, hive_type *t, size_t *len); + +Return the data type and length of the value in this (key, value) +pair. See also C<hivex_value_value> which returns all this +information, and the value itself. Also, C<hivex_value_*> functions +below which can be used to return the value in a more useful form when +you know the type in advance. + +Returns 0 on success. On error this returns -1 and sets errno. + +=item char *hivex_value_value (hive_h *h, hive_value_h value, hive_type *t, size_t *len); + +Return the value of this (key, value) pair. The value should +be interpreted according to its type (see C<hive_type>). + +The value is returned in an array of bytes of length C<len>. + +The value should be freed by the caller when it is no longer needed. + +On error this returns NULL and sets errno. + +=item char *hivex_value_string (hive_h *h, hive_value_h value); + +If this value is a string, return the string reencoded as UTF-8 +(as a C string). This only works for values which have type +C<hive_t_string>, C<hive_t_expand_string> or C<hive_t_link>. + +The string should be freed by the caller when it is no longer needed. + +On error this returns NULL and sets errno. + +=item char **hivex_value_multiple_strings (hive_h *h, hive_value_h value); + +If this value is a multiple-string, return the strings reencoded +as UTF-8 (as a NULL-terminated array of C strings). This only +works for values which have type C<hive_t_multiple_strings>. + +The string array and each string in it should be freed by the +caller when they are no longer needed. + +On error this returns NULL and sets errno. + +=item int32_t hivex_value_dword (hive_h *h, hive_value_h value); + +If this value is a DWORD (Windows int32), return it. This only works +for values which have type C<hive_t_dword> or C<hive_t_dword_be>. + +=item int64_t hivex_value_qword (hive_h *h, hive_value_h value); + +If this value is a QWORD (Windows int64), return it. This only +works for values which have type C<hive_t_qword>. + +=back + +=head2 VISITING ALL NODES + +The visitor pattern is useful if you want to visit all nodes +in the tree or all nodes below a certain point in the tree. + +First you set up your own C<struct hivex_visitor> with your +callback functions. + +Each of these callback functions should return 0 on success or -1 +on error. If any callback returns -1, then the entire visit +terminates immediately. If you don't need a callback function at +all, set the function pointer to NULL. + + struct hivex_visitor { + int (*node_start) (hive_h *, void *opaque, hive_node_h, const char *name); + int (*node_end) (hive_h *, void *opaque, hive_node_h, const char *name); + int (*value_string) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, const char *str); + int (*value_multiple_strings) (hive_h *, void *opaque, hive_node_h, + hive_value_h, hive_type t, size_t len, const char *key, char **argv); + int (*value_string_invalid_utf16) (hive_h *, void *opaque, hive_node_h, + hive_value_h, hive_type t, size_t len, const char *key, + const char *str); + int (*value_dword) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, int32_t); + int (*value_qword) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, int64_t); + int (*value_binary) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, const char *value); + int (*value_none) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, const char *value); + int (*value_other) (hive_h *, void *opaque, hive_node_h, hive_value_h, + hive_type t, size_t len, const char *key, const char *value); + }; + +=over 4 + +=item int hivex_visit (hive_h *h, const struct hivex_visitor *visitor, size_t len, void *opaque, int flags); + +Visit all the nodes recursively in the hive C<h>. + +C<visitor> should be a C<hivex_visitor> structure with callback +fields filled in as required (unwanted callbacks can be set to +NULL). C<len> must be the length of the 'visitor' struct (you +should pass C<sizeof (struct hivex_visitor)> for this). + +This returns 0 if the whole recursive visit was completed +successfully. On error this returns -1. If one of the callback +functions returned an error than we don't touch errno. If the +error was generated internally then we set errno. + +You can skip bad registry entries by setting C<flag> to +C<HIVEX_VISIT_SKIP_BAD>. If this flag is not set, then a bad registry +causes the function to return an error immediately. + +This function is robust if the registry contains cycles or +pointers which are invalid or outside the registry. It detects +these cases and returns an error. + +=item int hivex_visit_node (hive_h *h, hive_node_h node, const struct hivex_visitor *visitor, size_t len, void *opaque); + +Same as C<hivex_visit> but instead of starting out at the root, this +starts at C<node>. + +=back + +=head1 THE STRUCTURE OF THE WINDOWS REGISTRY + +Note: To understand the relationship between hives and the common +Windows Registry keys (like C<HKEY_LOCAL_MACHINE>) please see the +Wikipedia page on the Windows Registry. + +The Windows Registry is split across various binary files, each +file being known as a "hive". This library only handles a single +hive file at a time. + +Hives are n-ary trees with a single root. Each node in the tree +has a name. + +Each node in the tree (including non-leaf nodes) may have an +arbitrary list of (key, value) pairs attached to it. It may +be the case that one of these pairs has an empty key. This +is referred to as the default key for the node. + +The (key, value) pairs are the place where the useful data is +stored in the registry. The key is always a string (possibly the +empty string for the default key). The value is a typed object +(eg. string, int32, binary, etc.). + +=head1 NOTE ON THE USE OF ERRNO + +Many functions in this library set errno to indicate errors. +These are the values of errno you may encounter: + +=over 4 + +=item ENOTSUP + +Corrupt or unsupported Registry file format. + +=item ENOKEY + +Missing root key. + +=item EINVAL + +Passed an invalid argument to the function. + +=item EFAULT + +Followed a Registry pointer which goes outside +the registry or outside a registry block. + +=item ELOOP + +Registry contains cycles. + +=item ERANGE + +Field in the registry out of range. + +=back + +=head1 SEE ALSO + +L<hivexml(1)>, +L<hivexget(1)>, +L<virt-win-reg(1)>, +L<guestfs(3)>, +L<http://libguestfs.org/>, +L<virt-cat(1)>, +L<virt-edit(1)>. + +=head1 AUTHORS + +Richard W.M. Jones (C<rjones at redhat dot com>) + +=head1 COPYRIGHT + +Copyright (C) 2009 Red Hat Inc. + +Derived from code by Petter Nordahl-Hagen under a compatible license: +Copyright (c) 1997-2007 Petter Nordahl-Hagen. + +Derived from code by Markus Stephany under a compatible license: +Copyright (c)2000-2004, Markus Stephany. + +This library is free software; you can redistribute it and/or +modify it under the terms of the GNU Lesser General Public +License as published by the Free Software Foundation; +version 2.1 of the License. + +This library is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Lesser General Public License for more details. + +See file LICENSE for the full license. diff --git a/hivex/hivexget.c b/hivex/hivexget.c new file mode 100644 index 0000000..9bb6bbb --- /dev/null +++ b/hivex/hivexget.c @@ -0,0 +1,266 @@ +/* hivexget - Get single subkeys or values from a hive. + * Copyright (C) 2009 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> +#include <errno.h> + +#include "hivex.h" + +int +main (int argc, char *argv[]) +{ + if (argc < 3 || argc > 4) { + fprintf (stderr, "hivexget regfile path [key]\n"); + exit (1); + } + char *file = argv[1]; + char *path = argv[2]; + char *key = argv[3]; /* could be NULL */ + + if (path[0] != '\\') { + fprintf (stderr, "hivexget: path must start with a \\ character\n"); + exit (1); + } + if (path[1] == '\\') { + doubled: + fprintf (stderr, "hivexget: %s: \\ characters in path are doubled - are you escaping the path parameter correctly?\n", path); + exit (1); + } + + hive_h *h = hivex_open (file, 0); + if (h == NULL) { + error: + perror (file); + exit (1); + } + + /* Navigate to the desired node. */ + hive_node_h node = hivex_root (h); + if (!node) + goto error; + + char *p = path+1, *pnext; + size_t len; + while (*p) { + len = strcspn (p, "\\"); + + if (len == 0) + goto doubled; + + if (p[len] == '\\') { + p[len] = '\0'; + pnext = p + len + 1; + } else + pnext = p + len; + + errno = 0; + node = hivex_node_get_child (h, node, p); + if (node == 0) { + if (errno) + goto error; + /* else node not found */ + fprintf (stderr, "hivexget: %s: %s: path element not found\n", + path, p); + exit (2); + } + + p = pnext; + } + + /* Get the desired key, or print all keys. */ + if (key) { + hive_value_h value; + + errno = 0; + if (key[0] == '@' && key[1] == '\0') /* default key written as "@" */ + value = hivex_node_get_value (h, node, ""); + else + value = hivex_node_get_value (h, node, key); + + if (value == 0) { + if (errno) + goto error; + /* else key not found */ + fprintf (stderr, "hivexget: %s: key not found\n", key); + exit (2); + } + + /* Print the value. */ + hive_type t; + size_t len; + if (hivex_value_type (h, value, &t, &len) == -1) + goto error; + + switch (t) { + case hive_t_string: + case hive_t_expand_string: + case hive_t_link: { + char *str = hivex_value_string (h, value); + if (!str) + goto error; + + puts (str); /* note: this adds a single \n character */ + free (str); + break; + } + + case hive_t_dword: + case hive_t_dword_be: { + int32_t j = hivex_value_dword (h, value); + printf ("%" PRIi32 "\n", j); + break; + } + + case hive_t_qword: { + int64_t j = hivex_value_qword (h, value); + printf ("%" PRIi64 "\n", j); + break; + } + + case hive_t_multiple_strings: { + char **strs = hivex_value_multiple_strings (h, value); + if (!strs) + goto error; + size_t j; + for (j = 0; strs[j] != NULL; ++j) { + puts (strs[j]); + free (strs[j]); + } + free (strs); + break; + } + + case hive_t_none: + case hive_t_binary: + case hive_t_resource_list: + case hive_t_full_resource_description: + case hive_t_resource_requirements_list: + default: { + char *data = hivex_value_value (h, value, &t, &len); + if (!data) + goto error; + + if (fwrite (data, 1, len, stdout) != len) + goto error; + + free (data); + break; + } + } /* switch */ + } else { + /* No key specified, so print all keys in this node. We do this + * in a format which looks like the output of regedit, although + * this isn't a particularly useful format. + */ + hive_value_h *values; + + values = hivex_node_values (h, node); + if (values == NULL) + goto error; + + size_t i; + for (i = 0; values[i] != 0; ++i) { + char *key = hivex_value_key (h, values[i]); + if (!key) goto error; + + if (*key) { + putchar ('"'); + size_t j; + for (j = 0; key[j] != 0; ++j) { + if (key[j] == '"' || key[j] == '\\') + putchar ('\\'); + putchar (key[j]); + } + putchar ('"'); + } else + printf ("\"@\""); /* default key in regedit files */ + putchar ('='); + free (key); + + hive_type t; + size_t len; + if (hivex_value_type (h, values[i], &t, &len) == -1) + goto error; + + switch (t) { + case hive_t_string: + case hive_t_expand_string: + case hive_t_link: { + char *str = hivex_value_string (h, values[i]); + if (!str) + goto error; + + if (t != hive_t_string) + printf ("str(%d):", t); + putchar ('"'); + size_t j; + for (j = 0; str[j] != 0; ++j) { + if (str[j] == '"' || str[j] == '\\') + putchar ('\\'); + putchar (str[j]); + } + putchar ('"'); + free (str); + break; + } + + case hive_t_dword: + case hive_t_dword_be: { + int32_t j = hivex_value_dword (h, values[i]); + printf ("dword:%08" PRIx32 "\"", j); + break; + } + + case hive_t_qword: /* sic */ + case hive_t_none: + case hive_t_binary: + case hive_t_multiple_strings: + case hive_t_resource_list: + case hive_t_full_resource_description: + case hive_t_resource_requirements_list: + default: { + char *data = hivex_value_value (h, values[i], &t, &len); + if (!data) + goto error; + + printf ("hex(%d):", t); + size_t j; + for (j = 0; j < len; ++j) { + if (j > 0) + putchar (','); + printf ("%02x", data[j]); + } + break; + } + } /* switch */ + + putchar ('\n'); + } /* for */ + + free (values); + } + + if (hivex_close (h) == -1) + goto error; + + exit (0); +} diff --git a/hivex/hivexget.pod b/hivex/hivexget.pod new file mode 100644 index 0000000..fa390e0 --- /dev/null +++ b/hivex/hivexget.pod @@ -0,0 +1,94 @@ +=encoding utf8 + +=head1 NAME + +hivexget - Get subkey from a Windows Registry binary "hive" file + +=head1 SYNOPSIS + + hivexget hivefile '\Path\To\SubKey' + + hivexget hivefile '\Path\To\SubKey' name + +=head1 DESCRIPTION + +I<Note:> This is a low-level tool. For a more convenient way to +navigate the Windows Registry in Windows virtual machines, see +L<virt-win-reg(1)>. + +This program navigates through a Windows Registry binary "hive" +file and extracts I<either> all the (key, value) data pairs +stored in that subkey I<or> just the single named data item. + +In the first form: + + hivexget hivefile '\Path\To\SubKey' + +C<hivefile> is some Windows Registry binary hive, and C<\Path\To\Subkey> +is a path within that hive. I<NB> the path is relative to the top +of this hive, and is I<not> the full path as you would use in Windows +(eg. C<\HKEY_LOCAL_MACHINE> is not a valid path). + +If the subkey exists, then the output lists all data pairs under this +subkey, in a format compatible with C<regedit> in Windows. + +In the second form: + + hivexget hivefile '\Path\To\SubKey' name + +C<hivefile> and path are as above. C<name> is the name of the value +of interest (use C<@> for the default value). + +The corresponding data item is printed "raw" (ie. no processing or +escaping) except: + +=over 4 + +=item 1 + +If it's a string we will convert it from Windows UTF-16 to UTF-8, if +this conversion is possible. The string is printed with a single +trailing newline. + +=item 2 + +If it's a multiple-string value, each string is printed on a separate +line. + +=item 3 + +If it's a numeric value, it is printed as a decimal number. + +=back + +=head1 SEE ALSO + +L<hivex(3)>, +L<hivexml(1)>, +L<virt-win-reg(1)>, +L<guestfs(3)>, +L<http://libguestfs.org/>, +L<virt-cat(1)>, +L<virt-edit(1)>. + +=head1 AUTHORS + +Richard W.M. Jones (C<rjones at redhat dot com>) + +=head1 COPYRIGHT + +Copyright (C) 2009 Red Hat Inc. + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License along +with this program; if not, write to the Free Software Foundation, Inc., +51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. diff --git a/hivex/hivexml.c b/hivex/hivexml.c new file mode 100644 index 0000000..af3de9e --- /dev/null +++ b/hivex/hivexml.c @@ -0,0 +1,330 @@ +/* hivexml - Convert Windows Registry "hive" to XML file. + * Copyright (C) 2009 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> +#include <unistd.h> +#include <errno.h> + +#include <libxml/xmlwriter.h> + +#include "hivex.h" + +/* Callback functions. */ +static int node_start (hive_h *, void *, hive_node_h, const char *name); +static int node_end (hive_h *, void *, hive_node_h, const char *name); +static int value_string (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *str); +static int value_multiple_strings (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, char **argv); +static int value_string_invalid_utf16 (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *str); +static int value_dword (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, int32_t); +static int value_qword (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, int64_t); +static int value_binary (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); +static int value_none (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); +static int value_other (hive_h *, void *, hive_node_h, hive_value_h, hive_type t, size_t len, const char *key, const char *value); + +static struct hivex_visitor visitor = { + .node_start = node_start, + .node_end = node_end, + .value_string = value_string, + .value_multiple_strings = value_multiple_strings, + .value_string_invalid_utf16 = value_string_invalid_utf16, + .value_dword = value_dword, + .value_qword = value_qword, + .value_binary = value_binary, + .value_none = value_none, + .value_other = value_other +}; + +#define XML_CHECK(proc, args) \ + do { \ + if ((proc args) == -1) { \ + fprintf (stderr, "%s: failed to write XML document\n", #proc); \ + exit (1); \ + } \ + } while (0) + +int +main (int argc, char *argv[]) +{ + int c; + int open_flags = 0; + int visit_flags = 0; + + while ((c = getopt (argc, argv, "dk")) != EOF) { + switch (c) { + case 'd': + open_flags |= HIVEX_OPEN_DEBUG; + break; + case 'k': + visit_flags |= HIVEX_VISIT_SKIP_BAD; + break; + default: + fprintf (stderr, "hivexml [-dk] regfile > output.xml\n"); + exit (1); + } + } + + if (optind + 1 != argc) { + fprintf (stderr, "hivexml: missing name of input file\n"); + exit (1); + } + + hive_h *h = hivex_open (argv[optind], open_flags); + if (h == NULL) { + perror (argv[optind]); + exit (1); + } + + /* Note both this macro, and xmlTextWriterStartDocument leak memory. There + * doesn't seem to be any way to recover that memory, but it's not a + * large amount. + */ + LIBXML_TEST_VERSION; + + xmlTextWriterPtr writer; + writer = xmlNewTextWriterFilename ("/dev/stdout", 0); + if (writer == NULL) { + fprintf (stderr, "xmlNewTextWriterFilename: failed to create XML writer\n"); + exit (1); + } + + XML_CHECK (xmlTextWriterStartDocument, (writer, NULL, "utf-8", NULL)); + XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "hive")); + + if (hivex_visit (h, &visitor, sizeof visitor, writer, visit_flags) == -1) { + perror (argv[optind]); + exit (1); + } + + if (hivex_close (h) == -1) { + perror (argv[optind]); + exit (1); + } + + XML_CHECK (xmlTextWriterEndElement, (writer)); + XML_CHECK (xmlTextWriterEndDocument, (writer)); + xmlFreeTextWriter (writer); + + exit (0); +} + +static int +node_start (hive_h *h, void *writer_v, hive_node_h node, const char *name) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "node")); + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST "name", BAD_CAST name)); + return 0; +} + +static int +node_end (hive_h *h, void *writer_v, hive_node_h node, const char *name) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + XML_CHECK (xmlTextWriterEndElement, (writer)); + return 0; +} + +static void +start_value (xmlTextWriterPtr writer, + const char *key, const char *type, const char *encoding) +{ + XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "value")); + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST "type", BAD_CAST type)); + if (encoding) + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST "encoding", BAD_CAST encoding)); + if (*key) + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST "key", BAD_CAST key)); + else /* default key */ + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST "default", BAD_CAST "1")); +} + +static void +end_value (xmlTextWriterPtr writer) +{ + XML_CHECK (xmlTextWriterEndElement, (writer)); +} + +static int +value_string (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, const char *str) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + const char *type; + + switch (t) { + case hive_t_string: type = "string"; break; + case hive_t_expand_string: type = "expand"; break; + case hive_t_link: type = "link"; break; + + case hive_t_none: + case hive_t_binary: + case hive_t_dword: + case hive_t_dword_be: + case hive_t_multiple_strings: + case hive_t_resource_list: + case hive_t_full_resource_description: + case hive_t_resource_requirements_list: + case hive_t_qword: + abort (); /* internal error - should not happen */ + + default: + type = "unknown"; + } + + start_value (writer, key, type, NULL); + XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str)); + end_value (writer); + return 0; +} + +static int +value_multiple_strings (hive_h *h, void *writer_v, hive_node_h node, + hive_value_h value, hive_type t, size_t len, + const char *key, char **argv) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + start_value (writer, key, "string-list", NULL); + + size_t i; + for (i = 0; argv[i] != NULL; ++i) { + XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "string")); + XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i])); + XML_CHECK (xmlTextWriterEndElement, (writer)); + } + + end_value (writer); + return 0; +} + +static int +value_string_invalid_utf16 (hive_h *h, void *writer_v, hive_node_h node, + hive_value_h value, hive_type t, size_t len, + const char *key, + const char *str /* original data */) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + const char *type; + + switch (t) { + case hive_t_string: type = "bad-string"; break; + case hive_t_expand_string: type = "bad-expand"; break; + case hive_t_link: type = "bad-link"; break; + case hive_t_multiple_strings: type = "bad-string-list"; break; + + case hive_t_none: + case hive_t_binary: + case hive_t_dword: + case hive_t_dword_be: + case hive_t_resource_list: + case hive_t_full_resource_description: + case hive_t_resource_requirements_list: + case hive_t_qword: + abort (); /* internal error - should not happen */ + + default: + type = "unknown"; + } + + start_value (writer, key, type, "base64"); + XML_CHECK (xmlTextWriterWriteBase64, (writer, str, 0, len)); + end_value (writer); + + return 0; +} + +static int +value_dword (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, int32_t v) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + start_value (writer, key, "int32", NULL); + XML_CHECK (xmlTextWriterWriteFormatString, (writer, "%" PRIi32, v)); + end_value (writer); + return 0; +} + +static int +value_qword (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, int64_t v) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + start_value (writer, key, "int64", NULL); + XML_CHECK (xmlTextWriterWriteFormatString, (writer, "%" PRIi64, v)); + end_value (writer); + return 0; +} + +static int +value_binary (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, const char *v) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + start_value (writer, key, "binary", "base64"); + XML_CHECK (xmlTextWriterWriteBase64, (writer, v, 0, len)); + end_value (writer); + return 0; +} + +static int +value_none (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, const char *v) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + start_value (writer, key, "none", "base64"); + if (len > 0) XML_CHECK (xmlTextWriterWriteBase64, (writer, v, 0, len)); + end_value (writer); + return 0; +} + +static int +value_other (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value, + hive_type t, size_t len, const char *key, const char *v) +{ + xmlTextWriterPtr writer = (xmlTextWriterPtr) writer_v; + const char *type; + + switch (t) { + case hive_t_none: + case hive_t_binary: + case hive_t_dword: + case hive_t_dword_be: + case hive_t_qword: + case hive_t_string: + case hive_t_expand_string: + case hive_t_link: + case hive_t_multiple_strings: + abort (); /* internal error - should not happen */ + + case hive_t_resource_list: type = "resource-list"; break; + case hive_t_full_resource_description: type = "resource-description"; break; + case hive_t_resource_requirements_list: type = "resource-requirements"; break; + + default: + type = "unknown"; + } + + start_value (writer, key, type, "base64"); + if (len > 0) XML_CHECK (xmlTextWriterWriteBase64, (writer, v, 0, len)); + end_value (writer); + + return 0; +} diff --git a/hivex/hivexml.pod b/hivex/hivexml.pod new file mode 100644 index 0000000..448c4f6 --- /dev/null +++ b/hivex/hivexml.pod @@ -0,0 +1,64 @@ +=encoding utf8 + +=head1 NAME + +hivexml - Convert Windows Registry binary "hive" into XML + +=head1 SYNOPSIS + + hivexml [-dk] hivefile > output.xml + +=head1 DESCRIPTION + +This program converts a single Windows Registry binary "hive" +file into a self-describing XML format. + +=head1 OPTIONS + +=over 4 + +=item B<-d> + +Enable lots of debug messages. If you find a Registry file +that this program cannot parse, please enable this option and +post the complete output I<and> the Registry file in your +bug report. + +=item B<-k> + +Keep going even if we find errors in the Registry file. This +skips over any parts of the Registry that we cannot read. + +=back + +=head1 SEE ALSO + +L<hivex(3)>, +L<hivexget(1)>, +L<virt-win-reg(1)>, +L<guestfs(3)>, +L<http://libguestfs.org/>, +L<virt-cat(1)>, +L<virt-edit(1)>. + +=head1 AUTHORS + +Richard W.M. Jones (C<rjones at redhat dot com>) + +=head1 COPYRIGHT + +Copyright (C) 2009 Red Hat Inc. + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License along +with this program; if not, write to the Free Software Foundation, Inc., +51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. diff --git a/po/POTFILES.in b/po/POTFILES.in index a70072b..a4a6cd1 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -69,6 +69,9 @@ fish/rc.c fish/reopen.c fish/tilde.c fish/time.c +hivex/hivex.c +hivex/hivexget.c +hivex/hivexml.c inspector/virt-inspector java/com_redhat_et_libguestfs_GuestFS.c ocaml/guestfs_c.c diff --git a/tools/Makefile.am b/tools/Makefile.am index f593d46..f48edae 100644 --- a/tools/Makefile.am +++ b/tools/Makefile.am @@ -15,7 +15,7 @@ # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -tools = cat df edit ls rescue tar +tools = cat df edit ls rescue tar win-reg EXTRA_DIST = \ run-locally \ diff --git a/tools/run-locally b/tools/run-locally index 0bf1c0a..eb55d18 100755 --- a/tools/run-locally +++ b/tools/run-locally @@ -48,9 +48,10 @@ while(-l $path) { # Get the absolute path of the parent directory $path = abs_path(dirname($path).'/..'); +$ENV{PATH} = $path.'/hivex:'.$ENV{PATH}; $ENV{LD_LIBRARY_PATH} = $path.'/src/.libs'; $ENV{LIBGUESTFS_PATH} = $path.'/appliance'; $ENV{PERL5LIB} = $path.'/perl/blib/lib:'.$path.'/perl/blib/arch'; -print (join " ", ("$path/tools/virt-$tool", @ARGV), "\n"); +#print (join " ", ("$path/tools/virt-$tool", @ARGV), "\n"); exec('perl', "$path/tools/virt-$tool", @ARGV); diff --git a/tools/virt-win-reg b/tools/virt-win-reg new file mode 100755 index 0000000..10f4872 --- /dev/null +++ b/tools/virt-win-reg @@ -0,0 +1,300 @@ +#!/usr/bin/perl -w +# virt-win-reg +# Copyright (C) 2009 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +use warnings; +use strict; + +use Sys::Guestfs; +use Sys::Guestfs::Lib qw(open_guest get_partitions resolve_windows_path + inspect_all_partitions inspect_partition + inspect_operating_systems mount_operating_system); +use Pod::Usage; +use Getopt::Long; +use File::Temp qw/tempdir/; +use Locale::TextDomain 'libguestfs'; + +=encoding utf8 + +=head1 NAME + +virt-win-reg - Display Windows Registry entries from a Windows guest + +=head1 SYNOPSIS + + virt-win-reg [--options] domname '\Path\To\Subkey' name ['\Path'...] + + virt-win-reg [--options] domname '\Path\To\Subkey' @ ['\Path'...] + + virt-win-reg [--options] domname '\Path\To\Subkey' ['\Path'...] + + virt-win-reg [--options] disk.img [...] '\Path\To\Subkey' (name|@) + +=head1 DESCRIPTION + +This program can display Windows Registry entries from a Windows +guest. + +The first parameter is the libvirt guest name or the raw disk image of +the Windows guest. + +Then follow one or more sets of path specifiers. The path must begin +with a C<\> (backslash) character, and may be something like +C<'\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion'>. + +The next parameter after that is either a value name, the single +at-character C<@>, or missing. + +If it's a value name, then we print the data associated with that +value. If it's C<@>, then we print the default data associated with +the subkey. If it's missing, then we print all the data associated +with the subkey. + +If this is confusing, look at the L</EXAMPLES> section below. + +Usually you should use single quotes to protect backslashes in the +path from the shell. + +Paths and value names are case-insensitive. + +=head2 SUPPORTED SYSTEMS + +The program currently supports Windows NT-derived guests starting with +Windows XP through to at least Windows 7. + +Registry support is done for C<\HKEY_LOCAL_MACHINE\SAM>, +C<\HKEY_LOCAL_MACHINE\SECURITY>, C<\HKEY_LOCAL_MACHINE\SOFTWARE>, +C<\HKEY_LOCAL_MACHINE\SYSTEM> and C<\HKEY_USERS\.DEFAULT>. + +C<\HKEY_USERS\$SID> and C<\HKEY_CURRENT_USER> are B<not> supported at +this time. + +=head2 NOTES + +This program is only meant for simple access to the registry. If you +want to do complicated things with the registry, we suggest you +download the Registry hive files from the guest using C<libguestfs(3)> +or C<guestfish(1)> and access them locally, eg. using C<hivex(3)>, +C<hivexml(1)> or C<reged(1)>. + +=head1 EXAMPLES + + $ virt-win-reg MyWinGuest \ + '\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion' \ + ProductName + Microsoft Windows Server 2003 + + + + + + +=head1 OPTIONS + +=over 4 + +=cut + +my $help; + +=item B<--help> + +Display brief help. + +=cut + +my $version; + +=item B<--version> + +Display version number and exit. + +=cut + +my $uri; + +=item B<--connect URI> | B<-c URI> + +If using libvirt, connect to the given I<URI>. If omitted, then we +connect to the default libvirt hypervisor. + +If you specify guest block devices directly, then libvirt is not used +at all. + +=back + +=cut + +GetOptions ("help|?" => \$help, + "version" => \$version, + "connect|c=s" => \$uri, + ) or pod2usage (2); +pod2usage (1) if $help; +if ($version) { + my $g = Sys::Guestfs->new (); + my %h = $g->version (); + print "$h{major}.$h{minor}.$h{release}$h{extra}\n"; + exit +} + +# Split the command line at the first path. Paths begin with +# backslash so this is predictable. + +my @lib_args; +my $i; + +for ($i = 0; $i < @ARGV; ++$i) { + if (substr ($ARGV[$i], 0, 1) eq "\\") { + @lib_args = @ARGV[0 .. ($i-1)]; + @ARGV = @ARGV[$i .. $#ARGV]; + last; + } +} + +pod2usage (__"virt-win-reg: no VM name, disk images or Registry path given") if 0 == @lib_args; + +my $g; +if ($uri) { + $g = open_guest (\@lib_args, address => $uri); +} else { + $g = open_guest (\@lib_args); +} + +$g->launch (); + +# List of possible filesystems. +my @partitions = get_partitions ($g); + +# Now query each one to build up a picture of what's in it. +my %fses + inspect_all_partitions ($g, \@partitions, + use_windows_registry => 0); + +my $oses = inspect_operating_systems ($g, \%fses); + +my @roots = keys %$oses; +die __"no root device found in this operating system image" if @roots == 0; +die __"multiboot operating systems are not supported by virt-win-reg" if @roots > 1; +my $root_dev = $roots[0]; + +my $os = $oses->{$root_dev}; +mount_operating_system ($g, $os); + +# Create a working directory to store the downloaded registry files. +my $tmpdir = tempdir (CLEANUP => 1); + +# Now process each request in turn. +my $winfile; +my $localhive; +my $path; + +for ($i = 0; $i < @ARGV; ++$i) { + $_ = $ARGV[$i]; + + if (/^\\HKEY_LOCAL_MACHINE\\SAM(\\.*)/i) { + $winfile = "/windows/system32/config/sam"; + $localhive = "$tmpdir/sam"; + $path = $1; + } + elsif (/^\\HKEY_LOCAL_MACHINE\\SECURITY(\\.*)/i) { + $winfile = "/windows/system32/config/security"; + $localhive = "$tmpdir/security"; + $path = $1; + } + elsif (/^\\HKEY_LOCAL_MACHINE\\SOFTWARE(\\.*)/i) { + $winfile = "/windows/system32/config/software"; + $localhive = "$tmpdir/software"; + $path = $1; + } + elsif (/^\\HKEY_LOCAL_MACHINE\\SYSTEM(\\.*)/i) { + $winfile = "/windows/system32/config/system"; + $localhive = "$tmpdir/system"; + $path = $1; + } + elsif (/^\\HKEY_USERS\\.DEFAULT(\\.*)/i) { + $winfile = "/windows/system32/config/default"; + $localhive = "$tmpdir/default"; + $path = $1; + } + else { + die "virt-win-reg: $_: not a supported Windows Registry path\n" + } + + unless (-f $localhive) { + # Check the hive file exists and get the real name. + eval { + $winfile = $g->case_sensitive_path ($winfile); + $g->download ($winfile, $localhive); + }; + if ($@) { + die "virt-win-reg: $winfile: could not download registry file: $@\n" + } + } + + # What sort of request is it? Peek at the next arg. + my $name; # will be: undefined, @ or a name + if ($i+1 < @ARGV) { + if (substr ($ARGV[$i+1], 0, 1) ne "\\") { + $name = $ARGV[$i+1]; + $i++; + } + } + + my @cmd; + if (defined $name) { + @cmd = ("hivexget", $localhive, $path, $name); + } else { + @cmd = ("hivexget", $localhive, $path); + } + + system (@cmd) == 0 + or die "hivexget command failed: $?\n"; +} + +=head1 SEE ALSO + +L<hivex(3)>, +L<hivexget(1)>, +L<guestfs(3)>, +L<guestfish(1)>, +L<virt-cat(1)>, +L<Sys::Guestfs(3)>, +L<Sys::Guestfs::Lib(3)>, +L<Sys::Virt(3)>, +L<http://libguestfs.org/>. + +=head1 AUTHOR + +Richard W.M. Jones L<http://et.redhat.com/~rjones/> + +=head1 COPYRIGHT + +Copyright (C) 2009 Red Hat Inc. + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -- 1.6.5.rc2
Apparently Analagous Threads
- ANNOUNCE: hivex 1.2.8 - A library for reading and writing Windows Registry hive files
- Re: [PATCH 0/4] v2v: simplify Windows registry patching
- Got Windows guests?
- FW: Emailing: 0002-use-single-registry-change-for-all-supported-windows.patch
- [PATCH 0/7] Prepare for adding write support to hivex (windows registry) library