On Tue, Jan 31, 2012 at 12:21:44PM -0500, Todd Mummert
wrote:> I've made some changes to hivexml.c, that I think addresses the
> invalid XML that is being generated (as of version 1.3.3). I'm only
> addressing invalid XML characters in string values and string-lists.
> As I saw mentioned in earlier forum messages
> (https://www.redhat.com/archives/libguestfs/2011-September/msg00082.html),
> there could be invalid XML character issues in keys as well, but that
> is not addressed in this patch.
>
> Secondly, I slightly modified the string element in that I used a
> value attribute to more closely align with the earlier change that
> Alex did to singular string values.
>
> I've addressed the three of you because you all commented on the
> earlier patch, and I would like to get your feedback on mine. If
> you'd like me to post to the libguestfs mailing list, I could do so...
> let me know if there are patch standards that I should use.
>
> The approach I took was toI use XMLisCharQ() to check each UTF-8
> string value in value_string and value_mulitple_string.
>
>
>
> The issue I encountered was that the following UTF-8 string was
> occurring in a clean Win7 install --
>
> e689a4 02 eb8288c 7937 46 5
>
> The 02 (^B) is an invalid XML character in the 1.0 spec, and was
> causing xmllint to fail in the following manner
>
> [hivex-1.3.3-orig]# xmllint xmlout.orig
>
> xmlout.orig:2: parser error : PCDATA invalid Char value 2
> 18T02:23:42Z</mtime><value type="string-list"
key="Autorecover MOFs"><string>??
>
^
>
>
> After the patch, ...with all the surrounding stuff cut away
> [hivex-1.3.3-new]# xmllint xmlout.new | tidy -raw -xml -i
>
> <?xml version="1.0" encoding="utf-8"?>
> <hive>
> <mtime>2011-02-18T02:23:42Z</mtime>
> <value type="string-list" key="Autorecover
MOFs">
> <string encoding="base64"
value="5omkAuuCiMeTdGU=" />
> </value>
> </hive>
>
>
> And the patch itself...
>
> [hivex]# cat hivexml.patch
> diff -uprN hivex-1.3.3-orig/xml/hivexml.c hivex-1.3.3-new/xml/hivexml.c
> --- hivex-1.3.3-orig/xml/hivexml.c 2011-09-22 09:17:09.000000000 -0400
> +++ hivex-1.3.3-new/xml/hivexml.c 2012-01-30 23:39:14.995639422 -0500
> @@ -33,6 +33,7 @@
> #endif
>
> #include <libxml/xmlwriter.h>
> +#include <libxml/chvalid.h>
>
> #include "hivex.h"
>
> @@ -209,6 +210,22 @@ filetime_to_8601 (int64_t windows_ticks)
> }
>
> static int
> +isValidXMLString(const char *string)
Can we call this is_valid_xml_string? I hate camel-casing.
> +{
> + int c;
> + int len = strlen(string);
> + int pos = 0;
> + int charlen = len;
> + while ((c = xmlGetUTF8Char(string+pos, &charlen)) >= 0) {
> + if (xmlIsCharQ(c) == 0)
> + return 0;
> + pos += charlen;
> + charlen = len - pos;
> + }
> + return 1;
> +}
> +
> +static int
> node_start (hive_h *h, void *writer_v, hive_node_h node, const char *name)
> {
> int64_t last_modified;
> @@ -265,6 +282,20 @@ end_value (xmlTextWriterPtr writer)
> XML_CHECK (xmlTextWriterEndElement, (writer));
> }
>
> +static void
> +start_string(xmlTextWriterPtr writer, const char *encoding)
> +{
> + XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST
"string"));
> + if (encoding)
> + XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST
> "encoding", BAD_CAST encoding));
> +}
> +
> +static void
> +end_string(xmlTextWriterPtr writer)
> +{
> + XML_CHECK (xmlTextWriterEndElement, (writer));
> +}
> +
> static int
> value_string (hive_h *h, void *writer_v, hive_node_h node, hive_value_h
value,
> hive_type t, size_t len, const char *key, const char *str)
> @@ -292,9 +323,14 @@ value_string (hive_h *h, void *writer_v,
> type = "unknown";
> }
>
> - start_value (writer, key, type, NULL);
> + int validXML = isValidXMLString(str);
> + start_value (writer, key, type, validXML ? NULL : "base64");
> XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST
"value"));
> - XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
> + if (validXML)
> + XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
> + else
> + XML_CHECK (xmlTextWriterWriteBase64, (writer, str, 0, strlen(str)));
> +
> XML_CHECK (xmlTextWriterEndAttribute, (writer));
> end_value (writer);
> return 0;
> @@ -310,9 +346,14 @@ value_multiple_strings (hive_h *h, void
>
> size_t i;
> for (i = 0; argv[i] != NULL; ++i) {
> - XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST
"string"));
> - XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
> - XML_CHECK (xmlTextWriterEndElement, (writer));
> + int validXML = isValidXMLString(argv[i]);
> + start_string(writer, validXML ? NULL : "base64");
> + XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST
"value"));
> + if (validXML)
> + XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
> + else
> + XML_CHECK (xmlTextWriterWriteBase64, (writer, argv[i], 0,
> strlen(argv[i])));
> + end_string(writer);
> }
>
> end_value (writer);
> [hivex]#
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw