On Tue, Jan 31, 2012 at 12:21:44PM -0500, Todd Mummert
wrote:> I've made some changes to hivexml.c, that I think addresses the
> invalid XML that is being generated (as of version 1.3.3).    I'm only
> addressing invalid XML characters in string values and string-lists.
> As I saw mentioned in earlier forum messages
> (https://www.redhat.com/archives/libguestfs/2011-September/msg00082.html),
> there could be invalid XML character issues in keys as well, but that
> is not addressed in this patch.
> 
> Secondly, I slightly modified the string element in that I used a
> value attribute to more closely align with the earlier change that
> Alex did to singular string values.
> 
> I've addressed the three of you because you all commented on the
> earlier patch, and I would like to get your feedback on mine.    If
> you'd like me to post to the libguestfs mailing list, I could do so...
> let me know if there are patch standards that I should use.
> 
> The approach I took was toI use XMLisCharQ() to check each UTF-8
> string value in value_string and value_mulitple_string.
> 
> 
> 
> The issue I encountered was that the following UTF-8 string was
> occurring in a clean Win7 install --
> 
> e689a4 02 eb8288c 7937 46 5
> 
> The 02 (^B) is an invalid XML character in the 1.0 spec, and was
> causing xmllint to fail in the following manner
> 
> [hivex-1.3.3-orig]# xmllint xmlout.orig
> 
> xmlout.orig:2: parser error : PCDATA invalid Char value 2
> 18T02:23:42Z</mtime><value type="string-list"
key="Autorecover MOFs"><string>??
>                                                                            
^
> 
> 
> After the patch,  ...with all the surrounding stuff cut away
> [hivex-1.3.3-new]# xmllint xmlout.new | tidy -raw -xml -i
> 
> <?xml version="1.0" encoding="utf-8"?>
> <hive>
>   <mtime>2011-02-18T02:23:42Z</mtime>
>   <value type="string-list" key="Autorecover
MOFs">
>     <string encoding="base64"
value="5omkAuuCiMeTdGU=" />
>   </value>
> </hive>
> 
> 
> And the patch itself...
> 
> [hivex]# cat hivexml.patch
> diff -uprN hivex-1.3.3-orig/xml/hivexml.c hivex-1.3.3-new/xml/hivexml.c
> --- hivex-1.3.3-orig/xml/hivexml.c      2011-09-22 09:17:09.000000000 -0400
> +++ hivex-1.3.3-new/xml/hivexml.c       2012-01-30 23:39:14.995639422 -0500
> @@ -33,6 +33,7 @@
>  #endif
> 
>  #include <libxml/xmlwriter.h>
> +#include <libxml/chvalid.h>
> 
>  #include "hivex.h"
> 
> @@ -209,6 +210,22 @@ filetime_to_8601 (int64_t windows_ticks)
>  }
> 
>  static int
> +isValidXMLString(const char *string)
Can we call this is_valid_xml_string?  I hate camel-casing.
> +{
> +  int c;
> +  int len = strlen(string);
> +  int pos = 0;
> +  int charlen = len;
> +  while ((c = xmlGetUTF8Char(string+pos, &charlen)) >= 0) {
> +    if (xmlIsCharQ(c) == 0)
> +      return 0;
> +    pos += charlen;
> +    charlen = len - pos;
> +  }
> +  return 1;
> +}
> +
> +static int
>  node_start (hive_h *h, void *writer_v, hive_node_h node, const char *name)
>  {
>    int64_t last_modified;
> @@ -265,6 +282,20 @@ end_value (xmlTextWriterPtr writer)
>    XML_CHECK (xmlTextWriterEndElement, (writer));
>  }
> 
> +static void
> +start_string(xmlTextWriterPtr writer, const char *encoding)
> +{
> +  XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST
"string"));
> +  if (encoding)
> +    XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST
> "encoding", BAD_CAST encoding));
> +}
> +
> +static void
> +end_string(xmlTextWriterPtr writer)
> +{
> +  XML_CHECK (xmlTextWriterEndElement, (writer));
> +}
> +
>  static int
>  value_string (hive_h *h, void *writer_v, hive_node_h node, hive_value_h
value,
>                hive_type t, size_t len, const char *key, const char *str)
> @@ -292,9 +323,14 @@ value_string (hive_h *h, void *writer_v,
>      type = "unknown";
>    }
> 
> -  start_value (writer, key, type, NULL);
> +  int validXML = isValidXMLString(str);
> +  start_value (writer, key, type, validXML ? NULL : "base64");
>    XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST
"value"));
> -  XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
> +  if (validXML)
> +    XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
> +  else
> +    XML_CHECK (xmlTextWriterWriteBase64, (writer, str, 0, strlen(str)));
> +
>    XML_CHECK (xmlTextWriterEndAttribute, (writer));
>    end_value (writer);
>    return 0;
> @@ -310,9 +346,14 @@ value_multiple_strings (hive_h *h, void
> 
>    size_t i;
>    for (i = 0; argv[i] != NULL; ++i) {
> -    XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST
"string"));
> -    XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
> -    XML_CHECK (xmlTextWriterEndElement, (writer));
> +    int validXML = isValidXMLString(argv[i]);
> +    start_string(writer, validXML ? NULL : "base64");
> +    XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST
"value"));
> +    if (validXML)
> +      XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
> +    else
> +      XML_CHECK (xmlTextWriterWriteBase64, (writer, argv[i], 0,
> strlen(argv[i])));
> +    end_string(writer);
>    }
> 
>    end_value (writer);
> [hivex]#
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw