Richard W.M. Jones
2013-Nov-25 09:06 UTC
Re: [Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
On Sun, Nov 24, 2013 at 11:25:53PM +0100, Hilko Bengen wrote:> --- > lib/write.c | 49 ++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 34 insertions(+), 15 deletions(-) > > diff --git a/lib/write.c b/lib/write.c > index dbb8292..72b1f8a 100644 > --- a/lib/write.c > +++ b/lib/write.c > @@ -608,9 +608,17 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) > return 0; > } > > + size_t recoded_name_len; > + int use_utf16 = 0; > + char* recoded_name = _hivex_encode_string (name, &recoded_name_len, &use_utf16); > + if (recoded_name == NULL) { > + SET_ERRNO (EINVAL, "malformed name"); > + return 0; > + } > + > /* Create the new nk-record. */ > static const char nk_id[2] = { 'n', 'k' }; > - size_t seg_len = sizeof (struct ntreg_nk_record) + strlen (name); > + size_t seg_len = sizeof (struct ntreg_nk_record) + recoded_name_len; > hive_node_h nkoffset = allocate_block (h, seg_len, nk_id); > if (nkoffset == 0) > return 0; > @@ -619,14 +627,18 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) > > struct ntreg_nk_record *nk > (struct ntreg_nk_record *) ((char *) h->addr + nkoffset); > - nk->flags = htole16 (0x0020); /* key is ASCII. */ > + if (use_utf16) > + nk->flags = htole16 (0x0000); > + else > + nk->flags = htole16 (0x0020); > nk->parent = htole32 (parent - 0x1000); > nk->subkey_lf = htole32 (0xffffffff); > nk->subkey_lf_volatile = htole32 (0xffffffff); > nk->vallist = htole32 (0xffffffff); > nk->classname = htole32 (0xffffffff); > - nk->name_len = htole16 (strlen (name)); > - strcpy (nk->name, name); > + nk->name_len = htole16 (recoded_name_len); > + memcpy (nk->name, recoded_name, recoded_name_len); > + free(recoded_name);Please put spaces after function names! It improves readability: http://www1.psych.purdue.edu/~zpizlo/rsteinma/Bob-FOR%20CV/Epelboim%20et%20al%201997%20Fillers%20in%20Reading.pdf> /* Inherit parent sk. */ > struct ntreg_nk_record *parent_nk > @@ -719,9 +731,9 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) > parent_nk->nr_subkeys = htole32 (nr_subkeys_in_parent_nk); > > /* Update max_subkey_name_len in parent nk. */ > - uint16_t max = le16toh (parent_nk->max_subkey_name_len); > - if (max < strlen (name) * 2) /* *2 because "recoded" in UTF16-LE. */ > - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2); > + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;* 2 is probably wrong here for non-BMP characters, but the original code makes the same mistake ... Could we get the true length from the hivex_encode_string function?> + if (le16toh (parent_nk->max_subkey_name_len) < utf16_len) > + parent_nk->max_subkey_name_len = htole16 (utf16_len); > return nkoffset; > } > @@ -942,7 +954,12 @@ hivex_node_set_values (hive_h *h, hive_node_h node, > for (i = 0; i < nr_values; ++i) { > /* Allocate vk record to store this (key, value) pair. */ > static const char vk_id[2] = { 'v', 'k' }; > - seg_len = sizeof (struct ntreg_vk_record) + strlen (values[i].key); > + size_t name_len = strlen (values[i].key); > + size_t recoded_name_len; > + int use_utf16; > + char* recoded_name = _hivex_encode_string (values[i].key, &recoded_name_len, > + &use_utf16); > + seg_len = sizeof (struct ntreg_vk_record) + recoded_name_len; > size_t vk_offs = allocate_block (h, seg_len, vk_id); > if (vk_offs == 0) > return -1; > @@ -957,15 +974,17 @@ hivex_node_set_values (hive_h *h, hive_node_h node, > > struct ntreg_vk_record *vk > (struct ntreg_vk_record *) ((char *) h->addr + vk_offs); > - size_t name_len = strlen (values[i].key); > - vk->name_len = htole16 (name_len); > - strcpy (vk->name, values[i].key); > + vk->name_len = htole16 (recoded_name_len); > + memcpy (vk->name, recoded_name, recoded_name_len); > vk->data_type = htole32 (values[i].t); > uint32_t len = values[i].len; > if (len <= 4) /* store it inline => set MSB flag */ > len |= 0x80000000; > vk->data_len = htole32 (len); > - vk->flags = name_len == 0 ? 0 : 1; > + if (recoded_name_len == 0) > + vk->flags = 0; > + else > + vk->flags = htole16 (!use_utf16); > if (values[i].len <= 4) /* store it inline */ > memcpy (&vk->data_offset, values[i].value, values[i].len); > @@ -985,9 +1004,9 @@ hivex_node_set_values (hive_h *h, hive_node_h node, > vk->data_offset = htole32 (offs - 0x1000); > } > > - if (name_len * 2 > le32toh (nk->max_vk_name_len)) > - /* * 2 for UTF16-LE "reencoding" */ > - nk->max_vk_name_len = htole32 (name_len * 2); > + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;* 2 - see above.> + if (utf16_len > le32toh (nk->max_vk_name_len)) > + nk->max_vk_name_len = htole32 (utf16_len); > if (values[i].len > le32toh (nk->max_vk_data_len)) > nk->max_vk_data_len = htole32 (values[i].len); > } > -- > 1.8.4.42/3 & 3/3 are generally good, so ACK. Any comment on *2 issue above? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
Hilko Bengen
2013-Nov-25 21:52 UTC
Re: [Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
* Richard W.M. Jones:>> - nk->name_len = htole16 (strlen (name)); >> - strcpy (nk->name, name); >> + nk->name_len = htole16 (recoded_name_len); >> + memcpy (nk->name, recoded_name, recoded_name_len); >> + free(recoded_name); > > Please put spaces after function names! It improves readability:Sorry, I'll fix those. I also forgot to add a free() in hivex_node_set_values.>> /* Update max_subkey_name_len in parent nk. */ >> - uint16_t max = le16toh (parent_nk->max_subkey_name_len); >> - if (max < strlen (name) * 2) /* *2 because "recoded" in UTF16-LE. */ >> - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2); >> + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2; > > * 2 is probably wrong here for non-BMP characters, but the original > code makes the same mistake ... Could we get the true length from the > hivex_encode_string function?Are there any non-BMP characters that can be encoded in Latin1 -- or whatever 1-byte encoding one is supposed to use there? Peter Norris' master's thesis[1] suggests that recoded_name_len : recoded_name_len * 2 is probably right. Cheers, -Hilko [1] http://amnesia.gtisc.gatech.edu/~moyix/suzibandit.ltd.uk/MSc/Registry%20Structure%20-%20Main%20V4.pdf, p.79
Hilko Bengen
2013-Nov-25 21:56 UTC
[Libguestfs] [PATCH 3/3, take 2] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
--- lib/write.c | 50 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/lib/write.c b/lib/write.c index dbb8292..8c4dd8e 100644 --- a/lib/write.c +++ b/lib/write.c @@ -608,9 +608,17 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) return 0; } + size_t recoded_name_len; + int use_utf16 = 0; + char* recoded_name = _hivex_encode_string (name, &recoded_name_len, &use_utf16); + if (recoded_name == NULL) { + SET_ERRNO (EINVAL, "malformed name"); + return 0; + } + /* Create the new nk-record. */ static const char nk_id[2] = { 'n', 'k' }; - size_t seg_len = sizeof (struct ntreg_nk_record) + strlen (name); + size_t seg_len = sizeof (struct ntreg_nk_record) + recoded_name_len; hive_node_h nkoffset = allocate_block (h, seg_len, nk_id); if (nkoffset == 0) return 0; @@ -619,14 +627,18 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) struct ntreg_nk_record *nk (struct ntreg_nk_record *) ((char *) h->addr + nkoffset); - nk->flags = htole16 (0x0020); /* key is ASCII. */ + if (use_utf16) + nk->flags = htole16 (0x0000); + else + nk->flags = htole16 (0x0020); nk->parent = htole32 (parent - 0x1000); nk->subkey_lf = htole32 (0xffffffff); nk->subkey_lf_volatile = htole32 (0xffffffff); nk->vallist = htole32 (0xffffffff); nk->classname = htole32 (0xffffffff); - nk->name_len = htole16 (strlen (name)); - strcpy (nk->name, name); + nk->name_len = htole16 (recoded_name_len); + memcpy (nk->name, recoded_name, recoded_name_len); + free (recoded_name); /* Inherit parent sk. */ struct ntreg_nk_record *parent_nk @@ -719,9 +731,9 @@ hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name) parent_nk->nr_subkeys = htole32 (nr_subkeys_in_parent_nk); /* Update max_subkey_name_len in parent nk. */ - uint16_t max = le16toh (parent_nk->max_subkey_name_len); - if (max < strlen (name) * 2) /* *2 because "recoded" in UTF16-LE. */ - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2); + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2; + if (le16toh (parent_nk->max_subkey_name_len) < utf16_len) + parent_nk->max_subkey_name_len = htole16 (utf16_len); return nkoffset; } @@ -942,7 +954,12 @@ hivex_node_set_values (hive_h *h, hive_node_h node, for (i = 0; i < nr_values; ++i) { /* Allocate vk record to store this (key, value) pair. */ static const char vk_id[2] = { 'v', 'k' }; - seg_len = sizeof (struct ntreg_vk_record) + strlen (values[i].key); + size_t name_len = strlen (values[i].key); + size_t recoded_name_len; + int use_utf16; + char* recoded_name = _hivex_encode_string (values[i].key, &recoded_name_len, + &use_utf16); + seg_len = sizeof (struct ntreg_vk_record) + recoded_name_len; size_t vk_offs = allocate_block (h, seg_len, vk_id); if (vk_offs == 0) return -1; @@ -957,15 +974,18 @@ hivex_node_set_values (hive_h *h, hive_node_h node, struct ntreg_vk_record *vk (struct ntreg_vk_record *) ((char *) h->addr + vk_offs); - size_t name_len = strlen (values[i].key); - vk->name_len = htole16 (name_len); - strcpy (vk->name, values[i].key); + vk->name_len = htole16 (recoded_name_len); + memcpy (vk->name, recoded_name, recoded_name_len); + free (recoded_name); vk->data_type = htole32 (values[i].t); uint32_t len = values[i].len; if (len <= 4) /* store it inline => set MSB flag */ len |= 0x80000000; vk->data_len = htole32 (len); - vk->flags = name_len == 0 ? 0 : 1; + if (recoded_name_len == 0) + vk->flags = 0; + else + vk->flags = htole16 (!use_utf16); if (values[i].len <= 4) /* store it inline */ memcpy (&vk->data_offset, values[i].value, values[i].len); @@ -985,9 +1005,9 @@ hivex_node_set_values (hive_h *h, hive_node_h node, vk->data_offset = htole32 (offs - 0x1000); } - if (name_len * 2 > le32toh (nk->max_vk_name_len)) - /* * 2 for UTF16-LE "reencoding" */ - nk->max_vk_name_len = htole32 (name_len * 2); + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2; + if (utf16_len > le32toh (nk->max_vk_name_len)) + nk->max_vk_name_len = htole32 (utf16_len); if (values[i].len > le32toh (nk->max_vk_data_len)) nk->max_vk_data_len = htole32 (values[i].len); } -- 1.8.4.4
Richard W.M. Jones
2013-Nov-25 22:35 UTC
Re: [Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
On Mon, Nov 25, 2013 at 10:52:30PM +0100, Hilko Bengen wrote:> * Richard W.M. Jones: > > >> - nk->name_len = htole16 (strlen (name)); > >> - strcpy (nk->name, name); > >> + nk->name_len = htole16 (recoded_name_len); > >> + memcpy (nk->name, recoded_name, recoded_name_len); > >> + free(recoded_name); > > > > Please put spaces after function names! It improves readability: > > Sorry, I'll fix those. I also forgot to add a free() in > hivex_node_set_values. > > >> /* Update max_subkey_name_len in parent nk. */ > >> - uint16_t max = le16toh (parent_nk->max_subkey_name_len); > >> - if (max < strlen (name) * 2) /* *2 because "recoded" in UTF16-LE. */ > >> - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2); > >> + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2; > > > > * 2 is probably wrong here for non-BMP characters, but the original > > code makes the same mistake ... Could we get the true length from the > > hivex_encode_string function?[Note: I read this email after my reply to version 2 of this patch]> Are there any non-BMP characters that can be encoded in Latin1 -- or > whatever 1-byte encoding one is supposed to use there?OK I guess the original code is correct.> Peter Norris' master's thesis[1] suggests that > > recoded_name_len : recoded_name_len * 2 > > is probably right.However I still think *2 is incorrect, despite what the thesis says. (The thesis is -- how shall I put this -- "unclear" in the way he uses the word "Unicode", never mentioning "UTF" at all). For example, the encoding of U13057 (a rather elegant Egyptian hieroglyph 𓁗, if you can find a font that can render it) is { 0xD80C, 0xDC57 } (4 bytes) in UTF-16. However, I also doubt that Windows works correctly with non-BMP UTF-16. ie. Windows probably means UCS-2.> Cheers, > -Hilko > > [1] http://amnesia.gtisc.gatech.edu/~moyix/suzibandit.ltd.uk/MSc/Registry%20Structure%20-%20Main%20V4.pdf, p.79Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
Possibly Parallel Threads
- Re: [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
- [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
- [PATCH 1/3] lib: Further generalize iconv wrapper function.
- Re: [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
- [PATCH] Add a cache for iconv_t handles to hive_t