Laszlo Ersek
2021-Sep-09 23:06 UTC
[Libguestfs] [hivex PATCH] lib: write: improve key collation compatibility with Windows
There are multiple problems with using strcasecmp() for ordering registry keys: (1) strcasecmp() is influenced by LC_CTYPE. (2) strcasecmp() cannot implement case conversion for multibyte UTF-8 sequences. (3) Even with LC_CTYPE=POSIX and key names consisting solely of ASCII characters, strcasecmp() converts characters to lowercase, for comparison. But on Windows, the CompareStringOrdinal() function converts characters to uppercase. This makes a difference when comparing a letter to one of the characters that fall between 'Z' (0x5A) and 'a' (0x61), namely {'[', '\\', ']', '^', '_', '`'}. For example, 'c' (0x63) > '_' (0x5F) 'C' (0x43) < '_' (0x5F) Compare key names byte for byte, eliminating problems (1) and (3). Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1648520 Signed-off-by: Laszlo Ersek <lersek at redhat.com> --- lib/write.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/lib/write.c b/lib/write.c index 70105c9d9907..d9a13a3c18b6 100644 --- a/lib/write.c +++ b/lib/write.c @@ -462,7 +462,37 @@ compare_name_with_nk_name (hive_h *h, const char *name, hive_node_h nk_offs) return 0; } - int r = strcasecmp (name, nname); + /* Perform a limited case-insensitive comparison. ASCII letters will be + * *upper-cased*. Multibyte sequences will produce nonsensical orderings. + */ + int r = 0; + const char *s1 = name; + const char *s2 = nname; + + for (;;) { + unsigned char c1 = *(s1++); + unsigned char c2 = *(s2++); + + if (c1 >= 'a' && c1 <= 'z') + c1 = 'A' + (c1 - 'a'); + if (c2 >= 'a' && c2 <= 'z') + c2 = 'A' + (c2 - 'a'); + if (c1 < c2) { + /* Also covers the case when "name" is a prefix of "nname". */ + r = -1; + break; + } + if (c1 > c2) { + /* Also covers the case when "nname" is a prefix of "name". */ + r = 1; + break; + } + if (c1 == '\0') { + /* Both strings end. */ + break; + } + } + free (nname); return r; -- 2.19.1.3.g30247aa5d201
Richard W.M. Jones
2021-Sep-10 07:48 UTC
[Libguestfs] [hivex PATCH] lib: write: improve key collation compatibility with Windows
On Fri, Sep 10, 2021 at 01:06:17AM +0200, Laszlo Ersek wrote:> There are multiple problems with using strcasecmp() for ordering registry > keys: > > (1) strcasecmp() is influenced by LC_CTYPE. > > (2) strcasecmp() cannot implement case conversion for multibyte UTF-8 > sequences. > > (3) Even with LC_CTYPE=POSIX and key names consisting solely of ASCII > characters, strcasecmp() converts characters to lowercase, for > comparison. But on Windows, the CompareStringOrdinal() function > converts characters to uppercase. This makes a difference when > comparing a letter to one of the characters that fall between 'Z' > (0x5A) and 'a' (0x61), namely {'[', '\\', ']', '^', '_', '`'}. For > example, > > 'c' (0x63) > '_' (0x5F) > 'C' (0x43) < '_' (0x5F) > > Compare key names byte for byte, eliminating problems (1) and (3). > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1648520 > Signed-off-by: Laszlo Ersek <lersek at redhat.com> > --- > lib/write.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/lib/write.c b/lib/write.c > index 70105c9d9907..d9a13a3c18b6 100644 > --- a/lib/write.c > +++ b/lib/write.c > @@ -462,7 +462,37 @@ compare_name_with_nk_name (hive_h *h, const char *name, hive_node_h nk_offs) > return 0; > } > > - int r = strcasecmp (name, nname); > + /* Perform a limited case-insensitive comparison. ASCII letters will be > + * *upper-cased*. Multibyte sequences will produce nonsensical orderings. > + */ > + int r = 0; > + const char *s1 = name; > + const char *s2 = nname; > + > + for (;;) { > + unsigned char c1 = *(s1++); > + unsigned char c2 = *(s2++); > + > + if (c1 >= 'a' && c1 <= 'z') > + c1 = 'A' + (c1 - 'a'); > + if (c2 >= 'a' && c2 <= 'z') > + c2 = 'A' + (c2 - 'a'); > + if (c1 < c2) { > + /* Also covers the case when "name" is a prefix of "nname". */ > + r = -1; > + break; > + } > + if (c1 > c2) { > + /* Also covers the case when "nname" is a prefix of "name". */ > + r = 1; > + break; > + } > + if (c1 == '\0') { > + /* Both strings end. */ > + break; > + } > + } > + > free (nname); > > return r;Thanks for the detailed analysis on the BZ. ACK - since it's an incremental improvement over what we have now and fixes the bug. There may be registries with multibyte keys (nothing surprises me about the Windows registry), but as this only affects the ability to insert new keys into a node that has such keys, the problem that we don't handle those is limited in practice. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html