Laszlo Ersek
2023-May-18 15:42 UTC
[Libguestfs] [libguestfs-common PATCH v2 2/2] options/keys: introduce unescape_device_mapper_lvm()
On 5/18/23 15:42, Richard W.M. Jones wrote:> On Thu, May 18, 2023 at 03:09:42PM +0200, Laszlo Ersek wrote: >> Libguestfs uses the >> >> /dev/VG/LV >> >> format internally, for identifying LVM logical volumes, but the user might >> want to specify the >> >> /dev/mapper/VG-LV ID >> >> format with the "--key ID:..." options. >> >> Introduce unescape_device_mapper_lvm() for turning >> >> /dev/mapper/VG-LV >> >> key IDs into >> >> /dev/VG/LV >> >> ones, unescaping doubled hyphens to single hyphens in both VG and LV in >> the process. >> >> Call unescape_device_mapper_lvm() from key_store_import_key(). That is, >> translate the ID as soon as the "--key" option is processed -- let the >> keystore only know about the usual /dev/VG/LV format, for later matching. >> >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168506 >> Signed-off-by: Laszlo Ersek <lersek at redhat.com> >> --- >> >> Notes: >> v2: >> >> - rewrite without regular expressions [Rich, Laszlo] >> >> - restructure the commit message >> >> v1: >> >> regcomp() must definitely allocate memory dynamically, and we >> (intentionally) never free that -- we never call regfree(). I assume >> valgrind will catch this as a leak, so we might have to extend >> "valgrind-suppressions" in each dependent superproject. However, I'm >> unable to run "make check-valgrind" successfully in e.g. virt-v2v even >> before these patches; see also >> <https://listman.redhat.com/archives/libguestfs/2023-May/031496.html>. >> >> options/keys.c | 100 ++++++++++++++++++++ >> 1 file changed, 100 insertions(+) >> >> diff --git a/options/keys.c b/options/keys.c >> index bc7459749276..52b27369016a 100644 >> --- a/options/keys.c >> +++ b/options/keys.c >> @@ -260,6 +260,105 @@ key_store_add_from_selector (struct key_store *ks, const char *selector) >> return key_store_import_key (ks, &key); >> } >> >> +/* Turn /dev/mapper/VG-LV into /dev/VG/LV, in-place. */ >> +static void >> +unescape_device_mapper_lvm (char *id) >> +{ >> + static const char dev[] = "/dev/", dev_mapper[] = "/dev/mapper/"; >> + const char *input_start; >> + char *output; >> + enum { M_SCAN, M_FILL, M_DONE } mode; >> + >> + if (!STRPREFIX (id, dev_mapper)) >> + return; >> + >> + /* Start parsing "VG-LV" from "id" after "/dev/mapper/". */ >> + input_start = id + (sizeof dev_mapper - 1); >> + >> + /* Start writing the unescaped "VG/LV" output after "/dev/". */ >> + output = id + (sizeof dev - 1); >> + >> + for (mode = M_SCAN; mode < M_DONE; ++mode) { >> + char c; >> + const char *input = input_start; >> + const char *hyphen_buffered = NULL; >> + bool single_hyphen_seen = false; >> + >> + do { >> + c = *input; >> + >> + switch (c) { >> + case '-': >> + if (hyphen_buffered == NULL) >> + /* This hyphen may start an escaped hyphen, or it could be the >> + * separator in VG-LV. >> + */ >> + hyphen_buffered = input; >> + else { >> + /* This hyphen completes an escaped hyphen; unescape it. */ >> + if (mode == M_FILL) >> + *output++ = '-'; >> + hyphen_buffered = NULL; >> + } >> + break; >> + >> + case '/': >> + /* Slash characters are forbidden in VG-LV anywhere. If there's any, >> + * we'll find it in the first (i.e., scanning) phase, before we output >> + * anything back to "id". >> + */ >> + assert (mode == M_SCAN); >> + return; >> + >> + default: >> + /* Encountered a non-slash, non-hyphen character -- which also may be >> + * the terminating NUL. >> + */ >> + if (hyphen_buffered != NULL) { >> + /* The non-hyphen character comes after a buffered hyphen, so the >> + * buffered hyphen is supposed to be the single hyphen that separates >> + * VG from LV in VG-LV. There are three requirements for this >> + * separator: (a) it must be unique (we must not have seen another >> + * such separator earlier), (b) it must not be at the start of VG-LV >> + * (because VG would be empty that way), (c) it must not be at the end >> + * of VG-LV (because LV would be empty that way). Should any of these >> + * be violated, we'll catch that during the first (i.e., scanning) >> + * phase, before modifying "id". >> + */ >> + if (single_hyphen_seen || hyphen_buffered == input_start || >> + c == '\0') { >> + assert (mode == M_SCAN); >> + return; >> + } >> + >> + /* Translate the separator hyphen to a slash character. */ >> + if (mode == M_FILL) >> + *output++ = '/'; >> + hyphen_buffered = NULL; >> + single_hyphen_seen = true; >> + } >> + >> + /* Output the non-hyphen character (including the terminating NUL) >> + * regardless of whether there was a buffered hyphen separator (which, >> + * by now, we'll have attempted to translate and flush). >> + */ >> + if (mode == M_FILL) >> + *output++ = c; >> + } >> + >> + ++input; >> + } while (c != '\0'); >> + >> + /* We must have seen the VG-LV separator. If that's not the case, we'll >> + * catch it before modifying "id". >> + */ >> + if (!single_hyphen_seen) { >> + assert (mode == M_SCAN); >> + return; >> + } >> + } >> +} > > So this code can never return an error?It surely can; the one error mode is exactly the same as it was in v1 (the regex version). Namely, the error mode is that the input pathname in "id" does not match the expected form (i.e., /dev/mapper/VG-LV), and then we don't modify "id" at all, we let "id" be added to the keystore exactly as the user specified it. The error mode is activated whenever we reach any of the explicit return statements in the function: - when the first STRPREFIX call falls - or when any of the other three "return" statements are reached (which can only be reached during the scanning phase, M_SCAN; that is, *before* we write anything at all into "id"). The point of separating M_SCAN from M_FILL is to verify the full format match before we start modifying "id" in-place. The traversal (parsing) through "VG-LV" is exactly the same in both cases, but in the first pass, we just check. If that completes fine, we repeat the same traversal, but then we also modify "id" in-place. The translation only ever contracts VG-LV (it produces at most as many bytes as it consumes), and the write point is always to the left of the read point, so the in-place modification, including the final NUL-termination, is safe.> eg if the input was "A-B-C", > I think it would translate the string to "A/B-C" which is an output > but the input seems like it was wrong.If "id" contains the string "/dev/mapper/A-B-C", then it *mismatches* the expected form "/dev/mapper/VG-LV" just as much as an ID containing "/dev/sda2" would mismatch the expected form. In either of those cases, "id" does not get modified, and gets added to the keystore precisely as the user specified it on the command line. We don't say "this ID is wrong, error!". We don't know what the ID is, we only know what it is *not*. We only say "this is not a /dev/mapper/VG-LV specification, we just don't know what it is supposed to be, so we won't touch it, we'll add it as it is, to the keystore". That's exactly the behavior (= blind addition) that we have right now, i.e., pre-patch. unescape_device_mapper_lvm() either modifies "id" in place (if "id" on input has the expected form), or leaves "id" alone. The logic following the unescape_device_mapper_lvm() *call site* doesn't know which way the function went, but it does not *need* to know.> Or is it that only the VG is > escaped? (I don't imagine there is some specification for /dev/mapper > device names.)"/dev/mapper/A-B-C" simply cannot be interpreted according to the "/dev/mapper/VG-LV" scheme. For such an interpretation to be possible, we'd have to have one of the following strings in "id": - "/dev/mapper/A--B-C" --> "/dev/A-B/C" - "/dev/mapper/A-B--C" --> "/dev/A/B-C" If the user specifies (for example) "/dev/mapper/A-B-C", we reject that, so the ID remains "/dev/mapper/A-B-C", and that is what gets added to the keystore -- exactly as it happens pre-patch. If the user specifies (for example) "/dev/mapper/A--B--C", we also reject that (there is no separator to tell us what is the VG and what is the LV), so the ID remains "/dev/mapper/A--B--C", and that is what gets added to the keystore -- exactly as it happens pre-patch. FWIW, this aspect is identical between version 1 and version 2 of this patch: in v1, if the regex didn't match, then "id" would simply not be rewritten, and would get added to the keystore as specified by the user. The v1->v2 update is an implementation detal, internal to unescape_device_mapper_lvm() -- it only affects how the expected form is recognized (regex versus manual parsing), and how a matching input is translated (the output being constructed from matched regex subexpressions, versus byte-wise buffering and flushing integrated with scanning). Laszlo> > Rich. > >> struct key_store * >> key_store_import_key (struct key_store *ks, struct key_store_key *key) >> { >> @@ -278,6 +377,7 @@ key_store_import_key (struct key_store *ks, struct key_store_key *key) >> error (EXIT_FAILURE, errno, "realloc"); >> >> ks->keys = new_keys; >> + unescape_device_mapper_lvm (key->id); >> ks->keys[ks->nr_keys] = *key; >> ++ks->nr_keys; >> >> _______________________________________________ >> Libguestfs mailing list >> Libguestfs at redhat.com >> https://listman.redhat.com/mailman/listinfo/libguestfs >
Richard W.M. Jones
2023-May-18 15:59 UTC
[Libguestfs] [libguestfs-common PATCH v2 2/2] options/keys: introduce unescape_device_mapper_lvm()
On Thu, May 18, 2023 at 05:42:03PM +0200, Laszlo Ersek wrote:> On 5/18/23 15:42, Richard W.M. Jones wrote: > > On Thu, May 18, 2023 at 03:09:42PM +0200, Laszlo Ersek wrote: > >> Libguestfs uses the > >> > >> /dev/VG/LV > >> > >> format internally, for identifying LVM logical volumes, but the user might > >> want to specify the > >> > >> /dev/mapper/VG-LV ID > >> > >> format with the "--key ID:..." options. > >> > >> Introduce unescape_device_mapper_lvm() for turning > >> > >> /dev/mapper/VG-LV > >> > >> key IDs into > >> > >> /dev/VG/LV > >> > >> ones, unescaping doubled hyphens to single hyphens in both VG and LV in > >> the process. > >> > >> Call unescape_device_mapper_lvm() from key_store_import_key(). That is, > >> translate the ID as soon as the "--key" option is processed -- let the > >> keystore only know about the usual /dev/VG/LV format, for later matching. > >> > >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168506 > >> Signed-off-by: Laszlo Ersek <lersek at redhat.com> > >> --- > >> > >> Notes: > >> v2: > >> > >> - rewrite without regular expressions [Rich, Laszlo] > >> > >> - restructure the commit message > >> > >> v1: > >> > >> regcomp() must definitely allocate memory dynamically, and we > >> (intentionally) never free that -- we never call regfree(). I assume > >> valgrind will catch this as a leak, so we might have to extend > >> "valgrind-suppressions" in each dependent superproject. However, I'm > >> unable to run "make check-valgrind" successfully in e.g. virt-v2v even > >> before these patches; see also > >> <https://listman.redhat.com/archives/libguestfs/2023-May/031496.html>. > >> > >> options/keys.c | 100 ++++++++++++++++++++ > >> 1 file changed, 100 insertions(+) > >> > >> diff --git a/options/keys.c b/options/keys.c > >> index bc7459749276..52b27369016a 100644 > >> --- a/options/keys.c > >> +++ b/options/keys.c > >> @@ -260,6 +260,105 @@ key_store_add_from_selector (struct key_store *ks, const char *selector) > >> return key_store_import_key (ks, &key); > >> } > >> > >> +/* Turn /dev/mapper/VG-LV into /dev/VG/LV, in-place. */ > >> +static void > >> +unescape_device_mapper_lvm (char *id) > >> +{ > >> + static const char dev[] = "/dev/", dev_mapper[] = "/dev/mapper/"; > >> + const char *input_start; > >> + char *output; > >> + enum { M_SCAN, M_FILL, M_DONE } mode; > >> + > >> + if (!STRPREFIX (id, dev_mapper)) > >> + return; > >> + > >> + /* Start parsing "VG-LV" from "id" after "/dev/mapper/". */ > >> + input_start = id + (sizeof dev_mapper - 1); > >> + > >> + /* Start writing the unescaped "VG/LV" output after "/dev/". */ > >> + output = id + (sizeof dev - 1); > >> + > >> + for (mode = M_SCAN; mode < M_DONE; ++mode) { > >> + char c; > >> + const char *input = input_start; > >> + const char *hyphen_buffered = NULL; > >> + bool single_hyphen_seen = false; > >> + > >> + do { > >> + c = *input; > >> + > >> + switch (c) { > >> + case '-': > >> + if (hyphen_buffered == NULL) > >> + /* This hyphen may start an escaped hyphen, or it could be the > >> + * separator in VG-LV. > >> + */ > >> + hyphen_buffered = input; > >> + else { > >> + /* This hyphen completes an escaped hyphen; unescape it. */ > >> + if (mode == M_FILL) > >> + *output++ = '-'; > >> + hyphen_buffered = NULL; > >> + } > >> + break; > >> + > >> + case '/': > >> + /* Slash characters are forbidden in VG-LV anywhere. If there's any, > >> + * we'll find it in the first (i.e., scanning) phase, before we output > >> + * anything back to "id". > >> + */ > >> + assert (mode == M_SCAN); > >> + return; > >> + > >> + default: > >> + /* Encountered a non-slash, non-hyphen character -- which also may be > >> + * the terminating NUL. > >> + */ > >> + if (hyphen_buffered != NULL) { > >> + /* The non-hyphen character comes after a buffered hyphen, so the > >> + * buffered hyphen is supposed to be the single hyphen that separates > >> + * VG from LV in VG-LV. There are three requirements for this > >> + * separator: (a) it must be unique (we must not have seen another > >> + * such separator earlier), (b) it must not be at the start of VG-LV > >> + * (because VG would be empty that way), (c) it must not be at the end > >> + * of VG-LV (because LV would be empty that way). Should any of these > >> + * be violated, we'll catch that during the first (i.e., scanning) > >> + * phase, before modifying "id". > >> + */ > >> + if (single_hyphen_seen || hyphen_buffered == input_start || > >> + c == '\0') { > >> + assert (mode == M_SCAN); > >> + return; > >> + } > >> + > >> + /* Translate the separator hyphen to a slash character. */ > >> + if (mode == M_FILL) > >> + *output++ = '/'; > >> + hyphen_buffered = NULL; > >> + single_hyphen_seen = true; > >> + } > >> + > >> + /* Output the non-hyphen character (including the terminating NUL) > >> + * regardless of whether there was a buffered hyphen separator (which, > >> + * by now, we'll have attempted to translate and flush). > >> + */ > >> + if (mode == M_FILL) > >> + *output++ = c; > >> + } > >> + > >> + ++input; > >> + } while (c != '\0'); > >> + > >> + /* We must have seen the VG-LV separator. If that's not the case, we'll > >> + * catch it before modifying "id". > >> + */ > >> + if (!single_hyphen_seen) { > >> + assert (mode == M_SCAN); > >> + return; > >> + } > >> + } > >> +} > > > > So this code can never return an error? > > It surely can; the one error mode is exactly the same as it was in v1 > (the regex version). Namely, the error mode is that the input pathname > in "id" does not match the expected form (i.e., /dev/mapper/VG-LV), and > then we don't modify "id" at all, we let "id" be added to the keystore > exactly as the user specified it. > > The error mode is activated whenever we reach any of the explicit return > statements in the function: > > - when the first STRPREFIX call falls > > - or when any of the other three "return" statements are reached (which > can only be reached during the scanning phase, M_SCAN; that is, *before* > we write anything at all into "id"). > > The point of separating M_SCAN from M_FILL is to verify the full format > match before we start modifying "id" in-place. The traversal (parsing) > through "VG-LV" is exactly the same in both cases, but in the first > pass, we just check. If that completes fine, we repeat the same > traversal, but then we also modify "id" in-place. > > The translation only ever contracts VG-LV (it produces at most as many > bytes as it consumes), and the write point is always to the left of the > read point, so the in-place modification, including the final > NUL-termination, is safe. > > > eg if the input was "A-B-C", > > I think it would translate the string to "A/B-C" which is an output > > but the input seems like it was wrong. > > If "id" contains the string "/dev/mapper/A-B-C", then it *mismatches* > the expected form "/dev/mapper/VG-LV" just as much as an ID containing > "/dev/sda2" would mismatch the expected form. > > In either of those cases, "id" does not get modified, and gets added to > the keystore precisely as the user specified it on the command line. > > We don't say "this ID is wrong, error!". We don't know what the ID is, > we only know what it is *not*. We only say "this is not a > /dev/mapper/VG-LV specification, we just don't know what it is supposed > to be, so we won't touch it, we'll add it as it is, to the keystore". > > That's exactly the behavior (= blind addition) that we have right now, > i.e., pre-patch.OK got it, in that case: Reviewed-by: Richard W.M. Jones <rjones at redhat.com>> unescape_device_mapper_lvm() either modifies "id" in place (if "id" on > input has the expected form), or leaves "id" alone. > > The logic following the unescape_device_mapper_lvm() *call site* doesn't > know which way the function went, but it does not *need* to know. > > > Or is it that only the VG is > > escaped? (I don't imagine there is some specification for /dev/mapper > > device names.) > > "/dev/mapper/A-B-C" simply cannot be interpreted according to the > "/dev/mapper/VG-LV" scheme. For such an interpretation to be possible, > we'd have to have one of the following strings in "id": > > - "/dev/mapper/A--B-C" --> "/dev/A-B/C" > - "/dev/mapper/A-B--C" --> "/dev/A/B-C" > > If the user specifies (for example) "/dev/mapper/A-B-C", we reject that, > so the ID remains "/dev/mapper/A-B-C", and that is what gets added to > the keystore -- exactly as it happens pre-patch. > > If the user specifies (for example) "/dev/mapper/A--B--C", we also > reject that (there is no separator to tell us what is the VG and what is > the LV), so the ID remains "/dev/mapper/A--B--C", and that is what gets > added to the keystore -- exactly as it happens pre-patch. > > FWIW, this aspect is identical between version 1 and version 2 of this > patch: in v1, if the regex didn't match, then "id" would simply not be > rewritten, and would get added to the keystore as specified by the user. > The v1->v2 update is an implementation detal, internal to > unescape_device_mapper_lvm() -- it only affects how the expected form is > recognized (regex versus manual parsing), and how a matching input is > translated (the output being constructed from matched regex > subexpressions, versus byte-wise buffering and flushing integrated with > scanning).Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top