Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption
These are the trivial patches modeled after Nir's commit 7ba6ef67, fixing further places that fail to check *error during callbacks. nbdcopy still has a bigger bug in that it is NOT properly checking for *error in the pread/pwrite completion handlers, but fixing that gracefully (rather than just quitting the process immediately, as we can do in the examples), requires more invasive patches which I will be posting separately for review, while the ones posted here are trivial enough that I'm pushing them now (commits 23bcea4a..1373d423). I also audited other uses of completion callbacks: - in tests, closure-lifetimes, errors, server-death, and shutdown-flags are all safe (either the callback returns 0 and the code later calls nbd_aio_command_completed, or we are explicitly testing aspects of completion callbacks) - in fuse/operations.c, the completion callback returns 0 and calls nbd_aio_command_completed later - in examples/strict-structured-reads.c, the completion callback properly checks *error I also checked that nbdkit's nbd plugin properly checks *error. Eric Blake (4): examples: aio-connect-read.c: Fix error handling examples: glib-main-loop: don't strand commands when gsource disappears examples: glib-main-loop: Fix error handling copy: Ignore extents if we encountered earlier error examples/aio-connect-read.c | 7 +++++++ examples/glib-main-loop.c | 14 ++++++++++++-- copy/nbd-ops.c | 4 ++-- 3 files changed, 21 insertions(+), 4 deletions(-) -- 2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 1/4] examples: aio-connect-read.c: Fix error handling
This example failed to check the *error parameter to the read completion callback. Although the buffers in this example start life zero-initialized due to living in .bss (so there is no leak of heap contents), later reads could repeat the decoding of a buffer returned from an earlier read rather than reporting the failure. Fixed by reporting the error and exiting early. Here is an example failure fixed by this patch: $ ./run nbdkit -U - eval get_size='echo 64k' pread='echo EIO >&2; exit 1' \ --run 'examples/aio-connect-read $unixsocket' nbdkit: eval[1]: error: /tmp/nbdkit5kfDY2/pread: failed to read: Input/output error nbdkit: eval[1]: error: /tmp/nbdkit5kfDY2/pread: nbdkit: eval[1]: error: write error reply: Broken pipe $ echo $? 1 which previously dumped 64k of all zeroes before exiting with status 0. --- examples/aio-connect-read.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/examples/aio-connect-read.c b/examples/aio-connect-read.c index 799525ad..26ac3f99 100644 --- a/examples/aio-connect-read.c +++ b/examples/aio-connect-read.c @@ -17,6 +17,7 @@ #include <stdlib.h> #include <stdint.h> #include <inttypes.h> +#include <errno.h> #include <assert.h> #include <libnbd.h> @@ -35,6 +36,12 @@ hexdump (void *user_data, int *error) struct data *data = user_data; FILE *pp; + if (*error) { + errno = *error; + perror ("failed to read"); + exit (EXIT_FAILURE); + } + printf ("sector at offset 0x%" PRIx64 ":\n", data->offset); pp = popen ("hexdump -C", "w"); -- 2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 2/4] examples: glib-main-loop: don't strand commands when gsource disappears
Returning 0 from a callback handler implies that we will later call nbd_aio_command_completed(), but we weren't doing that, and our finalize() handler instead asserts that no outstanding commands remain. Even when we are going to ignore a late completion (due to the gsource already being torn down), we should still auto-retire the command. Fixes: 7e53d48e ("api: Allow completion callbacks to auto-retire", v0.1.9) --- examples/glib-main-loop.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/glib-main-loop.c b/examples/glib-main-loop.c index 9c279d39..2df27878 100644 --- a/examples/glib-main-loop.c +++ b/examples/glib-main-loop.c @@ -400,7 +400,7 @@ finished_read (void *vp, int *error) struct buffer *buffer = vp; if (gssrc == NULL) - return 0; + return 1; /* Nothing we can do, auto-retire the callback */ DEBUG (gssrc, "finished_read: read completed"); @@ -446,7 +446,7 @@ finished_write (void *vp, int *error) struct buffer *buffer = vp; if (gsdest == NULL) - return 0; + return 1; /* Nothing we can do, auto-retire the callback */ DEBUG (gsdest, "finished_write: write completed"); -- 2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 3/4] examples: glib-main-loop: Fix error handling
This example failed to check the *error parameter to the read and write completion callbacks. What's more, the buffers start life uninitialized, so a failed read can end up leaking heap contents to the corresponding write. Fortunately, the example program is hard-coded to two particular nbdkit invocations, and not usable to corrupt a user's data. However, as this example is likely to be copied elsewhere, we should at least perform rudimentary error checking to avoid bad copy-and-paste that could turn into a more severe problem in code derived from this. Easiest for the example is to just punt and exit the process on any detected errors from either the source or destination. Since the nbdkit invocations are hard-coded, testing that this patch works involves using a temporary patch, such as: | diff --git i/examples/glib-main-loop.c w/examples/glib-main-loop.c | index 1982f941..f60a4a38 100644 | --- i/examples/glib-main-loop.c | +++ w/examples/glib-main-loop.c | @@ -232,7 +232,8 @@ static struct NBDSource *gssrc, *gsdest; | #define SIZE (1024*1024*1024) | | static const char *src_args[] = { | - "nbdkit", "-s", "--exit-with-parent", "-r", "pattern", "size=1G", NULL | + "nbdkit", "-s", "--exit-with-parent", "-r", "--filter=error", | + "pattern", "size=1G", "error-pread-rate=0.1", NULL | }; | | static const char *dest_args[] = { Fixes: bc3b4230 ("examples: Include an example of integrating with the glib main loop", v0.1.9) --- examples/glib-main-loop.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/examples/glib-main-loop.c b/examples/glib-main-loop.c index 2df27878..1982f941 100644 --- a/examples/glib-main-loop.c +++ b/examples/glib-main-loop.c @@ -402,6 +402,11 @@ finished_read (void *vp, int *error) if (gssrc == NULL) return 1; /* Nothing we can do, auto-retire the callback */ + if (*error) { + fprintf (stderr, "finished_read: read failed: %s\n", strerror (*error)); + exit (EXIT_FAILURE); + } + DEBUG (gssrc, "finished_read: read completed"); assert (buffer->state == BUFFER_READING); @@ -448,6 +453,11 @@ finished_write (void *vp, int *error) if (gsdest == NULL) return 1; /* Nothing we can do, auto-retire the callback */ + if (*error) { + fprintf (stderr, "finished_write: write failed: %s\n", strerror (*error)); + exit (EXIT_FAILURE); + } + DEBUG (gsdest, "finished_write: write completed"); assert (buffer->state == BUFFER_WRITING); -- 2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 4/4] copy: Ignore extents if we encountered earlier error
In practice, the nbd extents callback will only be called once per nbd_aio_block_status, because we have only requested exactly one metacontext id, and because compliant servers send only one reply chunk per id. In theory, a server could reply first with an error chunk (perhaps meaning "I'm unable to get status beyond offset X") followed by an extent chunk ("but here's what I was able to get") while still being compliant. And a non-compliant server can send unexpected chunks, just to try and confuse us. Right now, it would take a custom NBD server to actually trigger any scenario where *error would ever be non-zero on entry to next_extent() (nbdkit does not have that much power). So while *error ever being non-zero is unlikely, our easiest course is to handle it the same way we would for a server sending us an unexpected id: ignore the rest of the extent callback. The later completion callback will handle the overall command failing. --- copy/nbd-ops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c index dfc62f20..ab30badd 100644 --- a/copy/nbd-ops.c +++ b/copy/nbd-ops.c @@ -1,5 +1,5 @@ /* NBD client library in userspace. - * Copyright (C) 2020 Red Hat Inc. + * Copyright (C) 2020-2022 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -337,7 +337,7 @@ add_extent (void *vp, const char *metacontext, extent_list *ret = vp; size_t i; - if (strcmp (metacontext, "base:allocation") != 0) + if (strcmp (metacontext, "base:allocation") != 0 || *error) return 0; for (i = 0; i < nr_entries; i += 2) { -- 2.34.1
Laszlo Ersek
2022-Feb-02 08:23 UTC
[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption
On 02/01/22 20:44, Eric Blake wrote:> These are the trivial patches modeled after Nir's commit 7ba6ef67, > fixing further places that fail to check *error during callbacks.I find these interfaces very difficult to use. https://libguestfs.org/nbd_aio_block_status.3.html We have two callbacks here, each takes a *error, and I have zero idea of the order the callbacks are supposed to be called in. The last paragraph of the commit message speaks about this, and the code updates seem to suggest some ordering, but I'm still unsure. ... In fact, I'm now uncertain whether ignoring "err" in virt-v2v's "lib/utils.ml", function "get_disk_allocated", is safe. It's a "sync" (not aio) nbd_block_status() call, but the documentation <https://libguestfs.org/nbd_block_status.3.html> says: "On entry to the callback, the error parameter contains the errno value of any previously detected error." So what should we do then, in "get_disk_allocated"? Return -1 immediately? And if we do so, what is that going to mean for "NBD.block_status"? Will it raise an exception? Thanks, Laszlo> nbdcopy still has a bigger bug in that it is NOT properly checking for > *error in the pread/pwrite completion handlers, but fixing that > gracefully (rather than just quitting the process immediately, as we > can do in the examples), requires more invasive patches which I will > be posting separately for review, while the ones posted here are > trivial enough that I'm pushing them now (commits 23bcea4a..1373d423). > > I also audited other uses of completion callbacks: > - in tests, closure-lifetimes, errors, server-death, and > shutdown-flags are all safe (either the callback returns 0 and the > code later calls nbd_aio_command_completed, or we are explicitly > testing aspects of completion callbacks) > - in fuse/operations.c, the completion callback returns 0 and calls > nbd_aio_command_completed later > - in examples/strict-structured-reads.c, the completion callback > properly checks *error > > I also checked that nbdkit's nbd plugin properly checks *error. > > Eric Blake (4): > examples: aio-connect-read.c: Fix error handling > examples: glib-main-loop: don't strand commands when gsource > disappears > examples: glib-main-loop: Fix error handling > copy: Ignore extents if we encountered earlier error > > examples/aio-connect-read.c | 7 +++++++ > examples/glib-main-loop.c | 14 ++++++++++++-- > copy/nbd-ops.c | 4 ++-- > 3 files changed, 21 insertions(+), 4 deletions(-) >