Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption
These are the trivial patches modeled after Nir's commit 7ba6ef67,
fixing further places that fail to check *error during callbacks.
nbdcopy still has a bigger bug in that it is NOT properly checking for
*error in the pread/pwrite completion handlers, but fixing that
gracefully (rather than just quitting the process immediately, as we
can do in the examples), requires more invasive patches which I will
be posting separately for review, while the ones posted here are
trivial enough that I'm pushing them now (commits 23bcea4a..1373d423).
I also audited other uses of completion callbacks:
- in tests, closure-lifetimes, errors, server-death, and
shutdown-flags are all safe (either the callback returns 0 and the
code later calls nbd_aio_command_completed, or we are explicitly
testing aspects of completion callbacks)
- in fuse/operations.c, the completion callback returns 0 and calls
nbd_aio_command_completed later
- in examples/strict-structured-reads.c, the completion callback
properly checks *error
I also checked that nbdkit's nbd plugin properly checks *error.
Eric Blake (4):
examples: aio-connect-read.c: Fix error handling
examples: glib-main-loop: don't strand commands when gsource
disappears
examples: glib-main-loop: Fix error handling
copy: Ignore extents if we encountered earlier error
examples/aio-connect-read.c | 7 +++++++
examples/glib-main-loop.c | 14 ++++++++++++--
copy/nbd-ops.c | 4 ++--
3 files changed, 21 insertions(+), 4 deletions(-)
--
2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 1/4] examples: aio-connect-read.c: Fix error handling
This example failed to check the *error parameter to the read
completion callback. Although the buffers in this example start life
zero-initialized due to living in .bss (so there is no leak of heap
contents), later reads could repeat the decoding of a buffer returned
from an earlier read rather than reporting the failure.
Fixed by reporting the error and exiting early.
Here is an example failure fixed by this patch:
$ ./run nbdkit -U - eval get_size='echo 64k' pread='echo EIO
>&2; exit 1' \
--run 'examples/aio-connect-read $unixsocket'
nbdkit: eval[1]: error: /tmp/nbdkit5kfDY2/pread:
failed to read: Input/output error
nbdkit: eval[1]: error: /tmp/nbdkit5kfDY2/pread:
nbdkit: eval[1]: error: write error reply: Broken pipe
$ echo $?
1
which previously dumped 64k of all zeroes before exiting with status 0.
---
examples/aio-connect-read.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/examples/aio-connect-read.c b/examples/aio-connect-read.c
index 799525ad..26ac3f99 100644
--- a/examples/aio-connect-read.c
+++ b/examples/aio-connect-read.c
@@ -17,6 +17,7 @@
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
+#include <errno.h>
#include <assert.h>
#include <libnbd.h>
@@ -35,6 +36,12 @@ hexdump (void *user_data, int *error)
struct data *data = user_data;
FILE *pp;
+ if (*error) {
+ errno = *error;
+ perror ("failed to read");
+ exit (EXIT_FAILURE);
+ }
+
printf ("sector at offset 0x%" PRIx64 ":\n",
data->offset);
pp = popen ("hexdump -C", "w");
--
2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 2/4] examples: glib-main-loop: don't strand commands when gsource disappears
Returning 0 from a callback handler implies that we will later call
nbd_aio_command_completed(), but we weren't doing that, and our
finalize() handler instead asserts that no outstanding commands
remain. Even when we are going to ignore a late completion (due to
the gsource already being torn down), we should still auto-retire the
command.
Fixes: 7e53d48e ("api: Allow completion callbacks to auto-retire",
v0.1.9)
---
examples/glib-main-loop.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/examples/glib-main-loop.c b/examples/glib-main-loop.c
index 9c279d39..2df27878 100644
--- a/examples/glib-main-loop.c
+++ b/examples/glib-main-loop.c
@@ -400,7 +400,7 @@ finished_read (void *vp, int *error)
struct buffer *buffer = vp;
if (gssrc == NULL)
- return 0;
+ return 1; /* Nothing we can do, auto-retire the callback */
DEBUG (gssrc, "finished_read: read completed");
@@ -446,7 +446,7 @@ finished_write (void *vp, int *error)
struct buffer *buffer = vp;
if (gsdest == NULL)
- return 0;
+ return 1; /* Nothing we can do, auto-retire the callback */
DEBUG (gsdest, "finished_write: write completed");
--
2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 3/4] examples: glib-main-loop: Fix error handling
This example failed to check the *error parameter to the read and
write completion callbacks. What's more, the buffers start life
uninitialized, so a failed read can end up leaking heap contents to
the corresponding write.
Fortunately, the example program is hard-coded to two particular
nbdkit invocations, and not usable to corrupt a user's data. However,
as this example is likely to be copied elsewhere, we should at least
perform rudimentary error checking to avoid bad copy-and-paste that
could turn into a more severe problem in code derived from this.
Easiest for the example is to just punt and exit the process on any
detected errors from either the source or destination.
Since the nbdkit invocations are hard-coded, testing that this patch
works involves using a temporary patch, such as:
| diff --git i/examples/glib-main-loop.c w/examples/glib-main-loop.c
| index 1982f941..f60a4a38 100644
| --- i/examples/glib-main-loop.c
| +++ w/examples/glib-main-loop.c
| @@ -232,7 +232,8 @@ static struct NBDSource *gssrc, *gsdest;
| #define SIZE (1024*1024*1024)
|
| static const char *src_args[] = {
| - "nbdkit", "-s", "--exit-with-parent",
"-r", "pattern", "size=1G", NULL
| + "nbdkit", "-s", "--exit-with-parent",
"-r", "--filter=error",
| + "pattern", "size=1G", "error-pread-rate=0.1",
NULL
| };
|
| static const char *dest_args[] = {
Fixes: bc3b4230 ("examples: Include an example of integrating with the glib
main loop", v0.1.9)
---
examples/glib-main-loop.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/examples/glib-main-loop.c b/examples/glib-main-loop.c
index 2df27878..1982f941 100644
--- a/examples/glib-main-loop.c
+++ b/examples/glib-main-loop.c
@@ -402,6 +402,11 @@ finished_read (void *vp, int *error)
if (gssrc == NULL)
return 1; /* Nothing we can do, auto-retire the callback */
+ if (*error) {
+ fprintf (stderr, "finished_read: read failed: %s\n", strerror
(*error));
+ exit (EXIT_FAILURE);
+ }
+
DEBUG (gssrc, "finished_read: read completed");
assert (buffer->state == BUFFER_READING);
@@ -448,6 +453,11 @@ finished_write (void *vp, int *error)
if (gsdest == NULL)
return 1; /* Nothing we can do, auto-retire the callback */
+ if (*error) {
+ fprintf (stderr, "finished_write: write failed: %s\n", strerror
(*error));
+ exit (EXIT_FAILURE);
+ }
+
DEBUG (gsdest, "finished_write: write completed");
assert (buffer->state == BUFFER_WRITING);
--
2.34.1
Eric Blake
2022-Feb-01 19:44 UTC
[Libguestfs] [libnbd PATCH 4/4] copy: Ignore extents if we encountered earlier error
In practice, the nbd extents callback will only be called once per
nbd_aio_block_status, because we have only requested exactly one
metacontext id, and because compliant servers send only one reply
chunk per id. In theory, a server could reply first with an error
chunk (perhaps meaning "I'm unable to get status beyond offset X")
followed by an extent chunk ("but here's what I was able to get")
while still being compliant. And a non-compliant server can send
unexpected chunks, just to try and confuse us. Right now, it would
take a custom NBD server to actually trigger any scenario where *error
would ever be non-zero on entry to next_extent() (nbdkit does not have
that much power). So while *error ever being non-zero is unlikely,
our easiest course is to handle it the same way we would for a server
sending us an unexpected id: ignore the rest of the extent callback.
The later completion callback will handle the overall command failing.
---
copy/nbd-ops.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
index dfc62f20..ab30badd 100644
--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace.
- * Copyright (C) 2020 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -337,7 +337,7 @@ add_extent (void *vp, const char *metacontext,
extent_list *ret = vp;
size_t i;
- if (strcmp (metacontext, "base:allocation") != 0)
+ if (strcmp (metacontext, "base:allocation") != 0 || *error)
return 0;
for (i = 0; i < nr_entries; i += 2) {
--
2.34.1
Laszlo Ersek
2022-Feb-02 08:23 UTC
[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption
On 02/01/22 20:44, Eric Blake wrote:> These are the trivial patches modeled after Nir's commit 7ba6ef67, > fixing further places that fail to check *error during callbacks.I find these interfaces very difficult to use. https://libguestfs.org/nbd_aio_block_status.3.html We have two callbacks here, each takes a *error, and I have zero idea of the order the callbacks are supposed to be called in. The last paragraph of the commit message speaks about this, and the code updates seem to suggest some ordering, but I'm still unsure. ... In fact, I'm now uncertain whether ignoring "err" in virt-v2v's "lib/utils.ml", function "get_disk_allocated", is safe. It's a "sync" (not aio) nbd_block_status() call, but the documentation <https://libguestfs.org/nbd_block_status.3.html> says: "On entry to the callback, the error parameter contains the errno value of any previously detected error." So what should we do then, in "get_disk_allocated"? Return -1 immediately? And if we do so, what is that going to mean for "NBD.block_status"? Will it raise an exception? Thanks, Laszlo> nbdcopy still has a bigger bug in that it is NOT properly checking for > *error in the pread/pwrite completion handlers, but fixing that > gracefully (rather than just quitting the process immediately, as we > can do in the examples), requires more invasive patches which I will > be posting separately for review, while the ones posted here are > trivial enough that I'm pushing them now (commits 23bcea4a..1373d423). > > I also audited other uses of completion callbacks: > - in tests, closure-lifetimes, errors, server-death, and > shutdown-flags are all safe (either the callback returns 0 and the > code later calls nbd_aio_command_completed, or we are explicitly > testing aspects of completion callbacks) > - in fuse/operations.c, the completion callback returns 0 and calls > nbd_aio_command_completed later > - in examples/strict-structured-reads.c, the completion callback > properly checks *error > > I also checked that nbdkit's nbd plugin properly checks *error. > > Eric Blake (4): > examples: aio-connect-read.c: Fix error handling > examples: glib-main-loop: don't strand commands when gsource > disappears > examples: glib-main-loop: Fix error handling > copy: Ignore extents if we encountered earlier error > > examples/aio-connect-read.c | 7 +++++++ > examples/glib-main-loop.c | 14 ++++++++++++-- > copy/nbd-ops.c | 4 ++-- > 3 files changed, 21 insertions(+), 4 deletions(-) >