Eric Blake
2022-Feb-03 20:25 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
nbdcopy has a nasty bug when performing multi-threaded copies using asynchronous nbd calls - it was blindly treating the completion of an asynchronous command as successful, rather than checking the *error parameter. This can result in the silent creation of a corrupted image in two different ways: when a read fails, we blindly wrote garbage to the destination; when a write fails, we did not flag that the destination was not written. Since nbdcopy already calls exit() on a synchronous read or write failure to a file, doing the same for an asynchronous op to an NBD server is the simplest solution. A nicer solution, but more invasive to code and thus not done here, might be to allow up to N retries of the transaction (in case the read or write failure was transient), or even having a mode where as much data is copied as possible (portions of the copy that failed would be logged on stderr, and nbdcopy would still fail with a non-zero exit status, but this would copy more than just stopping at the first error, as can be done with rsync or ddrescue). Note that since we rely on auto-retiring and do NOT call nbd_aio_command_completed, our completion callbacks must always return 1 (if they do not exit() first), even when acting on *error, so as not leave the command allocated until nbd_close. As such, there is no sane way to return an error to a manual caller of the callback, and therefore we can drop dead code that calls perror() and exit() if the callback "failed". It is also worth documenting the contract on when we must manually call the callback during the asynch_zero callback, so that we do not leak or double-free the command; thankfully, all the existing code paths were correct. The added testsuite script demonstrates several scenarios, some of which fail without the rest of this patch in place, and others which showcase ways in which sparse images can bypass errors. Once backports are complete, a followup patch on the main branch will edit docs/libnbd-security.pod with the mailing list announcement of the stable branch commit ids and release versions that incorporate this fix. Reported-by: Nir Soffer <nsoffer at redhat.com> Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6) Fixes: https://bugzilla.redhat.com/2046194 --- TODO | 1 + copy/Makefile.am | 4 +- copy/copy-nbd-error.sh | 78 +++++++++++++++++++++++++++++++++++++ copy/file-ops.c | 17 +++----- copy/multi-thread-copying.c | 15 +++++++ copy/nbdcopy.h | 7 ++-- copy/null-ops.c | 10 +---- 7 files changed, 108 insertions(+), 24 deletions(-) create mode 100755 copy/copy-nbd-error.sh diff --git a/TODO b/TODO index da157942..7c9c15e2 100644 --- a/TODO +++ b/TODO @@ -33,6 +33,7 @@ nbdcopy: - Better page cache usage, see nbdkit-file-plugin options fadvise=sequential cache=none. - Consider io_uring if there are performance bottlenecks. + - Configurable retries in response to read or write failures. nbdfuse: - If you write beyond the end of the virtual file, it returns EIO. diff --git a/copy/Makefile.am b/copy/Makefile.am index f2100853..e729f86a 100644 --- a/copy/Makefile.am +++ b/copy/Makefile.am @@ -1,5 +1,5 @@ # nbd client library in userspace -# Copyright (C) 2020 Red Hat Inc. +# Copyright (C) 2020-2022 Red Hat Inc. # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public @@ -33,6 +33,7 @@ EXTRA_DIST = \ copy-nbd-to-small-nbd-error.sh \ copy-nbd-to-sparse-file.sh \ copy-nbd-to-stdout.sh \ + copy-nbd-error.sh \ copy-progress-bar.sh \ copy-sparse.sh \ copy-sparse-allocated.sh \ @@ -124,6 +125,7 @@ TESTS += \ copy-stdin-to-nbd.sh \ copy-stdin-to-null.sh \ copy-nbd-to-stdout.sh \ + copy-nbd-error.sh \ copy-progress-bar.sh \ copy-sparse.sh \ copy-sparse-allocated.sh \ diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh new file mode 100755 index 00000000..89f0a2f1 --- /dev/null +++ b/copy/copy-nbd-error.sh @@ -0,0 +1,78 @@ +#!/usr/bin/env bash +# nbd client library in userspace +# Copyright (C) 2022 Red Hat Inc. +# +# This library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2 of the License, or (at your option) any later version. +# +# This library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this library; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +# Tests several scenarios of handling NBD server errors +# Serves as a regression test for the CVE-2022-0485 fix. + +. ../tests/functions.sh + +set -e +set -x + +requires nbdkit --exit-with-parent --version + +fail=0 + +# Failure to get block status should not be fatal, but merely downgrade to +# reading the entire image as if data +echo "Testing extents failures on source" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ + error-extents-rate=1 ] null: || fail=1 + +# Failure to read should be fatal +echo "Testing read failures on non-sparse source" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ + error-pread-rate=0.5 ] null: && fail=1 + +# However, reliable block status on a sparse image can avoid the need to read +echo "Testing read failures on sparse source" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \ + error-pread-rate=1 ] null: || fail=1 + +# Failure to write data should be fatal +echo "Testing write data failures on arbitrary destination" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \ + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ + memory 5M error-pwrite-rate=0.5 ] && fail=1 + +# However, writing zeroes can bypass the need for normal writes +echo "Testing write data failures from sparse source" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ + memory 5M error-pwrite-rate=1 ] || fail=1 + +# Failure to write zeroes should be fatal +echo "Testing write zero failures on arbitrary destination" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ + error-zero-rate=1 ] && fail=1 + +# However, assuming/learning destination is zero can skip need to write +echo "Testing write failures on pre-zeroed destination" +$VG nbdcopy --destination-is-zero -- \ + [ nbdkit --exit-with-parent -v null 5M ] \ + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ + error-pwrite-rate=1 error-zero-rate=1 ] || fail=1 + +# Likewise, when write zero is not advertised, fallback to normal write works +echo "Testing write zeroes to destination without zero support" +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ + [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \ + error-zero-rate=1 ] || fail=1 + +exit $fail diff --git a/copy/file-ops.c b/copy/file-ops.c index e37a5014..aaf04ade 100644 --- a/copy/file-ops.c +++ b/copy/file-ops.c @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw, file_synch_read (rw, slice_ptr (command->slice), command->slice.len, command->offset); - if (cb.callback (cb.user_data, &dummy) == -1) { - perror (rw->name); - exit (EXIT_FAILURE); - } + /* file_synch_read called exit() on error */ + cb.callback (cb.user_data, &dummy); } static void @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw, file_synch_write (rw, slice_ptr (command->slice), command->slice.len, command->offset); - if (cb.callback (cb.user_data, &dummy) == -1) { - perror (rw->name); - exit (EXIT_FAILURE); - } + /* file_synch_write called exit() on error */ + cb.callback (cb.user_data, &dummy); } static bool @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command, if (!file_synch_zero (rw, command->offset, command->slice.len, allocate)) return false; - if (cb.callback (cb.user_data, &dummy) == -1) { - perror (rw->name); - exit (EXIT_FAILURE); - } + cb.callback (cb.user_data, &dummy); return true; } diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c index 815b8a02..2aa74d63 100644 --- a/copy/multi-thread-copying.c +++ b/copy/multi-thread-copying.c @@ -28,6 +28,7 @@ #include <errno.h> #include <assert.h> #include <sys/stat.h> +#include <inttypes.h> #include <pthread.h> @@ -376,6 +377,13 @@ finished_read (void *vp, int *error) { struct command *command = vp; + /* XXX - is it worth retrying a failed command? */ + if (*error) { + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n", + command->offset, strerror (*error)); + exit (EXIT_FAILURE); + } + if (allocated || sparse_size == 0) { /* If sparseness detection (see below) is turned off then we write * the whole command. @@ -548,6 +556,13 @@ free_command (void *vp, int *error) struct command *command = vp; struct buffer *buffer = command->slice.buffer; + /* XXX - is it worth retrying a failed command? */ + if (*error) { + fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n", + command->offset, strerror (*error)); + exit (EXIT_FAILURE); + } + if (buffer != NULL) { if (--buffer->refs == 0) { free (buffer->data); diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h index e7fe1eab..c070f8d7 100644 --- a/copy/nbdcopy.h +++ b/copy/nbdcopy.h @@ -1,5 +1,5 @@ /* NBD client library in userspace. - * Copyright (C) 2020-2021 Red Hat Inc. + * Copyright (C) 2020-2022 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -157,7 +157,8 @@ struct rw_ops { bool allocate); /* Asynchronous I/O operations. These start the operation and call - * 'cb' on completion. + * 'cb' on completion. 'cb' will return 1, for auto-retiring with + * asynchronous libnbd calls. * * The file_ops versions are actually implemented synchronously, but * still call 'cb'. @@ -173,7 +174,7 @@ struct rw_ops { nbd_completion_callback cb); /* Asynchronously zero. command->slice.buffer is not used. If not possible, - * returns false. + * returns false. 'cb' must be called only if returning true. */ bool (*asynch_zero) (struct rw *rw, struct command *command, nbd_completion_callback cb, bool allocate); diff --git a/copy/null-ops.c b/copy/null-ops.c index 11cd5116..5f1fda50 100644 --- a/copy/null-ops.c +++ b/copy/null-ops.c @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw, { int dummy = 0; - if (cb.callback (cb.user_data, &dummy) == -1) { - perror (rw->name); - exit (EXIT_FAILURE); - } + cb.callback (cb.user_data, &dummy); } static bool @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command, { int dummy = 0; - if (cb.callback (cb.user_data, &dummy) == -1) { - perror (rw->name); - exit (EXIT_FAILURE); - } + cb.callback (cb.user_data, &dummy); return true; } -- 2.34.1
Richard W.M. Jones
2022-Feb-03 21:48 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
On Thu, Feb 03, 2022 at 02:25:58PM -0600, Eric Blake wrote: ...> +# Tests several scenarios of handling NBD server errors > +# Serves as a regression test for the CVE-2022-0485 fix. > + > +. ../tests/functions.sh > + > +set -e > +set -x > + > +requires nbdkit --exit-with-parent --versionIt's not very likely but in theory the pattern plugin or the error filter might not be installed. These additional tests would catch that and skip the test, although we don't do that in other tests yet ... requires nbdkit pattern --version requires nbdkit --filter=error null --version (https://libguestfs.org/nbdkit-probing.1.html)> +fail=0 > + > +# Failure to get block status should not be fatal, but merely downgrade to > +# reading the entire image as if data > +echo "Testing extents failures on source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-extents-rate=1 ] null: || fail=1 > + > +# Failure to read should be fatal > +echo "Testing read failures on non-sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-pread-rate=0.5 ] null: && fail=1 > + > +# However, reliable block status on a sparse image can avoid the need to read > +echo "Testing read failures on sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \ > + error-pread-rate=1 ] null: || fail=1 > + > +# Failure to write data should be fatal > +echo "Testing write data failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=0.5 ] && fail=1 > + > +# However, writing zeroes can bypass the need for normal writes > +echo "Testing write data failures from sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=1 ] || fail=1 > + > +# Failure to write zeroes should be fatal > +echo "Testing write zero failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-zero-rate=1 ] && fail=1 > + > +# However, assuming/learning destination is zero can skip need to write > +echo "Testing write failures on pre-zeroed destination" > +$VG nbdcopy --destination-is-zero -- \ > + [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-pwrite-rate=1 error-zero-rate=1 ] || fail=1 > + > +# Likewise, when write zero is not advertised, fallback to normal write works > +echo "Testing write zeroes to destination without zero support" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \ > + error-zero-rate=1 ] || fail=1 > + > +exit $fail > diff --git a/copy/file-ops.c b/copy/file-ops.c > index e37a5014..aaf04ade 100644 > --- a/copy/file-ops.c > +++ b/copy/file-ops.c > @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw, > > file_synch_read (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_read called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static void > @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw, > > file_synch_write (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_write called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command, > > if (!file_synch_zero (rw, command->offset, command->slice.len, allocate)) > return false; > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } > > diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c > index 815b8a02..2aa74d63 100644 > --- a/copy/multi-thread-copying.c > +++ b/copy/multi-thread-copying.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <assert.h> > #include <sys/stat.h> > +#include <inttypes.h> > > #include <pthread.h> > > @@ -376,6 +377,13 @@ finished_read (void *vp, int *error) > { > struct command *command = vp; > > + /* XXX - is it worth retrying a failed command? */I would say no. nbdkit can use the retry or retry-request filter if retrying is desirable, and other Unix commands (eg. cp) don't have automatic retry behaviour by default.> + if (*error) { > + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n", > + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (allocated || sparse_size == 0) { > /* If sparseness detection (see below) is turned off then we write > * the whole command. > @@ -548,6 +556,13 @@ free_command (void *vp, int *error) > struct command *command = vp; > struct buffer *buffer = command->slice.buffer; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n", > + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (buffer != NULL) { > if (--buffer->refs == 0) { > free (buffer->data); > diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h > index e7fe1eab..c070f8d7 100644 > --- a/copy/nbdcopy.h > +++ b/copy/nbdcopy.h > @@ -1,5 +1,5 @@ > /* NBD client library in userspace. > - * Copyright (C) 2020-2021 Red Hat Inc. > + * Copyright (C) 2020-2022 Red Hat Inc. > * > * This library is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public > @@ -157,7 +157,8 @@ struct rw_ops { > bool allocate); > > /* Asynchronous I/O operations. These start the operation and call > - * 'cb' on completion. > + * 'cb' on completion. 'cb' will return 1, for auto-retiring with > + * asynchronous libnbd calls. > * > * The file_ops versions are actually implemented synchronously, but > * still call 'cb'. > @@ -173,7 +174,7 @@ struct rw_ops { > nbd_completion_callback cb); > > /* Asynchronously zero. command->slice.buffer is not used. If not possible, > - * returns false. > + * returns false. 'cb' must be called only if returning true. > */ > bool (*asynch_zero) (struct rw *rw, struct command *command, > nbd_completion_callback cb, bool allocate); > diff --git a/copy/null-ops.c b/copy/null-ops.c > index 11cd5116..5f1fda50 100644 > --- a/copy/null-ops.c > +++ b/copy/null-ops.c > @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > }ACK series, thanks! Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Nir Soffer
2022-Feb-04 00:08 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
On Thu, Feb 3, 2022 at 10:26 PM Eric Blake <eblake at redhat.com> wrote:> > nbdcopy has a nasty bug when performing multi-threaded copies using > asynchronous nbd calls - it was blindly treating the completion of an > asynchronous command as successful, rather than checking the *error > parameter. This can result in the silent creation of a corrupted > image in two different ways: when a read fails, we blindly wrote > garbage to the destination; when a write fails, we did not flag that > the destination was not written. > > Since nbdcopy already calls exit() on a synchronous read or write > failure to a file, doing the same for an asynchronous op to an NBD > server is the simplest solution. A nicer solution, but more invasive > to code and thus not done here, might be to allow up to N retries of > the transaction (in case the read or write failure was transient), or > even having a mode where as much data is copied as possible (portions > of the copy that failed would be logged on stderr, and nbdcopy would > still fail with a non-zero exit status, but this would copy more than > just stopping at the first error, as can be done with rsync or > ddrescue).Maybe this belongs to the nbdresuce tool :-)> Note that since we rely on auto-retiring and do NOT call > nbd_aio_command_completed, our completion callbacks must always return > 1 (if they do not exit() first), even when acting on *error, so as not > leave the command allocated until nbd_close. As such, there is no > sane way to return an error to a manual caller of the callback, and > therefore we can drop dead code that calls perror() and exit() if the > callback "failed". It is also worth documenting the contract on when > we must manually call the callback during the asynch_zero callback, so > that we do not leak or double-free the command; thankfully, all the > existing code paths were correct. > > The added testsuite script demonstrates several scenarios, some of > which fail without the rest of this patch in place, and others which > showcase ways in which sparse images can bypass errors. > > Once backports are complete, a followup patch on the main branch will > edit docs/libnbd-security.pod with the mailing list announcement of > the stable branch commit ids and release versions that incorporate > this fix. > > Reported-by: Nir Soffer <nsoffer at redhat.com> > Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6) > Fixes: https://bugzilla.redhat.com/2046194 > --- > TODO | 1 + > copy/Makefile.am | 4 +- > copy/copy-nbd-error.sh | 78 +++++++++++++++++++++++++++++++++++++ > copy/file-ops.c | 17 +++----- > copy/multi-thread-copying.c | 15 +++++++ > copy/nbdcopy.h | 7 ++-- > copy/null-ops.c | 10 +---- > 7 files changed, 108 insertions(+), 24 deletions(-) > create mode 100755 copy/copy-nbd-error.sh > > diff --git a/TODO b/TODO > index da157942..7c9c15e2 100644 > --- a/TODO > +++ b/TODO > @@ -33,6 +33,7 @@ nbdcopy: > - Better page cache usage, see nbdkit-file-plugin options > fadvise=sequential cache=none. > - Consider io_uring if there are performance bottlenecks. > + - Configurable retries in response to read or write failures. > > nbdfuse: > - If you write beyond the end of the virtual file, it returns EIO. > diff --git a/copy/Makefile.am b/copy/Makefile.am > index f2100853..e729f86a 100644 > --- a/copy/Makefile.am > +++ b/copy/Makefile.am > @@ -1,5 +1,5 @@ > # nbd client library in userspace > -# Copyright (C) 2020 Red Hat Inc. > +# Copyright (C) 2020-2022 Red Hat Inc. > # > # This library is free software; you can redistribute it and/or > # modify it under the terms of the GNU Lesser General Public > @@ -33,6 +33,7 @@ EXTRA_DIST = \ > copy-nbd-to-small-nbd-error.sh \ > copy-nbd-to-sparse-file.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > @@ -124,6 +125,7 @@ TESTS += \ > copy-stdin-to-nbd.sh \ > copy-stdin-to-null.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh > new file mode 100755 > index 00000000..89f0a2f1 > --- /dev/null > +++ b/copy/copy-nbd-error.sh > @@ -0,0 +1,78 @@ > +#!/usr/bin/env bash > +# nbd client library in userspace > +# Copyright (C) 2022 Red Hat Inc. > +# > +# This library is free software; you can redistribute it and/or > +# modify it under the terms of the GNU Lesser General Public > +# License as published by the Free Software Foundation; either > +# version 2 of the License, or (at your option) any later version. > +# > +# This library is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# Lesser General Public License for more details. > +# > +# You should have received a copy of the GNU Lesser General Public > +# License along with this library; if not, write to the Free Software > +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +# Tests several scenarios of handling NBD server errors > +# Serves as a regression test for the CVE-2022-0485 fix. > + > +. ../tests/functions.sh > + > +set -e > +set -x > + > +requires nbdkit --exit-with-parent --version > + > +fail=0 > + > +# Failure to get block status should not be fatal, but merely downgrade to > +# reading the entire image as if data > +echo "Testing extents failures on source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-extents-rate=1 ] null: || fail=1 > + > +# Failure to read should be fatal > +echo "Testing read failures on non-sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-pread-rate=0.5 ] null: && fail=1 > + > +# However, reliable block status on a sparse image can avoid the need to read > +echo "Testing read failures on sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \ > + error-pread-rate=1 ] null: || fail=1 > + > +# Failure to write data should be fatal > +echo "Testing write data failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=0.5 ] && fail=1 > + > +# However, writing zeroes can bypass the need for normal writes > +echo "Testing write data failures from sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=1 ] || fail=1 > + > +# Failure to write zeroes should be fatal > +echo "Testing write zero failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-zero-rate=1 ] && fail=1 > + > +# However, assuming/learning destination is zero can skip need to write > +echo "Testing write failures on pre-zeroed destination" > +$VG nbdcopy --destination-is-zero -- \ > + [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-pwrite-rate=1 error-zero-rate=1 ] || fail=1 > + > +# Likewise, when write zero is not advertised, fallback to normal write works > +echo "Testing write zeroes to destination without zero support" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \ > + error-zero-rate=1 ] || fail=1Good test! All this behavior was not tested before?> + > +exit $fail > diff --git a/copy/file-ops.c b/copy/file-ops.c > index e37a5014..aaf04ade 100644 > --- a/copy/file-ops.c > +++ b/copy/file-ops.c > @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw, > > file_synch_read (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_read called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static void > @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw, > > file_synch_write (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_write called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command, > > if (!file_synch_zero (rw, command->offset, command->slice.len, allocate)) > return false; > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } > > diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c > index 815b8a02..2aa74d63 100644 > --- a/copy/multi-thread-copying.c > +++ b/copy/multi-thread-copying.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <assert.h> > #include <sys/stat.h> > +#include <inttypes.h> > > #include <pthread.h> > > @@ -376,6 +377,13 @@ finished_read (void *vp, int *error) > { > struct command *command = vp; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",Good error message, but there is missing space before "failed". Why use hex offset? I find it very confusing. I don't think nbdcopy users think in hex.> + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (allocated || sparse_size == 0) { > /* If sparseness detection (see below) is turned off then we write > * the whole command. > @@ -548,6 +556,13 @@ free_command (void *vp, int *error) > struct command *command = vp; > struct buffer *buffer = command->slice.buffer; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n", > + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (buffer != NULL) { > if (--buffer->refs == 0) { > free (buffer->data); > diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h > index e7fe1eab..c070f8d7 100644 > --- a/copy/nbdcopy.h > +++ b/copy/nbdcopy.h > @@ -1,5 +1,5 @@ > /* NBD client library in userspace. > - * Copyright (C) 2020-2021 Red Hat Inc. > + * Copyright (C) 2020-2022 Red Hat Inc. > * > * This library is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public > @@ -157,7 +157,8 @@ struct rw_ops { > bool allocate); > > /* Asynchronous I/O operations. These start the operation and call > - * 'cb' on completion. > + * 'cb' on completion. 'cb' will return 1, for auto-retiring with > + * asynchronous libnbd calls. > * > * The file_ops versions are actually implemented synchronously, but > * still call 'cb'. > @@ -173,7 +174,7 @@ struct rw_ops { > nbd_completion_callback cb); > > /* Asynchronously zero. command->slice.buffer is not used. If not possible, > - * returns false. > + * returns false. 'cb' must be called only if returning true. > */ > bool (*asynch_zero) (struct rw *rw, struct command *command, > nbd_completion_callback cb, bool allocate); > diff --git a/copy/null-ops.c b/copy/null-ops.c > index 11cd5116..5f1fda50 100644 > --- a/copy/null-ops.c > +++ b/copy/null-ops.c > @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } > > -- > 2.34.1 >Looks ready after fixing the error message. Nir
Laszlo Ersek
2022-Feb-04 09:09 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
On 02/03/22 21:25, Eric Blake wrote:> nbdcopy has a nasty bug when performing multi-threaded copies using > asynchronous nbd calls - it was blindly treating the completion of an > asynchronous command as successful, rather than checking the *error > parameter. This can result in the silent creation of a corrupted > image in two different ways: when a read fails, we blindly wrote > garbage to the destination; when a write fails, we did not flag that > the destination was not written. > > Since nbdcopy already calls exit() on a synchronous read or write > failure to a file, doing the same for an asynchronous op to an NBD > server is the simplest solution. A nicer solution, but more invasive > to code and thus not done here, might be to allow up to N retries of > the transaction (in case the read or write failure was transient), or > even having a mode where as much data is copied as possible (portions > of the copy that failed would be logged on stderr, and nbdcopy would > still fail with a non-zero exit status, but this would copy more than > just stopping at the first error, as can be done with rsync or > ddrescue). > > Note that since we rely on auto-retiring and do NOT call > nbd_aio_command_completed, our completion callbacks must always return > 1 (if they do not exit() first), even when acting on *error, so as not > leave the command allocated until nbd_close. As such, there is no > sane way to return an error to a manual caller of the callback, and > therefore we can drop dead code that calls perror() and exit() if the > callback "failed". It is also worth documenting the contract on when > we must manually call the callback during the asynch_zero callback, so > that we do not leak or double-free the command; thankfully, all the > existing code paths were correct. > > The added testsuite script demonstrates several scenarios, some of > which fail without the rest of this patch in place, and others which > showcase ways in which sparse images can bypass errors. > > Once backports are complete, a followup patch on the main branch will > edit docs/libnbd-security.pod with the mailing list announcement of > the stable branch commit ids and release versions that incorporate > this fix. > > Reported-by: Nir Soffer <nsoffer at redhat.com> > Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6) > Fixes: https://bugzilla.redhat.com/2046194 > --- > TODO | 1 + > copy/Makefile.am | 4 +- > copy/copy-nbd-error.sh | 78 +++++++++++++++++++++++++++++++++++++ > copy/file-ops.c | 17 +++----- > copy/multi-thread-copying.c | 15 +++++++ > copy/nbdcopy.h | 7 ++-- > copy/null-ops.c | 10 +---- > 7 files changed, 108 insertions(+), 24 deletions(-) > create mode 100755 copy/copy-nbd-error.sh > > diff --git a/TODO b/TODO > index da157942..7c9c15e2 100644 > --- a/TODO > +++ b/TODO > @@ -33,6 +33,7 @@ nbdcopy: > - Better page cache usage, see nbdkit-file-plugin options > fadvise=sequential cache=none. > - Consider io_uring if there are performance bottlenecks. > + - Configurable retries in response to read or write failures. > > nbdfuse: > - If you write beyond the end of the virtual file, it returns EIO. > diff --git a/copy/Makefile.am b/copy/Makefile.am > index f2100853..e729f86a 100644 > --- a/copy/Makefile.am > +++ b/copy/Makefile.am > @@ -1,5 +1,5 @@ > # nbd client library in userspace > -# Copyright (C) 2020 Red Hat Inc. > +# Copyright (C) 2020-2022 Red Hat Inc. > # > # This library is free software; you can redistribute it and/or > # modify it under the terms of the GNU Lesser General Public > @@ -33,6 +33,7 @@ EXTRA_DIST = \ > copy-nbd-to-small-nbd-error.sh \ > copy-nbd-to-sparse-file.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > @@ -124,6 +125,7 @@ TESTS += \ > copy-stdin-to-nbd.sh \ > copy-stdin-to-null.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh > new file mode 100755 > index 00000000..89f0a2f1 > --- /dev/null > +++ b/copy/copy-nbd-error.sh > @@ -0,0 +1,78 @@ > +#!/usr/bin/env bash > +# nbd client library in userspace > +# Copyright (C) 2022 Red Hat Inc. > +# > +# This library is free software; you can redistribute it and/or > +# modify it under the terms of the GNU Lesser General Public > +# License as published by the Free Software Foundation; either > +# version 2 of the License, or (at your option) any later version. > +# > +# This library is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# Lesser General Public License for more details. > +# > +# You should have received a copy of the GNU Lesser General Public > +# License along with this library; if not, write to the Free Software > +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +# Tests several scenarios of handling NBD server errors > +# Serves as a regression test for the CVE-2022-0485 fix. > + > +. ../tests/functions.sh > + > +set -e > +set -x > + > +requires nbdkit --exit-with-parent --version > + > +fail=0 > + > +# Failure to get block status should not be fatal, but merely downgrade to > +# reading the entire image as if data > +echo "Testing extents failures on source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-extents-rate=1 ] null: || fail=1 > + > +# Failure to read should be fatal > +echo "Testing read failures on non-sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-pread-rate=0.5 ] null: && fail=1 > + > +# However, reliable block status on a sparse image can avoid the need to read > +echo "Testing read failures on sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \ > + error-pread-rate=1 ] null: || fail=1 > + > +# Failure to write data should be fatal > +echo "Testing write data failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=0.5 ] && fail=1 > + > +# However, writing zeroes can bypass the need for normal writes > +echo "Testing write data failures from sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=1 ] || fail=1 > + > +# Failure to write zeroes should be fatal > +echo "Testing write zero failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-zero-rate=1 ] && fail=1 > + > +# However, assuming/learning destination is zero can skip need to write > +echo "Testing write failures on pre-zeroed destination" > +$VG nbdcopy --destination-is-zero -- \ > + [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-pwrite-rate=1 error-zero-rate=1 ] || fail=1 > + > +# Likewise, when write zero is not advertised, fallback to normal write works > +echo "Testing write zeroes to destination without zero support" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \ > + error-zero-rate=1 ] || fail=1 > + > +exit $fail > diff --git a/copy/file-ops.c b/copy/file-ops.c > index e37a5014..aaf04ade 100644 > --- a/copy/file-ops.c > +++ b/copy/file-ops.c > @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw, > > file_synch_read (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_read called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static void > @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw, > > file_synch_write (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_write called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command, > > if (!file_synch_zero (rw, command->offset, command->slice.len, allocate)) > return false; > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } > > diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c > index 815b8a02..2aa74d63 100644 > --- a/copy/multi-thread-copying.c > +++ b/copy/multi-thread-copying.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <assert.h> > #include <sys/stat.h> > +#include <inttypes.h> > > #include <pthread.h> > > @@ -376,6 +377,13 @@ finished_read (void *vp, int *error) > { > struct command *command = vp; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",Like Nir said, it should be '" failed..."'. (I'm neutral on PRIx64 vs. PRIu64.) Reviewed-by: Laszlo Ersek <lersek at redhat.com> Thanks Laszlo> + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (allocated || sparse_size == 0) { > /* If sparseness detection (see below) is turned off then we write > * the whole command. > @@ -548,6 +556,13 @@ free_command (void *vp, int *error) > struct command *command = vp; > struct buffer *buffer = command->slice.buffer; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n", > + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (buffer != NULL) { > if (--buffer->refs == 0) { > free (buffer->data); > diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h > index e7fe1eab..c070f8d7 100644 > --- a/copy/nbdcopy.h > +++ b/copy/nbdcopy.h > @@ -1,5 +1,5 @@ > /* NBD client library in userspace. > - * Copyright (C) 2020-2021 Red Hat Inc. > + * Copyright (C) 2020-2022 Red Hat Inc. > * > * This library is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public > @@ -157,7 +157,8 @@ struct rw_ops { > bool allocate); > > /* Asynchronous I/O operations. These start the operation and call > - * 'cb' on completion. > + * 'cb' on completion. 'cb' will return 1, for auto-retiring with > + * asynchronous libnbd calls. > * > * The file_ops versions are actually implemented synchronously, but > * still call 'cb'. > @@ -173,7 +174,7 @@ struct rw_ops { > nbd_completion_callback cb); > > /* Asynchronously zero. command->slice.buffer is not used. If not possible, > - * returns false. > + * returns false. 'cb' must be called only if returning true. > */ > bool (*asynch_zero) (struct rw *rw, struct command *command, > nbd_completion_callback cb, bool allocate); > diff --git a/copy/null-ops.c b/copy/null-ops.c > index 11cd5116..5f1fda50 100644 > --- a/copy/null-ops.c > +++ b/copy/null-ops.c > @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } >