Laszlo Ersek
2022-Feb-04 09:09 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
On 02/03/22 21:25, Eric Blake wrote:> nbdcopy has a nasty bug when performing multi-threaded copies using > asynchronous nbd calls - it was blindly treating the completion of an > asynchronous command as successful, rather than checking the *error > parameter. This can result in the silent creation of a corrupted > image in two different ways: when a read fails, we blindly wrote > garbage to the destination; when a write fails, we did not flag that > the destination was not written. > > Since nbdcopy already calls exit() on a synchronous read or write > failure to a file, doing the same for an asynchronous op to an NBD > server is the simplest solution. A nicer solution, but more invasive > to code and thus not done here, might be to allow up to N retries of > the transaction (in case the read or write failure was transient), or > even having a mode where as much data is copied as possible (portions > of the copy that failed would be logged on stderr, and nbdcopy would > still fail with a non-zero exit status, but this would copy more than > just stopping at the first error, as can be done with rsync or > ddrescue). > > Note that since we rely on auto-retiring and do NOT call > nbd_aio_command_completed, our completion callbacks must always return > 1 (if they do not exit() first), even when acting on *error, so as not > leave the command allocated until nbd_close. As such, there is no > sane way to return an error to a manual caller of the callback, and > therefore we can drop dead code that calls perror() and exit() if the > callback "failed". It is also worth documenting the contract on when > we must manually call the callback during the asynch_zero callback, so > that we do not leak or double-free the command; thankfully, all the > existing code paths were correct. > > The added testsuite script demonstrates several scenarios, some of > which fail without the rest of this patch in place, and others which > showcase ways in which sparse images can bypass errors. > > Once backports are complete, a followup patch on the main branch will > edit docs/libnbd-security.pod with the mailing list announcement of > the stable branch commit ids and release versions that incorporate > this fix. > > Reported-by: Nir Soffer <nsoffer at redhat.com> > Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6) > Fixes: https://bugzilla.redhat.com/2046194 > --- > TODO | 1 + > copy/Makefile.am | 4 +- > copy/copy-nbd-error.sh | 78 +++++++++++++++++++++++++++++++++++++ > copy/file-ops.c | 17 +++----- > copy/multi-thread-copying.c | 15 +++++++ > copy/nbdcopy.h | 7 ++-- > copy/null-ops.c | 10 +---- > 7 files changed, 108 insertions(+), 24 deletions(-) > create mode 100755 copy/copy-nbd-error.sh > > diff --git a/TODO b/TODO > index da157942..7c9c15e2 100644 > --- a/TODO > +++ b/TODO > @@ -33,6 +33,7 @@ nbdcopy: > - Better page cache usage, see nbdkit-file-plugin options > fadvise=sequential cache=none. > - Consider io_uring if there are performance bottlenecks. > + - Configurable retries in response to read or write failures. > > nbdfuse: > - If you write beyond the end of the virtual file, it returns EIO. > diff --git a/copy/Makefile.am b/copy/Makefile.am > index f2100853..e729f86a 100644 > --- a/copy/Makefile.am > +++ b/copy/Makefile.am > @@ -1,5 +1,5 @@ > # nbd client library in userspace > -# Copyright (C) 2020 Red Hat Inc. > +# Copyright (C) 2020-2022 Red Hat Inc. > # > # This library is free software; you can redistribute it and/or > # modify it under the terms of the GNU Lesser General Public > @@ -33,6 +33,7 @@ EXTRA_DIST = \ > copy-nbd-to-small-nbd-error.sh \ > copy-nbd-to-sparse-file.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > @@ -124,6 +125,7 @@ TESTS += \ > copy-stdin-to-nbd.sh \ > copy-stdin-to-null.sh \ > copy-nbd-to-stdout.sh \ > + copy-nbd-error.sh \ > copy-progress-bar.sh \ > copy-sparse.sh \ > copy-sparse-allocated.sh \ > diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh > new file mode 100755 > index 00000000..89f0a2f1 > --- /dev/null > +++ b/copy/copy-nbd-error.sh > @@ -0,0 +1,78 @@ > +#!/usr/bin/env bash > +# nbd client library in userspace > +# Copyright (C) 2022 Red Hat Inc. > +# > +# This library is free software; you can redistribute it and/or > +# modify it under the terms of the GNU Lesser General Public > +# License as published by the Free Software Foundation; either > +# version 2 of the License, or (at your option) any later version. > +# > +# This library is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# Lesser General Public License for more details. > +# > +# You should have received a copy of the GNU Lesser General Public > +# License along with this library; if not, write to the Free Software > +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +# Tests several scenarios of handling NBD server errors > +# Serves as a regression test for the CVE-2022-0485 fix. > + > +. ../tests/functions.sh > + > +set -e > +set -x > + > +requires nbdkit --exit-with-parent --version > + > +fail=0 > + > +# Failure to get block status should not be fatal, but merely downgrade to > +# reading the entire image as if data > +echo "Testing extents failures on source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-extents-rate=1 ] null: || fail=1 > + > +# Failure to read should be fatal > +echo "Testing read failures on non-sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \ > + error-pread-rate=0.5 ] null: && fail=1 > + > +# However, reliable block status on a sparse image can avoid the need to read > +echo "Testing read failures on sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \ > + error-pread-rate=1 ] null: || fail=1 > + > +# Failure to write data should be fatal > +echo "Testing write data failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=0.5 ] && fail=1 > + > +# However, writing zeroes can bypass the need for normal writes > +echo "Testing write data failures from sparse source" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \ > + memory 5M error-pwrite-rate=1 ] || fail=1 > + > +# Failure to write zeroes should be fatal > +echo "Testing write zero failures on arbitrary destination" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-zero-rate=1 ] && fail=1 > + > +# However, assuming/learning destination is zero can skip need to write > +echo "Testing write failures on pre-zeroed destination" > +$VG nbdcopy --destination-is-zero -- \ > + [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=error memory 5M \ > + error-pwrite-rate=1 error-zero-rate=1 ] || fail=1 > + > +# Likewise, when write zero is not advertised, fallback to normal write works > +echo "Testing write zeroes to destination without zero support" > +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \ > + [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \ > + error-zero-rate=1 ] || fail=1 > + > +exit $fail > diff --git a/copy/file-ops.c b/copy/file-ops.c > index e37a5014..aaf04ade 100644 > --- a/copy/file-ops.c > +++ b/copy/file-ops.c > @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw, > > file_synch_read (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_read called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static void > @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw, > > file_synch_write (rw, slice_ptr (command->slice), > command->slice.len, command->offset); > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + /* file_synch_write called exit() on error */ > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command, > > if (!file_synch_zero (rw, command->offset, command->slice.len, allocate)) > return false; > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } > > diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c > index 815b8a02..2aa74d63 100644 > --- a/copy/multi-thread-copying.c > +++ b/copy/multi-thread-copying.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <assert.h> > #include <sys/stat.h> > +#include <inttypes.h> > > #include <pthread.h> > > @@ -376,6 +377,13 @@ finished_read (void *vp, int *error) > { > struct command *command = vp; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",Like Nir said, it should be '" failed..."'. (I'm neutral on PRIx64 vs. PRIu64.) Reviewed-by: Laszlo Ersek <lersek at redhat.com> Thanks Laszlo> + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (allocated || sparse_size == 0) { > /* If sparseness detection (see below) is turned off then we write > * the whole command. > @@ -548,6 +556,13 @@ free_command (void *vp, int *error) > struct command *command = vp; > struct buffer *buffer = command->slice.buffer; > > + /* XXX - is it worth retrying a failed command? */ > + if (*error) { > + fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n", > + command->offset, strerror (*error)); > + exit (EXIT_FAILURE); > + } > + > if (buffer != NULL) { > if (--buffer->refs == 0) { > free (buffer->data); > diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h > index e7fe1eab..c070f8d7 100644 > --- a/copy/nbdcopy.h > +++ b/copy/nbdcopy.h > @@ -1,5 +1,5 @@ > /* NBD client library in userspace. > - * Copyright (C) 2020-2021 Red Hat Inc. > + * Copyright (C) 2020-2022 Red Hat Inc. > * > * This library is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public > @@ -157,7 +157,8 @@ struct rw_ops { > bool allocate); > > /* Asynchronous I/O operations. These start the operation and call > - * 'cb' on completion. > + * 'cb' on completion. 'cb' will return 1, for auto-retiring with > + * asynchronous libnbd calls. > * > * The file_ops versions are actually implemented synchronously, but > * still call 'cb'. > @@ -173,7 +174,7 @@ struct rw_ops { > nbd_completion_callback cb); > > /* Asynchronously zero. command->slice.buffer is not used. If not possible, > - * returns false. > + * returns false. 'cb' must be called only if returning true. > */ > bool (*asynch_zero) (struct rw *rw, struct command *command, > nbd_completion_callback cb, bool allocate); > diff --git a/copy/null-ops.c b/copy/null-ops.c > index 11cd5116..5f1fda50 100644 > --- a/copy/null-ops.c > +++ b/copy/null-ops.c > @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > } > > static bool > @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command, > { > int dummy = 0; > > - if (cb.callback (cb.user_data, &dummy) == -1) { > - perror (rw->name); > - exit (EXIT_FAILURE); > - } > + cb.callback (cb.user_data, &dummy); > return true; > } >
Eric Blake
2022-Feb-04 14:44 UTC
[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
On Fri, Feb 04, 2022 at 10:09:26AM +0100, Laszlo Ersek wrote:> On 02/03/22 21:25, Eric Blake wrote: > > nbdcopy has a nasty bug when performing multi-threaded copies using > > asynchronous nbd calls - it was blindly treating the completion of an > > asynchronous command as successful, rather than checking the *error > > parameter. This can result in the silent creation of a corrupted > > image in two different ways: when a read fails, we blindly wrote > > garbage to the destination; when a write fails, we did not flag that > > the destination was not written. > > > > + /* XXX - is it worth retrying a failed command? */ > > + if (*error) { > > + fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n", > > Like Nir said, it should be '" failed..."'. (I'm neutral on PRIx64 vs. > PRIu64.) > > Reviewed-by: Laszlo Ersek <lersek at redhat.com> > > Thanks > LaszloNow pushed upstream as a865526..8d444b4, with tweaks to patches 1 and 5 content and patch 4 commit message per review comments. I'm starting the backport process to stable branches, and will followup with a top-level post as the security announcement (although given my schedule today, the announcement may be delayed to Monday). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org