thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails [Feb 2022]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2022-Feb-03 14:12 UTC

[Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

On Thu, Feb 03, 2022 at 02:46:27PM +0200, Nir Soffer
wrote:> On Thu, Feb 3, 2022 at 3:51 AM Eric Blake <eblake at redhat.com>
wrote:
> >
> > FIXME: This is CVE-2022-XXXXX (still awaiting assignment of the CVE
number).
> >
> > nbdcopy has a nasty bug when performing multi-threaded copies using
> > asynchronous nbd calls - it was blindly treating the completion of an
> > asynchronous command as successful, rather than checking the *error
> > parameter.  This can result in the silent creation of a corrupted
> > image in two different ways: when a read fails, we blindly wrote
> > garbage to the destination; when a write fails, we did not flag that
> > the destination was not written.
> 
> Writing zeroes is third failure case
It was the same callback shared between nbd_aio_pwrite and
nbd_aio_zero, but you are right that...
> > +
> > +# Run an nbdkit server that randomly fails.
> > +nbdkit --exit-with-parent -f -v -P $pidfile -U $sock \
> > +       --filter=error --filter=noextents \
> > +       memory size=5M error-pread-rate=0.5 error-pwrite-rate=0.5
&
> 
> What about write_zeroes errors?
...I did not have test coverage of writing zeroes being yet another
way to trigger the problem.  I may further tweak the testsuite to use
'nbdcopy -- [ nbdkit ... ] [ nbdkit ...]' tuned to each failure
scenario, rather than creating a pid file for a long-running process.
One of the things I ran into when first testing is that a mere
error-rate=0.5 is too broad (it can fails the nbd_block_status calls,
which were already exiting nbdcopy), and I also had to inject
--filter=noextents so that reads would not see that the source was
entirely sparse (because then it skips the nbd_aio_pread).
> > @@ -589,11 +589,8 @@ file_asynch_read (struct rw *rw,
> >  {
> >    file_synch_read (rw, slice_ptr (command->slice),
> >                     command->slice.len, command->offset);
> > -  errno = 0;
> > -  if (cb.callback (cb.user_data, &errno) == -1) {
> > -    perror (rw->name);
> > -    exit (EXIT_FAILURE);
> > -  }
> > +  errno = 0; /* safe, since file_synch_read exits on error */
> > +  cb.callback (cb.user_data, &errno);
> 
> Much nicer - should we assert that the callback returns 1?
I'm not sure that the assert would buy us much; I already added the
documentation contract in nbdcopy.h, and really, the only time we
would even have the bug of not returning 1 would be when *error is not
0 on entry, but the file plugin never tickles that case for the
assertion to detect us intrducing such a bug into the callback.
> > @@ -376,6 +376,13 @@ finished_read (void *vp, int *error)
> >  {
> >    struct command *command = vp;
> >
> > +  /* XXX - is it worth retrying a failed command? */
> > +  if (*error) {
> > +    errno = *error;
> > +    perror("read");
> > +    exit (EXIT_FAILURE);
> > +  }
> 
> Setting errno to use perror is clever, but I think in future we would
> like to exit with a more detailed message, like:
> 
>     nbdcopy: read failed at offset 44040192: Input output error
Good idea - we DO have the offset available as part of *command, so
I'll rework it into the error message for v2.
> 
> So we would use fprintf and strerror, and setting errno would be needed
> although it can be used with %m (is it portable?).
%m is not portable in *printf(3).  It IS portable in syslog(3), which
is why glibc printf(3) supports %m as well; and in nbdkit, we use a
printf wrapper on non-glibc to give unconditional %m support to nbdkit
plugins.  But we do not have that printf wrapper here.  Using
strerror() instead of perror() is doable in applications (we shouldn't
do it directly in libnbd since strerror() has multi-threaded
implications and we don't control what the application may be doing in
other threads, but in nbdcopy we control the entire application).
> 
> For now I think this is fine, we want a minimal change for easy backport.
Yep, and since I'm the one that will be doing the backports, I'm aware
of the problems of making the patch too complex ;)
> 
> > +
> >    if (allocated || sparse_size == 0) {
> >      /* If sparseness detection (see below) is turned off then we
write
> >       * the whole command.
> > @@ -475,6 +482,7 @@ finished_read (void *vp, int *error)
> >      /* Free the original command since it has been split into
> >       * subcommands and the original is no longer needed.
> >       */
> > +    errno = 0;
> >      free_command (command, &errno);
> 
> Using free_command as a callback is ugly, but good enough for now.
Is your complaint that in some places we call free_command() directly,
in other places, we rely on libnbd to call it as a callback?  Yeah, it
can get confusing, and it took me a while to convince myself that we
call it exactly once on all call paths, particularly when asynch_zero
returns true vs. false.
> 
> Would be good to test also writing zeroes, but otherwise the fix is
complete.
Given that I will be revising the patch to add in a CVE number
anyways, I'll improve the test for v2.
> 
> Reviewed-by: Nir Soffer <nsoffer at redhat.com>
Thanks.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Laszlo Ersek

2022-Feb-04 08:31 UTC

head link

[Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

On 02/03/22 15:12, Eric Blake wrote:> On Thu, Feb 03, 2022 at 02:46:27PM +0200, Nir Soffer wrote:
>> So we would use fprintf and strerror, and setting errno would be needed
>> although it can be used with %m (is it portable?).
> 
> %m is not portable in *printf(3).  It IS portable in syslog(3), which
> is why glibc printf(3) supports %m as well; and in nbdkit, we use a
> printf wrapper on non-glibc to give unconditional %m support to nbdkit
> plugins.  But we do not have that printf wrapper here.  Using
> strerror() instead of perror() is doable in applications (we shouldn't
> do it directly in libnbd since strerror() has multi-threaded
> implications and we don't control what the application may be doing in
> other threads, but in nbdcopy we control the entire application).
Side topic: what do we think about strerror_r()?

... Hmmm I recall something... Yep: we now have guestfs_int_strerror()
in the libguestfs-common project at least.

(We cannot just copy it over I think, due to licence differences;
however, with Rich having authored commit 9ec7cd8e23b5 ("utils: Fix
usage of strerror_r", 2021-12-09), Red Hat seems to "own the copyright
on that function" (if this statement makes any sense), so we could
"relicense" it.)

Thanks,
Laszlo
> 
>>
>> For now I think this is fine, we want a minimal change for easy
backport.
> 
> Yep, and since I'm the one that will be doing the backports, I'm
aware
> of the problems of making the patch too complex ;)
> 
>>
>>> +
>>>    if (allocated || sparse_size == 0) {
>>>      /* If sparseness detection (see below) is turned off then we
write
>>>       * the whole command.
>>> @@ -475,6 +482,7 @@ finished_read (void *vp, int *error)
>>>      /* Free the original command since it has been split into
>>>       * subcommands and the original is no longer needed.
>>>       */
>>> +    errno = 0;
>>>      free_command (command, &errno);
>>
>> Using free_command as a callback is ugly, but good enough for now.
> 
> Is your complaint that in some places we call free_command() directly,
> in other places, we rely on libnbd to call it as a callback?  Yeah, it
> can get confusing, and it took me a while to convince myself that we
> call it exactly once on all call paths, particularly when asynch_zero
> returns true vs. false.
> 
>>
>> Would be good to test also writing zeroes, but otherwise the fix is
complete.
> 
> Given that I will be revising the patch to add in a CVE number
> anyways, I'll improve the test for v2.
> 
>>
>> Reviewed-by: Nir Soffer <nsoffer at redhat.com>
> 
> Thanks.
>

Libguestfs - Feb 2022 - [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

[Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

[Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails