thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption [Feb 2022]

If this information is useful, please help other people find it:
Share via:

Laszlo Ersek

2022-Feb-02 08:59 UTC

[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption

On 02/02/22 09:23, Laszlo Ersek wrote:> On 02/01/22 20:44, Eric Blake wrote:
>> These are the trivial patches modeled after Nir's commit 7ba6ef67,
>> fixing further places that fail to check *error during callbacks.
>
> I find these interfaces very difficult to use.
>
> https://libguestfs.org/nbd_aio_block_status.3.html
>
> We have two callbacks here, each takes a *error, and I have zero idea of
> the order the callbacks are supposed to be called in.
>
> The last paragraph of the commit message speaks about this, and the code
> updates seem to suggest some ordering, but I'm still unsure.
>
> ... In fact, I'm now uncertain whether ignoring "err" in
virt-v2v's
> "lib/utils.ml", function "get_disk_allocated", is safe.
It's a "sync"
> (not aio) nbd_block_status() call, but the documentation
> <https://libguestfs.org/nbd_block_status.3.html> says:
>
> "On entry to the callback, the error parameter contains the errno
value
> of any previously detected error."
>
> So what should we do then, in "get_disk_allocated"? Return -1
> immediately?
>From patch#4, it seems that we need exactly this. Here's the function:
let get_disk_allocated ~dir ~disknr   let socket = sprintf "%s/out%d"
dir disknr
  and alloc_ctx = "base:allocation" in
  with_nbd_connect_unix ~socket ~meta_contexts:[alloc_ctx]
    (fun nbd ->
         if NBD.can_meta_context nbd alloc_ctx then (
           (* Get the list of extents, using a 2GiB chunk size as hint. *)
           let size = NBD.get_size nbd
           and allocated = ref 0_L
           and fetch_offset = ref 0_L in
           while !fetch_offset < size do
             let remaining = size -^ !fetch_offset in
             let fetch_size = min 0x8000_0000_L remaining in
             NBD.block_status nbd fetch_size !fetch_offset
               (fun ctx offset entries err ->
                  assert (ctx = alloc_ctx);
                  for i = 0 to Array.length entries / 2 - 1 do
                    let len = entries.(i * 2)
                    and typ = entries.(i * 2 + 1) in
                    assert (len > 0_L);
                    if typ &^ 1_L = 0_L then
                      allocated := !allocated +^ len;
                    fetch_offset := !fetch_offset +^ len
                  done;
                  0
               )
           done;
           Some !allocated
         ) else None
       )

If our (nameless) function is entered with a nonzero "err", then we
cannot trust the "entries" array. Which means not only
"allocated" may
be miscalculated, but even "fetch_offset" -- and the latter can as
well
break the chunking (= outer) loop. This seems to imply we *must not*
continue the outer loop.

So how do we handle this in OCaml? If we return -1 from the (nameless)
extents callback, will NBD.block_status raise an exception?

Hmmmm: maybe. In the generated nbd_internal_ocaml_nbd_block_status()
function, in file "ocaml/nbd-c.c", we have:

  caml_enter_blocking_section ();
  r =  nbd_block_status (h, count, offset, extent_callback, flags);
  caml_leave_blocking_section ();

  if (r == -1)
    nbd_internal_ocaml_raise_error ();

However, in extent_wrapper() -> extent_wrapper_locked(), there is more
confusion for me:

  metacontextv = caml_copy_string (metacontext);
  offsetv = caml_copy_int64 (offset);
  entriesv = nbd_internal_ocaml_alloc_int64_from_uint32_array (entries,
nr_entries);
  errorv = caml_alloc_tuple (1);
  Store_field (errorv, 0, Val_int (*error));
  args[0] = metacontextv;
  args[1] = offsetv;
  args[2] = entriesv;
  args[3] = errorv;
  rv = caml_callbackN_exn (data->fnv, 4, args);
  *error = Int_val (Field (errorv, 0));
  if (Is_exception_result (rv)) {
    nbd_internal_ocaml_exception_in_wrapper ("extent", rv);
    CAMLreturnT (int, -1);
  }

  r = Int_val (rv);
  assert (r >= 0);
  CAMLreturnT (int, r);

If I understand correctly, if we returned (-1) from the (nameless)
extents callback, that would trip the assert(); would it not? Thus, we
would not reach the nbd_internal_ocaml_raise_error() in the above-quoted
nbd_internal_ocaml_nbd_block_status() function.

Thanks,
Laszlo

> And if we do so, what is that going to mean for
> "NBD.block_status"? Will it raise an exception?
>
> Thanks,
> Laszlo
>
>> nbdcopy still has a bigger bug in that it is NOT properly checking for
>> *error in the pread/pwrite completion handlers, but fixing that
>> gracefully (rather than just quitting the process immediately, as we
>> can do in the examples), requires more invasive patches which I will
>> be posting separately for review, while the ones posted here are
>> trivial enough that I'm pushing them now (commits
23bcea4a..1373d423).
>>
>> I also audited other uses of completion callbacks:
>> - in tests, closure-lifetimes, errors, server-death, and
>>   shutdown-flags are all safe (either the callback returns 0 and the
>>   code later calls nbd_aio_command_completed, or we are explicitly
>>   testing aspects of completion callbacks)
>> - in fuse/operations.c, the completion callback returns 0 and calls
>>   nbd_aio_command_completed later
>> - in examples/strict-structured-reads.c, the completion callback
>>   properly checks *error
>>
>> I also checked that nbdkit's nbd plugin properly checks *error.
>>
>> Eric Blake (4):
>>   examples: aio-connect-read.c: Fix error handling
>>   examples: glib-main-loop: don't strand commands when gsource
>>     disappears
>>   examples: glib-main-loop: Fix error handling
>>   copy: Ignore extents if we encountered earlier error
>>
>>  examples/aio-connect-read.c |  7 +++++++
>>  examples/glib-main-loop.c   | 14 ++++++++++++--
>>  copy/nbd-ops.c              |  4 ++--
>>  3 files changed, 21 insertions(+), 4 deletions(-)
>>
>

Richard W.M. Jones

2022-Feb-02 10:55 UTC

head link

[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption

On Wed, Feb 02, 2022 at 09:59:04AM +0100, Laszlo Ersek
wrote:> However, in extent_wrapper() -> extent_wrapper_locked(), there is more
> confusion for me:
> 
>   metacontextv = caml_copy_string (metacontext);
>   offsetv = caml_copy_int64 (offset);
>   entriesv = nbd_internal_ocaml_alloc_int64_from_uint32_array (entries,
nr_entries);
>   errorv = caml_alloc_tuple (1);
>   Store_field (errorv, 0, Val_int (*error));
>   args[0] = metacontextv;
>   args[1] = offsetv;
>   args[2] = entriesv;
>   args[3] = errorv;
>   rv = caml_callbackN_exn (data->fnv, 4, args);
>   *error = Int_val (Field (errorv, 0));
>   if (Is_exception_result (rv)) {
>     nbd_internal_ocaml_exception_in_wrapper ("extent", rv);
>     CAMLreturnT (int, -1);
>   }
> 
>   r = Int_val (rv);
>   assert (r >= 0);
>   CAMLreturnT (int, r);
> 
> If I understand correctly, if we returned (-1) from the (nameless)
> extents callback, that would trip the assert(); would it not? Thus, we
> would not reach the nbd_internal_ocaml_raise_error() in the above-quoted
> nbd_internal_ocaml_nbd_block_status() function.
There are five cases:

 - Callback returns > 1
   => The program shouldn't do this, but it's treated the same
      as the next case.

 - Callback returns 1
   => For completion callbacks this indicates that the callback was
      successful and you want to autoretire the command.  For other
      types of callbacks the program shouldn't do this but it is the
      same as the next case.

 - Callback returns 0
   => Indicates that the callback was successful.

 - Callback throws an exception
   => nbd_internal_ocaml_exception_in_wrapper is called
   => Callback returns -1 indicating (to libnbd) that there was
      an error.  cmd->error is updated with !err || EPROTO.

 - Callback returns < 0
   => assert fails!

The assert is a bit harsh, but it indicates a bug in the OCaml code.
It probably means the OCaml code thought that returning -1 indicates
an error, whereas the OCaml code should throw an exception instead.

For the history of why that assert was added see:

https://gitlab.com/nbdkit/libnbd/-/commit/d881d160e1cd9c9964782300a7652ffb4e506c27

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Libguestfs - Feb 2022 - [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption

[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption

[Libguestfs] [libnbd PATCH 0/4] Work towards fixing nbdcopy silent corruption