thr3ads.net - Libguestfs - [Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2019-Apr-25 18:01 UTC

[Libguestfs] [nbdkit PATCH v2 0/5] structured replies/.extents for nbd plugin

Updated based on other changes that have happened in the meantime:
- rely more on cleanup.h (throughout)
- split structured read for easier review (patch 2 and 3 were combined in v1)
- rely on nbdkit not leaking a server's partial answer (patch 3)
- add tests (patch 5)
- other bug fixes I found while testing it
- drop EOVERFLOW patch for now; it will be separate once upstream
NBD protocol specification is clarified

Eric Blake (5):
  nbd: Implement NBD_OPT_GO client request
  nbd: Refactor receive loop to prepare for structured replies
  nbd: Implement structured reads from server
  nbd: Implement .extents
  nbd: Test .extents

 plugins/nbd/nbd.c         | 534 +++++++++++++++++++++++++++++++++++---
 tests/Makefile.am         |   3 +
 tests/test-nbd-extents.sh | 113 ++++++++
 3 files changed, 608 insertions(+), 42 deletions(-)
 create mode 100755 tests/test-nbd-extents.sh

-- 
2.20.1

Eric Blake

2019-Apr-25 18:01 UTC

head link

[Libguestfs] [nbdkit PATCH v2 1/5] nbd: Implement NBD_OPT_GO client request

The NBD spec was recently patched (nbd.git commit 7827f3ae and
friends) to recommend NBD_OPT_GO for baseline interoperability, with
the aim of fewer servers and clients falling back to
NBD_OPT_EXPORT_NAME. And since nbdkit as server recently started
supporting NBD_OPT_GO (commit f7dd9799), our nbd client as plugin
should take advantage of it, which involves learning how to do option
haggling.

The use of option haggling is also a prerequisite to teaching the nbd
plugin about structured replies, which in turn will allow an
implementation of .extents.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 200 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 182 insertions(+), 18 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 71c1376..7f40035 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
@@ -434,6 +434,157 @@ nbd_reply (struct handle *h, int fd)
   return err ? -1 : 0;
 }

+/* Receive response to @option into @reply, and consume any
+   payload. If @payload is non-NULL, caller must free *payload. Return
+   0 on success, or -1 if communication to server is no longer
+   possible. */
+static int
+nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option,
+                                struct fixed_new_option_reply *reply,
+                                void **payload)
+{
+  CLEANUP_FREE char *buffer = NULL;
+
+  if (payload)
+    *payload = NULL;
+  if (read_full (h->fd, reply, sizeof *reply)) {
+    nbdkit_error ("unable to read option reply: %m");
+    return -1;
+  }
+  reply->magic = be64toh (reply->magic);
+  reply->option = be32toh (reply->option);
+  reply->reply = be32toh (reply->reply);
+  reply->replylen = be32toh (reply->replylen);
+  if (reply->magic != NBD_REP_MAGIC || reply->option != option) {
+    nbdkit_error ("unexpected option reply");
+    return -1;
+  }
+  if (reply->replylen) {
+    if (reply->reply == NBD_REP_ACK) {
+      nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32,
+                    reply->replylen);
+      return -1;
+    }
+    if (reply->replylen > 16 * 1024 * 1024) {
+      nbdkit_error ("option reply length is suspiciously large: %"
PRId32,
+                    reply->replylen);
+      return -1;
+    }
+    /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator
+       makes that string easier to use, without hurting other reply
+       types where buffer is not a string */
+    buffer = malloc (reply->replylen + 1);
+    if (!buffer) {
+      nbdkit_error ("malloc: %m");
+      return -1;
+    }
+    if (read_full (h->fd, buffer, reply->replylen)) {
+      nbdkit_error ("unable to read option reply payload: %m");
+      return -1;
+    }
+    buffer[reply->replylen] = '\0';
+    if (!payload)
+      nbdkit_debug ("ignoring option reply payload");
+    else {
+      *payload = buffer;
+      buffer = NULL;
+    }
+  }
+  return 0;
+}
+
+/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO.
+   Return 1 if haggling completed, 0 if haggling failed but
+   NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */
+static int
+nbd_newstyle_haggle (struct handle *h)
+{
+  struct new_option opt;
+  uint32_t exportnamelen = htobe32 (strlen (export));
+  /* For now, we make no NBD_INFO_* requests, relying on the server to
+     send its defaults. TODO: nbdkit should let plugins report block
+     sizes, at which point we should request NBD_INFO_BLOCK_SIZE and
+     obey any sizes set by server. */
+  uint16_t nrinfos = htobe16 (0);
+  struct fixed_new_option_reply reply;
+
+  /* TODO: structured reads, block status */
+
+  /* Try NBD_OPT_GO */
+  nbdkit_debug ("trying NBD_OPT_GO");
+  opt.version = htobe64 (NEW_VERSION);
+  opt.option = htobe32 (NBD_OPT_GO);
+  opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) +
+                        sizeof nrinfos);
+  if (write_full (h->fd, &opt, sizeof opt) ||
+      write_full (h->fd, &exportnamelen, sizeof exportnamelen) ||
+      write_full (h->fd, export, strlen (export)) ||
+      write_full (h->fd, &nrinfos, sizeof nrinfos)) {
+    nbdkit_error ("unable to request NBD_OPT_GO: %m");
+    return -1;
+  }
+  while (1) {
+    CLEANUP_FREE void *buffer;
+    struct fixed_new_option_reply_info_export *reply_export;
+    uint16_t info;
+
+    if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer)
< 0)
+      return -1;
+    switch (reply.reply) {
+    case NBD_REP_INFO:
+      /* Parse payload, but ignore all except NBD_INFO_EXPORT */
+      if (reply.replylen < 2) {
+        nbdkit_error ("NBD_REP_INFO reply too short");
+        return -1;
+      }
+      memcpy (&info, buffer, sizeof info);
+      info = be16toh (info);
+      switch (info) {
+      case NBD_INFO_EXPORT:
+        if (reply.replylen != sizeof *reply_export) {
+          nbdkit_error ("NBD_INFO_EXPORT reply wrong size");
+          return -1;
+        }
+        reply_export = buffer;
+        h->size = be64toh (reply_export->exportsize);
+        h->flags = be16toh (reply_export->eflags);
+        break;
+      default:
+        nbdkit_debug ("ignoring server info %d", info);
+      }
+      break;
+    case NBD_REP_ACK:
+      /* End of replies, valid if server already sent NBD_INFO_EXPORT,
+         observable since h->flags must contain NBD_FLAG_HAS_FLAGS */
+      assert (!buffer);
+      if (!h->flags) {
+        nbdkit_error ("server omitted NBD_INFO_EXPORT reply to
NBD_OPT_GO");
+        return -1;
+      }
+      nbdkit_debug ("NBD_OPT_GO complete");
+      return 1;
+    case NBD_REP_ERR_UNSUP:
+      /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */
+      nbdkit_debug ("server lacks NBD_OPT_GO support");
+      return 0;
+    default:
+      /* Unexpected. Either the server sent a legitimate error or an
+         unexpected reply, but either way, we can't connect. */
+      if (NBD_REP_IS_ERR (reply.reply))
+        if (reply.replylen)
+          nbdkit_error ("server rejected NBD_OPT_GO with %s: %s",
+                        name_of_nbd_rep (reply.reply), (char *) buffer);
+        else
+          nbdkit_error ("server rejected NBD_OPT_GO with %s",
+                        name_of_nbd_rep (reply.reply));
+      else
+        nbdkit_error ("server used unexpected reply %s to
NBD_OPT_GO",
+                      name_of_nbd_rep (reply.reply));
+      return -1;
+    }
+  }
+}
+
 /* Create the per-connection handle. */
 static void *
 nbd_open (int readonly)
@@ -473,6 +624,7 @@ nbd_open (int readonly)
   }
   version = be64toh (old.version);
   if (version == OLD_VERSION) {
+    nbdkit_debug ("trying oldstyle connection");
     if (read_full (h->fd,
                    (char *) &old + offsetof (struct old_handshake,
exportsize),
                    sizeof old - offsetof (struct old_handshake, exportsize))) {
@@ -489,6 +641,7 @@ nbd_open (int readonly)
     struct new_handshake_finish finish;
     size_t expect;

+    nbdkit_debug ("trying newstyle connection");
     if (read_full (h->fd, &gflags, sizeof gflags)) {
       nbdkit_error ("unable to read global flags: %m");
       goto err;
@@ -500,25 +653,36 @@ nbd_open (int readonly)
       goto err;
     }

-    /* For now, we don't do any option haggling, but go straight into
-       transmission phase */
-    opt.version = htobe64 (NEW_VERSION);
-    opt.option = htobe32 (NBD_OPT_EXPORT_NAME);
-    opt.optlen = htobe32 (strlen (export));
-    if (write_full (h->fd, &opt, sizeof opt) ||
-        write_full (h->fd, export, strlen (export))) {
-      nbdkit_error ("unable to request export '%s': %m",
export);
-      goto err;
+    /* Prefer NBD_OPT_GO if possible */
+    if (gflags & NBD_FLAG_FIXED_NEWSTYLE) {
+      int rc = nbd_newstyle_haggle (h);
+      if (rc < 0)
+        goto err;
+      if (!rc)
+        goto export_name;
     }
-    expect = sizeof finish;
-    if (gflags & NBD_FLAG_NO_ZEROES)
-      expect -= sizeof finish.zeroes;
-    if (read_full (h->fd, &finish, expect)) {
-      nbdkit_error ("unable to read new handshake: %m");
-      goto err;
+    else {
+    export_name:
+      /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */
+      nbdkit_debug ("trying NBD_OPT_EXPORT_NAME");
+      opt.version = htobe64 (NEW_VERSION);
+      opt.option = htobe32 (NBD_OPT_EXPORT_NAME);
+      opt.optlen = htobe32 (strlen (export));
+      if (write_full (h->fd, &opt, sizeof opt) ||
+          write_full (h->fd, export, strlen (export))) {
+        nbdkit_error ("unable to request export '%s': %m",
export);
+        goto err;
+      }
+      expect = sizeof finish;
+      if (gflags & NBD_FLAG_NO_ZEROES)
+        expect -= sizeof finish.zeroes;
+      if (read_full (h->fd, &finish, expect)) {
+        nbdkit_error ("unable to read new handshake: %m");
+        goto err;
+      }
+      h->size = be64toh (finish.exportsize);
+      h->flags = be16toh (finish.eflags);
     }
-    h->size = be64toh (finish.exportsize);
-    h->flags = be16toh (finish.eflags);
   }
   else {
     nbdkit_error ("unexpected version %#" PRIx64, version);
-- 
2.20.1

Eric Blake

2019-Apr-25 18:01 UTC

head link

[Libguestfs] [nbdkit PATCH v2 2/5] nbd: Refactor receive loop to prepare for structured replies

Lay the groundwork for handling structured replies, by refactoring the
logic of handling a server reply to support different magic numbers,
and to support deferring an error from an early chunk until the final
structured chunk is received. Also, structured reads report absolute
offsets, be we want to read with a relative offset into the buffer, so
we have to track the original client offset of each transaction.

However, this patch intentionally does not change existing semantic
behavior (other than adding a sane error message when the server
replies with unexpected magic) because it does not actually negotiate
structured replies yet.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 88 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 68 insertions(+), 20 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 7f40035..1d3b2a3 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -128,7 +128,9 @@ struct transaction {
     int fds[2];
   } u;
   void *buf;
+  uint64_t offset;
   uint32_t count;
+  uint32_t err;
   struct transaction *next;
 };

@@ -138,6 +140,7 @@ struct handle {
   int fd;
   int flags;
   int64_t size;
+  bool structured;
   pthread_t reader;

   /* Prevents concurrent threads from interleaving writes to server */
@@ -216,9 +219,10 @@ nbd_mark_dead (struct handle *h)
   return -1;
 }

-/* Find and remove the transaction corresponding to cookie from the list. */
+/* Find and possibly remove the transaction corresponding to cookie
+   from the list. */
 static struct transaction *
-find_trans_by_cookie (struct handle *h, uint64_t cookie)
+find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove)
 {
   struct transaction **ptr;
   struct transaction *trans;
@@ -230,7 +234,7 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie)
       break;
     ptr = &trans->next;
   }
-  if (trans)
+  if (trans && remove)
     *ptr = trans->next;
   return trans;
 }
@@ -287,6 +291,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   }
   trans->buf = rep_buf;
   trans->count = rep_buf ? count : 0;
+  trans->offset = offset;
   {
     ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock);
     if (h->dead)
@@ -298,7 +303,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   }
   if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0)
     return fd;
-  trans = find_trans_by_cookie (h, cookie);
+  trans = find_trans_by_cookie (h, cookie, true);

  err:
   err = errno;
@@ -323,35 +328,77 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t
type, uint64_t offset,

 /* Read a reply, and look up the fd corresponding to the transaction.
    Return the server's non-negative answer (converted to local errno
-   value) on success, or -1 on read failure. */
+   value) on success, or -1 on read failure. If structured replies
+   were negotiated, fd is set to -1 if there are still more replies
+   expected. */
 static int
 nbd_reply_raw (struct handle *h, int *fd)
 {
-  struct simple_reply rep;
+  union {
+    struct simple_reply simple;
+    struct structured_reply structured;
+  } rep;
   struct transaction *trans;
-  void *buf;
+  void *buf = NULL;
   uint32_t count;
+  int error = NBD_SUCCESS;
+  bool more = false;

   *fd = -1;
-  if (read_full (h->fd, &rep, sizeof rep) < 0)
+  /* magic and handle overlap between simple and structured replies */
+  if (read_full (h->fd, &rep, sizeof rep.simple))
     return nbd_mark_dead (h);
-  if (be32toh (rep.magic) != NBD_SIMPLE_REPLY_MAGIC)
+  rep.simple.magic = be32toh (rep.simple.magic);
+  switch (rep.simple.magic) {
+  case NBD_SIMPLE_REPLY_MAGIC:
+    nbdkit_debug ("received simple reply for cookie %#" PRIx64
", status %s",
+                  rep.simple.handle,
+                  name_of_nbd_error(be32toh (rep.simple.error)));
+    error = be32toh (rep.simple.error);
+    break;
+  case NBD_STRUCTURED_REPLY_MAGIC:
+    if (!h->structured) {
+      nbdkit_error ("structured response without negotiation");
+      return nbd_mark_dead (h);
+    }
+    /* TODO - set 'more' based on NBD_REPLY_FLAG_DONE, parse the
+       various reply types, etc. */
+    abort ();
+
+  default:
+    nbdkit_error ("received unexpected magic in reply: %#" PRIx32,
+                  rep.simple.magic);
     return nbd_mark_dead (h);
-  nbdkit_debug ("received reply for cookie %#" PRIx64 ", status
%s",
-                rep.handle, name_of_nbd_error(be32toh (rep.error)));
-  trans = find_trans_by_cookie (h, rep.handle);
+  }
+
+  trans = find_trans_by_cookie (h, rep.simple.handle, !more);
   if (!trans) {
-    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.handle);
+    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.simple.handle);
     return nbd_mark_dead (h);
   }

-  *fd = trans->u.fds[1];
   buf = trans->buf;
   count = trans->count;
-  free (trans);
-  switch (be32toh (rep.error)) {
+  if (buf && h->structured && rep.simple.magic ==
NBD_SIMPLE_REPLY_MAGIC) {
+    nbdkit_error ("simple read reply when structured was expected");
+    return nbd_mark_dead (h);
+  }
+
+  /* Thanks to structured replies, we must preserve an error in any
+     earlier chunk for replay during the final chunk. */
+  if (!more) {
+    *fd = trans->u.fds[1];
+    if (!error)
+      error = trans->err;
+    free (trans);
+  }
+  else if (error && !trans->err)
+    trans->err = error;
+
+  /* Convert from wire value to local errno, and perform any final read */
+  switch (error) {
   case NBD_SUCCESS:
-    if (buf && read_full (h->fd, buf, count) < 0)
+    if (buf && read_full (h->fd, buf, count))
       return nbd_mark_dead (h);
     return 0;
   case NBD_EPERM:
@@ -361,8 +408,7 @@ nbd_reply_raw (struct handle *h, int *fd)
   case NBD_ENOMEM:
     return ENOMEM;
   default:
-    nbdkit_debug ("unexpected error %d, squashing to EINVAL",
-                  be32toh (rep.error));
+    nbdkit_debug ("unexpected error %d, squashing to EINVAL", error);
     /* fallthrough */
   case NBD_EINVAL:
     return EINVAL;
@@ -386,7 +432,9 @@ nbd_reader (void *handle)

     r = nbd_reply_raw (h, &fd);
     if (r >= 0) {
-      if (write (fd, &r, sizeof r) != sizeof r) {
+      if (fd < 0)
+        nbdkit_debug ("partial reply handled, waiting for final
reply");
+      else if (write (fd, &r, sizeof r) != sizeof r) {
         nbdkit_error ("failed to write pipe: %m");
         abort ();
       }
-- 
2.20.1

Eric Blake

2019-Apr-25 18:01 UTC

head link

[Libguestfs] [nbdkit PATCH v2 3/5] nbd: Implement structured reads from server

Time to enhance the nbd plugin to request structured replies from the
server. For now, deal only with structured reads. The server can now
return sparse reads, even though we need to design nbdkit version 3
before we can in turn return sparse reads back to the client.

In general, we have to assume the server is malicious, and so we must
sanity check that it sends replies we expect, and kill the connection
if we detect protocol non-compliance that may affect our ability to
further communicate with the server. However, we have a choice between
either implementing bookkeeping to ensure that the server sends
exactly as many non-overlapping chunks as needed to cover the entire
read request, or else assuming that the server was compliant. Since
nbdkit was just recently patched to ensure that a plugin that does not
make a full reply will merely be exposing data used in a previous read
or write command, we don't bother to pre-wipe the buffer or add any
reconstruction bookkeeping (if our assumption about a compliant server
is wrong, the client may notice confusing results because of the stale
data being replayed at server-omitted offsets, but the data corruption
fault then lies in the non-compliant server).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 140 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 136 insertions(+), 4 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 1d3b2a3..b2b3075 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -340,9 +340,14 @@ nbd_reply_raw (struct handle *h, int *fd)
   } rep;
   struct transaction *trans;
   void *buf = NULL;
+  CLEANUP_FREE char *payload = NULL;
   uint32_t count;
   int error = NBD_SUCCESS;
   bool more = false;
+  uint32_t len = 0; /* 0 except for structured reads */
+  uint64_t offset = 0; /* if len, absolute offset of structured read chunk */
+  bool zero = false; /* if len, whether to read or memset */
+  uint16_t errlen;

   *fd = -1;
   /* magic and handle overlap between simple and structured replies */
@@ -361,9 +366,100 @@ nbd_reply_raw (struct handle *h, int *fd)
       nbdkit_error ("structured response without negotiation");
       return nbd_mark_dead (h);
     }
-    /* TODO - set 'more' based on NBD_REPLY_FLAG_DONE, parse the
-       various reply types, etc. */
-    abort ();
+    if (read_full (h->fd, sizeof rep.simple + (char *) &rep,
+                   sizeof rep - sizeof rep.simple))
+      return nbd_mark_dead (h);
+    rep.structured.flags = be16toh (rep.structured.flags);
+    rep.structured.type = be16toh (rep.structured.type);
+    rep.structured.length = be32toh (rep.structured.length);
+    nbdkit_debug ("received structured reply %s for cookie %#" PRIx64
+                  ", payload length %" PRId32,
+                  name_of_nbd_reply_type(rep.structured.type),
+                  rep.structured.handle, rep.structured.length);
+    if (rep.structured.length > 64 * 1024 * 1024) {
+      nbdkit_error ("structured reply length is suspiciously large:
%" PRId32,
+                    rep.structured.length);
+      return nbd_mark_dead (h);
+    }
+    if (rep.structured.length) {
+      /* Special case for OFFSET_DATA in order to read tail of chunk
+         directly into final buffer later on */
+      len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA &&
+             rep.structured.length > sizeof offset) ? sizeof offset :
+        rep.structured.length;
+      payload = malloc (len);
+      if (!payload) {
+        nbdkit_error ("reading structured reply payload: %m");
+        return nbd_mark_dead (h);
+      }
+      if (read_full (h->fd, payload, len))
+        return nbd_mark_dead (h);
+      len = 0;
+    }
+    more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE);
+    switch (rep.structured.type) {
+    case NBD_REPLY_TYPE_NONE:
+      if (rep.structured.length) {
+        nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload");
+        return nbd_mark_dead (h);
+      }
+      if (more) {
+        nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag");
+        return nbd_mark_dead (h);
+      }
+      break;
+    case NBD_REPLY_TYPE_OFFSET_DATA:
+      if (rep.structured.length <= sizeof offset) {
+        nbdkit_error ("structured reply OFFSET_DATA too small");
+        return nbd_mark_dead (h);
+      }
+      memcpy (&offset, payload, sizeof offset);
+      offset = be64toh (offset);
+      len = rep.structured.length - sizeof offset;
+      break;
+    case NBD_REPLY_TYPE_OFFSET_HOLE:
+      if (rep.structured.length != sizeof offset + sizeof len) {
+        nbdkit_error ("structured reply OFFSET_HOLE size incorrect");
+        return nbd_mark_dead (h);
+      }
+      memcpy (&offset, payload, sizeof offset);
+      offset = be64toh (offset);
+      memcpy (&len, payload, sizeof len);
+      len = be32toh (len);
+      if (!len) {
+        nbdkit_error ("structured reply OFFSET_HOLE length
incorrect");
+        return nbd_mark_dead (h);
+      }
+      zero = true;
+      break;
+    default:
+      if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) {
+        nbdkit_error ("received unexpected structured reply %s",
+                      name_of_nbd_reply_type(rep.structured.type));
+        return nbd_mark_dead (h);
+      }
+
+      if (rep.structured.length < sizeof error + sizeof errlen) {
+        nbdkit_error ("structured reply error size incorrect");
+        return nbd_mark_dead (h);
+      }
+      memcpy (&errlen, payload + sizeof error, sizeof errlen);
+      errlen = be16toh (errlen);
+      if (errlen > rep.structured.length - sizeof error - sizeof errlen) {
+        nbdkit_error ("structured reply error message size
incorrect");
+        return nbd_mark_dead (h);
+      }
+      memcpy (&error, payload, sizeof error);
+      error = be32toh (error);
+      if (errlen)
+        nbdkit_debug ("received structured error %s with message:
%.*s",
+                      name_of_nbd_error (error), (int) errlen,
+                      payload + sizeof error + sizeof errlen);
+      else
+        nbdkit_debug ("received structured error %s without message",
+                      name_of_nbd_error (error));
+    }
+    break;

   default:
     nbdkit_error ("received unexpected magic in reply: %#" PRIx32,
@@ -383,6 +479,24 @@ nbd_reply_raw (struct handle *h, int *fd)
     nbdkit_error ("simple read reply when structured was expected");
     return nbd_mark_dead (h);
   }
+  if (len) {
+    if (!buf) {
+      nbdkit_error ("structured read response to a non-read
command");
+      return nbd_mark_dead (h);
+    }
+    if (offset < trans->offset || offset > INT64_MAX ||
+        offset + len > trans->offset + count) {
+      nbdkit_error ("structured read reply with unexpected
offset/length");
+      return nbd_mark_dead (h);
+    }
+    buf = (char *) buf + offset - trans->offset;
+    if (zero) {
+      memset (buf, 0, len);
+      buf = NULL;
+    }
+    else
+      count = len;
+  }

   /* Thanks to structured replies, we must preserve an error in any
      earlier chunk for replay during the final chunk. */
@@ -556,7 +670,25 @@ nbd_newstyle_haggle (struct handle *h)
   uint16_t nrinfos = htobe16 (0);
   struct fixed_new_option_reply reply;

-  /* TODO: structured reads, block status */
+  nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY");
+  opt.version = htobe64 (NEW_VERSION);
+  opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY);
+  opt.optlen = htobe32 (0);
+  if (write_full (h->fd, &opt, sizeof opt)) {
+    nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m");
+    return -1;
+  }
+  if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply,
+                                      NULL) < 0)
+    return -1;
+  if (reply.reply == NBD_REP_ACK) {
+    nbdkit_debug ("structured replies enabled");
+    h->structured = true;
+    /* TODO: block status */
+  }
+  else {
+    nbdkit_debug ("structured replies disabled");
+  }

   /* Try NBD_OPT_GO */
   nbdkit_debug ("trying NBD_OPT_GO");
-- 
2.20.1

Eric Blake

2019-Apr-25 18:01 UTC

head link

[Libguestfs] [nbdkit PATCH v2 4/5] nbd: Implement .extents

Add support for calling NBD_CMD_BLOCK_STATUS on the remote server when
it is supported. I could have parsed that the server's id response to
NBD_OPT_SET_META_CONTEXT is the same as what later appears in
NBD_REPLY_TYPE_BLOCK_STATUS, but didn't think it worth the effort as
long as we expect exactly one meta context.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 118 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 112 insertions(+), 6 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index b2b3075..fd1855e 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -131,6 +131,7 @@ struct transaction {
   uint64_t offset;
   uint32_t count;
   uint32_t err;
+  struct nbdkit_extents *extents;
   struct transaction *next;
 };

@@ -141,6 +142,7 @@ struct handle {
   int flags;
   int64_t size;
   bool structured;
+  bool extents;
   pthread_t reader;

   /* Prevents concurrent threads from interleaving writes to server */
@@ -270,7 +272,7 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t
type,
 static int
 nbd_request_full (struct handle *h, uint16_t flags, uint16_t type,
                   uint64_t offset, uint32_t count, const void *req_buf,
-                  void *rep_buf)
+                  void *rep_buf, struct nbdkit_extents *extents)
 {
   int err;
   struct transaction *trans;
@@ -292,6 +294,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   trans->buf = rep_buf;
   trans->count = rep_buf ? count : 0;
   trans->offset = offset;
+  trans->extents = extents;
   {
     ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock);
     if (h->dead)
@@ -323,7 +326,7 @@ static int
 nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset,
              uint32_t count)
 {
-  return nbd_request_full (h, flags, type, offset, count, NULL, NULL);
+  return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL);
 }

 /* Read a reply, and look up the fd corresponding to the transaction.
@@ -342,6 +345,9 @@ nbd_reply_raw (struct handle *h, int *fd)
   void *buf = NULL;
   CLEANUP_FREE char *payload = NULL;
   uint32_t count;
+  uint32_t id;
+  struct block_descriptor *extents = NULL;
+  size_t nextents = 0;
   int error = NBD_SUCCESS;
   bool more = false;
   uint32_t len = 0; /* 0 except for structured reads */
@@ -432,6 +438,23 @@ nbd_reply_raw (struct handle *h, int *fd)
       }
       zero = true;
       break;
+    case NBD_REPLY_TYPE_BLOCK_STATUS:
+      if (!h->extents) {
+        nbdkit_error ("block status response without negotiation");
+        return nbd_mark_dead (h);
+      }
+      if (rep.structured.length < sizeof *extents ||
+          rep.structured.length % sizeof *extents != sizeof id) {
+        nbdkit_error ("structured reply OFFSET_HOLE size incorrect");
+        return nbd_mark_dead (h);
+      }
+      nextents = rep.structured.length / sizeof *extents;
+      extents = (struct block_descriptor *) &payload[sizeof id];
+      memcpy (&id, payload, sizeof id);
+      id = be32toh (id);
+      nbdkit_debug ("parsing %zu extents for context id %" PRId32,
+                    nextents, id);
+      break;
     default:
       if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) {
         nbdkit_error ("received unexpected structured reply %s",
@@ -475,6 +498,23 @@ nbd_reply_raw (struct handle *h, int *fd)

   buf = trans->buf;
   count = trans->count;
+  if (nextents) {
+    if (!trans->extents) {
+      nbdkit_error ("block status response to a non-status command");
+      return nbd_mark_dead (h);
+    }
+    offset = trans->offset;
+    for (size_t i = 0; i < nextents; i++) {
+      /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */
+      if (nbdkit_add_extent (trans->extents, offset,
+                             be32toh (extents[i].length),
+                             be32toh (extents[i].status_flags)) == -1) {
+        error = errno;
+        break;
+      }
+      offset += be32toh (extents[i].length);
+    }
+  }
   if (buf && h->structured && rep.simple.magic ==
NBD_SIMPLE_REPLY_MAGIC) {
     nbdkit_error ("simple read reply when structured was expected");
     return nbd_mark_dead (h);
@@ -661,8 +701,11 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t
option,
 static int
 nbd_newstyle_haggle (struct handle *h)
 {
+  const char *const query = "base:allocation";
   struct new_option opt;
   uint32_t exportnamelen = htobe32 (strlen (export));
+  uint32_t nrqueries = htobe32 (1);
+  uint32_t querylen = htobe32 (strlen (query));
   /* For now, we make no NBD_INFO_* requests, relying on the server to
      send its defaults. TODO: nbdkit should let plugins report block
      sizes, at which point we should request NBD_INFO_BLOCK_SIZE and
@@ -682,9 +725,47 @@ nbd_newstyle_haggle (struct handle *h)
                                       NULL) < 0)
     return -1;
   if (reply.reply == NBD_REP_ACK) {
-    nbdkit_debug ("structured replies enabled");
+    nbdkit_debug ("structured replies enabled, trying
NBD_OPT_SET_META_CONTEXT");
     h->structured = true;
-    /* TODO: block status */
+
+    opt.version = htobe64 (NEW_VERSION);
+    opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT);
+    opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) +
+                          sizeof nrqueries + sizeof querylen + strlen (query));
+    if (write_full (h->fd, &opt, sizeof opt) ||
+        write_full (h->fd, &exportnamelen, sizeof exportnamelen) ||
+        write_full (h->fd, export, strlen (export)) ||
+        write_full (h->fd, &nrqueries, sizeof nrqueries) ||
+        write_full (h->fd, &querylen, sizeof querylen) ||
+        write_full (h->fd, query, strlen (query))) {
+      nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT:
%m");
+      return -1;
+    }
+    if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT,
&reply,
+                                        NULL) < 0)
+      return -1;
+    if (reply.reply == NBD_REP_META_CONTEXT) {
+      /* Cheat: we asked for exactly one context. We could double
+         check that the server is replying with exactly the
+         "base:allocation" context, and then remember the id it tells
+         us to later confirm that responses to NBD_CMD_BLOCK_STATUS
+         match up; but in the absence of multiple contexts, it's
+         easier to just assume the server is compliant, and will reuse
+         the same id, without bothering to check further. */
+      nbdkit_debug ("extents enabled");
+      h->extents = true;
+      if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT,
&reply,
+                                          NULL) < 0)
+        return -1;
+    }
+    if (reply.reply != NBD_REP_ACK) {
+      if (h->extents) {
+        nbdkit_error ("unexpected response to set meta context");
+        return -1;
+      }
+      nbdkit_debug ("ignoring meta context response %s",
+                    name_of_nbd_rep (reply.reply));
+    }
   }
   else {
     nbdkit_debug ("structured replies disabled");
@@ -979,6 +1060,14 @@ nbd_can_multi_conn (void *handle)
   return h->flags & NBD_FLAG_CAN_MULTI_CONN;
 }

+static int
+nbd_can_extents (void *handle)
+{
+  struct handle *h = handle;
+
+  return h->extents;
+}
+
 /* Read data from the file. */
 static int
 nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset,
@@ -988,7 +1077,7 @@ nbd_pread (void *handle, void *buf, uint32_t count,
uint64_t offset,
   int c;

   assert (!flags);
-  c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf);
+  c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL);
   return c < 0 ? c : nbd_reply (h, c);
 }

@@ -1002,7 +1091,7 @@ nbd_pwrite (void *handle, const void *buf, uint32_t count,
uint64_t offset,

   assert (!(flags & ~NBDKIT_FLAG_FUA));
   c = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0,
-                        NBD_CMD_WRITE, offset, count, buf, NULL);
+                        NBD_CMD_WRITE, offset, count, buf, NULL, NULL);
   return c < 0 ? c : nbd_reply (h, c);
 }

@@ -1050,6 +1139,21 @@ nbd_flush (void *handle, uint32_t flags)
   return c < 0 ? c : nbd_reply (h, c);
 }

+/* Read extents of the file. */
+static int
+nbd_extents (void *handle, uint32_t count, uint64_t offset,
+             uint32_t flags, struct nbdkit_extents *extents)
+{
+  struct handle *h = handle;
+  int c;
+
+  assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents);
+  c = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ?
NBD_CMD_FLAG_REQ_ONE : 0,
+                        NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL,
+                        extents);
+  return c < 0 ? c : nbd_reply (h, c);
+}
+
 static struct nbdkit_plugin plugin = {
   .name               = "nbd",
   .longname           = "nbdkit nbd plugin",
@@ -1068,11 +1172,13 @@ static struct nbdkit_plugin plugin = {
   .can_zero           = nbd_can_zero,
   .can_fua            = nbd_can_fua,
   .can_multi_conn     = nbd_can_multi_conn,
+  .can_extents        = nbd_can_extents,
   .pread              = nbd_pread,
   .pwrite             = nbd_pwrite,
   .zero               = nbd_zero,
   .flush              = nbd_flush,
   .trim               = nbd_trim,
+  .extents            = nbd_extents,
   .errno_is_preserved = 1,
 };

-- 
2.20.1

Eric Blake

2019-Apr-25 18:01 UTC

head link

[Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents

Borrows heavily from the existing test-truncate-extents.sh (basically,
adding another layer of nbdkit to prove that the nbd plugin doesn't
change anything from what the truncate filter is already tested to
do).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 tests/Makefile.am         |   3 +
 tests/test-nbd-extents.sh | 113 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)
 create mode 100755 tests/test-nbd-extents.sh

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 08bdf85..4b7aa22 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -79,6 +79,7 @@ EXTRA_DIST = \
 	test.lua \
 	test-memory-largest.sh \
 	test-memory-largest-for-qemu.sh \
+	test-nbd-extents.sh \
 	test-nozero.sh \
 	test-null-extents.sh \
 	test_ocaml_plugin.ml \
@@ -530,6 +531,8 @@ TESTS += \

 # nbd plugin test.
 LIBGUESTFS_TESTS += test-nbd
+TESTS += \
+	test-nbd-extents.sh

 test_nbd_SOURCES = test-nbd.c test.h
 test_nbd_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS)
diff --git a/tests/test-nbd-extents.sh b/tests/test-nbd-extents.sh
new file mode 100755
index 0000000..76f4f47
--- /dev/null
+++ b/tests/test-nbd-extents.sh
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Unfortunately the output of this test depends on the PAGE_SIZE
+# defined in common/sparse/sparse.c and would change (breaking the
+# test) if we ever changed that definition.
+
+source ./functions.sh
+set -e
+set -x
+
+requires jq --version
+requires qemu-img --version
+requires qemu-img map --help
+
+out="test-nbd-extents.out"
+expected="test-nbd-extents.expected"
+socket="test-nbd-extents.sock"
+pid1="test-nbd-extents.pid1"
+pid2="test-nbd-extents.pid2"
+pid3="test-nbd-extents.pid3"
+pid4="test-nbd-extents.pid4"
+pid5="test-nbd-extents.pid5"
+files="$out $expected $socket $pid1 $pid2 $pid3 $pid4 $pid5"
+rm -f $files
+cleanup_fn rm -f $files
+
+do_test ()
+{
+    start_nbdkit -P "$4" -U "$socket" \
+           --filter=truncate \
+           data data="$1" size="$2" \
+           truncate="$3"
+    # We use jq to normalize the output and convert it to plain text.
+    nbdkit -U - nbd socket="$socket" \
+           --run 'qemu-img map -f raw --output=json $nbd' |
+        jq -c '.[] | {start:.start, length:.length, data:.data,
zero:.zero}' \
+           > $out
+    rm -f "$socket"
+    if ! cmp $out $expected; then
+        echo "$0: output did not match expected data"
+        echo "expected:"
+        cat $expected
+        echo "output:"
+        cat $out
+        exit 1
+    fi
+}
+
+# Completely sparse disk.
+cat > $expected <<'EOF'
+{"start":0,"length":65536,"data":false,"zero":true}
+EOF
+do_test "" 1M 65536 "$pid1"
+
+# Completely allocated disk.
+cat > $expected <<'EOF'
+{"start":0,"length":32768,"data":true,"zero":false}
+EOF
+do_test "1 @32768 1 @65536 1 @98304 1" 128K 32768 "$pid2"
+
+#----------------------------------------------------------------------
+# The above are the easy cases.  Now let's truncate to a larger
+# size which should create a hole at the end.
+
+# Completely sparse disk.
+cat > $expected <<'EOF'
+{"start":0,"length":1048576,"data":false,"zero":true}
+EOF
+do_test "" 65536 1M "$pid3"
+
+# Completely allocated disk.
+cat > $expected <<'EOF'
+{"start":0,"length":512,"data":true,"zero":false}
+{"start":512,"length":1048064,"data":false,"zero":true}
+EOF
+do_test "1" 512 1M "$pid4"
+
+# Zero-length plugin.  Unlike nbdkit-zero-plugin, the data plugin
+# advertises extents and so will behave differently.
+cat > $expected <<'EOF'
+{"start":0,"length":0,"data":false,"zero":false}
+EOF
+do_test "" 0 0 "$pid5"
-- 
2.20.1

Richard W.M. Jones

2019-Apr-26 16:12 UTC

head link

Re: [Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents

ACK series.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Possibly Parallel Threads

Search for more apparently analagous threads

Libguestfs - Apr 2019 - [nbdkit PATCH v2 5/5] nbd: Test .extents

[Libguestfs] [nbdkit PATCH v2 0/5] structured replies/.extents for nbd plugin

[Libguestfs] [nbdkit PATCH v2 1/5] nbd: Implement NBD_OPT_GO client request

[Libguestfs] [nbdkit PATCH v2 2/5] nbd: Refactor receive loop to prepare for structured replies

[Libguestfs] [nbdkit PATCH v2 3/5] nbd: Implement structured reads from server

[Libguestfs] [nbdkit PATCH v2 4/5] nbd: Implement .extents

[Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents

Re: [Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents

Possibly Parallel Threads