thr3ads.net - Libguestfs - Re: [Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2019-Apr-23 00:50 UTC

[Libguestfs] [nbdkit PATCH 0/7] Implement structured replies in nbd plugin

I'm hoping to implement .extents for the nbd plugin; this is a
prerequisite. I'm not sure about patch 3 - if we like it, I'll
squash it to 2, if we don't, I think we are okay just dropping
it. I'm also wondering if we have to worry about malicious
plugins that don't populate the entire .pread buffer in an
effort to get nbdkit to expose portions of the heap; my patch 7
loses some time on extra memset()s to avoid that, although it's
probably in the noise compared to the time spent talking over
sockets to the remote server.

Eric Blake (7):
  protocol: Send correct structured reply error for non-Linux
  protocol: Support EOVERFLOW
  RFC: protocol: Only send EOVERFLOW when valid
  nbd: Honor server global flags on little-endian
  protocol: Add helpers for error response handling
  nbd: Implement NBD_OPT_GO client request
  nbd: Implement structured replies

 common/protocol/protocol.h |  21 +-
 plugins/nbd/nbd.c          | 434 +++++++++++++++++++++++++++++++++----
 server/protocol.c          |  16 +-
 3 files changed, 422 insertions(+), 49 deletions(-)

-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 1/7] protocol: Send correct structured reply error for non-Linux

Commit f141228d introduced a mapping from errno values to NBD protocol
error numbers, to match the fact that non-Linux systems do not
necessarily share the same errno values. But we failed to honor that
mapping when we added structured replies.

Fixes: eaa4c6e9
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 server/protocol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/server/protocol.c b/server/protocol.c
index 3d5668f..212bed0 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -591,7 +591,7 @@ send_structured_reply_error (struct connection *conn,
   }

   /* Send the error. */
-  error_data.error = htobe32 (error);
+  error_data.error = htobe32 (nbd_errno (error));
   error_data.len = htobe16 (0);
   r = conn->send (conn, &error_data, sizeof error_data);
   if (r == -1) {
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 2/7] protocol: Support EOVERFLOW

When the NBD spec added structured replies, it also added the
EOVERFLOW error as valid for an NBD_CMD_FLAG_DF request larger than
64k if the server can't support that without fragmentation. Once the
nbd plugin supports structured replies, we will need to properly pass
that error through to the end client (even though our current .pread
interface implies we are unlikely to ever generate the error locally).

The protocol says that EOVERFLOW is only valid if the client actually
sent NBD_CMD_FLAG_DF (which in turn requires that the client
negotiated NBD_OPT_STRUCTURED_REPLY), so if we wanted, we could add
some logic to continue to squash all other cases of EOVERFLOW to
EINVAL to remain protcol-compliant; but I figured it was not worth the
complications of ensuring that plugins are not setting that particular
error when it is not appropriate.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 common/protocol/protocol.h | 1 +
 plugins/nbd/nbd.c          | 2 ++
 server/protocol.c          | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h
index 57ce3d8..94dd6c5 100644
--- a/common/protocol/protocol.h
+++ b/common/protocol/protocol.h
@@ -230,6 +230,7 @@ extern const char *name_of_nbd_error (int);
 #define NBD_ENOMEM     12
 #define NBD_EINVAL     22
 #define NBD_ENOSPC     28
+#define NBD_EOVERFLOW  75
 #define NBD_ESHUTDOWN 108

 #endif /* NBDKIT_PROTOCOL_H */
diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index ae15eb7..72e833c 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -385,6 +385,8 @@ nbd_reply_raw (struct handle *h, int *fd)
     return EINVAL;
   case NBD_ENOSPC:
     return ENOSPC;
+  case NBD_EOVERFLOW:
+    return EOVERFLOW;
   case NBD_ESHUTDOWN:
     return ESHUTDOWN;
   }
diff --git a/server/protocol.c b/server/protocol.c
index 212bed0..a52bb56 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -348,6 +348,8 @@ nbd_errno (int error)
   case ESHUTDOWN:
     return NBD_ESHUTDOWN;
 #endif
+  case EOVERFLOW:
+    return NBD_EOVERFLOW;
   case EINVAL:
   default:
     return NBD_EINVAL;
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid

Previously, we were squashing EOVERFLOW into EINVAL; continue to do so
at points in the protocol where the client may not be expecting
EOVERFLOW.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 server/protocol.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/server/protocol.c b/server/protocol.c
index a52bb56..0a9f73c 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -326,7 +326,7 @@ skip_over_write_buffer (int sock, size_t count)

 /* Convert a system errno to an NBD_E* error code. */
 static int
-nbd_errno (int error)
+nbd_errno (int error, bool flag_df)
 {
   switch (error) {
   case 0:
@@ -349,7 +349,9 @@ nbd_errno (int error)
     return NBD_ESHUTDOWN;
 #endif
   case EOVERFLOW:
-    return NBD_EOVERFLOW;
+    if (flag_df)
+      return NBD_EOVERFLOW;
+    /* fallthrough */
   case EINVAL:
   default:
     return NBD_EINVAL;
@@ -368,7 +370,7 @@ send_simple_reply (struct connection *conn,

   reply.magic = htobe32 (NBD_SIMPLE_REPLY_MAGIC);
   reply.handle = handle;
-  reply.error = htobe32 (nbd_errno (error));
+  reply.error = htobe32 (nbd_errno (error, false));

   r = conn->send (conn, &reply, sizeof reply);
   if (r == -1) {
@@ -573,7 +575,8 @@ send_structured_reply_block_status (struct connection *conn,

 static int
 send_structured_reply_error (struct connection *conn,
-                             uint64_t handle, uint16_t cmd, uint32_t error)
+                             uint64_t handle, uint16_t cmd, uint16_t flags,
+                             uint32_t error)
 {
   ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock);
   struct structured_reply reply;
@@ -593,7 +596,7 @@ send_structured_reply_error (struct connection *conn,
   }

   /* Send the error. */
-  error_data.error = htobe32 (nbd_errno (error));
+  error_data.error = htobe32 (nbd_errno (error, flags & NBD_CMD_FLAG_DF));
   error_data.len = htobe16 (0);
   r = conn->send (conn, &error_data, sizeof error_data);
   if (r == -1) {
@@ -737,7 +740,8 @@ protocol_recv_request_send_reply (struct connection *conn)
                                                    extents);
     }
     else
-      return send_structured_reply_error (conn, request.handle, cmd, error);
+      return send_structured_reply_error (conn, request.handle, cmd, flags,
+                                          error);
   }
   else
     return send_simple_reply (conn, request.handle, cmd, buf, count, error);
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 4/7] nbd: Honor server global flags on little-endian

In the nbd plugin, I forgot to swap the server's global flags into
host endian order before checking which flags the server had set. As a
result, on little-endian machines, we were always replying with cflags
of 0 instead of the intended mirroring of the supported server flags,
although fortunately it made no real difference up to now (we were
never sending any option other than NBD_OPT_EXPORT_NAME, so not
mirroring NBD_FLAG_FIXED_NEWSTYLE had no effect; not mirroring
NBD_FLAG_NO_ZEROES merely means a longer handshake; and the NBD
protocol does not yet define flag 0x100 or 0x200 to confuse
us). However, once we add code for NBD_OPT_GO, it becomes essential to
get it right.

Fixes: 0e8e8eb1
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 72e833c..14e6806 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -512,6 +512,7 @@ nbd_open (int readonly)
       nbdkit_error ("unable to read global flags: %m");
       goto err;
     }
+    gflags = be16toh (gflags);
     cflags = htobe32(gflags & (NBD_FLAG_FIXED_NEWSTYLE |
NBD_FLAG_NO_ZEROES));
     if (write_full (h->fd, &cflags, sizeof cflags)) {
       nbdkit_error ("unable to return global flags: %m");
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 5/7] protocol: Add helpers for error response handling

The NBD protocol specifically sets a witness bit in NBD_REP_ERR_* and
NBD_REPLY_TYPE_ERROR_* to make it possible to recognize even unknown
error messages at least to the point of being able to parse out the
server's optional message as a debugging aid.  Make it easier to
declare new errors, as well as adding *_IS_ERR macros that the nbd
plugin will use in later patches for handling arbitrary server errors.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 common/protocol/protocol.h | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h
index 94dd6c5..c27104c 100644
--- a/common/protocol/protocol.h
+++ b/common/protocol/protocol.h
@@ -108,16 +108,19 @@ extern const char *name_of_nbd_opt (int);
 #define NBD_OPT_LIST_META_CONTEXT  9
 #define NBD_OPT_SET_META_CONTEXT   10

+#define NBD_REP_ERR(val) (0x80000000 | (val))
+#define NBD_REP_IS_ERR(val) (!!((val) & 0x80000000))
+
 extern const char *name_of_nbd_rep (int);
 #define NBD_REP_ACK          1
 #define NBD_REP_SERVER       2
 #define NBD_REP_INFO         3
 #define NBD_REP_META_CONTEXT 4
-#define NBD_REP_ERR_UNSUP    0x80000001
-#define NBD_REP_ERR_POLICY   0x80000002
-#define NBD_REP_ERR_INVALID  0x80000003
-#define NBD_REP_ERR_PLATFORM 0x80000004
-#define NBD_REP_ERR_TLS_REQD 0x80000005
+#define NBD_REP_ERR_UNSUP    NBD_REP_ERR (1)
+#define NBD_REP_ERR_POLICY   NBD_REP_ERR (2)
+#define NBD_REP_ERR_INVALID  NBD_REP_ERR (3)
+#define NBD_REP_ERR_PLATFORM NBD_REP_ERR (4)
+#define NBD_REP_ERR_TLS_REQD NBD_REP_ERR (5)

 extern const char *name_of_nbd_info (int);
 #define NBD_INFO_EXPORT      0
@@ -195,14 +198,17 @@ struct structured_reply_error {
 extern const char *name_of_nbd_reply_flag (int);
 #define NBD_REPLY_FLAG_DONE         (1<<0)

+#define NBD_REPLY_TYPE_ERR(val) ((1<<15) | (val))
+#define NBD_REPLY_TYPE_IS_ERR(val) (!!((val) & (1<<15)))
+
 /* Structured reply types. */
 extern const char *name_of_nbd_reply_type (int);
 #define NBD_REPLY_TYPE_NONE         0
 #define NBD_REPLY_TYPE_OFFSET_DATA  1
 #define NBD_REPLY_TYPE_OFFSET_HOLE  2
 #define NBD_REPLY_TYPE_BLOCK_STATUS 5
-#define NBD_REPLY_TYPE_ERROR        ((1<<15) + 1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET ((1<<15) + 2)
+#define NBD_REPLY_TYPE_ERROR        NBD_REPLY_TYPE_ERR (1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_TYPE_ERR (2)

 /* NBD commands. */
 extern const char *name_of_nbd_cmd (int);
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request

The NBD spec was recently patched (nbd.git commit 7827f3ae and friends)
to require NBD_OPT_GO for baseline interoperability, with the aim of
fewer servers and clients falling back to NBD_OPT_EXPORT_NAME. And
since nbdkit as server recently started supporting NBD_OPT_GO (commit
f7dd9799), our nbd client as plugin should take advantage of it.

This patch is a prerequisite to teaching the nbd plugin about
structured replies, which in turn will allow an implementation of
.extents.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 201 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 183 insertions(+), 18 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 14e6806..313bf59 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
@@ -453,6 +453,158 @@ nbd_reply (struct handle *h, int fd)
   return err ? -1 : 0;
 }

+static int
+nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option,
+                                struct fixed_new_option_reply *reply,
+                                char **payload)
+{
+  char *buffer;
+
+  if (payload)
+    *payload = NULL;
+  if (read_full (h->fd, reply, sizeof *reply)) {
+    nbdkit_error ("unable to read option reply: %m");
+    return -1;
+  }
+  reply->magic = be64toh (reply->magic);
+  reply->option = be32toh (reply->option);
+  reply->reply = be32toh (reply->reply);
+  reply->replylen = be32toh (reply->replylen);
+  if (reply->magic != NBD_REP_MAGIC || reply->option != option) {
+    nbdkit_error ("unexpected option reply");
+    return -1;
+  }
+  if (reply->replylen) {
+    if (reply->reply == NBD_REP_ACK) {
+      nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32,
+                    reply->replylen);
+      return -1;
+    }
+    if (reply->replylen > 16 * 1024 * 1024) {
+      nbdkit_error ("option reply length is suspiciously large: %"
PRId32,
+                    reply->replylen);
+      return -1;
+    }
+    /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator
+       makes that string easier to use, without hurting other reply
+       types where buffer is not a string */
+    buffer = malloc (reply->replylen + 1);
+    if (!buffer) {
+      nbdkit_error ("malloc: %m");
+      return -1;
+    }
+    if (read_full (h->fd, buffer, reply->replylen)) {
+      nbdkit_error ("unable to read option reply payload: %m");
+      free (buffer);
+      return -1;
+    }
+    buffer[reply->replylen] = '\0';
+    if (!payload) {
+      nbdkit_debug ("ignoring option reply payload");
+      free (buffer);
+    }
+    else
+      *payload = buffer;
+  }
+  return 0;
+}
+
+/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO.
+   Return 1 if haggling completed, 0 if haggling failed but
+   NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */
+static int
+nbd_newstyle_haggle (struct handle *h)
+{
+  struct new_option opt;
+  uint32_t exportnamelen = htobe32 (strlen (export));
+  /* For now, we make no NBD_INFO_* requests, relying on the server to
+     send its defaults. TODO: nbdkit should let plugins report block
+     sizes, at which point we should request NBD_INFO_BLOCK_SIZE and
+     obey any sizes set by server. */
+  uint16_t nrinfos = htobe16 (0);
+  struct fixed_new_option_reply reply;
+  struct fixed_new_option_reply_info_export reply_export;
+  char *buffer;
+  uint16_t info;
+
+  /* TODO: structured reads, block status */
+
+  /* Try NBD_OPT_GO */
+  nbdkit_debug ("trying NBD_OPT_GO");
+  opt.version = htobe64 (NEW_VERSION);
+  opt.option = htobe32 (NBD_OPT_GO);
+  opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) +
+                        sizeof nrinfos);
+  if (write_full (h->fd, &opt, sizeof opt) ||
+      write_full (h->fd, &exportnamelen, sizeof exportnamelen) ||
+      write_full (h->fd, export, strlen (export)) ||
+      write_full (h->fd, &nrinfos, sizeof nrinfos)) {
+    nbdkit_error ("unable to request NBD_OPT_GO: %m");
+    return -1;
+  }
+  while (1) {
+    if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer)
< 0)
+      return -1;
+    switch (reply.reply) {
+    case NBD_REP_INFO:
+      /* Parse payload, but ignore all except NBD_INFO_EXPORT */
+      if (reply.replylen < 2) {
+        nbdkit_error ("NBD_REP_INFO reply too short");
+        free (buffer);
+        return -1;
+      }
+      memcpy (&info, buffer, sizeof info);
+      info = be16toh (info);
+      switch (info) {
+      case NBD_INFO_EXPORT:
+        if (reply.replylen != sizeof reply_export) {
+          nbdkit_error ("NBD_INFO_EXPORT reply wrong size");
+          free (buffer);
+          return -1;
+        }
+        memcpy (&reply_export, buffer, sizeof reply_export);
+        h->size = be64toh (reply_export.exportsize);
+        h->flags = be16toh (reply_export.eflags);
+        break;
+      default:
+        nbdkit_debug ("ignoring server info %d", info);
+      }
+      free (buffer);
+      break;
+    case NBD_REP_ACK:
+      /* End of replies, valid if server already sent NBD_INFO_EXPORT,
+         which is observable since h->flags will be set */
+      assert (!buffer);
+      if (!h->flags) {
+        nbdkit_error ("server omitted NBD_INFO_EXPORT reply to
NBD_OPT_GO");
+        return -1;
+      }
+      nbdkit_debug ("NBD_OPT_GO complete");
+      return 1;
+    case NBD_REP_ERR_UNSUP:
+      /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */
+      nbdkit_debug ("server lacks NBD_OPT_GO support");
+      free (buffer);
+      return 0;
+    default:
+      /* Unexpected. Either the server sent a legitimate error or an
+         unexpected reply, but either way, we can't connect. */
+      if (NBD_REP_IS_ERR (reply.reply))
+        if (reply.replylen)
+          nbdkit_error ("server rejected NBD_OPT_GO with %s: %s",
+                        name_of_nbd_rep (reply.reply), buffer);
+        else
+          nbdkit_error ("server rejected NBD_OPT_GO with %s",
+                        name_of_nbd_rep (reply.reply));
+      else
+        nbdkit_error ("server used unexpected reply %s to
NBD_OPT_GO",
+                      name_of_nbd_rep (reply.reply));
+      free (buffer);
+      return -1;
+    }
+  }
+}
+
 /* Create the per-connection handle. */
 static void *
 nbd_open (int readonly)
@@ -492,6 +644,7 @@ nbd_open (int readonly)
   }
   version = be64toh (old.version);
   if (version == OLD_VERSION) {
+    nbdkit_debug ("trying oldstyle connection");
     if (read_full (h->fd,
                    (char *) &old + offsetof (struct old_handshake,
exportsize),
                    sizeof old - offsetof (struct old_handshake, exportsize))) {
@@ -508,6 +661,7 @@ nbd_open (int readonly)
     struct new_handshake_finish finish;
     size_t expect;

+    nbdkit_debug ("trying newstyle connection");
     if (read_full (h->fd, &gflags, sizeof gflags)) {
       nbdkit_error ("unable to read global flags: %m");
       goto err;
@@ -519,25 +673,36 @@ nbd_open (int readonly)
       goto err;
     }

-    /* For now, we don't do any option haggling, but go straight into
-       transmission phase */
-    opt.version = htobe64 (NEW_VERSION);
-    opt.option = htobe32 (NBD_OPT_EXPORT_NAME);
-    opt.optlen = htobe32 (strlen (export));
-    if (write_full (h->fd, &opt, sizeof opt) ||
-        write_full (h->fd, export, strlen (export))) {
-      nbdkit_error ("unable to request export '%s': %m",
export);
-      goto err;
+    /* Prefer NBD_OPT_GO if possible */
+    if (gflags & NBD_FLAG_FIXED_NEWSTYLE) {
+      int rc = nbd_newstyle_haggle (h);
+      if (rc < 0)
+        goto err;
+      if (!rc)
+        goto export_name;
     }
-    expect = sizeof finish;
-    if (gflags & NBD_FLAG_NO_ZEROES)
-      expect -= sizeof finish.zeroes;
-    if (read_full (h->fd, &finish, expect)) {
-      nbdkit_error ("unable to read new handshake: %m");
-      goto err;
+    else {
+    export_name:
+      /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */
+      nbdkit_debug ("trying NBD_OPT_EXPORT_NAME");
+      opt.version = htobe64 (NEW_VERSION);
+      opt.option = htobe32 (NBD_OPT_EXPORT_NAME);
+      opt.optlen = htobe32 (strlen (export));
+      if (write_full (h->fd, &opt, sizeof opt) ||
+          write_full (h->fd, export, strlen (export))) {
+        nbdkit_error ("unable to request export '%s': %m",
export);
+        goto err;
+      }
+      expect = sizeof finish;
+      if (gflags & NBD_FLAG_NO_ZEROES)
+        expect -= sizeof finish.zeroes;
+      if (read_full (h->fd, &finish, expect)) {
+        nbdkit_error ("unable to read new handshake: %m");
+        goto err;
+      }
+      h->size = be64toh (finish.exportsize);
+      h->flags = be16toh (finish.eflags);
     }
-    h->size = be64toh (finish.exportsize);
-    h->flags = be16toh (finish.eflags);
   }
   else {
     nbdkit_error ("unexpected version %#" PRIx64, version);
-- 
2.20.1

Eric Blake

2019-Apr-23 00:50 UTC

head link

[Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

Time to enhance the nbd plugin to request structured replies from the
server. For now, deal only with structured reads. The server can now
return sparse reads, even though we need nbdkit version 3 before we
can in turn return sparse reads back to the client.

In general, we have to assume the server is malicious, and so we must
sanity check that it sends replies we expect. Thus, we have a choice
between either implementing bookkeeping to ensure that the server
sends exactly as many non-overlapping chunks as needed to cover the
entire read request, or else ensuring that even when the server
cheats, we do not leak uninitialized memory back to our client. I
chose for simplicity, with the result that I end up calling memset()
more frequently than necessary for a compliant server, rather than
worrying about bookkeeping to detect a non-compliant server that is
attempting to cause an information leak.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 232 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 212 insertions(+), 20 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 313bf59..f729051 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -127,7 +127,9 @@ struct transaction {
     int fds[2];
   } u;
   void *buf;
+  uint64_t offset;
   uint32_t count;
+  uint32_t err;
   struct transaction *next;
 };

@@ -137,6 +139,7 @@ struct handle {
   int fd;
   int flags;
   int64_t size;
+  bool structured;
   pthread_t reader;

   /* Prevents concurrent threads from interleaving writes to server */
@@ -230,9 +233,10 @@ nbd_mark_dead (struct handle *h)
   return -1;
 }

-/* Find and remove the transaction corresponding to cookie from the list. */
+/* Find and possibly remove the transaction corresponding to cookie
+   from the list. */
 static struct transaction *
-find_trans_by_cookie (struct handle *h, uint64_t cookie)
+find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove)
 {
   struct transaction **ptr;
   struct transaction *trans;
@@ -244,7 +248,7 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie)
       break;
     ptr = &trans->next;
   }
-  if (trans)
+  if (trans && remove)
     *ptr = trans->next;
   nbd_unlock (h);
   return trans;
@@ -303,6 +307,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   }
   trans->buf = rep_buf;
   trans->count = rep_buf ? count : 0;
+  trans->offset = offset;
   nbd_lock (h);
   if (h->dead) {
     nbd_unlock (h);
@@ -315,7 +320,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   nbd_unlock (h);
   if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0)
     return fd;
-  trans = find_trans_by_cookie (h, cookie);
+  trans = find_trans_by_cookie (h, cookie, true);

  err:
   err = errno;
@@ -340,35 +345,196 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t
type, uint64_t offset,

 /* Read a reply, and look up the fd corresponding to the transaction.
    Return the server's non-negative answer (converted to local errno
-   value) on success, or -1 on read failure. */
+   value) on success, or -1 on read failure. If structured replies
+   were negotiated, fd is set to -1 if there are still more replies
+   expected. */
 static int
 nbd_reply_raw (struct handle *h, int *fd)
 {
-  struct simple_reply rep;
+  union {
+    struct simple_reply simple;
+    struct structured_reply structured;
+  } rep;
   struct transaction *trans;
-  void *buf;
+  void *buf = NULL;
   uint32_t count;
+  int error = NBD_SUCCESS;
+  bool more = false;
+  uint64_t offset = 0; /* absolute offset of structured read chunk from buf */
+  uint32_t len = 0; /* 0 except for structured reads */
+  bool zero = false; /* Whether to read or memset on structured read */

   *fd = -1;
-  if (read_full (h->fd, &rep, sizeof rep) < 0)
+  /* magic and handle overlap between simple and structured replies */
+  if (read_full (h->fd, &rep, sizeof rep.simple))
     return nbd_mark_dead (h);
-  if (be32toh (rep.magic) != NBD_SIMPLE_REPLY_MAGIC)
+  switch (be32toh (rep.simple.magic)) {
+  case NBD_SIMPLE_REPLY_MAGIC:
+    nbdkit_debug ("received simple reply for cookie %#" PRIx64
", status %s",
+                  rep.simple.handle,
+                  name_of_nbd_error(be32toh (rep.simple.error)));
+    error = be32toh (rep.simple.error);
+    break;
+  case NBD_STRUCTURED_REPLY_MAGIC:
+    if (!h->structured) {
+      nbdkit_error ("structured response without negotiation");
+      return nbd_mark_dead (h);
+    }
+    if (read_full (h->fd, sizeof rep.simple + (char *) &rep,
+                   sizeof rep - sizeof rep.simple))
+      return nbd_mark_dead (h);
+    rep.structured.flags = be16toh (rep.structured.flags);
+    rep.structured.type = be16toh (rep.structured.type);
+    rep.structured.length = be32toh (rep.structured.length);
+    nbdkit_debug ("received structured reply %s for cookie %#" PRIx64
+                  ", payload length %" PRId32,
+                  name_of_nbd_reply_type(rep.structured.type),
+                  rep.structured.handle, rep.structured.length);
+    if (rep.structured.length > 64 * 1024 * 1024) {
+      nbdkit_error ("structured reply length is suspiciously large:
%" PRId32,
+                    rep.structured.length);
+      return nbd_mark_dead (h);
+    }
+    if (rep.structured.length) {
+      /* Special case for OFFSET_DATA in order to read tail of chunk
+         directly into final buffer later on */
+      len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA &&
+             rep.structured.length > sizeof offset) ? sizeof offset :
+        rep.structured.length;
+      buf = malloc (len);
+      if (!buf) {
+        nbdkit_error ("reading structured reply payload: %m");
+        return nbd_mark_dead (h);
+      }
+      if (read_full (h->fd, buf, len)) {
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      len = 0;
+    }
+    more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE);
+    switch (rep.structured.type) {
+    case NBD_REPLY_TYPE_NONE:
+      if (rep.structured.length) {
+        nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      if (more) {
+        nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag");
+        return nbd_mark_dead (h);
+      }
+      break;
+    case NBD_REPLY_TYPE_OFFSET_DATA:
+      if (rep.structured.length <= sizeof offset) {
+        nbdkit_error ("structured reply OFFSET_DATA too small");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      memcpy (&offset, buf, sizeof offset);
+      offset = be64toh (offset);
+      len = rep.structured.length - sizeof offset;
+      break;
+    case NBD_REPLY_TYPE_OFFSET_HOLE:
+      if (rep.structured.length != sizeof offset + sizeof len) {
+        nbdkit_error ("structured reply OFFSET_HOLE size incorrect");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      memcpy (&offset, buf, sizeof offset);
+      offset = be64toh (offset);
+      memcpy (&len, buf, sizeof len);
+      len = be32toh (len);
+      if (!len) {
+        nbdkit_error ("structured reply OFFSET_HOLE length
incorrect");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      zero = true;
+      break;
+    default:
+      if (NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) {
+        uint16_t errlen;
+
+        if (rep.structured.length < sizeof error + sizeof errlen) {
+          nbdkit_error ("structured reply error size incorrect");
+          free (buf);
+          return nbd_mark_dead (h);
+        }
+        memcpy (&errlen, (char *) buf + sizeof error, sizeof errlen);
+        errlen = be16toh (errlen);
+        if (errlen > rep.structured.length - sizeof error - sizeof errlen) {
+          nbdkit_error ("structured reply error message size
incorrect");
+          free (buf);
+          return nbd_mark_dead (h);
+        }
+        memcpy (&error, buf, sizeof error);
+        error = be32toh (error);
+        if (errlen)
+          nbdkit_debug ("received structured error %s with message:
%.*s",
+                        name_of_nbd_error (error), errlen,
+                        (char *) buf + sizeof error + sizeof errlen);
+        else
+          nbdkit_debug ("received structured error %s without
message",
+                        name_of_nbd_error (error));
+        free (buf);
+      }
+      else {
+        nbdkit_error ("received unexpected structured reply %s",
+                      name_of_nbd_reply_type(rep.structured.type));
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+    }
+    break;
+  default:
+    nbdkit_error ("received unexpected magic in reply: %#" PRIx32,
+                  rep.simple.magic);
     return nbd_mark_dead (h);
-  nbdkit_debug ("received reply for cookie %#" PRIx64 ", status
%s",
-                rep.handle, name_of_nbd_error(be32toh (rep.error)));
-  trans = find_trans_by_cookie (h, rep.handle);
+  }
+
+  trans = find_trans_by_cookie (h, rep.simple.handle, !more);
   if (!trans) {
-    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.handle);
+    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.simple.handle);
     return nbd_mark_dead (h);
   }

-  *fd = trans->u.fds[1];
+  if (!more)
+    *fd = trans->u.fds[1];
+  else if (error && !trans->err)
+    trans->err = error;
   buf = trans->buf;
   count = trans->count;
+  if (buf && h->structured &&
+      be32toh (rep.simple.magic) == NBD_SIMPLE_REPLY_MAGIC) {
+    nbdkit_error ("simple read reply when structured was expected");
+    return nbd_mark_dead (h);
+  }
+  if (len) {
+    if (!buf) {
+      nbdkit_error ("structured read response to a non-read
command");
+      return nbd_mark_dead (h);
+    }
+    if (offset < trans->offset || offset > INT64_MAX ||
+        offset + len > trans->offset + count) {
+      nbdkit_error ("structured read reply with unexpected
offset/length");
+      return nbd_mark_dead (h);
+    }
+    buf = (char *) buf + offset - trans->offset;
+    if (zero) {
+      /* We do not check whether the server mistakenly sends
+         non-compliant overlapping chunks, thus we must re-zero holes
+         rather than relying on our pre-initialization to zero */
+      memset (buf, 0, len);
+      buf = NULL;
+    }
+    else
+      count = len;
+  }
   free (trans);
-  switch (be32toh (rep.error)) {
+  switch (error) {
   case NBD_SUCCESS:
-    if (buf && read_full (h->fd, buf, count) < 0)
+    if (buf && read_full (h->fd, buf, count))
       return nbd_mark_dead (h);
     return 0;
   case NBD_EPERM:
@@ -378,8 +544,7 @@ nbd_reply_raw (struct handle *h, int *fd)
   case NBD_ENOMEM:
     return ENOMEM;
   default:
-    nbdkit_debug ("unexpected error %d, squashing to EINVAL",
-                  be32toh (rep.error));
+    nbdkit_debug ("unexpected error %d, squashing to EINVAL", error);
     /* fallthrough */
   case NBD_EINVAL:
     return EINVAL;
@@ -405,7 +570,9 @@ nbd_reader (void *handle)

     r = nbd_reply_raw (h, &fd);
     if (r >= 0) {
-      if (write (fd, &r, sizeof r) != sizeof r) {
+      if (fd < 0)
+        nbdkit_debug ("partial reply handled, waiting for final
reply");
+      else if (write (fd, &r, sizeof r) != sizeof r) {
         nbdkit_error ("failed to write pipe: %m");
         abort ();
       }
@@ -527,7 +694,25 @@ nbd_newstyle_haggle (struct handle *h)
   char *buffer;
   uint16_t info;

-  /* TODO: structured reads, block status */
+  nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY");
+  opt.version = htobe64 (NEW_VERSION);
+  opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY);
+  opt.optlen = htobe32 (0);
+  if (write_full (h->fd, &opt, sizeof opt)) {
+    nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m");
+    return -1;
+  }
+  if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply,
+                                      NULL) < 0)
+    return -1;
+  if (reply.reply == NBD_REP_ACK) {
+    nbdkit_debug ("structured replies enabled");
+    h->structured = true;
+    /* TODO: block status */
+  }
+  else {
+    nbdkit_debug ("structured replies disabled");
+  }

   /* Try NBD_OPT_GO */
   nbdkit_debug ("trying NBD_OPT_GO");
@@ -828,6 +1013,13 @@ nbd_pread (void *handle, void *buf, uint32_t count,
uint64_t offset,
   int c;

   assert (!flags);
+  /* We do not check whether structured read chunks cover the entire
+     buffer; as such, to avoid leaking uninit memory back to the
+     client, we must pre-zero the buffer. TODO: should nbdkit
+     guarantee a clean buffer to .pread, perhaps based on a flag set
+     when we register our plugin? */
+  if (h->structured)
+    memset (buf, 0, count);
   c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf);
   return c < 0 ? c : nbd_reply (h, c);
 }
-- 
2.20.1

Eric Blake

2019-Apr-23 02:17 UTC

head link

Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

On 4/22/19 7:50 PM, Eric Blake wrote:> Time to enhance the nbd plugin to request structured replies from the
> server. For now, deal only with structured reads. The server can now
> return sparse reads, even though we need nbdkit version 3 before we
> can in turn return sparse reads back to the client.
> 
> In general, we have to assume the server is malicious, and so we must
> sanity check that it sends replies we expect. Thus, we have a choice
> between either implementing bookkeeping to ensure that the server
> sends exactly as many non-overlapping chunks as needed to cover the
> entire read request, or else ensuring that even when the server
> cheats, we do not leak uninitialized memory back to our client. I
> chose for simplicity, with the result that I end up calling memset()
> more frequently than necessary for a compliant server, rather than
> worrying about bookkeeping to detect a non-compliant server that is
> attempting to cause an information leak.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  plugins/nbd/nbd.c | 232 ++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 212 insertions(+), 20 deletions(-)
> 
> +  case NBD_STRUCTURED_REPLY_MAGIC:
> +    if (!h->structured) {
> +      nbdkit_error ("structured response without negotiation");
> +      return nbd_mark_dead (h);
> +    }
> +    if (read_full (h->fd, sizeof rep.simple + (char *) &rep,
> +                   sizeof rep - sizeof rep.simple))
> +      return nbd_mark_dead (h);
> +    rep.structured.flags = be16toh (rep.structured.flags);
> +    rep.structured.type = be16toh (rep.structured.type);
> +    rep.structured.length = be32toh (rep.structured.length);
> +    nbdkit_debug ("received structured reply %s for cookie %#"
PRIx64
> +                  ", payload length %" PRId32,
> +                  name_of_nbd_reply_type(rep.structured.type),
> +                  rep.structured.handle, rep.structured.length);
> +    if (rep.structured.length > 64 * 1024 * 1024) {
> +      nbdkit_error ("structured reply length is suspiciously large:
%" PRId32,
> +                    rep.structured.length);
> +      return nbd_mark_dead (h);
> +    }
> +    if (rep.structured.length) {
> +      /* Special case for OFFSET_DATA in order to read tail of chunk
> +         directly into final buffer later on */
> +      len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA &&
> +             rep.structured.length > sizeof offset) ? sizeof offset :
> +        rep.structured.length;
> +      buf = malloc (len);...
> +    case NBD_REPLY_TYPE_OFFSET_DATA:
> +      if (rep.structured.length <= sizeof offset) {
> +        nbdkit_error ("structured reply OFFSET_DATA too small");
> +        free (buf);
> +        return nbd_mark_dead (h);
> +      }
> +      memcpy (&offset, buf, sizeof offset);
> +      offset = be64toh (offset);
> +      len = rep.structured.length - sizeof offset;
> +      break;
leaks buf
> +    case NBD_REPLY_TYPE_OFFSET_HOLE:
> +      if (rep.structured.length != sizeof offset + sizeof len) {
> +        nbdkit_error ("structured reply OFFSET_HOLE size
incorrect");
> +        free (buf);
> +        return nbd_mark_dead (h);
> +      }
> +      memcpy (&offset, buf, sizeof offset);
> +      offset = be64toh (offset);
> +      memcpy (&len, buf, sizeof len);
> +      len = be32toh (len);
> +      if (!len) {
> +        nbdkit_error ("structured reply OFFSET_HOLE length
incorrect");
> +        free (buf);
> +        return nbd_mark_dead (h);
> +      }
> +      zero = true;
> +      break;
likewise

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2019-Apr-23 07:36 UTC

head link

Re: [Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid

On Mon, Apr 22, 2019 at 07:50:22PM -0500, Eric Blake
wrote:> Previously, we were squashing EOVERFLOW into EINVAL; continue to do so
> at points in the protocol where the client may not be expecting
> EOVERFLOW.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  server/protocol.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/server/protocol.c b/server/protocol.c
> index a52bb56..0a9f73c 100644
> --- a/server/protocol.c
> +++ b/server/protocol.c
> @@ -326,7 +326,7 @@ skip_over_write_buffer (int sock, size_t count)
> 
>  /* Convert a system errno to an NBD_E* error code. */
>  static int
> -nbd_errno (int error)
> +nbd_errno (int error, bool flag_df)
>  {
>    switch (error) {
>    case 0:
> @@ -349,7 +349,9 @@ nbd_errno (int error)
>      return NBD_ESHUTDOWN;
>  #endif
>    case EOVERFLOW:
> -    return NBD_EOVERFLOW;
> +    if (flag_df)
> +      return NBD_EOVERFLOW;
> +    /* fallthrough */
>    case EINVAL:
>    default:
>      return NBD_EINVAL;
> @@ -368,7 +370,7 @@ send_simple_reply (struct connection *conn,
> 
>    reply.magic = htobe32 (NBD_SIMPLE_REPLY_MAGIC);
>    reply.handle = handle;
> -  reply.error = htobe32 (nbd_errno (error));
> +  reply.error = htobe32 (nbd_errno (error, false));
> 
>    r = conn->send (conn, &reply, sizeof reply);
>    if (r == -1) {
> @@ -573,7 +575,8 @@ send_structured_reply_block_status (struct connection
*conn,
> 
>  static int
>  send_structured_reply_error (struct connection *conn,
> -                             uint64_t handle, uint16_t cmd, uint32_t
error)
> +                             uint64_t handle, uint16_t cmd, uint16_t
flags,
> +                             uint32_t error)
>  {
>    ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock);
>    struct structured_reply reply;
> @@ -593,7 +596,7 @@ send_structured_reply_error (struct connection *conn,
>    }
> 
>    /* Send the error. */
> -  error_data.error = htobe32 (nbd_errno (error));
> +  error_data.error = htobe32 (nbd_errno (error, flags &
NBD_CMD_FLAG_DF));
>    error_data.len = htobe16 (0);
>    r = conn->send (conn, &error_data, sizeof error_data);
>    if (r == -1) {
> @@ -737,7 +740,8 @@ protocol_recv_request_send_reply (struct connection
*conn)
>                                                     extents);
>      }
>      else
> -      return send_structured_reply_error (conn, request.handle, cmd,
error);
> +      return send_structured_reply_error (conn, request.handle, cmd,
flags,
> +                                          error);
>    }
>    else
>      return send_simple_reply (conn, request.handle, cmd, buf, count,
error);
> -- 
> 2.20.1
The protocol spec is unclear on whether EOVERFLOW can be returned in
cases other than the DF flag being set.  Whether we include this patch
or not seems to hinge on the interpretation of the protocol spec which
I'm not really in a position to make.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Richard W.M. Jones

2019-Apr-23 07:42 UTC

head link

Re: [Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request

So patches 1-6 are fine.  I'm not in a position to say whether patch 3
is required or not, but I have no objection to it either way.

Patch 7 seems to have a leak, but apart from that is also fine.

About memset-ing the buffer before passing it to plugin .pread
methods, I'll send you a patch for that in a minute.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Eric Blake

2019-Apr-23 14:41 UTC

head link

[Libguestfs] [nbdkit PATCH 8/7] nbd: Implement .extents

Add support for calling NBD_CMD_BLOCK_STATUS on the remote server when
it is supported. I could have parsed that the server's id response to
NBD_OPT_SET_META_CONTEXT is the same as what later appears in
NBD_REPLY_TYPE_BLOCK_STATUS, but didn't think it worth the effort as
long as we expect exactly one meta context.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 plugins/nbd/nbd.c | 135 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 129 insertions(+), 6 deletions(-)

diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 8eb8a31..9986d6c 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -130,6 +130,7 @@ struct transaction {
   uint64_t offset;
   uint32_t count;
   uint32_t err;
+  struct nbdkit_extents *extents;
   struct transaction *next;
 };

@@ -140,6 +141,7 @@ struct handle {
   int flags;
   int64_t size;
   bool structured;
+  bool extents;
   pthread_t reader;

   /* Prevents concurrent threads from interleaving writes to server */
@@ -286,7 +288,7 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t
type,
 static int
 nbd_request_full (struct handle *h, uint16_t flags, uint16_t type,
                   uint64_t offset, uint32_t count, const void *req_buf,
-                  void *rep_buf)
+                  void *rep_buf, struct nbdkit_extents *extents)
 {
   int err;
   struct transaction *trans;
@@ -308,6 +310,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t
type,
   trans->buf = rep_buf;
   trans->count = rep_buf ? count : 0;
   trans->offset = offset;
+  trans->extents = extents;
   nbd_lock (h);
   if (h->dead) {
     nbd_unlock (h);
@@ -340,7 +343,7 @@ static int
 nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset,
              uint32_t count)
 {
-  return nbd_request_full (h, flags, type, offset, count, NULL, NULL);
+  return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL);
 }

 /* Read a reply, and look up the fd corresponding to the transaction.
@@ -358,6 +361,12 @@ nbd_reply_raw (struct handle *h, int *fd)
   struct transaction *trans;
   void *buf = NULL;
   uint32_t count;
+  uint32_t id;
+  struct {
+    uint32_t length;
+    uint32_t flags;
+  } *extents = NULL;
+  size_t nextents = 0;
   int error = NBD_SUCCESS;
   bool more = false;
   uint64_t offset = 0; /* absolute offset of structured read chunk from buf */
@@ -452,6 +461,34 @@ nbd_reply_raw (struct handle *h, int *fd)
       }
       zero = true;
       break;
+    case NBD_REPLY_TYPE_BLOCK_STATUS:
+      if (!h->extents) {
+        nbdkit_error ("block status response without negotiation");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      if (rep.structured.length < sizeof *extents ||
+          rep.structured.length % sizeof *extents != sizeof id) {
+        nbdkit_error ("structured reply OFFSET_HOLE size incorrect");
+        free (buf);
+        return nbd_mark_dead (h);
+      }
+      nextents = rep.structured.length / sizeof *extents;
+      extents = malloc (rep.structured.length - sizeof id);
+      if (!extents) {
+        nbdkit_error ("malloc: %m");
+        return nbd_mark_dead (h);
+      }
+      memcpy (&id, buf, sizeof id);
+      id = be32toh (id);
+      nbdkit_debug ("parsing %zu extents for context id %" PRId32,
+                    nextents, id);
+      memcpy (extents, (char *) buf + sizeof id, sizeof *extents * nextents);
+      for (size_t i = 0; i < nextents; i++) {
+        extents[i].length = be32toh (extents[i].length);
+        extents[i].flags = be32toh (extents[i].flags);
+      }
+      break;
     default:
       if (NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) {
         uint16_t errlen;
@@ -497,9 +534,29 @@ nbd_reply_raw (struct handle *h, int *fd)
   trans = find_trans_by_cookie (h, rep.simple.handle, !more);
   if (!trans) {
     nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.simple.handle);
+    free (extents);
     return nbd_mark_dead (h);
   }

+  if (nextents) {
+    if (!trans->extents) {
+      nbdkit_error ("block status response to a non-status command");
+      free (extents);
+      return nbd_mark_dead (h);
+    }
+    offset = trans->offset;
+    for (size_t i = 0; i < nextents; i++) {
+      /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */
+      if (nbdkit_add_extent (trans->extents, offset, extents[i].length,
+                             extents[i].flags) == -1) {
+        error = EINVAL;
+        break;
+      }
+      offset += extents[i].length;
+    }
+    free (extents);
+  }
+
   if (!more) {
     *fd = trans->u.fds[1];
     if (!error)
@@ -686,8 +743,11 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t
option,
 static int
 nbd_newstyle_haggle (struct handle *h)
 {
+  const char *const query = "base:allocation";
   struct new_option opt;
   uint32_t exportnamelen = htobe32 (strlen (export));
+  uint32_t nrqueries = htobe32 (1);
+  uint32_t querylen = htobe32 (strlen (query));
   /* For now, we make no NBD_INFO_* requests, relying on the server to
      send its defaults. TODO: nbdkit should let plugins report block
      sizes, at which point we should request NBD_INFO_BLOCK_SIZE and
@@ -710,9 +770,47 @@ nbd_newstyle_haggle (struct handle *h)
                                       NULL) < 0)
     return -1;
   if (reply.reply == NBD_REP_ACK) {
-    nbdkit_debug ("structured replies enabled");
+    nbdkit_debug ("structured replies enabled, trying
NBD_OPT_SET_META_CONTEXT");
     h->structured = true;
-    /* TODO: block status */
+
+    opt.version = htobe64 (NEW_VERSION);
+    opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT);
+    opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) +
+                          sizeof nrqueries + sizeof querylen + strlen (query));
+    if (write_full (h->fd, &opt, sizeof opt) ||
+        write_full (h->fd, &exportnamelen, sizeof exportnamelen) ||
+        write_full (h->fd, export, strlen (export)) ||
+        write_full (h->fd, &nrqueries, sizeof nrqueries) ||
+        write_full (h->fd, &querylen, sizeof querylen) ||
+        write_full (h->fd, query, strlen (query))) {
+      nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT:
%m");
+      return -1;
+    }
+    if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT,
&reply,
+                                        NULL) < 0)
+      return -1;
+    if (reply.reply == NBD_REP_META_CONTEXT) {
+      /* Cheat: we asked for exactly one context. We could double
+         check that the server is replying with exactly the
+         "base:allocation" context, and then remember the id it tells
+         us to later confirm that responses to NBD_CMD_BLOCK_STATUS
+         match up; but in the absence of multiple contexts, it's
+         easier to just assume the server is compliant, and will reuse
+         the same id, without bothering to check further. */
+      nbdkit_debug ("extents enabled");
+      h->extents = true;
+      if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT,
&reply,
+                                          NULL) < 0)
+        return -1;
+    }
+    if (reply.reply != NBD_REP_ACK) {
+      if (h->extents) {
+        nbdkit_error ("unexpected response to set meta context");
+        return -1;
+      }
+      nbdkit_debug ("ignoring meta context response %s",
+                    name_of_nbd_rep (reply.reply));
+    }
   }
   else {
     nbdkit_debug ("structured replies disabled");
@@ -1008,6 +1106,14 @@ nbd_can_multi_conn (void *handle)
   return h->flags & NBD_FLAG_CAN_MULTI_CONN;
 }

+static int
+nbd_can_extents (void *handle)
+{
+  struct handle *h = handle;
+
+  return h->extents;
+}
+
 /* Read data from the file. */
 static int
 nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset,
@@ -1024,7 +1130,7 @@ nbd_pread (void *handle, void *buf, uint32_t count,
uint64_t offset,
      when we register our plugin? */
   if (h->structured)
     memset (buf, 0, count);
-  c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf);
+  c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL);
   return c < 0 ? c : nbd_reply (h, c);
 }

@@ -1038,7 +1144,7 @@ nbd_pwrite (void *handle, const void *buf, uint32_t count,
uint64_t offset,

   assert (!(flags & ~NBDKIT_FLAG_FUA));
   c = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0,
-                        NBD_CMD_WRITE, offset, count, buf, NULL);
+                        NBD_CMD_WRITE, offset, count, buf, NULL, NULL);
   return c < 0 ? c : nbd_reply (h, c);
 }

@@ -1086,6 +1192,21 @@ nbd_flush (void *handle, uint32_t flags)
   return c < 0 ? c : nbd_reply (h, c);
 }

+/* Read extents of the file. */
+static int
+nbd_extents (void *handle, uint32_t count, uint64_t offset,
+             uint32_t flags, struct nbdkit_extents *extents)
+{
+  struct handle *h = handle;
+  int c;
+
+  assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents);
+  c = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ?
NBD_CMD_FLAG_REQ_ONE : 0,
+                        NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL,
+                        extents);
+  return c < 0 ? c : nbd_reply (h, c);
+}
+
 static struct nbdkit_plugin plugin = {
   .name               = "nbd",
   .longname           = "nbdkit nbd plugin",
@@ -1104,11 +1225,13 @@ static struct nbdkit_plugin plugin = {
   .can_zero           = nbd_can_zero,
   .can_fua            = nbd_can_fua,
   .can_multi_conn     = nbd_can_multi_conn,
+  .can_extents        = nbd_can_extents,
   .pread              = nbd_pread,
   .pwrite             = nbd_pwrite,
   .zero               = nbd_zero,
   .flush              = nbd_flush,
   .trim               = nbd_trim,
+  .extents            = nbd_extents,
   .errno_is_preserved = 1,
 };

-- 
2.20.1

Eric Blake

2019-Apr-23 20:25 UTC

head link

Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

On 4/22/19 7:50 PM, Eric Blake wrote:> Time to enhance the nbd plugin to request structured replies from the
> server. For now, deal only with structured reads. The server can now
> return sparse reads, even though we need nbdkit version 3 before we
> can in turn return sparse reads back to the client.
> 
> +  case NBD_STRUCTURED_REPLY_MAGIC:
> +    more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE);
> +  trans = find_trans_by_cookie (h, rep.simple.handle, !more);
>    if (!trans) {
> -    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.handle);
> +    nbdkit_error ("reply with unexpected cookie %#" PRIx64,
rep.simple.handle);
>      return nbd_mark_dead (h);
>    }
> 
> -  *fd = trans->u.fds[1];
> +  if (!more)
> +    *fd = trans->u.fds[1];
> +  else if (error && !trans->err)
> +    trans->err = error;
>    buf = trans->buf;
>    count = trans->count;
> +  if (buf && h->structured &&
> +  }
>    free (trans);
This sets up a use-after-free if the server replies with more than one
chunk. The free(trans) call must happen only if !more. Looks like I'll
be sending a v2 of the tail of this series on top of my work to utilize
cleanup.h (I've applied the obvious bug fixes in 1, 4, and 5, and am
waiting for the NBD list to respond to my question about a possible
protocol spec change before deciding to push 2 alone or squashed with 3).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Reasonably Related Threads

Search for more apparently analagous threads

Libguestfs - Apr 2019 - Re: [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid

[Libguestfs] [nbdkit PATCH 0/7] Implement structured replies in nbd plugin

[Libguestfs] [nbdkit PATCH 1/7] protocol: Send correct structured reply error for non-Linux

[Libguestfs] [nbdkit PATCH 2/7] protocol: Support EOVERFLOW

[Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid

[Libguestfs] [nbdkit PATCH 4/7] nbd: Honor server global flags on little-endian

[Libguestfs] [nbdkit PATCH 5/7] protocol: Add helpers for error response handling

[Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request

[Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

Re: [Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid

Re: [Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request

[Libguestfs] [nbdkit PATCH 8/7] nbd: Implement .extents

Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies

Reasonably Related Threads