thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands [May 2019]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2019-May-22 03:15 UTC

[Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands

On v1, we discussed whether cmds_to_issue needed to be a list, since
it never had more than one element. I played with the idea of making
it a list, and allowing the client to queue up new commands regardless
of whether the state machine is currently in READY. I also polished up
the tmp demo into a bit more full-fledged example file, worth
including since it also let me discover a hard-to-hit race with large
NBD_CMD_WRITE vs. EAGAIN failures (now fixed).

Eric Blake (5):
  lib: Refactor state event into command_common
  commands: Allow for a command queue
  states: Split ISSUE_COMMAND.SEND_REQUEST
  states: Allow in-flight read while writing next command
  examples: Add example to demonstrate just-fixed deadlock scenario

 .gitignore                       |   1 +
 examples/Makefile.am             |  10 ++
 examples/batched-read-write.c    | 199 +++++++++++++++++++++++++++++++
 generator/generator              |  40 +++++--
 generator/states-issue-command.c |  68 ++++++++---
 generator/states.c               |   6 +
 lib/disconnect.c                 |  17 +--
 lib/internal.h                   |  10 ++
 lib/rw.c                         |  41 ++++---
 9 files changed, 334 insertions(+), 58 deletions(-)
 create mode 100644 examples/batched-read-write.c

-- 
2.20.1

Eric Blake

2019-May-22 03:15 UTC

head link

[Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common

The next patch wants to make queuing commands smarter; rather than
duplicating the logic at each command's call site, it's better to
centralize things in command_common, and to also make this helper
routine visible for use in NBD_CMD_DISC.
---
 lib/disconnect.c | 17 +++--------------
 lib/internal.h   |  6 ++++++
 lib/rw.c         | 19 +++----------------
 3 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/lib/disconnect.c b/lib/disconnect.c
index bc43b4c..26d7298 100644
--- a/lib/disconnect.c
+++ b/lib/disconnect.c
@@ -62,25 +62,14 @@ nbd_unlocked_aio_disconnect (struct nbd_connection *conn)
 {
   struct command_in_flight *cmd;

-  cmd = malloc (sizeof *cmd);
-  if (cmd == NULL) {
-    set_error (errno, "malloc");
+  cmd = command_common (conn, 0, NBD_CMD_DISC, 0, 0, NULL);
+  if (cmd == NULL)
     return -1;
-  }
-  cmd->flags = 0;
-  cmd->type = NBD_CMD_DISC;
-  cmd->handle = conn->h->unique++;
-  cmd->offset = 0;
-  cmd->count = 0;
-  cmd->data = NULL;
-
-  cmd->next = conn->cmds_to_issue;
-  conn->cmds_to_issue = cmd;

   /* This will leave the command on the in-flight list.  Is this a
    * problem?  Probably it isn't.  If it is, we could add a flag to
    * the command struct to tell SEND_REQUEST not to add it to the
    * in-flight list.
    */
-  return nbd_internal_run (conn->h, conn, cmd_issue);
+  return 0;
 }
diff --git a/lib/internal.h b/lib/internal.h
index 3f2b729..67bd52a 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -265,6 +265,12 @@ extern void nbd_internal_set_last_error (int errnum, char
*error);
 extern int nbd_internal_errno_of_nbd_error (uint32_t error);
 extern const char *nbd_internal_name_of_nbd_cmd (uint16_t type);

+/* rw.c */
+extern struct command_in_flight *command_common (struct nbd_connection *conn,
+                                                 uint16_t flags, uint16_t type,
+                                                 uint64_t offset,
+                                                 uint64_t count, void *data);
+
 /* socket.c */
 struct socket *nbd_internal_socket_create (int fd);

diff --git a/lib/rw.c b/lib/rw.c
index 9dfce97..a7587e9 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -241,7 +241,7 @@ nbd_unlocked_block_status (struct nbd_handle *h,
   return r == -1 ? -1 : 0;
 }

-static struct command_in_flight *
+struct command_in_flight *
 command_common (struct nbd_connection *conn,
                 uint16_t flags, uint16_t type,
                 uint64_t offset, uint64_t count, void *data)
@@ -298,6 +298,8 @@ command_common (struct nbd_connection *conn,

   cmd->next = conn->cmds_to_issue;
   conn->cmds_to_issue = cmd;
+  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
+    return NULL;

   return cmd;
 }
@@ -311,8 +313,6 @@ nbd_unlocked_aio_pread (struct nbd_connection *conn, void
*buf,
   cmd = command_common (conn, 0, NBD_CMD_READ, offset, count, buf);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -344,8 +344,6 @@ nbd_unlocked_aio_pwrite (struct nbd_connection *conn, const
void *buf,
                         (void *) buf);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -363,8 +361,6 @@ nbd_unlocked_aio_flush (struct nbd_connection *conn)
   cmd = command_common (conn, 0, NBD_CMD_FLUSH, 0, 0, NULL);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -400,8 +396,6 @@ nbd_unlocked_aio_trim (struct nbd_connection *conn,
   cmd = command_common (conn, flags, NBD_CMD_TRIM, offset, count, NULL);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -424,8 +418,6 @@ nbd_unlocked_aio_cache (struct nbd_connection *conn,
   cmd = command_common (conn, 0, NBD_CMD_CACHE, offset, count, NULL);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -461,8 +453,6 @@ nbd_unlocked_aio_zero (struct nbd_connection *conn,
   cmd = command_common (conn, flags, NBD_CMD_WRITE_ZEROES, offset, count,
NULL);
   if (!cmd)
     return -1;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;

   return cmd->handle;
 }
@@ -505,8 +495,5 @@ nbd_unlocked_aio_block_status (struct nbd_connection *conn,
   cmd->extent_fn = extent;
   cmd->extent_id = id;

-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
-    return -1;
-
   return cmd->handle;
 }
-- 
2.20.1

Eric Blake

2019-May-22 03:15 UTC

head link

[Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue

Previously, our 'cmds_to_issue' list was at most 1 element long,
because we reject all commands except from state READY, but don't get
back to state READY until the issue-commands sequence has completed.
However, this is not as friendly on the client - once we are in
transmission phase, a client may want to queue up another command
whether or not the state machine is still tied up in processing a
previous command.  We still want to reject commands sent before the
first time we reach READY, as well as keep the cmd_issue event for
kicking the state machine into action when there is no previous
command being worked on, but otherwise, the state machine itself can
recognize when the command queue needs draining.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 generator/states.c |  6 ++++++
 lib/internal.h     |  3 +++
 lib/rw.c           | 24 ++++++++++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/generator/states.c b/generator/states.c
index 202c305..a4eecbd 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -110,6 +110,12 @@ send_from_wbuf (struct nbd_connection *conn)
 /*----- End of prologue. -----*/

 /* STATE MACHINE */ {
+ READY:
+  conn->reached_ready = true;
+  if (conn->cmds_to_issue)
+    SET_NEXT_STATE (%ISSUE_COMMAND.START);
+  return 0;
+
  DEAD:
   if (conn->sock) {
     conn->sock->ops->close (conn->sock);
diff --git a/lib/internal.h b/lib/internal.h
index 67bd52a..2d6ad9d 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -175,6 +175,9 @@ struct nbd_connection {
   /* Global flags from the server. */
   uint16_t gflags;

+  /* True if we've ever reached the READY state. */
+  bool reached_ready;
+
   /* When issuing a command, the first list contains commands waiting
    * to be issued.  The second list contains commands which have been
    * issued and waiting for replies.  The third list contains commands
diff --git a/lib/rw.c b/lib/rw.c
index a7587e9..ebd4ff9 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -246,7 +246,8 @@ command_common (struct nbd_connection *conn,
                 uint16_t flags, uint16_t type,
                 uint64_t offset, uint64_t count, void *data)
 {
-  struct command_in_flight *cmd;
+  struct command_in_flight *cmd, *prev_cmd;
+  bool event = !conn->reached_ready;

   switch (type) {
     /* Commands which send or receive data are limited to MAX_REQUEST_SIZE. */
@@ -296,9 +297,24 @@ command_common (struct nbd_connection *conn,
   if (conn->structured_replies && cmd->data && type ==
NBD_CMD_READ)
     memset (cmd->data, 0, cmd->count);

-  cmd->next = conn->cmds_to_issue;
-  conn->cmds_to_issue = cmd;
-  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
+  /* If we have not reached READY yet, sending the event lets the
+   * state machine fail for requesting a command too early. If there
+   * are no queued commands and we are already in state READY, send an
+   * event to kick the state machine. In all other cases, append our
+   * command to the end of the queue, and the state machine will
+   * eventually get to it when it cycles back to READY.
+   */
+  if (conn->cmds_to_issue != NULL) {
+    prev_cmd = conn->cmds_to_issue;
+    while (prev_cmd->next)
+      prev_cmd = prev_cmd->next;
+    prev_cmd->next = cmd;
+  }
+  else {
+    conn->cmds_to_issue = cmd;
+    event |= conn->state == STATE_READY;
+  }
+  if (event && nbd_internal_run (conn->h, conn, cmd_issue) == -1)
     return NULL;

   return cmd;
-- 
2.20.1

Eric Blake

2019-May-22 03:15 UTC

head link

[Libguestfs] [libnbd PATCH v2 3/5] states: Split ISSUE_COMMAND.SEND_REQUEST

In order to handle reading an in-flight response while in the middle
of sending a second command, we'll need a way to jump back into the
middle of a command being sent. This is easier if the state that sets
wbuf/wlen is distinct from the state that reads into wbuf and alters
wlen, and also easier if we don't move the command to the in-flight
queue until after the writes finish (since the previous patch was
careful to queue new pending commands to the end of the list, the head
of the list is always the current command needing to be sent).
---
 generator/generator              | 14 ++++++++++
 generator/states-issue-command.c | 45 ++++++++++++++++++++------------
 2 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/generator/generator b/generator/generator
index a1bf41d..a4ad362 100755
--- a/generator/generator
+++ b/generator/generator
@@ -637,12 +637,26 @@ and issue_command_state_machine = [
     external_events = [ NotifyWrite, "" ];
   };

+  State {
+    default_state with
+    name = "PREPARE_WRITE_PAYLOAD";
+    comment = "Prepare the write payload to send to the remote
server";
+    external_events = [];
+  };
+
   State {
     default_state with
     name = "SEND_WRITE_PAYLOAD";
     comment = "Sending the write payload to the remote server";
     external_events = [ NotifyWrite, "" ];
   };
+
+State {
+    default_state with
+    name = "FINISH";
+    comment = "Finish issuing a command";
+    external_events = [];
+  };
 ]

 (* Receiving a reply from the server. *)
diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c
index a57f40f..e24ea34 100644
--- a/generator/states-issue-command.c
+++ b/generator/states-issue-command.c
@@ -24,9 +24,6 @@

   assert (conn->cmds_to_issue != NULL);
   cmd = conn->cmds_to_issue;
-  conn->cmds_to_issue = cmd->next;
-  cmd->next = conn->cmds_in_flight;
-  conn->cmds_in_flight = cmd;

   conn->sbuf.request.magic = htobe32 (NBD_REQUEST_MAGIC);
   conn->sbuf.request.flags = htobe16 (cmd->flags);
@@ -40,29 +37,43 @@
   return 0;

  ISSUE_COMMAND.SEND_REQUEST:
-  struct command_in_flight *cmd;
-
   switch (send_from_wbuf (conn)) {
   case -1: SET_NEXT_STATE (%.DEAD); return -1;
-  case 0:
-    assert (conn->cmds_in_flight != NULL);
-    cmd = conn->cmds_in_flight;
-    assert (cmd->handle == be64toh (conn->sbuf.request.handle));
-    if (cmd->type == NBD_CMD_WRITE) {
-      conn->wbuf = cmd->data;
-      conn->wlen = cmd->count;
-      SET_NEXT_STATE (%SEND_WRITE_PAYLOAD);
-    }
-    else
-      SET_NEXT_STATE (%.READY);
+  case 0:  SET_NEXT_STATE (%PREPARE_WRITE_PAYLOAD);
   }
   return 0;

+ ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD:
+  struct command_in_flight *cmd;
+
+  assert (conn->cmds_to_issue != NULL);
+  cmd = conn->cmds_to_issue;
+  assert (cmd->handle == be64toh (conn->sbuf.request.handle));
+  if (cmd->type == NBD_CMD_WRITE) {
+    conn->wbuf = cmd->data;
+    conn->wlen = cmd->count;
+    SET_NEXT_STATE (%SEND_WRITE_PAYLOAD);
+  }
+  else
+    SET_NEXT_STATE (%FINISH);
+  return 0;
+
  ISSUE_COMMAND.SEND_WRITE_PAYLOAD:
   switch (send_from_wbuf (conn)) {
   case -1: SET_NEXT_STATE (%.DEAD); return -1;
-  case 0:  SET_NEXT_STATE (%.READY);
+  case 0:  SET_NEXT_STATE (%FINISH);
   }
   return 0;

+ ISSUE_COMMAND.FINISH:
+  struct command_in_flight *cmd;
+
+  assert (conn->cmds_to_issue != NULL);
+  cmd = conn->cmds_to_issue;
+  conn->cmds_to_issue = cmd->next;
+  cmd->next = conn->cmds_in_flight;
+  conn->cmds_in_flight = cmd;
+  SET_NEXT_STATE (%.READY);
+  return 0;
+
 } /* END STATE MACHINE */
-- 
2.20.1

Eric Blake

2019-May-22 03:15 UTC

head link

[Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

As already noted in our state machine, a client that batches up a
large read followed by large writes, coupled with a server that only
processes commands in order, can result in deadlock (the server won't
read more until we unblock its ability to write out its reply to our
first command; but we aren't willing to read until we are done writing
out our second command).  Break the deadlock by teaching the generator
that while we are in the middle of writing a command, we must remain
responsive to read_notify events; if the server has data for us to
read, we should consume that before jumping back into the middle of
our command issue (and consuming a reply can invalidate sbuf, so we
have to drop an assertion in PREPARE_WRITE_PAYLOAD).
---
 generator/generator              | 26 ++++++++++++++++++--------
 generator/states-issue-command.c | 25 ++++++++++++++++++++++++-
 lib/internal.h                   |  1 +
 3 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/generator/generator b/generator/generator
index a4ad362..5c84a5d 100755
--- a/generator/generator
+++ b/generator/generator
@@ -620,12 +620,6 @@ and issue_command_state_machine = [
   State {
     default_state with
     name = "START";
-    (* XXX There's a possible deadlock here if a server cannot
-     * handle multiple requests pipelined on a single connection.
-     * We could try to issue a command and block, but reads might
-     * be available.  It should be possible to break this with
-     * another state.
-     *)
     comment = "Begin issuing a command to the remote server";
     external_events = [];
   };
@@ -634,7 +628,15 @@ and issue_command_state_machine = [
     default_state with
     name = "SEND_REQUEST";
     comment = "Sending a request to the remote server";
-    external_events = [ NotifyWrite, "" ];
+    external_events = [ NotifyWrite, "";
+                        NotifyRead, "PAUSE_SEND_REQUEST" ];
+  };
+
+  State {
+    default_state with
+    name = "PAUSE_SEND_REQUEST";
+    comment = "Interrupt send request to receive an earlier command's
reply";
+    external_events = [];
   };

   State {
@@ -648,7 +650,15 @@ and issue_command_state_machine = [
     default_state with
     name = "SEND_WRITE_PAYLOAD";
     comment = "Sending the write payload to the remote server";
-    external_events = [ NotifyWrite, "" ];
+    external_events = [ NotifyWrite, "";
+                        NotifyRead, "PAUSE_WRITE_PAYLOAD" ];
+  };
+
+State {
+    default_state with
+    name = "PAUSE_WRITE_PAYLOAD";
+    comment = "Interrupt write payload to receive an earlier command's
reply";
+    external_events = [];
   };

 State {
diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c
index e24ea34..3a5980d 100644
--- a/generator/states-issue-command.c
+++ b/generator/states-issue-command.c
@@ -25,6 +25,15 @@
   assert (conn->cmds_to_issue != NULL);
   cmd = conn->cmds_to_issue;

+  /* Were we interrupted by reading a reply to an earlier command? */
+  if (conn->wlen) {
+    if (conn->in_write_payload)
+      SET_NEXT_STATE(%SEND_WRITE_PAYLOAD);
+    else
+      SET_NEXT_STATE(%SEND_REQUEST);
+    return 0;
+  }
+
   conn->sbuf.request.magic = htobe32 (NBD_REQUEST_MAGIC);
   conn->sbuf.request.flags = htobe16 (cmd->flags);
   conn->sbuf.request.type = htobe16 (cmd->type);
@@ -43,12 +52,18 @@
   }
   return 0;

+ ISSUE_COMMAND.PAUSE_SEND_REQUEST:
+  assert (conn->wlen);
+  assert (conn->cmds_to_issue != NULL);
+  conn->in_write_payload = false;
+  SET_NEXT_STATE (%^REPLY.START);
+  return 0;
+
  ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD:
   struct command_in_flight *cmd;

   assert (conn->cmds_to_issue != NULL);
   cmd = conn->cmds_to_issue;
-  assert (cmd->handle == be64toh (conn->sbuf.request.handle));
   if (cmd->type == NBD_CMD_WRITE) {
     conn->wbuf = cmd->data;
     conn->wlen = cmd->count;
@@ -65,9 +80,17 @@
   }
   return 0;

+ ISSUE_COMMAND.PAUSE_WRITE_PAYLOAD:
+  assert (conn->wlen);
+  assert (conn->cmds_to_issue != NULL);
+  conn->in_write_payload = true;
+  SET_NEXT_STATE (%^REPLY.START);
+  return 0;
+
  ISSUE_COMMAND.FINISH:
   struct command_in_flight *cmd;

+  assert (!conn->wlen);
   assert (conn->cmds_to_issue != NULL);
   cmd = conn->cmds_to_issue;
   conn->cmds_to_issue = cmd->next;
diff --git a/lib/internal.h b/lib/internal.h
index 2d6ad9d..91c056c 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -185,6 +185,7 @@ struct nbd_connection {
    * acknowledge them.
    */
   struct command_in_flight *cmds_to_issue, *cmds_in_flight, *cmds_done;
+  bool in_write_payload;
 };

 struct meta_context {
-- 
2.20.1

Eric Blake

2019-May-22 03:15 UTC

head link

[Libguestfs] [libnbd PATCH v2 5/5] examples: Add example to demonstrate just-fixed deadlock scenario

Added merely as an example to be run by hand, rather than automated
into 'make check', since nbdkit --filter=noparallel is quite new.
---
 .gitignore                    |   1 +
 examples/Makefile.am          |  10 ++
 examples/batched-read-write.c | 199 ++++++++++++++++++++++++++++++++++
 3 files changed, 210 insertions(+)
 create mode 100644 examples/batched-read-write.c

diff --git a/.gitignore b/.gitignore
index 66ff811..2ea7da0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,6 +30,7 @@ Makefile.in
 /docs/libnbd.3
 /docs/libnbd-api.3
 /docs/libnbd-api.pod
+/examples/batched-read-write
 /examples/threaded-reads-and-writes
 /examples/simple-fetch-first-sector
 /examples/simple-reads-and-writes
diff --git a/examples/Makefile.am b/examples/Makefile.am
index be3f21d..6825384 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -18,6 +18,7 @@
 include $(top_srcdir)/subdir-rules.mk

 noinst_PROGRAMS = \
+	batched-read-write \
 	simple-fetch-first-sector \
 	simple-reads-and-writes \
 	threaded-reads-and-writes
@@ -50,3 +51,12 @@ threaded_reads_and_writes_CFLAGS = \
 threaded_reads_and_writes_LDADD = \
 	$(top_builddir)/lib/libnbd.la \
 	$(PTHREAD_LIBS)
+
+batched_read_write_SOURCES = \
+	batched-read-write.c
+batched_read_write_CPPFLAGS = \
+	-I$(top_srcdir)/include
+batched_read_write_CFLAGS = \
+	$(WARNINGS_CFLAGS)
+batched_read_write_LDADD = \
+	$(top_builddir)/lib/libnbd.la
diff --git a/examples/batched-read-write.c b/examples/batched-read-write.c
new file mode 100644
index 0000000..888f4b6
--- /dev/null
+++ b/examples/batched-read-write.c
@@ -0,0 +1,199 @@
+/* This example can be copied, used and modified for any purpose
+ * without restrictions.
+ *
+ * Example usage with nbdkit:
+ *
+ * timeout 10s nbdkit -U - --filter=noparallel memory 2M \
+ *   --run './batched-read-write $unixsocket'
+ *
+ * This will attempt to batch a large aio read request immediately
+ * followed by a large aio write request, prior to waiting for any
+ * command replies from the server. A naive client that does not check
+ * for available read data related to the first command while trying
+ * to write data for the second command, coupled with a server that
+ * only processes commands serially, would cause deadlock (both
+ * processes fill up their write buffers waiting for a reader); thus,
+ * this tests that libnbd is smart enough to always respond to replies
+ * for in-flight requests even when it has batched up other commands
+ * to write.
+ *
+ * To run it against a remote server over TCP:
+ *
+ * ./batched-read-write hostname port
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <poll.h>
+#include <time.h>
+#include <assert.h>
+
+#include <libnbd.h>
+
+/* The single NBD handle. */
+static struct nbd_handle *nbd;
+
+/* Buffers used for the test. */
+static char *in, *out;
+static int64_t packetsize;
+
+static int
+try_deadlock (void *arg)
+{
+  struct pollfd fds[1];
+  struct nbd_connection *conn;
+  size_t i, j;
+  int64_t handles[2];
+  size_t in_flight;        /* counts number of requests in flight */
+  int dir, r;
+
+  /* The single thread "owns" the connection. */
+  conn = nbd_get_connection (nbd, 0);
+
+  /* Issue commands. */
+  in_flight = 0;
+  fprintf (stderr, " * before aio_pread\n");
+  handles[0] = nbd_aio_pread (conn, in, packetsize, 0);
+  if (handles[0] == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    goto error;
+  }
+  fprintf (stderr, " * after aio_pread\n");
+  in_flight++;
+  fprintf (stderr, " * before aio_pwrite\n");
+  handles[1] = nbd_aio_pwrite (conn, out, packetsize, packetsize, 0);
+  if (handles[1] == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    goto error;
+  }
+  fprintf (stderr, " * after aio_pwrite\n");
+  in_flight++;
+
+  /* Now wait for commands to retire, or for deadlock to occur */
+  while (in_flight > 0) {
+    if (nbd_aio_is_dead (conn) || nbd_aio_is_closed (conn)) {
+      fprintf (stderr, "connection is dead or closed\n");
+      goto error;
+    }
+
+    fds[0].fd = nbd_aio_get_fd (conn);
+    fds[0].events = 0;
+    fds[0].revents = 0;
+    dir = nbd_aio_get_direction (conn);
+    if ((dir & LIBNBD_AIO_DIRECTION_READ) != 0)
+      fds[0].events |= POLLIN;
+    if ((dir & LIBNBD_AIO_DIRECTION_WRITE) != 0)
+      fds[0].events |= POLLOUT;
+
+    if (poll (fds, 1, -1) == -1) {
+      perror ("poll");
+      goto error;
+    }
+
+    if ((dir & LIBNBD_AIO_DIRECTION_READ) != 0 &&
+        (fds[0].revents & POLLIN) != 0)
+      nbd_aio_notify_read (conn);
+    else if ((dir & LIBNBD_AIO_DIRECTION_WRITE) != 0 &&
+             (fds[0].revents & POLLOUT) != 0)
+      nbd_aio_notify_write (conn);
+
+    /* If a command is ready to retire, retire it. */
+    for (j = 0; j < in_flight; ++j) {
+      r = nbd_aio_command_completed (conn, handles[j]);
+      if (r == -1) {
+        fprintf (stderr, "%s\n", nbd_get_error ());
+        goto error;
+      }
+      if (r) {
+        memmove (&handles[j], &handles[j+1],
+                 sizeof (handles[0]) * (in_flight - j - 1));
+        j--;
+        in_flight--;
+      }
+    }
+  }
+
+  printf ("finished OK\n");
+
+  return 0;
+
+ error:
+  fprintf (stderr, "failed\n");
+  return -1;
+}
+
+int
+main (int argc, char *argv[])
+{
+  int err;
+  int64_t exportsize;
+
+  if (argc < 2 || argc > 3) {
+    fprintf (stderr, "%s socket | hostname port\n", argv[0]);
+    exit (EXIT_FAILURE);
+  }
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  /* Connect synchronously as this is simpler. */
+  if (argc == 2) {
+    if (nbd_connect_unix (nbd, argv[1]) == -1) {
+      fprintf (stderr, "%s\n", nbd_get_error ());
+      exit (EXIT_FAILURE);
+    }
+  }
+  else {
+    if (nbd_connect_tcp (nbd, argv[1], argv[2]) == -1) {
+      fprintf (stderr, "%s\n", nbd_get_error ());
+      exit (EXIT_FAILURE);
+    }
+  }
+
+  if (nbd_read_only (nbd) == 1) {
+    fprintf (stderr, "%s: error: this NBD export is read-only\n",
argv[0]);
+    exit (EXIT_FAILURE);
+  }
+
+  exportsize = nbd_get_size (nbd);
+  if (exportsize == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  packetsize = exportsize / 2;
+  if (packetsize > 2 * 1024 * 1024)
+    packetsize = 2 * 1024 * 1024;
+
+  in = malloc (packetsize);
+  out = malloc (packetsize);
+  if (!in || !out) {
+    fprintf (stderr, "insufficient memory\n");
+    exit (EXIT_FAILURE);
+  }
+
+  /* Attempt to be non-destructive, by writing what file already contains */
+  if (nbd_pread (nbd, out, packetsize, packetsize) == -1) {
+    fprintf (stderr, "sync read failed: %s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  if (try_deadlock (NULL) == -1)
+    exit (EXIT_FAILURE);
+
+  if (nbd_shutdown (nbd) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_close (nbd);
+
+  return EXIT_SUCCESS;
+}
-- 
2.20.1

Richard W.M. Jones

2019-May-22 07:03 UTC

head link

Re: [Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands

There are a couple of unused variables in the new program.  The
attached patch fixes it.  ./configure --enable-gcc-warnings should
catch these automatically.  I need to document this ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Richard W.M. Jones

2019-May-22 09:13 UTC

head link

Re: [Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common

On Tue, May 21, 2019 at 10:15:48PM -0500, Eric Blake
wrote:> The next patch wants to make queuing commands smarter; rather than
> duplicating the logic at each command's call site, it's better to
> centralize things in command_common, and to also make this helper
> routine visible for use in NBD_CMD_DISC.
There's a few problems with this patch.

To avoid leaking symbols -- even though we have a linker script, not
all future platforms are expected to support it -- global symbols
which are not exported should be called nbd_internal_*.  So
command_common should be nbd_internal_command_common.

The bigger problem is:
> @@ -505,8 +495,5 @@ nbd_unlocked_aio_block_status (struct nbd_connection
*conn,
>    cmd->extent_fn = extent;
>    cmd->extent_id = id;
> 
> -  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
> -    return -1;
> -
>    return cmd->handle;
This changes the semantics of the function because the cmd->extent_*
fields are set too late.  (Yes, command_common is rather ad hoc :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Richard W.M. Jones

2019-May-22 09:23 UTC

head link

Re: [Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue

On Tue, May 21, 2019 at 10:15:49PM -0500, Eric Blake
wrote:> Previously, our 'cmds_to_issue' list was at most 1 element long,
> because we reject all commands except from state READY, but don't get
> back to state READY until the issue-commands sequence has completed.
> However, this is not as friendly on the client - once we are in
> transmission phase, a client may want to queue up another command
> whether or not the state machine is still tied up in processing a
> previous command.  We still want to reject commands sent before the
> first time we reach READY, as well as keep the cmd_issue event for
> kicking the state machine into action when there is no previous
> command being worked on, but otherwise, the state machine itself can
> recognize when the command queue needs draining.
I have a queued series which adds some new calls that may make
this patch simpler:

  int nbd_[unlocked_]aio_is_created (conn)
     Returns true if the connection is in the START state

  int nbd_[unlocked_]aio_is_connecting (conn)
     Returns true if the connection is connecting or handshaking

  int nbd_[unlocked_]aio_is_processing (conn)
     Returns true if the connection is issuing or replying to
     commands (but not in the READY state)

I will post these later this morning.

I believe they will let you get rid of the reached_ready flag, as well
as making this patch more correct because you probably want to avoid
issuing commands on a connection which is closed or dead.

You'd use a test like:

  if (nbd_unlocked_aio_is_ready (conn)) {
    // call nbd_internal_run
  }
  else if (nbd_unlocked_aio_is_processing (conn)) {
    // can't call nbd_internal_run because we're in the
    // wrong state, but we can enqueue the command and it
    // will be processed next time we get to READY
  }
  else {
    // this is an error
  }
> @@ -296,9 +297,24 @@ command_common (struct nbd_connection *conn,
>    if (conn->structured_replies && cmd->data && type
== NBD_CMD_READ)
>      memset (cmd->data, 0, cmd->count);
> 
> -  cmd->next = conn->cmds_to_issue;
> -  conn->cmds_to_issue = cmd;
> -  if (nbd_internal_run (conn->h, conn, cmd_issue) == -1)
> +  /* If we have not reached READY yet, sending the event lets the
> +   * state machine fail for requesting a command too early. If there
> +   * are no queued commands and we are already in state READY, send an
> +   * event to kick the state machine. In all other cases, append our
> +   * command to the end of the queue, and the state machine will
> +   * eventually get to it when it cycles back to READY.
> +   */
I think this test is wrong.  We don't know what the failure was,
it might have failed for a variety of reasons.

It's better to simple test if nbd_unlocked_aio_is_ready, then
call nbd_internal_run (if true), otherwise queue the command.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Richard W.M. Jones

2019-May-22 09:24 UTC

head link

Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

Did we think it was better that the READY state should check if there
are commands to issue and automatically transition to
PREPARE_FOR_WRITE if so?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Eric Blake

2019-May-22 11:21 UTC

head link

Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

On 5/21/19 10:15 PM, Eric Blake wrote:> As already noted in our state machine, a client that batches up a
> large read followed by large writes, coupled with a server that only
> processes commands in order, can result in deadlock (the server won't
> read more until we unblock its ability to write out its reply to our
> first command; but we aren't willing to read until we are done writing
> out our second command).  Break the deadlock by teaching the generator
> that while we are in the middle of writing a command, we must remain
> responsive to read_notify events; if the server has data for us to
> read, we should consume that before jumping back into the middle of
> our command issue (and consuming a reply can invalidate sbuf, so we
> have to drop an assertion in PREPARE_WRITE_PAYLOAD).
That's a hint of a latent bug...
> ---
>  generator/generator              | 26 ++++++++++++++++++--------
>  generator/states-issue-command.c | 25 ++++++++++++++++++++++++-
>  lib/internal.h                   |  1 +
>  3 files changed, 43 insertions(+), 9 deletions(-)
> @@ -43,12 +52,18 @@
>    }
>    return 0;
> 
> + ISSUE_COMMAND.PAUSE_SEND_REQUEST:
> +  assert (conn->wlen);
> +  assert (conn->cmds_to_issue != NULL);
> +  conn->in_write_payload = false;
> +  SET_NEXT_STATE (%^REPLY.START);
> +  return 0;
...Since processing REPLY is allowed to scribble over conn->sbuf,
> +
>   ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD:
>    struct command_in_flight *cmd;
> 
>    assert (conn->cmds_to_issue != NULL);
>    cmd = conn->cmds_to_issue;
> -  assert (cmd->handle == be64toh (conn->sbuf.request.handle));
it is also possible that resuming SEND_REQUEST will be sending scribbled
data. So I need to split conn->sbuf into two parts - a section used for
command issue, and the rest used for REPLY, at which point the assert is
still viable after all.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Reasonably Related Threads

Search for more reasonably related threads

Libguestfs - May 2019 - [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands

[Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands

[Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common

[Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue

[Libguestfs] [libnbd PATCH v2 3/5] states: Split ISSUE_COMMAND.SEND_REQUEST

[Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

[Libguestfs] [libnbd PATCH v2 5/5] examples: Add example to demonstrate just-fixed deadlock scenario

Re: [Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands

Re: [Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common

Re: [Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue

Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command

Reasonably Related Threads