Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands
On v1, we discussed whether cmds_to_issue needed to be a list, since it never had more than one element. I played with the idea of making it a list, and allowing the client to queue up new commands regardless of whether the state machine is currently in READY. I also polished up the tmp demo into a bit more full-fledged example file, worth including since it also let me discover a hard-to-hit race with large NBD_CMD_WRITE vs. EAGAIN failures (now fixed). Eric Blake (5): lib: Refactor state event into command_common commands: Allow for a command queue states: Split ISSUE_COMMAND.SEND_REQUEST states: Allow in-flight read while writing next command examples: Add example to demonstrate just-fixed deadlock scenario .gitignore | 1 + examples/Makefile.am | 10 ++ examples/batched-read-write.c | 199 +++++++++++++++++++++++++++++++ generator/generator | 40 +++++-- generator/states-issue-command.c | 68 ++++++++--- generator/states.c | 6 + lib/disconnect.c | 17 +-- lib/internal.h | 10 ++ lib/rw.c | 41 ++++--- 9 files changed, 334 insertions(+), 58 deletions(-) create mode 100644 examples/batched-read-write.c -- 2.20.1
Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common
The next patch wants to make queuing commands smarter; rather than duplicating the logic at each command's call site, it's better to centralize things in command_common, and to also make this helper routine visible for use in NBD_CMD_DISC. --- lib/disconnect.c | 17 +++-------------- lib/internal.h | 6 ++++++ lib/rw.c | 19 +++---------------- 3 files changed, 12 insertions(+), 30 deletions(-) diff --git a/lib/disconnect.c b/lib/disconnect.c index bc43b4c..26d7298 100644 --- a/lib/disconnect.c +++ b/lib/disconnect.c @@ -62,25 +62,14 @@ nbd_unlocked_aio_disconnect (struct nbd_connection *conn) { struct command_in_flight *cmd; - cmd = malloc (sizeof *cmd); - if (cmd == NULL) { - set_error (errno, "malloc"); + cmd = command_common (conn, 0, NBD_CMD_DISC, 0, 0, NULL); + if (cmd == NULL) return -1; - } - cmd->flags = 0; - cmd->type = NBD_CMD_DISC; - cmd->handle = conn->h->unique++; - cmd->offset = 0; - cmd->count = 0; - cmd->data = NULL; - - cmd->next = conn->cmds_to_issue; - conn->cmds_to_issue = cmd; /* This will leave the command on the in-flight list. Is this a * problem? Probably it isn't. If it is, we could add a flag to * the command struct to tell SEND_REQUEST not to add it to the * in-flight list. */ - return nbd_internal_run (conn->h, conn, cmd_issue); + return 0; } diff --git a/lib/internal.h b/lib/internal.h index 3f2b729..67bd52a 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -265,6 +265,12 @@ extern void nbd_internal_set_last_error (int errnum, char *error); extern int nbd_internal_errno_of_nbd_error (uint32_t error); extern const char *nbd_internal_name_of_nbd_cmd (uint16_t type); +/* rw.c */ +extern struct command_in_flight *command_common (struct nbd_connection *conn, + uint16_t flags, uint16_t type, + uint64_t offset, + uint64_t count, void *data); + /* socket.c */ struct socket *nbd_internal_socket_create (int fd); diff --git a/lib/rw.c b/lib/rw.c index 9dfce97..a7587e9 100644 --- a/lib/rw.c +++ b/lib/rw.c @@ -241,7 +241,7 @@ nbd_unlocked_block_status (struct nbd_handle *h, return r == -1 ? -1 : 0; } -static struct command_in_flight * +struct command_in_flight * command_common (struct nbd_connection *conn, uint16_t flags, uint16_t type, uint64_t offset, uint64_t count, void *data) @@ -298,6 +298,8 @@ command_common (struct nbd_connection *conn, cmd->next = conn->cmds_to_issue; conn->cmds_to_issue = cmd; + if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) + return NULL; return cmd; } @@ -311,8 +313,6 @@ nbd_unlocked_aio_pread (struct nbd_connection *conn, void *buf, cmd = command_common (conn, 0, NBD_CMD_READ, offset, count, buf); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -344,8 +344,6 @@ nbd_unlocked_aio_pwrite (struct nbd_connection *conn, const void *buf, (void *) buf); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -363,8 +361,6 @@ nbd_unlocked_aio_flush (struct nbd_connection *conn) cmd = command_common (conn, 0, NBD_CMD_FLUSH, 0, 0, NULL); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -400,8 +396,6 @@ nbd_unlocked_aio_trim (struct nbd_connection *conn, cmd = command_common (conn, flags, NBD_CMD_TRIM, offset, count, NULL); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -424,8 +418,6 @@ nbd_unlocked_aio_cache (struct nbd_connection *conn, cmd = command_common (conn, 0, NBD_CMD_CACHE, offset, count, NULL); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -461,8 +453,6 @@ nbd_unlocked_aio_zero (struct nbd_connection *conn, cmd = command_common (conn, flags, NBD_CMD_WRITE_ZEROES, offset, count, NULL); if (!cmd) return -1; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; return cmd->handle; } @@ -505,8 +495,5 @@ nbd_unlocked_aio_block_status (struct nbd_connection *conn, cmd->extent_fn = extent; cmd->extent_id = id; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) - return -1; - return cmd->handle; } -- 2.20.1
Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue
Previously, our 'cmds_to_issue' list was at most 1 element long, because we reject all commands except from state READY, but don't get back to state READY until the issue-commands sequence has completed. However, this is not as friendly on the client - once we are in transmission phase, a client may want to queue up another command whether or not the state machine is still tied up in processing a previous command. We still want to reject commands sent before the first time we reach READY, as well as keep the cmd_issue event for kicking the state machine into action when there is no previous command being worked on, but otherwise, the state machine itself can recognize when the command queue needs draining. Signed-off-by: Eric Blake <eblake@redhat.com> --- generator/states.c | 6 ++++++ lib/internal.h | 3 +++ lib/rw.c | 24 ++++++++++++++++++++---- 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/generator/states.c b/generator/states.c index 202c305..a4eecbd 100644 --- a/generator/states.c +++ b/generator/states.c @@ -110,6 +110,12 @@ send_from_wbuf (struct nbd_connection *conn) /*----- End of prologue. -----*/ /* STATE MACHINE */ { + READY: + conn->reached_ready = true; + if (conn->cmds_to_issue) + SET_NEXT_STATE (%ISSUE_COMMAND.START); + return 0; + DEAD: if (conn->sock) { conn->sock->ops->close (conn->sock); diff --git a/lib/internal.h b/lib/internal.h index 67bd52a..2d6ad9d 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -175,6 +175,9 @@ struct nbd_connection { /* Global flags from the server. */ uint16_t gflags; + /* True if we've ever reached the READY state. */ + bool reached_ready; + /* When issuing a command, the first list contains commands waiting * to be issued. The second list contains commands which have been * issued and waiting for replies. The third list contains commands diff --git a/lib/rw.c b/lib/rw.c index a7587e9..ebd4ff9 100644 --- a/lib/rw.c +++ b/lib/rw.c @@ -246,7 +246,8 @@ command_common (struct nbd_connection *conn, uint16_t flags, uint16_t type, uint64_t offset, uint64_t count, void *data) { - struct command_in_flight *cmd; + struct command_in_flight *cmd, *prev_cmd; + bool event = !conn->reached_ready; switch (type) { /* Commands which send or receive data are limited to MAX_REQUEST_SIZE. */ @@ -296,9 +297,24 @@ command_common (struct nbd_connection *conn, if (conn->structured_replies && cmd->data && type == NBD_CMD_READ) memset (cmd->data, 0, cmd->count); - cmd->next = conn->cmds_to_issue; - conn->cmds_to_issue = cmd; - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) + /* If we have not reached READY yet, sending the event lets the + * state machine fail for requesting a command too early. If there + * are no queued commands and we are already in state READY, send an + * event to kick the state machine. In all other cases, append our + * command to the end of the queue, and the state machine will + * eventually get to it when it cycles back to READY. + */ + if (conn->cmds_to_issue != NULL) { + prev_cmd = conn->cmds_to_issue; + while (prev_cmd->next) + prev_cmd = prev_cmd->next; + prev_cmd->next = cmd; + } + else { + conn->cmds_to_issue = cmd; + event |= conn->state == STATE_READY; + } + if (event && nbd_internal_run (conn->h, conn, cmd_issue) == -1) return NULL; return cmd; -- 2.20.1
Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 3/5] states: Split ISSUE_COMMAND.SEND_REQUEST
In order to handle reading an in-flight response while in the middle of sending a second command, we'll need a way to jump back into the middle of a command being sent. This is easier if the state that sets wbuf/wlen is distinct from the state that reads into wbuf and alters wlen, and also easier if we don't move the command to the in-flight queue until after the writes finish (since the previous patch was careful to queue new pending commands to the end of the list, the head of the list is always the current command needing to be sent). --- generator/generator | 14 ++++++++++ generator/states-issue-command.c | 45 ++++++++++++++++++++------------ 2 files changed, 42 insertions(+), 17 deletions(-) diff --git a/generator/generator b/generator/generator index a1bf41d..a4ad362 100755 --- a/generator/generator +++ b/generator/generator @@ -637,12 +637,26 @@ and issue_command_state_machine = [ external_events = [ NotifyWrite, "" ]; }; + State { + default_state with + name = "PREPARE_WRITE_PAYLOAD"; + comment = "Prepare the write payload to send to the remote server"; + external_events = []; + }; + State { default_state with name = "SEND_WRITE_PAYLOAD"; comment = "Sending the write payload to the remote server"; external_events = [ NotifyWrite, "" ]; }; + +State { + default_state with + name = "FINISH"; + comment = "Finish issuing a command"; + external_events = []; + }; ] (* Receiving a reply from the server. *) diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c index a57f40f..e24ea34 100644 --- a/generator/states-issue-command.c +++ b/generator/states-issue-command.c @@ -24,9 +24,6 @@ assert (conn->cmds_to_issue != NULL); cmd = conn->cmds_to_issue; - conn->cmds_to_issue = cmd->next; - cmd->next = conn->cmds_in_flight; - conn->cmds_in_flight = cmd; conn->sbuf.request.magic = htobe32 (NBD_REQUEST_MAGIC); conn->sbuf.request.flags = htobe16 (cmd->flags); @@ -40,29 +37,43 @@ return 0; ISSUE_COMMAND.SEND_REQUEST: - struct command_in_flight *cmd; - switch (send_from_wbuf (conn)) { case -1: SET_NEXT_STATE (%.DEAD); return -1; - case 0: - assert (conn->cmds_in_flight != NULL); - cmd = conn->cmds_in_flight; - assert (cmd->handle == be64toh (conn->sbuf.request.handle)); - if (cmd->type == NBD_CMD_WRITE) { - conn->wbuf = cmd->data; - conn->wlen = cmd->count; - SET_NEXT_STATE (%SEND_WRITE_PAYLOAD); - } - else - SET_NEXT_STATE (%.READY); + case 0: SET_NEXT_STATE (%PREPARE_WRITE_PAYLOAD); } return 0; + ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD: + struct command_in_flight *cmd; + + assert (conn->cmds_to_issue != NULL); + cmd = conn->cmds_to_issue; + assert (cmd->handle == be64toh (conn->sbuf.request.handle)); + if (cmd->type == NBD_CMD_WRITE) { + conn->wbuf = cmd->data; + conn->wlen = cmd->count; + SET_NEXT_STATE (%SEND_WRITE_PAYLOAD); + } + else + SET_NEXT_STATE (%FINISH); + return 0; + ISSUE_COMMAND.SEND_WRITE_PAYLOAD: switch (send_from_wbuf (conn)) { case -1: SET_NEXT_STATE (%.DEAD); return -1; - case 0: SET_NEXT_STATE (%.READY); + case 0: SET_NEXT_STATE (%FINISH); } return 0; + ISSUE_COMMAND.FINISH: + struct command_in_flight *cmd; + + assert (conn->cmds_to_issue != NULL); + cmd = conn->cmds_to_issue; + conn->cmds_to_issue = cmd->next; + cmd->next = conn->cmds_in_flight; + conn->cmds_in_flight = cmd; + SET_NEXT_STATE (%.READY); + return 0; + } /* END STATE MACHINE */ -- 2.20.1
Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command
As already noted in our state machine, a client that batches up a large read followed by large writes, coupled with a server that only processes commands in order, can result in deadlock (the server won't read more until we unblock its ability to write out its reply to our first command; but we aren't willing to read until we are done writing out our second command). Break the deadlock by teaching the generator that while we are in the middle of writing a command, we must remain responsive to read_notify events; if the server has data for us to read, we should consume that before jumping back into the middle of our command issue (and consuming a reply can invalidate sbuf, so we have to drop an assertion in PREPARE_WRITE_PAYLOAD). --- generator/generator | 26 ++++++++++++++++++-------- generator/states-issue-command.c | 25 ++++++++++++++++++++++++- lib/internal.h | 1 + 3 files changed, 43 insertions(+), 9 deletions(-) diff --git a/generator/generator b/generator/generator index a4ad362..5c84a5d 100755 --- a/generator/generator +++ b/generator/generator @@ -620,12 +620,6 @@ and issue_command_state_machine = [ State { default_state with name = "START"; - (* XXX There's a possible deadlock here if a server cannot - * handle multiple requests pipelined on a single connection. - * We could try to issue a command and block, but reads might - * be available. It should be possible to break this with - * another state. - *) comment = "Begin issuing a command to the remote server"; external_events = []; }; @@ -634,7 +628,15 @@ and issue_command_state_machine = [ default_state with name = "SEND_REQUEST"; comment = "Sending a request to the remote server"; - external_events = [ NotifyWrite, "" ]; + external_events = [ NotifyWrite, ""; + NotifyRead, "PAUSE_SEND_REQUEST" ]; + }; + + State { + default_state with + name = "PAUSE_SEND_REQUEST"; + comment = "Interrupt send request to receive an earlier command's reply"; + external_events = []; }; State { @@ -648,7 +650,15 @@ and issue_command_state_machine = [ default_state with name = "SEND_WRITE_PAYLOAD"; comment = "Sending the write payload to the remote server"; - external_events = [ NotifyWrite, "" ]; + external_events = [ NotifyWrite, ""; + NotifyRead, "PAUSE_WRITE_PAYLOAD" ]; + }; + +State { + default_state with + name = "PAUSE_WRITE_PAYLOAD"; + comment = "Interrupt write payload to receive an earlier command's reply"; + external_events = []; }; State { diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c index e24ea34..3a5980d 100644 --- a/generator/states-issue-command.c +++ b/generator/states-issue-command.c @@ -25,6 +25,15 @@ assert (conn->cmds_to_issue != NULL); cmd = conn->cmds_to_issue; + /* Were we interrupted by reading a reply to an earlier command? */ + if (conn->wlen) { + if (conn->in_write_payload) + SET_NEXT_STATE(%SEND_WRITE_PAYLOAD); + else + SET_NEXT_STATE(%SEND_REQUEST); + return 0; + } + conn->sbuf.request.magic = htobe32 (NBD_REQUEST_MAGIC); conn->sbuf.request.flags = htobe16 (cmd->flags); conn->sbuf.request.type = htobe16 (cmd->type); @@ -43,12 +52,18 @@ } return 0; + ISSUE_COMMAND.PAUSE_SEND_REQUEST: + assert (conn->wlen); + assert (conn->cmds_to_issue != NULL); + conn->in_write_payload = false; + SET_NEXT_STATE (%^REPLY.START); + return 0; + ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD: struct command_in_flight *cmd; assert (conn->cmds_to_issue != NULL); cmd = conn->cmds_to_issue; - assert (cmd->handle == be64toh (conn->sbuf.request.handle)); if (cmd->type == NBD_CMD_WRITE) { conn->wbuf = cmd->data; conn->wlen = cmd->count; @@ -65,9 +80,17 @@ } return 0; + ISSUE_COMMAND.PAUSE_WRITE_PAYLOAD: + assert (conn->wlen); + assert (conn->cmds_to_issue != NULL); + conn->in_write_payload = true; + SET_NEXT_STATE (%^REPLY.START); + return 0; + ISSUE_COMMAND.FINISH: struct command_in_flight *cmd; + assert (!conn->wlen); assert (conn->cmds_to_issue != NULL); cmd = conn->cmds_to_issue; conn->cmds_to_issue = cmd->next; diff --git a/lib/internal.h b/lib/internal.h index 2d6ad9d..91c056c 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -185,6 +185,7 @@ struct nbd_connection { * acknowledge them. */ struct command_in_flight *cmds_to_issue, *cmds_in_flight, *cmds_done; + bool in_write_payload; }; struct meta_context { -- 2.20.1
Eric Blake
2019-May-22 03:15 UTC
[Libguestfs] [libnbd PATCH v2 5/5] examples: Add example to demonstrate just-fixed deadlock scenario
Added merely as an example to be run by hand, rather than automated into 'make check', since nbdkit --filter=noparallel is quite new. --- .gitignore | 1 + examples/Makefile.am | 10 ++ examples/batched-read-write.c | 199 ++++++++++++++++++++++++++++++++++ 3 files changed, 210 insertions(+) create mode 100644 examples/batched-read-write.c diff --git a/.gitignore b/.gitignore index 66ff811..2ea7da0 100644 --- a/.gitignore +++ b/.gitignore @@ -30,6 +30,7 @@ Makefile.in /docs/libnbd.3 /docs/libnbd-api.3 /docs/libnbd-api.pod +/examples/batched-read-write /examples/threaded-reads-and-writes /examples/simple-fetch-first-sector /examples/simple-reads-and-writes diff --git a/examples/Makefile.am b/examples/Makefile.am index be3f21d..6825384 100644 --- a/examples/Makefile.am +++ b/examples/Makefile.am @@ -18,6 +18,7 @@ include $(top_srcdir)/subdir-rules.mk noinst_PROGRAMS = \ + batched-read-write \ simple-fetch-first-sector \ simple-reads-and-writes \ threaded-reads-and-writes @@ -50,3 +51,12 @@ threaded_reads_and_writes_CFLAGS = \ threaded_reads_and_writes_LDADD = \ $(top_builddir)/lib/libnbd.la \ $(PTHREAD_LIBS) + +batched_read_write_SOURCES = \ + batched-read-write.c +batched_read_write_CPPFLAGS = \ + -I$(top_srcdir)/include +batched_read_write_CFLAGS = \ + $(WARNINGS_CFLAGS) +batched_read_write_LDADD = \ + $(top_builddir)/lib/libnbd.la diff --git a/examples/batched-read-write.c b/examples/batched-read-write.c new file mode 100644 index 0000000..888f4b6 --- /dev/null +++ b/examples/batched-read-write.c @@ -0,0 +1,199 @@ +/* This example can be copied, used and modified for any purpose + * without restrictions. + * + * Example usage with nbdkit: + * + * timeout 10s nbdkit -U - --filter=noparallel memory 2M \ + * --run './batched-read-write $unixsocket' + * + * This will attempt to batch a large aio read request immediately + * followed by a large aio write request, prior to waiting for any + * command replies from the server. A naive client that does not check + * for available read data related to the first command while trying + * to write data for the second command, coupled with a server that + * only processes commands serially, would cause deadlock (both + * processes fill up their write buffers waiting for a reader); thus, + * this tests that libnbd is smart enough to always respond to replies + * for in-flight requests even when it has batched up other commands + * to write. + * + * To run it against a remote server over TCP: + * + * ./batched-read-write hostname port + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <stdint.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> +#include <poll.h> +#include <time.h> +#include <assert.h> + +#include <libnbd.h> + +/* The single NBD handle. */ +static struct nbd_handle *nbd; + +/* Buffers used for the test. */ +static char *in, *out; +static int64_t packetsize; + +static int +try_deadlock (void *arg) +{ + struct pollfd fds[1]; + struct nbd_connection *conn; + size_t i, j; + int64_t handles[2]; + size_t in_flight; /* counts number of requests in flight */ + int dir, r; + + /* The single thread "owns" the connection. */ + conn = nbd_get_connection (nbd, 0); + + /* Issue commands. */ + in_flight = 0; + fprintf (stderr, " * before aio_pread\n"); + handles[0] = nbd_aio_pread (conn, in, packetsize, 0); + if (handles[0] == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + goto error; + } + fprintf (stderr, " * after aio_pread\n"); + in_flight++; + fprintf (stderr, " * before aio_pwrite\n"); + handles[1] = nbd_aio_pwrite (conn, out, packetsize, packetsize, 0); + if (handles[1] == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + goto error; + } + fprintf (stderr, " * after aio_pwrite\n"); + in_flight++; + + /* Now wait for commands to retire, or for deadlock to occur */ + while (in_flight > 0) { + if (nbd_aio_is_dead (conn) || nbd_aio_is_closed (conn)) { + fprintf (stderr, "connection is dead or closed\n"); + goto error; + } + + fds[0].fd = nbd_aio_get_fd (conn); + fds[0].events = 0; + fds[0].revents = 0; + dir = nbd_aio_get_direction (conn); + if ((dir & LIBNBD_AIO_DIRECTION_READ) != 0) + fds[0].events |= POLLIN; + if ((dir & LIBNBD_AIO_DIRECTION_WRITE) != 0) + fds[0].events |= POLLOUT; + + if (poll (fds, 1, -1) == -1) { + perror ("poll"); + goto error; + } + + if ((dir & LIBNBD_AIO_DIRECTION_READ) != 0 && + (fds[0].revents & POLLIN) != 0) + nbd_aio_notify_read (conn); + else if ((dir & LIBNBD_AIO_DIRECTION_WRITE) != 0 && + (fds[0].revents & POLLOUT) != 0) + nbd_aio_notify_write (conn); + + /* If a command is ready to retire, retire it. */ + for (j = 0; j < in_flight; ++j) { + r = nbd_aio_command_completed (conn, handles[j]); + if (r == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + goto error; + } + if (r) { + memmove (&handles[j], &handles[j+1], + sizeof (handles[0]) * (in_flight - j - 1)); + j--; + in_flight--; + } + } + } + + printf ("finished OK\n"); + + return 0; + + error: + fprintf (stderr, "failed\n"); + return -1; +} + +int +main (int argc, char *argv[]) +{ + int err; + int64_t exportsize; + + if (argc < 2 || argc > 3) { + fprintf (stderr, "%s socket | hostname port\n", argv[0]); + exit (EXIT_FAILURE); + } + + nbd = nbd_create (); + if (nbd == NULL) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + /* Connect synchronously as this is simpler. */ + if (argc == 2) { + if (nbd_connect_unix (nbd, argv[1]) == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + } + else { + if (nbd_connect_tcp (nbd, argv[1], argv[2]) == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + } + + if (nbd_read_only (nbd) == 1) { + fprintf (stderr, "%s: error: this NBD export is read-only\n", argv[0]); + exit (EXIT_FAILURE); + } + + exportsize = nbd_get_size (nbd); + if (exportsize == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + packetsize = exportsize / 2; + if (packetsize > 2 * 1024 * 1024) + packetsize = 2 * 1024 * 1024; + + in = malloc (packetsize); + out = malloc (packetsize); + if (!in || !out) { + fprintf (stderr, "insufficient memory\n"); + exit (EXIT_FAILURE); + } + + /* Attempt to be non-destructive, by writing what file already contains */ + if (nbd_pread (nbd, out, packetsize, packetsize) == -1) { + fprintf (stderr, "sync read failed: %s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + if (try_deadlock (NULL) == -1) + exit (EXIT_FAILURE); + + if (nbd_shutdown (nbd) == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + nbd_close (nbd); + + return EXIT_SUCCESS; +} -- 2.20.1
Richard W.M. Jones
2019-May-22 07:03 UTC
Re: [Libguestfs] [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands
There are a couple of unused variables in the new program. The attached patch fixes it. ./configure --enable-gcc-warnings should catch these automatically. I need to document this ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Richard W.M. Jones
2019-May-22 09:13 UTC
Re: [Libguestfs] [libnbd PATCH v2 1/5] lib: Refactor state event into command_common
On Tue, May 21, 2019 at 10:15:48PM -0500, Eric Blake wrote:> The next patch wants to make queuing commands smarter; rather than > duplicating the logic at each command's call site, it's better to > centralize things in command_common, and to also make this helper > routine visible for use in NBD_CMD_DISC.There's a few problems with this patch. To avoid leaking symbols -- even though we have a linker script, not all future platforms are expected to support it -- global symbols which are not exported should be called nbd_internal_*. So command_common should be nbd_internal_command_common. The bigger problem is:> @@ -505,8 +495,5 @@ nbd_unlocked_aio_block_status (struct nbd_connection *conn, > cmd->extent_fn = extent; > cmd->extent_id = id; > > - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) > - return -1; > - > return cmd->handle;This changes the semantics of the function because the cmd->extent_* fields are set too late. (Yes, command_common is rather ad hoc :-) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2019-May-22 09:23 UTC
Re: [Libguestfs] [libnbd PATCH v2 2/5] commands: Allow for a command queue
On Tue, May 21, 2019 at 10:15:49PM -0500, Eric Blake wrote:> Previously, our 'cmds_to_issue' list was at most 1 element long, > because we reject all commands except from state READY, but don't get > back to state READY until the issue-commands sequence has completed. > However, this is not as friendly on the client - once we are in > transmission phase, a client may want to queue up another command > whether or not the state machine is still tied up in processing a > previous command. We still want to reject commands sent before the > first time we reach READY, as well as keep the cmd_issue event for > kicking the state machine into action when there is no previous > command being worked on, but otherwise, the state machine itself can > recognize when the command queue needs draining.I have a queued series which adds some new calls that may make this patch simpler: int nbd_[unlocked_]aio_is_created (conn) Returns true if the connection is in the START state int nbd_[unlocked_]aio_is_connecting (conn) Returns true if the connection is connecting or handshaking int nbd_[unlocked_]aio_is_processing (conn) Returns true if the connection is issuing or replying to commands (but not in the READY state) I will post these later this morning. I believe they will let you get rid of the reached_ready flag, as well as making this patch more correct because you probably want to avoid issuing commands on a connection which is closed or dead. You'd use a test like: if (nbd_unlocked_aio_is_ready (conn)) { // call nbd_internal_run } else if (nbd_unlocked_aio_is_processing (conn)) { // can't call nbd_internal_run because we're in the // wrong state, but we can enqueue the command and it // will be processed next time we get to READY } else { // this is an error }> @@ -296,9 +297,24 @@ command_common (struct nbd_connection *conn, > if (conn->structured_replies && cmd->data && type == NBD_CMD_READ) > memset (cmd->data, 0, cmd->count); > > - cmd->next = conn->cmds_to_issue; > - conn->cmds_to_issue = cmd; > - if (nbd_internal_run (conn->h, conn, cmd_issue) == -1) > + /* If we have not reached READY yet, sending the event lets the > + * state machine fail for requesting a command too early. If there > + * are no queued commands and we are already in state READY, send an > + * event to kick the state machine. In all other cases, append our > + * command to the end of the queue, and the state machine will > + * eventually get to it when it cycles back to READY. > + */I think this test is wrong. We don't know what the failure was, it might have failed for a variety of reasons. It's better to simple test if nbd_unlocked_aio_is_ready, then call nbd_internal_run (if true), otherwise queue the command. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2019-May-22 09:24 UTC
Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command
Did we think it was better that the READY state should check if there are commands to issue and automatically transition to PREPARE_FOR_WRITE if so? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2019-May-22 11:21 UTC
Re: [Libguestfs] [libnbd PATCH v2 4/5] states: Allow in-flight read while writing next command
On 5/21/19 10:15 PM, Eric Blake wrote:> As already noted in our state machine, a client that batches up a > large read followed by large writes, coupled with a server that only > processes commands in order, can result in deadlock (the server won't > read more until we unblock its ability to write out its reply to our > first command; but we aren't willing to read until we are done writing > out our second command). Break the deadlock by teaching the generator > that while we are in the middle of writing a command, we must remain > responsive to read_notify events; if the server has data for us to > read, we should consume that before jumping back into the middle of > our command issue (and consuming a reply can invalidate sbuf, so we > have to drop an assertion in PREPARE_WRITE_PAYLOAD).That's a hint of a latent bug...> --- > generator/generator | 26 ++++++++++++++++++-------- > generator/states-issue-command.c | 25 ++++++++++++++++++++++++- > lib/internal.h | 1 + > 3 files changed, 43 insertions(+), 9 deletions(-)> @@ -43,12 +52,18 @@ > } > return 0; > > + ISSUE_COMMAND.PAUSE_SEND_REQUEST: > + assert (conn->wlen); > + assert (conn->cmds_to_issue != NULL); > + conn->in_write_payload = false; > + SET_NEXT_STATE (%^REPLY.START); > + return 0;...Since processing REPLY is allowed to scribble over conn->sbuf,> + > ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD: > struct command_in_flight *cmd; > > assert (conn->cmds_to_issue != NULL); > cmd = conn->cmds_to_issue; > - assert (cmd->handle == be64toh (conn->sbuf.request.handle));it is also possible that resuming SEND_REQUEST will be sending scribbled data. So I need to split conn->sbuf into two parts - a section used for command issue, and the rest used for REPLY, at which point the assert is still viable after all. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [libnbd PATCH 3/3] states: Allow in-flight read while writing next command
- [libnbd PATCH 2/3] states: Split ISSUE_COMMAND.SEND_REQUEST
- [libnbd PATCH 0/3] Avoid deadlock with in-flight commands
- [libnbd PATCH v2 0/5] Avoid deadlock with in-flight commands
- [libnbd PATCH 4/4] states: Add NBD_OPT_EXPORT_NAME handling