Eric Blake
2020-Mar-30 19:59 UTC
[Libguestfs] [libnbd PATCH 0/2] fix hangs against nbdkit 1.2
nbdkit 1.2 as a server waits for read() to see EOF, even after the client has sent NBD_CMD_DISC. That was fixed in mbdkit 1.4; and most modern NBD servers are smarter than this (they close() the write end of their traffic soon after NBD_CMD_DISC). But it's easy enough to revert nbdkit commit c70616f8 to get back to a server with the same behavior as the older nbdkit, at which point both 'cd /path/to/libnbd && make check PATH=/path/to/nbdkit' and 'cd /path/to/nbdkit && /path/to/libnbd/run make check' will hang without this series. In short, this series is restoring the shutdown(SHUT_WR) call that got lost from plugins/nbd/nbd.c when nbdkit switched to libnbd in commit ab7760fc. Eric Blake (2): sockets: Add .shut_writes callback states: Add state for shutdown/gnutls_bye after NBD_CMD_DISC lib/internal.h | 4 +++- generator/state_machine.ml | 19 +++++++++++++++++-- generator/states-issue-command.c | 20 ++++++++++++++++++-- lib/crypto.c | 21 +++++++++++++++++---- lib/socket.c | 12 +++++++++++- 5 files changed, 66 insertions(+), 10 deletions(-) -- 2.26.0.rc2
Eric Blake
2020-Mar-30 19:59 UTC
[Libguestfs] [libnbd PATCH 1/2] sockets: Add .shut_writes callback
The next commit wants to teach libnbd to ensure that a server will read EOF from a bi-directional connection as soon as libnbd is done writing, rather than making libnbd dependent on the server being the connection endpoint to initiate closing the socket or waiting on the user to eventually call nbd_close(). For that to work, we need to expose the equivalent of shutdown(SHUT_WR) for our various connection types. Note that shutdown() is only applicable to sockets, since it only really aids in bi-directional communication on a single fd. For now, libnbd only uses sockets (even for nbd_connect_command, we use socketpair() to pass a single fd to both stdin and stdout, rather than two pipe() calls to create a pair of one-way communication). But if we were to ever support NBD connections over a pair of pipe()s, the obvious implementation would be to close() the write pipe during .shut_writes, while leaving the read pipe open until .close. Also note that shutdown() is effectively non-blocking, but that gnutls_bye() can fail with GNUTLS_E_AGAIN if it is waiting for space to send notification to the server. But at the same time, while socket read and write normally distinguish between success, EAGAIN, and other failures, we don't need the tri-state return from our new hook (we will only ever attempt to shut down writes after sending NBD_CMD_DISC, and moving to CLOSED instead of DEAD at that point doesn't change the results of any earlier requests). Finally, note that the semantics of this new callback are specifically only for SHUT_WR, and not generic enough for SHUT_RDWR. This is because gnutls_bye() handles SHUT_WR with only writes to the socket, but when handling SHUT_RDWR it also waits for a subsequent read. Getting the ping-pong I/O correct would require more states, for no real gain when the user can just blindly call nbd_close() and close the entire socket rather than having to massage their poll() loop through more states. --- lib/internal.h | 3 ++- lib/crypto.c | 21 +++++++++++++++++---- lib/socket.c | 12 +++++++++++- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/lib/internal.h b/lib/internal.h index 6eb50d8..0d02687 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -1,5 +1,5 @@ /* nbd client library in userspace: internal definitions - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -267,6 +267,7 @@ struct socket_ops { struct socket *sock, const void *buf, size_t len, int flags); bool (*pending) (struct socket *sock); int (*get_fd) (struct socket *sock); + bool (*shut_writes) (struct nbd_handle *h, struct socket *sock); int (*close) (struct socket *sock); }; diff --git a/lib/crypto.c b/lib/crypto.c index 3e33af7..516651c 100644 --- a/lib/crypto.c +++ b/lib/crypto.c @@ -1,5 +1,5 @@ /* NBD client library in userspace - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -230,9 +230,21 @@ tls_get_fd (struct socket *sock) return sock->u.tls.oldsock->ops->get_fd (sock->u.tls.oldsock); } -/* XXX Calling gnutls_bye is possible, but it may send and receive - * data over the wire which would require modifications to the state - * machine. So instead we abruptly drop the TLS session. +static bool +tls_shut_writes (struct nbd_handle *h, struct socket *sock) +{ + int r = gnutls_bye (sock->u.tls.session, GNUTLS_SHUT_WR); + + if (r == GNUTLS_E_AGAIN || r == GNUTLS_E_INTERRUPTED) + return false; + if (r != 0) + debug (h, "ignoring gnutls_bye failure: %s", gnutls_strerror (r)); + return sock->u.tls.oldsock->ops->shut_writes (h, sock->u.tls.oldsock); +} + +/* XXX Calling gnutls_bye(GNUTLS_SHUT_RDWR) is possible, but it may send + * and receive data over the wire which would require further modifications + * to the state machine. So instead we abruptly drop the TLS session. */ static int tls_close (struct socket *sock) @@ -254,6 +266,7 @@ static struct socket_ops crypto_ops = { .send = tls_send, .pending = tls_pending, .get_fd = tls_get_fd, + .shut_writes = tls_shut_writes, .close = tls_close, }; diff --git a/lib/socket.c b/lib/socket.c index ac60286..6698941 100644 --- a/lib/socket.c +++ b/lib/socket.c @@ -1,5 +1,5 @@ /* NBD client library in userspace - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -63,6 +63,15 @@ socket_get_fd (struct socket *sock) return sock->u.fd; } +static bool +socket_shut_writes (struct nbd_handle *h, struct socket *sock) +{ + if (shutdown (sock->u.fd, SHUT_WR) == -1) + debug (h, "ignoring shutdown failure: %s", strerror (errno)); + /* Regardless of any errors, we don't need to retry. */ + return true; +} + static int socket_close (struct socket *sock) { @@ -76,6 +85,7 @@ static struct socket_ops socket_ops = { .recv = socket_recv, .send = socket_send, .get_fd = socket_get_fd, + .shut_writes = socket_shut_writes, .close = socket_close, }; -- 2.26.0.rc2
Eric Blake
2020-Mar-30 19:59 UTC
[Libguestfs] [libnbd PATCH 2/2] states: Add state for shutdown/gnutls_bye after NBD_CMD_DISC
nbdkit prior to commit c70616f87c was prone to deadlock on a bi-directional connection: it would end up stuck in a read() after getting NBD_CMD_DISC to make sure that the client wasn't erroneously sending any more requests, at the same time that the client could get stuck in a read() waiting for EOF from the server responding to NBD_CMD_DISC. While nbdkit as a server has been fixed, it is possible that there are other servers out there with the same issue. What's more, nbdkit 430f814102 demonstrated that clients can be robust to such servers, by utilizing shutdown() to give the server an early EOF even while the bi-directional socket remains open. So, libnbd needs to use the same solution as what nbdkit had done as a client. We need to wire up a couple of new states in order to call the just-added .shut_writes hook as needed (in practice, a client won't be calling nbd_aio_disconnect while there are still pending earlier requests, and thus we are unlikely to hit the case where writes performed during gnutls_bye() actually block; but the logic to support this is easy to copy from what we already have in place for NBD_CMD_WRITE). --- lib/internal.h | 1 + generator/state_machine.ml | 19 +++++++++++++++++-- generator/states-issue-command.c | 20 ++++++++++++++++++-- 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/lib/internal.h b/lib/internal.h index 0d02687..c99c3e7 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -192,6 +192,7 @@ struct nbd_handle { */ struct nbd_request request; bool in_write_payload; + bool in_write_shutdown; /* When connecting, this stores the socket address. */ struct sockaddr_storage connaddr; diff --git a/generator/state_machine.ml b/generator/state_machine.ml index 6c7f8bd..bd65ffb 100644 --- a/generator/state_machine.ml +++ b/generator/state_machine.ml @@ -602,14 +602,29 @@ and issue_command_state_machine = [ NotifyRead, "PAUSE_WRITE_PAYLOAD" ]; }; -State { + State { default_state with name = "PAUSE_WRITE_PAYLOAD"; comment = "Interrupt write payload to receive an earlier command's reply"; external_events = []; }; -State { + State { + default_state with + name = "SEND_WRITE_SHUTDOWN"; + comment = "Sending write shutdown notification to the remote server"; + external_events = [ NotifyWrite, ""; + NotifyRead, "PAUSE_WRITE_SHUTDOWN" ]; + }; + + State { + default_state with + name = "PAUSE_WRITE_SHUTDOWN"; + comment = "Interrupt write shutdown to receive an earlier command's reply"; + external_events = []; + }; + + State { default_state with name = "FINISH"; comment = "Finish issuing a command"; diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c index bd9946d..a810114 100644 --- a/generator/states-issue-command.c +++ b/generator/states-issue-command.c @@ -1,5 +1,5 @@ /* nbd client library in userspace: state machine - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -31,7 +31,9 @@ STATE_MACHINE { * writes yet; we want to advance back to the correct state but * without trying a send_from_wbuf that will likely return 1. */ - if (h->wlen) { + if (h->in_write_shutdown) + SET_NEXT_STATE_AND_BLOCK (%SEND_WRITE_SHUTDOWN); + else if (h->wlen) { if (h->in_write_payload) SET_NEXT_STATE_AND_BLOCK (%SEND_WRITE_PAYLOAD); else @@ -79,6 +81,10 @@ STATE_MACHINE { h->wflags = MSG_MORE; SET_NEXT_STATE (%SEND_WRITE_PAYLOAD); } + else if (cmd->type == NBD_CMD_DISC) { + h->in_write_shutdown = true; + SET_NEXT_STATE (%SEND_WRITE_SHUTDOWN); + } else SET_NEXT_STATE (%FINISH); return 0; @@ -97,6 +103,16 @@ STATE_MACHINE { SET_NEXT_STATE (%^REPLY.START); return 0; + ISSUE_COMMAND.SEND_WRITE_SHUTDOWN: + if (h->sock->ops->shut_writes (h, h->sock)) + SET_NEXT_STATE (%FINISH); + return 0; + + ISSUE_COMMAND.PAUSE_WRITE_SHUTDOWN: + assert (h->in_write_shutdown); + SET_NEXT_STATE (%^REPLY.START); + return 0; + ISSUE_COMMAND.FINISH: struct command *cmd; -- 2.26.0.rc2
Eric Blake
2020-Mar-30 20:16 UTC
Re: [Libguestfs] [libnbd PATCH 0/2] fix hangs against nbdkit 1.2
On 3/30/20 2:59 PM, Eric Blake wrote:> nbdkit 1.2 as a server waits for read() to see EOF, even after the > client has sent NBD_CMD_DISC. That was fixed in mbdkit 1.4; and most > modern NBD servers are smarter than this (they close() the write end > of their traffic soon after NBD_CMD_DISC). But it's easy enough to > revert nbdkit commit c70616f8 to get back to a server with the same > behavior as the older nbdkit, at which point both 'cd /path/to/libnbd > && make check PATH=/path/to/nbdkit'Making the obvious correction: cd /path/to/libnbd && make check PATH=/path/to/nbdkit:$PATH> and 'cd /path/to/nbdkit && > /path/to/libnbd/run make check' will hang without this series. > > In short, this series is restoring the shutdown(SHUT_WR) call that got > lost from plugins/nbd/nbd.c when nbdkit switched to libnbd in commit > ab7760fc.-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Mar-31 12:55 UTC
Re: [Libguestfs] [libnbd PATCH 0/2] fix hangs against nbdkit 1.2
On Mon, Mar 30, 2020 at 03:16:59PM -0500, Eric Blake wrote:> On 3/30/20 2:59 PM, Eric Blake wrote: > >nbdkit 1.2 as a server waits for read() to see EOF, even after the > >client has sent NBD_CMD_DISC. That was fixed in mbdkit 1.4; and most > >modern NBD servers are smarter than this (they close() the write end > >of their traffic soon after NBD_CMD_DISC). But it's easy enough to > >revert nbdkit commit c70616f8 to get back to a server with the same > >behavior as the older nbdkit, at which point both 'cd /path/to/libnbd > >&& make check PATH=/path/to/nbdkit' > > Making the obvious correction: > cd /path/to/libnbd && make check PATH=/path/to/nbdkit:$PATH > > >and 'cd /path/to/nbdkit && > >/path/to/libnbd/run make check' will hang without this series. > > > >In short, this series is restoring the shutdown(SHUT_WR) call that got > >lost from plugins/nbd/nbd.c when nbdkit switched to libnbd in commit > >ab7760fc.The series looks fine on inspection, although I guess it's the kind of thing that may reveal weird deadlocks in some corner cases, but we'll find those through testing. ACK Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Possibly Parallel Threads
- [nbdkit PATCH 0/2] Improve shutdown race in nbd plugin
- [libnbd PATCH] tls: Check for pending bytes in gnutls buffers
- Re: [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()
- [nbdkit PATCH 0/2] Fix testsuite deadlocks during close
- [libnbd PATCH 0/3] opt_list_meta_context