Eric Blake
2020-Mar-27 22:33 UTC
[Libguestfs] [nbdkit PATCH 0/2] Improve shutdown race in nbd plugin
I still need more soak time on testing, to see whether I can: a) reproduce the hang with patch 2 not applied b) feel confident that patch 2 is sufficient to fix the race, or else determine that I also need to augment the loop condition in the reader thread to additionally break out of the loop when the pipe-to-self sees EOF even when nbd_aio_is_dead() has not yet been satisfied I'm also fairly confident that I'll need to patch libnbd to call shutdown(SHUT_WR) on plaintext and gnutls_bye(GNUTLS_SHUT_WR) on TLS connections after sending NBD_CMD_DISC, as that is something we lowt when the nbd plugin switched to libnbd, even after documenting in commit 430f8141 why it is important for preventing shutdown deadlocks. Eric Blake (2): nbd: Don't reference stale errno in reader loop nbd: Reorder cleanup to avoid getting stuck in poll() plugins/nbd/nbd.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) -- 2.26.0.rc2
Eric Blake
2020-Mar-27 22:33 UTC
[Libguestfs] [nbdkit PATCH 1/2] nbd: Don't reference stale errno in reader loop
When switching to libnbd in commit ab7760fcfd, I mistakenly assumed that after a POLLIN event fires on the pipe-to-self, then read() will return either 1 or -1. But this is not true; read() can also return 0 (if the pipe hits EOF), in which case POSIX says errno has an unspecified value, and we should not be deciding whether to log an error based on a random value. I did not manage to fix the bug even when later refactoring the pipe-to-self to be non-blocking, although it appears to be latent as long as the pipe is non-blocking and the write end is not closed. Fixes: e0d32468 Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index d020beec..e5b8f338 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -332,9 +332,11 @@ nbdplug_reader (void *handle) /* Check if we were kicked because a command was started */ if (fds[1].revents & POLLIN) { - while (read (h->fds[0], &c, 1) == 1) + int r; + + while ((r = read (h->fds[0], &c, 1)) == 1) /* Drain any backlog */; - if (errno != EAGAIN) { + if (r == -1 && errno != EAGAIN) { nbdkit_error ("failed to read pipe: %m"); break; } -- 2.26.0.rc2
Eric Blake
2020-Mar-27 22:33 UTC
[Libguestfs] [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()
We have been seeing sporadic hangs on test-nbd-tls-psk.sh, where even though the client to the 'nbdkit nbd' process has cleanly exited, things are stalled in .close where nbd is trying to pthread_join() the reader thread, while the reader thread is itself blocked on a poll() that will never make additional progress. Tracing the race is difficult: nbd_shutdown() sends NBD_CMD_DISC to the server, and the NBD protocol does not require the server to send a response but does not forbid the server from using gnutls_bye to at least gracefully end the TLS session. So where a plaintext server just closes the socket (and the resulting EOF exits our poll()), it appears that with TLS, we can sometimes hit the situation where the server is waiting for a response to gnutls_bye() (note that qemu does not use this function, but nbdkit does), while at the same time our client is waiting to see EOF on the socket, resulting in deadlock. In the past (see commit 430f8141), we added shutdown(SHUT_WR) for plaintext clients to prod the server into closing the socket faster, but libnbd does not yet have that counterpart action. But I can at least reuse the same mechanism we have for waking up the poll() loop when first sending a command to the server. That is, since we already have a pipe-to-self in addition to reading from the server, it's trivial to argue that closing the pipe-to-self will guarantee that the reader thread sees something interesting to break out of its poll() loop, regardless of whether it also sees something interesting from the server after having sent NBD_CMD_DISC, and regardless of whether I need to add in more gnutls_bye() calls to either nbdkit or libnbd. Fixes: ab7760fc Signed-off-by: Eric Blake <eblake@redhat.com> --- May be incomplete: I might also need to break out of the reader loop when read() returns 0. plugins/nbd/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index e5b8f338..23a7da06 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -537,10 +537,10 @@ nbdplug_close_handle (struct handle *h) { if (nbd_shutdown (h->nbd, 0) == -1) nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ()); + close (h->fds[1]); if ((errno = pthread_join (h->reader, NULL))) nbdkit_debug ("failed to join reader thread: %m"); close (h->fds[0]); - close (h->fds[1]); nbd_close (h->nbd); free (h); } -- 2.26.0.rc2
Richard W.M. Jones
2020-Mar-27 23:27 UTC
Re: [Libguestfs] [nbdkit PATCH 1/2] nbd: Don't reference stale errno in reader loop
On Fri, Mar 27, 2020 at 05:33:27PM -0500, Eric Blake wrote:> When switching to libnbd in commit ab7760fcfd, I mistakenly assumed > that after a POLLIN event fires on the pipe-to-self, then read() will > return either 1 or -1. But this is not true; read() can also return 0 > (if the pipe hits EOF), in which case POSIX says errno has an > unspecified value, and we should not be deciding whether to log an > error based on a random value. I did not manage to fix the bug even > when later refactoring the pipe-to-self to be non-blocking, although > it appears to be latent as long as the pipe is non-blocking and the > write end is not closed. > > Fixes: e0d32468 > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > plugins/nbd/nbd.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c > index d020beec..e5b8f338 100644 > --- a/plugins/nbd/nbd.c > +++ b/plugins/nbd/nbd.c > @@ -332,9 +332,11 @@ nbdplug_reader (void *handle) > > /* Check if we were kicked because a command was started */ > if (fds[1].revents & POLLIN) { > - while (read (h->fds[0], &c, 1) == 1) > + int r; > + > + while ((r = read (h->fds[0], &c, 1)) == 1) > /* Drain any backlog */; > - if (errno != EAGAIN) { > + if (r == -1 && errno != EAGAIN) { > nbdkit_error ("failed to read pipe: %m"); > break; > }This one seems pretty obvious, so ACK. Although personally I would have made one other change: The semicolon after the comment was invisible to me until I had studied the code above and the original code for a while. I actually wrote a quite lengthy email here, which was completely wrong once I noticed the semicolon. Can we make that more visible somehow? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2020-Mar-27 23:28 UTC
Re: [Libguestfs] [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()
On Fri, Mar 27, 2020 at 05:33:28PM -0500, Eric Blake wrote:> We have been seeing sporadic hangs on test-nbd-tls-psk.sh, where even > though the client to the 'nbdkit nbd' process has cleanly exited, > things are stalled in .close where nbd is trying to pthread_join() the > reader thread, while the reader thread is itself blocked on a poll() > that will never make additional progress. Tracing the race is > difficult: nbd_shutdown() sends NBD_CMD_DISC to the server, and the > NBD protocol does not require the server to send a response but does > not forbid the server from using gnutls_bye to at least gracefully end > the TLS session. So where a plaintext server just closes the socket > (and the resulting EOF exits our poll()), it appears that with TLS, we > can sometimes hit the situation where the server is waiting for a > response to gnutls_bye() (note that qemu does not use this function, > but nbdkit does), while at the same time our client is waiting to see > EOF on the socket, resulting in deadlock. In the past (see commit > 430f8141), we added shutdown(SHUT_WR) for plaintext clients to prod > the server into closing the socket faster, but libnbd does not yet > have that counterpart action. > > But I can at least reuse the same mechanism we have for waking up the > poll() loop when first sending a command to the server. That is, > since we already have a pipe-to-self in addition to reading from the > server, it's trivial to argue that closing the pipe-to-self will > guarantee that the reader thread sees something interesting to break > out of its poll() loop, regardless of whether it also sees something > interesting from the server after having sent NBD_CMD_DISC, and > regardless of whether I need to add in more gnutls_bye() calls to > either nbdkit or libnbd. > > Fixes: ab7760fc > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > May be incomplete: I might also need to break out of the reader loop > when read() returns 0. > > plugins/nbd/nbd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c > index e5b8f338..23a7da06 100644 > --- a/plugins/nbd/nbd.c > +++ b/plugins/nbd/nbd.c > @@ -537,10 +537,10 @@ nbdplug_close_handle (struct handle *h) > { > if (nbd_shutdown (h->nbd, 0) == -1) > nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ()); > + close (h->fds[1]); > if ((errno = pthread_join (h->reader, NULL))) > nbdkit_debug ("failed to join reader thread: %m"); > close (h->fds[0]); > - close (h->fds[1]); > nbd_close (h->nbd); > free (h); > }Probably deserves a comment in the code to mention to future Eric (and Richard) not to move that close call. But yes this is also fine, so ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Apparently Analagous Threads
- [nbdkit PATCH 0/2] Improve shutdown race in nbd plugin
- [nbdkit PATCH] Experiment: nbd: Use ppoll() instead of pipe-to-self
- [nbdkit PATCH 2/2] nbd: Use nbdkit aio_*_notify variants
- [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()
- [PATCH nbdkit 6/9] nbd: Don't cache nbd_aio_get_fd in the handle.