Richard W.M. Jones
2019-Jul-25 17:43 UTC
[Libguestfs] [PATCH libnbd] lib: Kill subprocess in nbd_close.
This is a simple patch which stops nbd_close from waiting too long for a server subprocess to shut down. I wanted to send SIGHUP because the server will be able to catch it and do a clean shutdown if that is required. Is another signal better? Is it right to send a signal here? Rich.
Richard W.M. Jones
2019-Jul-25 17:43 UTC
[Libguestfs] [PATCH libnbd] lib: Kill subprocess in nbd_close.
$ time nbdsh -c 'h.connect_command (["nbdkit", "-s", "null", "size=512", "--filter=delay", "delay-read=10"]); b = nbd.aio_buffer(1); h.aio_pread (b, 0); del (h)' real 0m10.499s user 0m0.065s sys 0m0.023s With this patch the elapsed time is near instantaneous. --- lib/handle.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/handle.c b/lib/handle.c index 1fe4467..111cfc5 100644 --- a/lib/handle.c +++ b/lib/handle.c @@ -124,8 +124,10 @@ nbd_close (struct nbd_handle *h) freeaddrinfo (h->result); if (h->sock) h->sock->ops->close (h->sock); - if (h->pid >= 0) /* XXX kill it? */ + if (h->pid >= 0) { + kill (h->pid, SIGHUP); waitpid (h->pid, NULL, 0); + } free (h->export_name); free (h->tls_certificates); -- 2.22.0
Eric Blake
2019-Jul-25 18:28 UTC
Re: [Libguestfs] [PATCH libnbd] lib: Kill subprocess in nbd_close.
On 7/25/19 12:43 PM, Richard W.M. Jones wrote:> $ time nbdsh -c 'h.connect_command (["nbdkit", "-s", "null", "size=512", "--filter=delay", "delay-read=10"]); b = nbd.aio_buffer(1); h.aio_pread (b, 0); del (h)' > real 0m10.499s > user 0m0.065s > sys 0m0.023s > > With this patch the elapsed time is near instantaneous. > --- > lib/handle.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/lib/handle.c b/lib/handle.c > index 1fe4467..111cfc5 100644 > --- a/lib/handle.c > +++ b/lib/handle.c > @@ -124,8 +124,10 @@ nbd_close (struct nbd_handle *h) > freeaddrinfo (h->result); > if (h->sock) > h->sock->ops->close (h->sock);Note that this patch is trying to speed up a slow waitpid() issued _after_ we call h->sock->ops->close(). You'd think that after we close() our end, that the remote end doing a read() would see an immediate EOF, and that would be enough for the server to shut down quickly instead of waiting for the delay to expire and finally notice during its write() that we are no longer reading. However, POSIX says that when sockets are involved, a close() on our end does not trigger an immediate EOF on the other side's read end, but rather may start an asynchronous timer to try to finish the TCP connection gracefully to flush any pending traffic, where both directions have to agree that the connection is dead before either direction sees EOF/EPIPE. And that's where the shutdown(2) API comes in handy: it forces the kernel to realize that one or both directions of traffic are no longer needed, making this asynchronous handling in close() much faster. At least nbdkit is known to react favorably to a shutdown() request - see nbdkit commit 430f8141. We may want to add h->sock->ops->shutdown() (especially if we want to handle SHUT_WR through adding more states for more graceful shutdown paths, including when TLS is involved), but even in the short-term, something as simple as adding shutdown(h->fd, SHUT_RDWR) immediately before the s->sock->ops->close() would be helpful. That said, I think we want a two-pronged approach - adding a use of shutdown(), but also playing with signal:> - if (h->pid >= 0) /* XXX kill it? */We're only doing this for connect_command. We must NOT kill TCP or Unix servers, as those are pre-existing and presumably must stick around for further clients; but when we spawned the server, it is indeed likely that the server should not continue to exist after our handle goes away - it is unlikely that a client would ever expect the server they just created to continue to exist after the handle to serve other clients. So this makes sense.> + if (h->pid >= 0) { > + kill (h->pid, SIGHUP);I don't know if SIGTERM is any better, but think SIGHUP is okay for now. Also, I don't know if we want to put a time limit on waiting for the server, and follow up with an even harder SIGKILL if it hasn't gracefully exited. Maybe it should be a knob where the user can call nbd_set_FOO to tune the knob for their desired cleanup behavior (whether for which signal to send, or whether to follow up with SIGKILL, or even how long to wait)? Or are we just reinventing timeout(1) from coreutils? So for now this seems okay.> waitpid (h->pid, NULL, 0); > + }-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH libnbd] lib: Kill subprocess in nbd_close.
- [PATCH libnbd] lib: Kill subprocess in nbd_close.
- [PATCH libnbd] api: New nbd_kill_command API for sending a signal to the command subprocess.
- [PATCH libnbd 1/2] lib: Avoid killing subprocess twice.
- [libnbd PATCH 0/2] fix hangs against nbdkit 1.2