While working on a new test of what happens when the server goes away while commands are in flight, I managed to hit a race where I hit death from SIGPIPE instead of a clean transition to the DEAD state. I also found myself wanting to use nbd_poll from the test, but with a way to distinguish between the state machine progressing vs. hanging. Eric Blake (2): socket: Avoid SIGPIPE where possible poll: Improve our interface generator/generator | 18 +++++++++++++----- lib/poll.c | 13 ++++++++++--- lib/socket.c | 7 +++++++ 3 files changed, 30 insertions(+), 8 deletions(-) -- 2.20.1
Eric Blake
2019-Jun-27 04:29 UTC
[Libguestfs] [libnbd PATCH 1/2] socket: Avoid SIGPIPE where possible
If the server hangs up while we are writing, we'd rather have send() fail with EPIPE than deal with a SIGPIPE signal - especially when we are being run as a library inside a larger program that may have other intentions on what signals should do for its other activities. At least Linux makes this easy with the MSG_NOSIGNAL flag (it requires that we use send() instead of write(), which in turn requires socketpair() rather than pipe() for nbd_connect_command - but we already do that). If we ever branch out into accepting pre-opened fds, we'll have to deal with SIGPIPE suppression for non-sockets at that time. --- lib/socket.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/lib/socket.c b/lib/socket.c index 8555855..a0f3d61 100644 --- a/lib/socket.c +++ b/lib/socket.c @@ -46,6 +46,13 @@ socket_send (struct nbd_handle *h, { ssize_t r; + /* We don't want to die from SIGPIPE, but also don't want to force a + * changed signal handler on the rest of the application. + */ +#ifdef MSG_NOSIGNAL + flags |= MSG_NOSIGNAL; +#endif + r = send (sock->u.fd, buf, len, flags); if (r == -1 && errno != EAGAIN && errno != EWOULDBLOCK) set_error (errno, "send"); -- 2.20.1
Eric Blake
2019-Jun-27 04:29 UTC
[Libguestfs] [libnbd PATCH 2/2] poll: Improve our interface
Make nbd_poll slightly more like poll(), allowing a user to detect timeouts by returning 0 on timeout and 1 when we made progress. It turns out that none of our internal uses ever expect a timeout (we only call nbd_internal_poll with timeout==-1 because we expect a reply from the server), but the public function might as well be nicer. Also handle POLLERR (server closed its read end, so our writes will fail) and POLLNVAL (the fd itself is closed) as outright errors, and POLLHUP the same as POLLIN (the server closed its write end, so read() will eventually see EOF even if we have to drain a buffer first). Perhaps we also want to add an nbd_aio_notify_err() that can inform the machine of any externally-detected error on the fd to the server, where the poll loop invokes that on POLLERR, and where we tweak the generator to permit that external event in any other public state. But that's a bigger patch. --- generator/generator | 18 +++++++++++++----- lib/poll.c | 13 ++++++++++--- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/generator/generator b/generator/generator index 684fb44..85fa66f 100755 --- a/generator/generator +++ b/generator/generator @@ -1571,13 +1571,21 @@ validate that the server obeyed the flag."; "poll", { default_call with - args = [ Int "timeout" ]; ret = RErr; + args = [ Int "timeout" ]; ret = RInt; shortdesc = "poll the handle once"; longdesc = "\ This is a simple implementation of L<poll(2)> which is used -internally by synchronous API calls. It is mainly useful as -an example of how you might integrate libnbd with your own -main loop, rather than being intended as something you would use."; +internally by synchronous API calls. It returns 0 if the +C<timeout> (in milliseconds) occurs, or 1 if the poll completed +and the state machine progressed. Set C<timeout> to C<-1> to +block indefinitely (but be careful that eventual action is +actually expected - for example, if the connection is +established but there are no commands in flight, using an +infinite timeout will permanently block). + +This function is mainly useful as an example of how you might +integrate libnbd with your own main loop, rather than being +intended as something you would use."; }; "aio_connect", { @@ -1838,7 +1846,7 @@ come from some other means such as C<nbd_aio_connect>. We are expected next to read from the server. If using L<poll(2)> you would set C<events = POLLIN>. If C<revents> returns C<POLLIN> -you would then call C<nbd_aio_notify_read>. +or C<POLLHUP> you would then call C<nbd_aio_notify_read>. Note that once libnbd reaches C<nbd_aio_is_ready>, this direction is returned even before a command is issued via C<nbd_aio_pwrite> and diff --git a/lib/poll.c b/lib/poll.c index 982b172..d356afe 100644 --- a/lib/poll.c +++ b/lib/poll.c @@ -57,21 +57,28 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) set_error (errno, "poll"); return -1; } + if (r == 0) + return 0; /* POLLIN and POLLOUT might both be set. However we shouldn't call * both nbd_aio_notify_read and nbd_aio_notify_write at this time * since the first might change the handle state, making the second * notification invalid. Nothing bad happens by ignoring one of the * notifications since if it's still valid it will be picked up by a - * subsequent poll. + * subsequent poll. Prefer notifying on read, since the reply is + * for a command older than what we are trying to write. */ r = 0; - if ((fds[0].revents & POLLIN) != 0) + if ((fds[0].revents & (POLLIN | POLLHUP)) != 0) r = nbd_unlocked_aio_notify_read (h); else if ((fds[0].revents & POLLOUT) != 0) r = nbd_unlocked_aio_notify_write (h); + else if ((fds[0].revents & (POLLERR | POLLNVAL)) != 0) { + set_error (ENOTCONN, "server closed socket unexpectedly"); + return -1; + } if (r == -1) return -1; - return 0; + return 1; } -- 2.20.1
Eric Blake
2019-Jun-27 05:03 UTC
Re: [Libguestfs] [libnbd PATCH 2/2] poll: Improve our interface
On 6/26/19 11:29 PM, Eric Blake wrote:> Make nbd_poll slightly more like poll(), allowing a user to detect > timeouts by returning 0 on timeout and 1 when we made progress. It > turns out that none of our internal uses ever expect a timeout (we > only call nbd_internal_poll with timeout==-1 because we expect a reply > from the server), but the public function might as well be nicer. > > Also handle POLLERR (server closed its read end, so our writes will > fail) and POLLNVAL (the fd itself is closed) as outright errors, and > POLLHUP the same as POLLIN (the server closed its write end, so read() > will eventually see EOF even if we have to drain a buffer first). > > Perhaps we also want to add an nbd_aio_notify_err() that can inform > the machine of any externally-detected error on the fd to the server, > where the poll loop invokes that on POLLERR, and where we tweak the > generator to permit that external event in any other public state. > But that's a bigger patch. > --- > generator/generator | 18 +++++++++++++----- > lib/poll.c | 13 ++++++++++--- > 2 files changed, 23 insertions(+), 8 deletions(-) >> +++ b/lib/poll.c > @@ -57,21 +57,28 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) > set_error (errno, "poll"); > return -1; > }I also need to squash in this, to fix our use of an uninitialized variable when nbd_aio_get_direction returns 0 (such as when we are already DEAD): diff --git i/lib/poll.c w/lib/poll.c index d356afe..fc6aae5 100644 --- i/lib/poll.c +++ w/lib/poll.c @@ -45,6 +45,8 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout) case LIBNBD_AIO_DIRECTION_BOTH: fds[0].events = POLLIN|POLLOUT; break; + default: + fds[0].events = 0; } fds[0].revents = 0; -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org