Richard W.M. Jones
2019-Jun-09 18:34 UTC
[Libguestfs] [PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.
I thought this should produce a fairly dramatic performance gain. In fact I couldn't measure any performance difference at all. I think what's happening is we're actually paying an extra syscall (to discover the socket would block) and then doing the poll anyway. So I don't know if it's worth having this patch. It could be argued that it makes the code shorter (therefore simpler), so I'm posting it, but I'm ambivalent. Rich.
Richard W.M. Jones
2019-Jun-09 18:34 UTC
[Libguestfs] [PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.
Previously we performed a single call to recv(2) or send(2) (or the GnuTLS equivalents), and even if more data/space was immediately available to receive/send we would return to poll. Instead of this, loop until the socket returns EAGAIN. --- generator/states.c | 91 ++++++++++++++++++++++------------------------ 1 file changed, 43 insertions(+), 48 deletions(-) diff --git a/generator/states.c b/generator/states.c index 145e8c1..cde934a 100644 --- a/generator/states.c +++ b/generator/states.c @@ -46,43 +46,40 @@ recv_into_rbuf (struct nbd_handle *h) void *rbuf; size_t rlen; - if (h->rlen == 0) - return 0; /* move to next state */ + while (h->rlen > 0) { + /* As a special case h->rbuf is allowed to be NULL, meaning + * throw away the data. + */ + if (h->rbuf) { + rbuf = h->rbuf; + rlen = h->rlen; + } + else { + rbuf = &buf; + rlen = h->rlen > sizeof buf ? sizeof buf : h->rlen; + } - /* As a special case h->rbuf is allowed to be NULL, meaning - * throw away the data. - */ - if (h->rbuf) { - rbuf = h->rbuf; - rlen = h->rlen; - } - else { - rbuf = &buf; - rlen = h->rlen > sizeof buf ? sizeof buf : h->rlen; - } - - r = h->sock->ops->recv (h, h->sock, rbuf, rlen); - if (r == -1) { - if (errno == EAGAIN || errno == EWOULDBLOCK) - return 1; /* more data */ - /* sock->ops->recv called set_error already. */ - return -1; - } - if (r == 0) { - set_error (0, "recv: server disconnected unexpectedly"); - return -1; - } + r = h->sock->ops->recv (h, h->sock, rbuf, rlen); + if (r == -1) { + if (errno == EAGAIN || errno == EWOULDBLOCK) + return 1; /* more data */ + /* sock->ops->recv called set_error already. */ + return -1; + } + if (r == 0) { + set_error (0, "recv: server disconnected unexpectedly"); + return -1; + } #ifdef DUMP_PACKETS - if (h->rbuf != NULL) - nbd_internal_hexdump (h->rbuf, r, stderr); + if (h->rbuf != NULL) + nbd_internal_hexdump (h->rbuf, r, stderr); #endif - if (h->rbuf) - h->rbuf += r; - h->rlen -= r; - if (h->rlen == 0) - return 0; /* move to next state */ - else - return 1; /* more data */ + if (h->rbuf) + h->rbuf += r; + h->rlen -= r; + } + + return 0; /* move to next state */ } static int @@ -90,21 +87,19 @@ send_from_wbuf (struct nbd_handle *h) { ssize_t r; - if (h->wlen == 0) - return 0; /* move to next state */ - r = h->sock->ops->send (h, h->sock, h->wbuf, h->wlen); - if (r == -1) { - if (errno == EAGAIN || errno == EWOULDBLOCK) - return 1; /* more data */ - /* sock->ops->send called set_error already. */ - return -1; + while (h->wlen > 0) { + r = h->sock->ops->send (h, h->sock, h->wbuf, h->wlen); + if (r == -1) { + if (errno == EAGAIN || errno == EWOULDBLOCK) + return 1; /* more data */ + /* sock->ops->send called set_error already. */ + return -1; + } + h->wbuf += r; + h->wlen -= r; } - h->wbuf += r; - h->wlen -= r; - if (h->wlen == 0) - return 0; /* move to next state */ - else - return 1; /* more data */ + + return 0; /* move to next state */ } /*----- End of prologue. -----*/ -- 2.21.0
Eric Blake
2019-Jun-10 22:48 UTC
Re: [Libguestfs] [PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.
On 6/9/19 1:34 PM, Richard W.M. Jones wrote:> Previously we performed a single call to recv(2) or send(2) (or the > GnuTLS equivalents), and even if more data/space was immediately > available to receive/send we would return to poll. Instead of this, > loop until the socket returns EAGAIN.In general, if you get a short read() or write(), it's BECAUSE the socket blocked (and doing an immediate additional read or write will likely return 0/EAGAIN because you are blocked). So I don't think this patch as-is is worthwhile. But unless we have a larger buffer to read into, we're really limiting ourselves by the length we read. I think you might actually get a performance boost on read if you rework things to maintain a 4k or similar buffer of pre-read data and consult that instead of going into read (I already added the .pending() callback since gnutls in fact does something like that already) - but then you have to try to read a full buffer amount. So, pre-patch, you had something like: Most replies: read(4) = 4 next state read(4) = 4 next state back to READY poll() vs. NBD_CMD_READ reply: read(4) = 4 next state read(1M) = 64k block poll() read(1M-64k) = 64k block poll() ... where this patch didn't really change short replies, but changes large reads into: read(4) = 4 next state read(1M) = 64k read(1M-64k) = 0 block poll() read(1M-64k) = 64k ... so yeah, this patch does NOT look beneficial as-is. But with my idea of reading into a 4k buffer or so, we change to something like: buffer is empty service 4 to read simple_reply.magic read(4k) = 16; buffer contains 16 bytes pre-read consume 4 from buffer; buffer contains 12 bytes next state service 4 to read simple_reply.error consume 4 from buffer; buffer contains 8 bytes - we skipped read :) next state service 8 to read simple_reply.handle consume 8 from buffer; buffer is empty next state back to READY, time to block poll() or even: service 4 to read simple_reply.magic read(4k) = 32; buffer contains 32 bytes pre-read consume 4 from buffer; buffer contains 28 bytes next state service 4 to read simple_reply.error consume 4 from buffer; buffer contains 24 bytes - we skipped read :) next state service 8 to read simple_reply.handle consume 8 from buffer; buffer contains 16 bytes next state back to READY, buffer is non-empty, so .pending returns true service 4 to read simple_reply.magic consume 4 from buffer - we again skipped read etc. We do, however, lose a bit of efficiency where we are no longer reading NBD_CMD_READ replies directly into the user's buffer - we'll have to memcpy() any prefix (up to the length of the buffer, which is why sizing the buffer larger than 4k is probably a loss) before doing a direct read() into the tail of the user's buffer. So we'd have to measure the effects, both with a client that does lots of 512-byte reads, as well as a client that does lots of large reads, to see if any pre-buffered read()ing followed by memcpy() saves enough syscalls to be noticeable. Then on the write side, there MIGHT be a minor performance boost if we try and set MSG_MORE any time we complete a request but know that there are more requests pending, at which point we can continue the state machine onto that next request and pack multiple commands into one TCP segment. But that's different than trying to call write() in a loop after a short write. Loops like this are good for blocking connections, but we are intentionally using non-blocking sockets, where a short read/write is more likely because we are actually blocked than because of an interruption from a signal where our next attempt will not actually be blocked. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH libnbd] api: Get rid of nbd_connection.
- [PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.
- [libnbd PATCH 0/4] Various interop fixes
- [PATCH libnbd discussion only 0/5] api: Implement concurrent writer.
- [libnbd PATCH v4 0/4] Saner reply header layout