Eric Blake
2019-Jun-06 19:36 UTC
[Libguestfs] [libnbd PATCH] tls: Check for pending bytes in gnutls buffers
Checking for poll(POLLIN) only wakes us up when the server sends more bytes over the wire. But the way gnutls is implemented, it reads as many non-blocking bytes as possible from the wire before then parsing where the encoded message boundaries lie within those bytes. As a result, we may already have the server's next reply in memory without needing to block on the server sending yet more bytes (especially if the server was employing tricks like corking to batch things up for fewer packets over the network). If we are using synchronous API, this is not a problem (we never have more than one request in flight at once, which implies there are no outstanding replies from the server when we send a request, so we always get the needed POLLIN when the server's next reply begins); but if we are using asynch API and have more than one message in flight, it is conceivable that the server flushes its complete reply queue (two or more replies) contained within a single encrypted network packet, but where our state machine reads only the first reply and then waits for a POLLIN kick before reading again. Fortunately, I could not think of a scenario where this would turn into direct deadlock in our state machine alone (although I also haven't definitively ruled out that possibility) - in the common case, the only way the server can send more than one reply at once is if it supports parallel in-flight requests; which means we are safe that the next asynch message we send will be received by the server, and its reply to that message will give us the POLLIN kick we need to finally process the reply that had been stuck (for a possibly long time) in the local buffer. But I _did_ come up with a problematic scenario - if the server's final two replies before our NBD_CMD_DISC occur in the same packet, we could end up with the socket closed without ever handling the second reply, giving us data loss for that reply. It's also a potential indirect deadlock: the NBD spec recommends that a client not call NBD_CMD_DISC until it has received all responses to outstanding in-flight requests, so a client that has finished its work and does not plan to make any more aio requests other than NBD_CMD_DISC and wants to block until all server replies are in will hang waiting for a POLLIN that the server never sends because we already have the reply and aren't sending more requests. Testing shows that this patch has a slight performance improvement (to be expected, since we are eliminating an artificial latency on starting the REPLY sequence in the state machine) - on my machine, running tests/aio-parallel-load-tls saw an increase from 11522.9 IOPS pre-patch to 12558.1 IOPS post-patch. --- I'm pushing this because of the performance improvement, but wanted to post it because it was an interesting problem. (Sad when my writeup is three times longer than the patch itself...) generator/states.c | 5 +++++ lib/crypto.c | 7 +++++++ lib/internal.h | 1 + 3 files changed, 13 insertions(+) diff --git a/generator/states.c b/generator/states.c index bce4f85..145e8c1 100644 --- a/generator/states.c +++ b/generator/states.c @@ -113,6 +113,11 @@ send_from_wbuf (struct nbd_handle *h) READY: if (h->cmds_to_issue) SET_NEXT_STATE (%ISSUE_COMMAND.START); + else { + assert (h->sock); + if (h->sock->ops->pending && h->sock->ops->pending (h->sock)) + SET_NEXT_STATE (%REPLY.START); + } return 0; DEAD: diff --git a/lib/crypto.c b/lib/crypto.c index 3f7c744..e0f173f 100644 --- a/lib/crypto.c +++ b/lib/crypto.c @@ -181,6 +181,12 @@ tls_send (struct nbd_handle *h, return r; } +static bool +tls_pending (struct socket *sock) +{ + return gnutls_record_check_pending (sock->u.tls.session) > 0; +} + static int tls_get_fd (struct socket *sock) { @@ -209,6 +215,7 @@ tls_close (struct socket *sock) static struct socket_ops crypto_ops = { .recv = tls_recv, .send = tls_send, + .pending = tls_pending, .get_fd = tls_get_fd, .close = tls_close, }; diff --git a/lib/internal.h b/lib/internal.h index 5b6152b..61ddbde 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -192,6 +192,7 @@ struct socket_ops { struct socket *sock, void *buf, size_t len); ssize_t (*send) (struct nbd_handle *h, struct socket *sock, const void *buf, size_t len); + bool (*pending) (struct socket *sock); int (*get_fd) (struct socket *sock); int (*close) (struct socket *sock); }; -- 2.20.1
Richard W.M. Jones
2019-Jun-07 14:46 UTC
Re: [Libguestfs] [libnbd PATCH] tls: Check for pending bytes in gnutls buffers
On Thu, Jun 06, 2019 at 02:36:48PM -0500, Eric Blake wrote:> Checking for poll(POLLIN) only wakes us up when the server sends more > bytes over the wire. But the way gnutls is implemented, it reads as > many non-blocking bytes as possible from the wire before then parsing > where the encoded message boundaries lie within those bytes. As a > result, we may already have the server's next reply in memory without > needing to block on the server sending yet more bytes (especially if > the server was employing tricks like corking to batch things up for > fewer packets over the network). > > If we are using synchronous API, this is not a problem (we never have > more than one request in flight at once, which implies there are no > outstanding replies from the server when we send a request, so we > always get the needed POLLIN when the server's next reply begins); but > if we are using asynch API and have more than one message in flight, > it is conceivable that the server flushes its complete reply queue > (two or more replies) contained within a single encrypted network > packet, but where our state machine reads only the first reply and > then waits for a POLLIN kick before reading again. Fortunately, I > could not think of a scenario where this would turn into direct > deadlock in our state machine alone (although I also haven't > definitively ruled out that possibility) - in the common case, the > only way the server can send more than one reply at once is if it > supports parallel in-flight requests; which means we are safe that the > next asynch message we send will be received by the server, and its > reply to that message will give us the POLLIN kick we need to finally > process the reply that had been stuck (for a possibly long time) in > the local buffer. > > But I _did_ come up with a problematic scenario - if the server's > final two replies before our NBD_CMD_DISC occur in the same packet, we > could end up with the socket closed without ever handling the second > reply, giving us data loss for that reply. It's also a potential > indirect deadlock: the NBD spec recommends that a client not call > NBD_CMD_DISC until it has received all responses to outstanding > in-flight requests, so a client that has finished its work and does > not plan to make any more aio requests other than NBD_CMD_DISC and > wants to block until all server replies are in will hang waiting for a > POLLIN that the server never sends because we already have the reply > and aren't sending more requests. > > Testing shows that this patch has a slight performance improvement (to > be expected, since we are eliminating an artificial latency on > starting the REPLY sequence in the state machine) - on my machine, > running tests/aio-parallel-load-tls saw an increase from 11522.9 IOPS > pre-patch to 12558.1 IOPS post-patch. > --- > > I'm pushing this because of the performance improvement, but wanted > to post it because it was an interesting problem. (Sad when my writeup > is three times longer than the patch itself...) > > generator/states.c | 5 +++++ > lib/crypto.c | 7 +++++++ > lib/internal.h | 1 + > 3 files changed, 13 insertions(+) > > diff --git a/generator/states.c b/generator/states.c > index bce4f85..145e8c1 100644 > --- a/generator/states.c > +++ b/generator/states.c > @@ -113,6 +113,11 @@ send_from_wbuf (struct nbd_handle *h) > READY: > if (h->cmds_to_issue) > SET_NEXT_STATE (%ISSUE_COMMAND.START); > + else { > + assert (h->sock); > + if (h->sock->ops->pending && h->sock->ops->pending (h->sock)) > + SET_NEXT_STATE (%REPLY.START); > + } > return 0; > > DEAD: > diff --git a/lib/crypto.c b/lib/crypto.c > index 3f7c744..e0f173f 100644 > --- a/lib/crypto.c > +++ b/lib/crypto.c > @@ -181,6 +181,12 @@ tls_send (struct nbd_handle *h, > return r; > } > > +static bool > +tls_pending (struct socket *sock) > +{ > + return gnutls_record_check_pending (sock->u.tls.session) > 0; > +} > + > static int > tls_get_fd (struct socket *sock) > { > @@ -209,6 +215,7 @@ tls_close (struct socket *sock) > static struct socket_ops crypto_ops = { > .recv = tls_recv, > .send = tls_send, > + .pending = tls_pending, > .get_fd = tls_get_fd, > .close = tls_close, > }; > diff --git a/lib/internal.h b/lib/internal.h > index 5b6152b..61ddbde 100644 > --- a/lib/internal.h > +++ b/lib/internal.h > @@ -192,6 +192,7 @@ struct socket_ops { > struct socket *sock, void *buf, size_t len); > ssize_t (*send) (struct nbd_handle *h, > struct socket *sock, const void *buf, size_t len); > + bool (*pending) (struct socket *sock); > int (*get_fd) (struct socket *sock); > int (*close) (struct socket *sock); > }; > --Sorry for the late review, was busy with meeting this week. Patch looks good to me, ACK. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Apparently Analagous Threads
- [libnbd PATCH 0/2] fix hangs against nbdkit 1.2
- [PATCH libnbd discussion only 3/5] lib: Pass handle to socket recv and send calls.
- [PATCH libnbd 1/3] lib: socket: Add .send flags parameter.
- [PATCH libnbd 0/3] states: Use MSG_MORE to coalesce messages.
- Exim and Dovecot2 SASL: 435 Unable to authenticate at present