Eric Blake
2023-Mar-24 19:41 UTC
[Libguestfs] [PATCH 1/1] nbd/server: push pending frames after sending reply
On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote:> qemu-nbd doesn't set TCP_NODELAY on the tcp socket. > > Kernel waits for more data and avoids transmission of small packets. > Without TLS this is barely noticeable, but with TLS this really shows. > > Booting a VM via qemu-nbd on localhost (with tls) takes more than > 2 minutes on my system. tcpdump shows frequent wait periods, where no > packets get sent for a 40ms period.Thank you for this analysis.> > Add explicit (un)corking when processing (and responding to) requests. > "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. > > VM Boot time: > main: no tls: 23s, with tls: 2m45s > patched: no tls: 14s, with tls: 15s > > VM Boot time, qemu-nbd via network (same lan): > main: no tls: 18s, with tls: 1m50s > patched: no tls: 17s, with tls: 18sAnd the timings bear proof that it matters.> > Future optimization: if we could detect if there is another pending > request we could defer the uncork operation because more data would be > appended.nbdkit and libnbd do this with the MSG_MORE flag (plaintext) and TLS corking (tls); when building up a message to the other side, a flag is set any time we know we are likely to send more data very shortly. nbdkit wraps it under a flag SEND_MORE, which applies to both plaintext: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/connections.c#L415 and to TLS connections: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/crypto.c#L396 while libnbd uses MSG_MORE a bit more directly for the same purpose for plaintext, but isn't (yet) doing TLS corking: https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-issue-command.c#L53 https://gitlab.com/nbdkit/libnbd/-/blob/master/lib/internal.h#L57 I would love to see a future patch to qio_channel code to support MSG_MORE in the same way as nbdkit is using its SEND_MORE flag, as the caller often has more info on whether it is sending a short prefix or is done with a conceptual message and ready to uncork, and where the use of a flag can be more efficient than separate passes through cork/uncork calls. But even your initial work at properly corking is a good step in the right direction. And surprisingly, qemu IS using corking on the client side: https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525 just not on the server side, before your patch.> > Signed-off-by: Florian Westphal <fw at strlen.de> > --- > nbd/server.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index a4750e41880a..848836d41405 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, true); > + > if (ret < 0) { > /* It wasn't -EIO, so, according to nbd_co_receive_request() > * semantics, we should return the error to the client. */ > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, false);Reviewed-by: Eric Blake <eblake at redhat.com>> done: > nbd_request_put(req); > nbd_client_put(client); > -- > 2.39.2 >-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Eric Blake
2023-Mar-24 20:03 UTC
[Libguestfs] [PATCH 1/1] nbd/server: push pending frames after sending reply
On Fri, Mar 24, 2023 at 02:41:20PM -0500, Eric Blake wrote:> On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote: > > qemu-nbd doesn't set TCP_NODELAY on the tcp socket.Replying to myself, WHY aren't we setting TCP_NODELAY on the socket?> > And surprisingly, qemu IS using corking on the client side: > https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525 > just not on the server side, before your patch.Corking matters more when TCP_NODELAY is enabled. The entire reason Nagle's algorithm exists (and is default on unless you enable TCP_NODELAY) is that the benefits of merging smaller piecemeal packets into larger traffic is a lot easier to do in a common location for code that isn't super-sensitive to latency and message boundaries. But once you get to the point where corking or MSG_MORE makes a difference, then it is obvious you know your message boundaries, and will benefit from TCP_NODELAY, at the expense of potentially more network traffic overhead. One more code search, and I find that we use TCP_NODELAY in all of: qemu client: https://gitlab.com/qemu-project/qemu/-/blob/master/nbd/client-connection.c#L143 nbdkit: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/sockets.c#L430 libnbd: https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-connect.c#L41 so I think we _should_ be calling qio_channel_set_delay(false) for qemu-nbd as well. That doesn't negate your patch, but rather argues that we can go for even better performance with TCP_NODELAY also turned on. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Reasonably Related Threads
- [nbdkit PATCH] crypto: Tweak handling of SEND_MORE
- [nbdkit PATCH v2 2/2] server: Group related transmission send()s
- [PATCH 1/1] nbd/server: push pending frames after sending reply
- [nbdkit PATCH 2/2] server: Cork around grouped transmission send()s
- [nbdkit PATCH 0/2] Reduce network overhead with corking