On 5/29/19 10:13 PM, Eric Blake wrote:> Benchmark-wise, using the same setup as in commit e897ed70, I see > an order-of-magnitude slowdown: > > Pre-patch, the runs averaged 1.266s, 1.30E+08 bits/s > Post-patch, the runs averaged 11.663s, 1.41E+07 bits/s > > This will need further profiling to determine how much is nbdkit's > fault, and how much is libnbd's. I think that we are probably holding > some locks for too long, resulting in some serialized performance. > Also, the standalone code was able to run read of command 1 in > parallel with write of command 2 via separate threads, whereas > libnbd's state machine is serializing everything (whether or not the > state machine spreads out the I/O to do writes from the thread calling > nbd_aio_FOO and reads from the reader thread, the two are never run at > once).Rich identified and fixed the culprit - libnbd was not setting TCP_NODELAY (disabling Nagle's algorithm) the way nbd-standalone.c did, which meant that any request that gets split over TCP windowing sizes waits to send the second packet until the ACK for the first has been received by the server. While disabling Nagle's increases network overhead (you are sending more short packets with their overhead rather than bundling things into larger packets by virtue of the downtime waiting for the ACK), it is an artificial delay (the server can't process anything until the packets arrive). libnbd 0.1.2 now disables Nagle's algorithm, and my repeat of the tests with that change showed an improvement to 2.180s, which is more eassily explained by the serialization nature that libnbd is never read()ing from one thread at the same time another thread is write()ing. I'd have to intentionally cripple nbd-standalone.c to do that same serialization, to see if libnbd actually offers an overall performance gain when I/O is serialized (may not be as easy as it sounds; I was relying on blocking I/O from separate threads, but merely serializing that so that no thread is doing I/O concurrent with another risks deadlock in the case of a client sending a large NBD_CMD_READ followed by NBD_CMD_WRITE to a server that responds strictly serially, where the server is waiting for the client to finish the read's response but the client is waiting for the server to parse the write's request; libnbd works around that by using non-blocking sockets and prioritizing reads over writes rather than insisting on complete transactions).> -/* Connect to a TCP socket, returning the fd on success */ > -static int > -nbdplug_connect_tcp (void) > -{ > - struct addrinfo hints = { .ai_family = AF_UNSPEC, > - .ai_socktype = SOCK_STREAM, }; > - struct addrinfo *result, *rp; > - int r; > - const int optval = 1; > - int fd; > -> - > - if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, > - sizeof (int)) == -1) { > - nbdkit_error ("cannot set TCP_NODELAY option: %m"); > - close (fd); > - return -1; > - }-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2019-May-31 08:43 UTC
Re: [Libguestfs] [nbdkit PATCH 3/4] nbd: Use libnbd 0.1
On Thu, May 30, 2019 at 02:00:48PM -0500, Eric Blake wrote:> [...] which is more eassily explained by the > serialization nature that libnbd is never read()ing from one thread at > the same time another thread is write()ing. > > [...] libnbd > works around that by using non-blocking sockets and prioritizing reads > over writes rather than insisting on complete transactions).Indeed, this is something that even a more complex state machine in libnbd could not solve. It fundamentally assumes that there is one thread running the handle at any one time, but to both write to and read from the socket simultaneously would require two threads. That's a large architectural change (I'm not even sure how to do it). The question is whether this use case can be covered up by using multi-conn, or if there's a requirement for parallel writes and reads on the same socket. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
On 5/31/19 3:43 AM, Richard W.M. Jones wrote:> On Thu, May 30, 2019 at 02:00:48PM -0500, Eric Blake wrote: >> [...] which is more eassily explained by the >> serialization nature that libnbd is never read()ing from one thread at >> the same time another thread is write()ing. >> >> [...] libnbd >> works around that by using non-blocking sockets and prioritizing reads >> over writes rather than insisting on complete transactions). > > Indeed, this is something that even a more complex state machine in > libnbd could not solve. It fundamentally assumes that there is one > thread running the handle at any one time, but to both write to and > read from the socket simultaneously would require two threads. That's > a large architectural change (I'm not even sure how to do it).There's always POSIX aio_* (which, in glibc, creates a worker thread on your behalf rather than being truly single-thread async). And https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/ states that Linux-only io_submit() (probably via libaio) can do a truly async single-thread network polling loop that can therefore manage both simultaneous reads and writes; but that requires an entirely different control loop (io_getevents instead of poll), so it may be a lot of work to wire up, before we'd even know if it actually adds any gain.> > The question is whether this use case can be covered up by using > multi-conn, or if there's a requirement for parallel writes and reads > on the same socket.A client truly wanting to saturate performance will probably implement their own multi-conn on top of libnbd. But if we want libnbd to be used in qemu, which is currently using coroutines, then yes, qemu is currently exploiting the fact that coroutines can track simultaneous read and write on the same socket. So I don't think we have to redesign everything now, but it is food for thought for future performance improvements. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- Re: [nbdkit PATCH 3/4] nbd: Use libnbd 0.1
- [nbdkit PATCH 2/2] nbd: Another libnbd API bump
- Re: [nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
- [PATCH nbdkit] nbd: Update for libnbd 0.9.6.
- ANNOUNCE: libnbd 1.0 & nbdkit 1.14 - high performance NBD client and server