Abhay Raj Singh
2021-Aug-23 15:56 UTC
[Libguestfs] nbdcpy: from scratch nbdcopy using io_uring
I had an idea for optimizing my current approach, it's good in some ways but can be faster with some breaking changes to the protocol. Currently, we read (from socket connected to source) one request at a time the simple flow looks like `read_header(io_uring) ---- success ---> recv(data) --- success ---> send(data) & queue another read header` but it's not as efficient as it could be at best it's a hack. Another approach I am thinking about is a large buffer where we can read all of the socket's data and process packets from that buffer as all the I/O is handled. this minimizes the number of read requests to the kernel as we do 1 read for multiple NBD packets. Further optimization requires changing the NBD protocol a bit Current protocol 1. Memory representation of a response (20-byte header + data) 2. Memory representation of a request (28-byte header + data) HHHHH_DDDDDDDDD... HHHHHHH_DDDDDDDDD... H and D represent 4 bytes, _ represents 0 bytes With the large buffer approach, we read data into a large buffer, then copy the NBD packet's data to a new buffer, strap a new header to it and send it. This copying is what we wanted to avoid in the first place. If the response header was 28 bytes or the first 8-bytes of data were useless we could have just overwritten the header part and sent data directly from the large buffer, therefore avoiding the copy. What are your thoughts? Thanks and Regards. Abhay
[adding the NBD list into cc] On Mon, Aug 23, 2021 at 09:26:34PM +0530, Abhay Raj Singh wrote:> I had an idea for optimizing my current approach, it's good in some > ways but can be faster with some breaking changes to the protocol. > > Currently, we read (from socket connected to source) one request at a time > the simple flow looks like `read_header(io_uring) ---- success ---> > recv(data) --- success ---> send(data) & queue another read header` > but it's not as efficient as it could be at best it's a hack. > > Another approach I am thinking about is a large buffer > where we can read all of the socket's data and process packets from > that buffer as all the I/O is handled. > this minimizes the number of read requests to the kernel as we do 1 > read for multiple NBD packets. > > Further optimization requires changing the NBD protocol a bit > Current protocol > 1. Memory representation of a response (20-byte header + data) > 2. Memory representation of a request (28-byte header + data) > > HHHHH_DDDDDDDDD... > HHHHHHH_DDDDDDDDD... > > H and D represent 4 bytes, _ represents 0 bytesYou are correct that requests are currently 28 bytes header plus any payload (where payload is currently only in NBD_CMD_WRITE). But responses are two different lengths: simple responses are 16 bytes + payload (payload only for NBD_CMD_READ, and only if structured replies not negotiated), while structured responses are 20 bytes + payload (but while NBD_CMD_READ and NBD_CMD_BLOCK_STATUS require structured replies, a compliant server can still send simple replies to other commands). So it's even trickier than you represent here, as reading 20-byte headers of a reply is not going to always do the right thing.> > With the large buffer approach, we read data into a large buffer, then > copy the NBD packet's data to a new buffer, strap a new header to it > and send it. > This copying is what we wanted to avoid in the first place. > > If the response header was 28 bytes or the first 8-bytes of data were > useless we could have just overwritten the header part and sent data > directly from the large buffer, therefore avoiding the copy. > > What are your thoughts?There's already discussions about what it would take to extend the NBD protocol to support 64-bit requests (not that we'd want to go beyond current server restrictions of 32M or 64M maximum NBD_CMD_READ and NBD_CMD_WRITE, but more so we can permit quick image zeroing via a 64-bit NBD_CMD_WRITE_ZEROES). Your observation that having the request and response headers be equally sized for more efficient handling is worthwhile to consider in making such a protocol extension - of necessity, it would have to be via an NBD_OPT_* option requested by the client during negotiation and responded to affirmatively by the server, before both sides then use the new-size packets in both directions after NBD_OPT_GO (and a client would still have to be prepared to fall back to the unequal-sized headers if the server doesn't understand the option). For that matter, is there a benefit to having cache-line-optimized sizing, where all headers are exactly 32 bytes (both requests and responses, and both simple and structured replies)? I'm thinking maybe NBD_OPT_FIXED_SIZE_HEADER might be a sane name for such an option. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org