Wouter Verhelst
2023-Apr-18 12:33 UTC
[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:> Rather than requiring all servers and clients to have a 32-bit limit > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize > support for a 64-bit single I/O transaction now. > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart. > > By standardizing this, all clients must be prepared to support both > types of hole type replies, even though most server implementations of > extended replies are likely to only send one hole type.I think it's probably a better idea to push this patch to a separate "extension-*" branch, and link to that from proto.md on master. Those are documented as "we standardized this, but no first implementor exists yet". If someone actually comes up with a reason for 64-bit transactions, we can then see if the spec matches the need and merge it to master. Otherwise this feels too much like a solution in search of a problem to me. With that said, for the things I didn't reply to, you can add: Reviewed-By: Wouter Verhelst <w at uter.be> -- w at uter.{be,co.za} wouter@{grep.be,fosdem.org,debian.org} I will have a Tin-Actinium-Potassium mixture, thanks.
Richard W.M. Jones
2023-Apr-18 14:13 UTC
[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote: > > Rather than requiring all servers and clients to have a 32-bit limit > > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize > > support for a 64-bit single I/O transaction now. > > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but > > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart. > > > > By standardizing this, all clients must be prepared to support both > > types of hole type replies, even though most server implementations of > > extended replies are likely to only send one hole type. > > I think it's probably a better idea to push this patch to a separate > "extension-*" branch, and link to that from proto.md on master. Those > are documented as "we standardized this, but no first implementor exists > yet". > > If someone actually comes up with a reason for 64-bit transactions, we > can then see if the spec matches the need and merge it to master. > > Otherwise this feels too much like a solution in search of a problem to > me.I'd like to push back a bit on this. Firstly Eric does have two complete implementations. It's true however that they not upstream in either case. But we also need this because there are relatively serious issues observed, particularly around trimming/zeroing, and extents. The trimming problem can be demonstrated very easily in fact: $ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run ' time guestfish add "" protocol:nbd server:unix:$unixsocket discard:enable format:raw : run : mkfs xfs /dev/sda ' real 4m17.531s user 0m0.032s sys 0m0.040s total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total Request size and alignment breakdown: 12 bits: 50.8% (2215 reqs, 8.65 MiB total) 12 bit aligned: 100.0% (2215) 13 bit aligned: 51.6% (1143) 14 bit aligned: 26.9% (595) 15 bit aligned: 14.6% (323) 16 bit aligned: 8.4% (185) 17+ bit-aligned: 4.9% (109) 9 bits: 47.4% (2064 reqs, 1.01 MiB total) 9 bit aligned: 100.0% (2064) 10+ bit-aligned: 0.6% (13) other sizes: 1.8% (77 reqs, 14.61 MiB total) write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total Request size and alignment breakdown: 12 bits: 53.8% (7170 reqs, 28.01 MiB total) 12 bit aligned: 100.0% (7170) 13 bit aligned: 50.0% (3585) 14 bit aligned: 25.0% (1793) 15 bit aligned: 12.5% (896) 16 bit aligned: 6.2% (448) 17+ bit-aligned: 3.1% (224) 9 bits: 46.2% (6150 reqs, 3.00 MiB total) 9 bit aligned: 100.0% (6150) 10 bit aligned: 33.4% (2054) 12 bit aligned: 16.7% (1029) 13 bit aligned: 8.4% (515) 14+ bit-aligned: 4.2% (259) other sizes: 0.0% (5 reqs, 31.29 MiB total) trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22 GiB/s total Request size and alignment breakdown: 30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total) 30 bit aligned: 100.0% (1048576) 31 bit aligned: 50.0% (524288) 32 bit aligned: 25.0% (262144) 33 bit aligned: 12.5% (131072) 34 bit aligned: 6.2% (65536) 35+ bit-aligned: 3.1% (32768) zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total Request size and alignment breakdown: 25 bits: 98.4% (63 reqs, 1.97 GiB total) 13 bit aligned: 100.0% (63) other sizes: 1.6% (1 reqs, 1.99 GiB total) flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total Note how trim takes a million operations and most of the time. That should be done in one operation. If you stop advertising discard support on the disk ("discard:disable") it takes only a fraction of the time. The extents one is harder to demonstrate, but it makes our code considerably more complex that we cannot just grab the extent map for a whole disk larger than 4GB in a single command. (The complexity won't go away, but the querying will be faster with fewer round trips with this change.) Nevertheless I'm not opposed to keeping this as an extension until the implementations are upstream and bedded in. Rich.> With that said, for the things I didn't reply to, you can add: > > Reviewed-By: Wouter Verhelst <w at uter.be> > > -- > w at uter.{be,co.za} > wouter@{grep.be,fosdem.org,debian.org} > > I will have a Tin-Actinium-Potassium mixture, thanks. > > _______________________________________________ > Libguestfs mailing list > Libguestfs at redhat.com > https://listman.redhat.com/mailman/listinfo/libguestfs-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2023-Apr-18 15:54 UTC
[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote: > > Rather than requiring all servers and clients to have a 32-bit limit > > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize > > support for a 64-bit single I/O transaction now. > > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but > > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart. > > > > By standardizing this, all clients must be prepared to support both > > types of hole type replies, even though most server implementations of > > extended replies are likely to only send one hole type. > > I think it's probably a better idea to push this patch to a separate > "extension-*" branch, and link to that from proto.md on master. Those > are documented as "we standardized this, but no first implementor exists > yet". > > If someone actually comes up with a reason for 64-bit transactions, we > can then see if the spec matches the need and merge it to master.Okay, based on that, I'll merge this in a new branch, at which point we WILL have something definitive to point to as a way to unjam the patches to libnbd and qemu (they were waiting in part on having a spec close enough to final). I think patches 1 and 2 are ready for mainstream, and only 3 and later need the extension branch. I'm also very reluctant to check patch 6 into the extension branch for now (having it just be in the mail archives is good enough), since I have not yet played with making qemu or libnbd support payloads larger than 64M, and am not sure if it ever makes sense to try to do an NBD_CMD_READ or NBD_CMD_WRITE with more than 4G of material at once. For that matter, ssize_t constraints to send() and recv() may make it impractical to ever allow a maximum payload larger than 2G.> > Otherwise this feels too much like a solution in search of a problem to > me.Rich has already followed up with some demonstrations of where larger effect lengths can matter (on commands where effect length is orthogonal to payload length).> > With that said, for the things I didn't reply to, you can add: > > Reviewed-By: Wouter Verhelst <w at uter.be>I've replied to your other reviews with a couple of ideas to squash in. The insistence on only a 64-bit block status reply when extended headers are in effect will have the most ripple effect on my qemu/libnbd patches, but I don't think it is insurmountable, and agree that insisting on extended headers mandating 64-bit responses is a bit simpler than a client that has to handle both 32- and 64-bit responses (a client may still need the complexity of handling both types in order to talk to servers without extended headers, but that is a different sort of complexity and can be phased out over time if lots of servers decide to move towards 64-bit headers). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org