thr3ads.net - Libguestfs - [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE

If this information is useful, please help other people find it:
Share via:

Wouter Verhelst

2023-Apr-18 12:33 UTC

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake
wrote:> Rather than requiring all servers and clients to have a 32-bit limit
> on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
> support for a 64-bit single I/O transaction now.
> NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
> NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
> 
> By standardizing this, all clients must be prepared to support both
> types of hole type replies, even though most server implementations of
> extended replies are likely to only send one hole type.
I think it's probably a better idea to push this patch to a separate
"extension-*" branch, and link to that from proto.md on master. Those
are documented as "we standardized this, but no first implementor exists
yet".

If someone actually comes up with a reason for 64-bit transactions, we
can then see if the spec matches the need and merge it to master.

Otherwise this feels too much like a solution in search of a problem to
me.

With that said, for the things I didn't reply to, you can add:

Reviewed-By: Wouter Verhelst <w at uter.be>

-- 
     w at uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.

Richard W.M. Jones

2023-Apr-18 14:13 UTC

head link

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst
wrote:> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
> > Rather than requiring all servers and clients to have a 32-bit limit
> > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
> > support for a 64-bit single I/O transaction now.
> > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
> > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
> > 
> > By standardizing this, all clients must be prepared to support both
> > types of hole type replies, even though most server implementations of
> > extended replies are likely to only send one hole type.
> 
> I think it's probably a better idea to push this patch to a separate
> "extension-*" branch, and link to that from proto.md on master.
Those
> are documented as "we standardized this, but no first implementor
exists
> yet".
> 
> If someone actually comes up with a reason for 64-bit transactions, we
> can then see if the spec matches the need and merge it to master.
> 
> Otherwise this feels too much like a solution in search of a problem to
> me.
I'd like to push back a bit on this.  Firstly Eric does have two
complete implementations.  It's true however that they not upstream in
either case.

But we also need this because there are relatively serious issues
observed, particularly around trimming/zeroing, and extents.  The
trimming problem can be demonstrated very easily in fact:

    $ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run '
time guestfish add "" protocol:nbd server:unix:$unixsocket
discard:enable format:raw : run : mkfs xfs /dev/sda '
    
    real	 4m17.531s
    user	 0m0.032s
    sys	 0m0.040s
    total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s
    read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total
      Request size and alignment breakdown:
        12 bits: 50.8% (2215 reqs, 8.65 MiB total)
             12 bit aligned: 100.0% (2215)
             13 bit aligned:  51.6% (1143)
             14 bit aligned:  26.9% (595)
             15 bit aligned:  14.6% (323)
             16 bit aligned:   8.4% (185)
             17+ bit-aligned:  4.9% (109)
         9 bits: 47.4% (2064 reqs, 1.01 MiB total)
              9 bit aligned: 100.0% (2064)
             10+ bit-aligned:  0.6% (13)
        other sizes:  1.8% (77 reqs, 14.61 MiB total)
    
    write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total
      Request size and alignment breakdown:
        12 bits: 53.8% (7170 reqs, 28.01 MiB total)
             12 bit aligned: 100.0% (7170)
             13 bit aligned:  50.0% (3585)
             14 bit aligned:  25.0% (1793)
             15 bit aligned:  12.5% (896)
             16 bit aligned:   6.2% (448)
             17+ bit-aligned:  3.1% (224)
         9 bits: 46.2% (6150 reqs, 3.00 MiB total)
              9 bit aligned: 100.0% (6150)
             10 bit aligned:  33.4% (2054)
             12 bit aligned:  16.7% (1029)
             13 bit aligned:   8.4% (515)
             14+ bit-aligned:  4.2% (259)
        other sizes:  0.0% (5 reqs, 31.29 MiB total)
    
    trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22
GiB/s total
      Request size and alignment breakdown:
        30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total)
             30 bit aligned: 100.0% (1048576)
             31 bit aligned:  50.0% (524288)
             32 bit aligned:  25.0% (262144)
             33 bit aligned:  12.5% (131072)
             34 bit aligned:   6.2% (65536)
             35+ bit-aligned:  3.1% (32768)
    
    zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total
      Request size and alignment breakdown:
        25 bits: 98.4% (63 reqs, 1.97 GiB total)
             13 bit aligned: 100.0% (63)
        other sizes:  1.6% (1 reqs, 1.99 GiB total)
    
    flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total

Note how trim takes a million operations and most of the time.  That
should be done in one operation.  If you stop advertising discard
support on the disk ("discard:disable") it takes only a fraction of
the time.

The extents one is harder to demonstrate, but it makes our code
considerably more complex that we cannot just grab the extent map for
a whole disk larger than 4GB in a single command.  (The complexity
won't go away, but the querying will be faster with fewer round trips
with this change.)

Nevertheless I'm not opposed to keeping this as an extension until the
implementations are upstream and bedded in.

Rich.

> With that said, for the things I didn't reply to, you can add:
> 
> Reviewed-By: Wouter Verhelst <w at uter.be>
> 
> -- 
>      w at uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
> 
> I will have a Tin-Actinium-Potassium mixture, thanks.
> 
> _______________________________________________
> Libguestfs mailing list
> Libguestfs at redhat.com
> https://listman.redhat.com/mailman/listinfo/libguestfs
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Eric Blake

2023-Apr-18 15:54 UTC

head link

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst
wrote:> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
> > Rather than requiring all servers and clients to have a 32-bit limit
> > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
> > support for a 64-bit single I/O transaction now.
> > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
> > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
> > 
> > By standardizing this, all clients must be prepared to support both
> > types of hole type replies, even though most server implementations of
> > extended replies are likely to only send one hole type.
> 
> I think it's probably a better idea to push this patch to a separate
> "extension-*" branch, and link to that from proto.md on master.
Those
> are documented as "we standardized this, but no first implementor
exists
> yet".
> 
> If someone actually comes up with a reason for 64-bit transactions, we
> can then see if the spec matches the need and merge it to master.
Okay, based on that, I'll merge this in a new branch, at which point
we WILL have something definitive to point to as a way to unjam the
patches to libnbd and qemu (they were waiting in part on having a spec
close enough to final).  I think patches 1 and 2 are ready for
mainstream, and only 3 and later need the extension branch.  I'm also
very reluctant to check patch 6 into the extension branch for now
(having it just be in the mail archives is good enough), since I have
not yet played with making qemu or libnbd support payloads larger than
64M, and am not sure if it ever makes sense to try to do an
NBD_CMD_READ or NBD_CMD_WRITE with more than 4G of material at once.
For that matter, ssize_t constraints to send() and recv() may make it
impractical to ever allow a maximum payload larger than 2G.
> 
> Otherwise this feels too much like a solution in search of a problem to
> me.
Rich has already followed up with some demonstrations of where larger
effect lengths can matter (on commands where effect length is
orthogonal to payload length).
> 
> With that said, for the things I didn't reply to, you can add:
> 
> Reviewed-By: Wouter Verhelst <w at uter.be>
I've replied to your other reviews with a couple of ideas to squash
in.  The insistence on only a 64-bit block status reply when extended
headers are in effect will have the most ripple effect on my
qemu/libnbd patches, but I don't think it is insurmountable, and agree
that insisting on extended headers mandating 64-bit responses is a bit
simpler than a client that has to handle both 32- and 64-bit responses
(a client may still need the complexity of handling both types in
order to talk to servers without extended headers, but that is a
different sort of complexity and can be phased out over time if lots
of servers decide to move towards 64-bit headers).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Libguestfs - Apr 2023 - [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

[Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT