Laszlo Ersek
2023-May-30 15:48 UTC
[Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers
On 5/30/23 16:56, Wouter Verhelst wrote:> On Tue, May 30, 2023 at 01:50:59PM +0200, Laszlo Ersek wrote: >> On 5/25/23 15:00, Eric Blake wrote:>>> +struct nbd_structured_reply_block_status_ext_hdr { >>> + uint32_t context_id; /* metadata context ID */ >>> + uint32_t count; /* length of following array */ >> >> (1) I think that "length of following array" is confusing here. Per >> spec, this is "descriptor count" -- that's clearer. "Length" could be >> mistaken for "number of bytes". >> >> Also, what was the justification for *not* making "count" uint64_t? >> (Just asking.) I do understand a server is permitted to respond with a >> block status reply that covers less than the requested range, so I >> understand a server, if it needs to, *can* truncate its response for the >> sake of fitting "count" into a uint32_t. > > This is, as you say, the number of block descriptor messages we are > going to send to the client. Each such message is at 8 bytes long.16 bytes, isn't it? (uint64_t length, uint64_t status_flags)> That would mean that by the time you overflow a uint32_t with the number > of extents that are to be sent, you'll be promising to send 2^32 * 8 > (i.e., 2^36) bytes of data, or 32 GiB.(Or 64 GiB if we agree that sizeof(nbd_block_descriptor_ext)=16.) But... yes, I'm aware this is exorbitant, practically speaking. My concern was that "practical considerations" must have been what led to the original 32-bit-only: struct nbd_block_descriptor { uint32_t length; /* length of block */ uint32_t status_flags; /* block type (hole etc) */ } NBD_ATTRIBUTE_PACKED; which is now proving too restrictive (being the basis for this entire work, IIUC). A 2^(32+9) B = 2 TB image is not unthinkable. If the image used 512B sectors, and sectors alternated between being a hole and being allocated, then 2^32 extended descriptors would be justified. May not be practical, but that's "policy"; the "mechanism" could still exist (if it doesn't come with undesirable costs).> That would be a ridiculously amount of data, and no user is going to > wait for a client to finish parsing that amount of data without a hard > reboot of their client.Over a 10gbit/s connection, transferring 64GB should take on the order of a minute. ... *parsing* 4.3 billion entries is a different matter, of course ;) OK!> The only change that would be reasonable here is to reduce the size of > this field 16 to bits, rather than increasing it (but then we lose > alignment, so that's also not a good idea)Putting aside alignment even, I don't understand why reducing "count" to uint16_t would be reasonable. With the current 32-bit-only block descriptor, we already need to write loops in libnbd clients, because we can't cover the entire remote image in one API call [*]. If I understood Eric right earlier, the 64-bit extensions were supposed to remedy that -- but as it stands, clients will still need loops ("chunking") around block status fetching; is that right? Let me emphasize again that I'm not challenging the spec; also I don't secretly wish for the patches to do more than required by the spec. I guess I'd like to understand the intent of the spec, plus the consequences for client applications. [*] References (in this order!): - https://github.com/libguestfs/virt-v2v/commit/27c056cdc6aa0 - https://gitlab.com/nbdkit/libnbd/-/commit/0e714a6e06e6 - https://github.com/libguestfs/virt-v2v/commit/a2afed32d8b1f To be less cryptic, the first commit added "chunked" block status fetching to virt-v2v. Because our OCaml language libnbd mappings weren't proper at the time, that loop could move backwards and get stuck. We fixed that in the second commit (regarding the bindings) and then adapted virt-v2v's loop to the fixed bindings in the thirds commit. I've been hoping that such complexities could be eliminated in the future by virtue of not having to "chunk" the block status fetching. ( BTW I'm foreseeing a problem: if the extended block descriptor can provide an unsigned 64-bit length, we're going to have trouble exposing that in OCaml, because OCaml only has signed 64-bit integers. So that's going to reproduce the same issue, only for OCaml callers of the *new* API. I can see Eric's series includes patches like "ocaml: Add example for 64-bit extents" -- I've not looked at those yet; for now I'm just wondering what tricks we might need in the bindings generator. The method seen in the "middle patch" above won't work; we don't have a native OCaml "i128" type for example that we could use as an escape hatch, for representing C's uint64_t. ) Thanks! Laszlo
Eric Blake
2023-May-30 18:48 UTC
[Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers
On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:> On 5/30/23 16:56, Wouter Verhelst wrote: > > On Tue, May 30, 2023 at 01:50:59PM +0200, Laszlo Ersek wrote: > >> On 5/25/23 15:00, Eric Blake wrote: > > >>> +struct nbd_structured_reply_block_status_ext_hdr { > >>> + uint32_t context_id; /* metadata context ID */ > >>> + uint32_t count; /* length of following array */ > >> > >> (1) I think that "length of following array" is confusing here. Per > >> spec, this is "descriptor count" -- that's clearer. "Length" could be > >> mistaken for "number of bytes". > >> > >> Also, what was the justification for *not* making "count" uint64_t? > >> (Just asking.) I do understand a server is permitted to respond with a > >> block status reply that covers less than the requested range, so I > >> understand a server, if it needs to, *can* truncate its response for the > >> sake of fitting "count" into a uint32_t. > > > > This is, as you say, the number of block descriptor messages we are > > going to send to the client. Each such message is at 8 bytes long. > > 16 bytes, isn't it? (uint64_t length, uint64_t status_flags)Existing block status was 8 bytes per extent (uint32_t * 2), but yes, the extended extent info is 16 bytes (uint64_t * 2).> > > That would mean that by the time you overflow a uint32_t with the number > > of extents that are to be sent, you'll be promising to send 2^32 * 8 > > (i.e., 2^36) bytes of data, or 32 GiB. > > (Or 64 GiB if we agree that sizeof(nbd_block_descriptor_ext)=16.) > > But... yes, I'm aware this is exorbitant, practically speaking.We already coded into the NBD spec (commit 926a51df) that a server SHOULD truncate its response rather than sending more than 2M extent entries; I picked that number because 32M (the existing payload cap in qemu) / 16 bytes is 2M extents (for non-extended headers, it actually caps things at 16M instead of 32M). In that commit, I pointed out that qemu actually caps things at 128k entries (payload cap 1M without extended headers), and nbdkit caps things at 1M entries.> > My concern was that "practical considerations" must have been what led > to the original 32-bit-only: > > struct nbd_block_descriptor { > uint32_t length; /* length of block */ > uint32_t status_flags; /* block type (hole etc) */ > } NBD_ATTRIBUTE_PACKED; > > which is now proving too restrictive (being the basis for this entire > work, IIUC). > > A 2^(32+9) B = 2 TB image is not unthinkable. If the image used 512B > sectors, and sectors alternated between being a hole and being > allocated, then 2^32 extended descriptors would be justified. > > May not be practical, but that's "policy"; the "mechanism" could still > exist (if it doesn't come with undesirable costs). > > > That would be a ridiculously amount of data, and no user is going to > > wait for a client to finish parsing that amount of data without a hard > > reboot of their client. > > Over a 10gbit/s connection, transferring 64GB should take on the order > of a minute. > > ... *parsing* 4.3 billion entries is a different matter, of course ;) > > OK! > > > The only change that would be reasonable here is to reduce the size of > > this field 16 to bits, rather than increasing it (but then we lose > > alignment, so that's also not a good idea) > > Putting aside alignment even, I don't understand why reducing "count" to > uint16_t would be reasonable. With the current 32-bit-only block > descriptor, we already need to write loops in libnbd clients, because we > can't cover the entire remote image in one API call [*]. If I understood > Eric right earlier, the 64-bit extensions were supposed to remedy that > -- but as it stands, clients will still need loops ("chunking") around > block status fetching; is that right?While the larger extents reduce the need for looping, it does not entirely eliminate it. For example, just because the server can now tell you that an image is entirely data in just one reply does not mean that it will actually do so - qemu in particular limits block status of a qcow2 file to reporting just one cluster at a time for consistency reasons, where even if you use the maximum size of 2M clusters, you can never get more than (2M/16)*2M = 256G status reported in a single request. But without 64-bit lengths, you are guaranteed to need at least 2 (and possibly more) round trips to map out an entire 6G image, while with 64-bit lengths, you can often get an entire image map in one round trip. Reducing 'count' to uint16_t for 64-bit responses would be possible if we wanted to hard-code a server limit of never sending more than 64k extents in one go; but above, I argued that existing servers currently do exceed that cap even for 32-bit responses.> > Let me emphasize again that I'm not challenging the spec; also I don't > secretly wish for the patches to do more than required by the spec. I > guess I'd like to understand the intent of the spec, plus the > consequences for client applications. > > [*] References (in this order!): > > - https://github.com/libguestfs/virt-v2v/commit/27c056cdc6aa0 > - https://gitlab.com/nbdkit/libnbd/-/commit/0e714a6e06e6 > - https://github.com/libguestfs/virt-v2v/commit/a2afed32d8b1f > > To be less cryptic, the first commit added "chunked" block status > fetching to virt-v2v. Because our OCaml language libnbd mappings weren't > proper at the time, that loop could move backwards and get stuck. We > fixed that in the second commit (regarding the bindings) and then > adapted virt-v2v's loop to the fixed bindings in the thirds commit. I've > been hoping that such complexities could be eliminated in the future by > virtue of not having to "chunk" the block status fetching. >We'll always have to deal with servers that send shorter replies than we asked for, and where consecutive replies may have the same status. The best the spec was able to do was recommend that the server return as much as it can, and without consecutive status, where feasible.> ( > > BTW I'm foreseeing a problem: if the extended block descriptor can > provide an unsigned 64-bit length, we're going to have trouble exposing > that in OCaml, because OCaml only has signed 64-bit integers. So that's > going to reproduce the same issue, only for OCaml callers of the *new* API.At one point, I was playing with an idea to add a patch to the NBD spec to REQUIRE that an export be capped at 2^63-1 bytes (that is, the maximum of 'off_t'). It doesn't change any existing implementations, and actually frees us up to use signed 64-bit numbers where negative values are indeed error cases. I'll probably try to revisit my thoughts on that patch front, but don't think it holds up this series.> > I can see Eric's series includes patches like "ocaml: Add example for > 64-bit extents" -- I've not looked at those yet; for now I'm just > wondering what tricks we might need in the bindings generator. The > method seen in the "middle patch" above won't work; we don't have a > native OCaml "i128" type for example that we could use as an escape > hatch, for representing C's uint64_t.I _did_ go with the (currently implicit) assumption that no server will ever expose larger than off_t can represent, and therefore a signed 64-bit size is good enough. Flags has to be unsigned, but flags is representing something different than image size. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2023-Jun-07 10:00 UTC
[Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers
On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:> BTW I'm foreseeing a problem: if the extended block descriptor can > provide an unsigned 64-bit length, we're going to have trouble exposing > that in OCaml, because OCaml only has signed 64-bit integers. So that's > going to reproduce the same issue, only for OCaml callers of the *new* API. > > I can see Eric's series includes patches like "ocaml: Add example for > 64-bit extents" -- I've not looked at those yet; for now I'm just > wondering what tricks we might need in the bindings generator. The > method seen in the "middle patch" above won't work; we don't have a > native OCaml "i128" type for example that we could use as an escape > hatch, for representing C's uint64_t.I think that's OK because disk sizes are already limited to 2^63 - 1 by the kernel (and for qemu even less than that). The OCaml bindings return a (signed) int64 for NBD.get_size. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit