Eric Blake
2022-Apr-07 21:37 UTC
[Libguestfs] [PATCH 0/2] More NBD spec prep-work before 64-bit headers
In implementing my proof-of-concept 64-bit headers, I found the following spec changes to be independent enough to post for review now. Eric Blake (2): spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length spec: Tweak description of maximum block size doc/proto.md | 87 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 34 deletions(-) -- 2.35.1
Eric Blake
2022-Apr-07 21:37 UTC
[Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
The spec was silent on how many extents a server could reply with. However, both qemu and nbdkit (the two server implementations known to have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard cap, and will truncate the amount of extents in a reply to avoid sending a client a reply larger than the maximum NBD_CMD_READ response they are willing to tolerate: When qemu first implemented NBD_CMD_BLOCK_STATUS for the base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never responded with more than one extent. Later, when adding its qemu:dirty-bitmap:XXX context extension (qemu commit 3d068aff16, Jun 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was applied to base:allocation once qemu started sending multiple extents for that context as well (qemu commit fb7afc797e, Jul 2018). Qemu extents are never smaller than 512 bytes (other than an exception at the end of a file whose size is not aligned to 512), but even so, a request for just under 4G of block status could produce 8M extents, resulting in a reply of 64M if it were not capped smaller. When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5, Mar 2019), it did not impose any restriction on the number of extents in the reply chunk. But because it allows extents as small as one byte, it is easy to write a server that can amplify a client's request of status over 1M of the image into a reply over 8M in size, and it was very easy to demonstrate that a hard cap was needed to avoid crashing clients or otherwise killing the connection (a bad server impacting the client negatively); unique to nbdkit's situation is the fact that because it is designed for plugin server implementations, not capping the length of extent also posed a problem to nbdkit as the server (a client requesting large block status could cause the server to run out of memory depending on the plugin providing the server callbacks). So nbdkit enforced a bound of 1M extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019). Since the limit chosen by these two implementations is different, and since nbdkit has versions that were not limited, add this as a SHOULD NOT instead of MUST NOT constraint on servers implementing block status. It does not matter that qemu picked a smaller limit that it truncates to, since we have already documented that the server may truncate for other reasons (such as it being inefficient to collect that many extents in the first place). But documenting the limit now becomes even more important in the face of a future addition of 64-bit requests, where a client's request is no longer bounded to 4G and could thereby produce even more than 8M extents for the corner case when every 512 bytes is a new extent, if it were not for this recommendation. --- doc/proto.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index bacccfa..c3c7cd9 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -1815,6 +1815,12 @@ MUST initiate a hard disconnect. the different contexts need not have the same number of extents or cumulative extent length. + Servers SHOULD NOT send more than 2^20 extents in a single reply + chunk; in other words, the maximum size of + `NBD_REPLY_TYPE_BLOCK_STATUS` should not be more than 4 + 8*2^20 + (8,388,612 bytes), even if this requires that the server truncate + the response in relation to the *length* requested by the client. + Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in its request, the server MAY return fewer descriptors in the reply than would be required to fully specify the whole range of requested @@ -2181,10 +2187,13 @@ The following request types exist: multiple descriptors, and the final descriptor MAY extend beyond the original requested size if the server can determine a larger length without additional effort. On the other hand, the server MAY - return less data than requested. However the server MUST return at - least one status descriptor (and since each status descriptor has - a non-zero length, a client can always make progress on a - successful return). The server SHOULD use different *status* + return less data than requested. In particular, a server SHOULD NOT + send more than 2^20 status descriptors in a single chunk. + + However the server MUST return at least one status descriptor, + and since each status descriptor has a non-zero length, a client + can always make progress on a successful return. The server SHOULD + use different *status* values between consecutive descriptors where feasible, although the client SHOULD be prepared to handle consecutive descriptors with the same *status* value. The server SHOULD use descriptor -- 2.35.1
Eric Blake
2022-Apr-07 21:37 UTC
[Libguestfs] [PATCH 2/2] spec: Tweak description of maximum block size
Commit 9f30fedb improved the spec to allow non-payload requests that exceed any advertised maximum block size. Take this one step further by permitting the server to use NBD_EOVERFLOW as a hint to the client when a request is oversize (while permitting NBD_EINVAL for back-compat), and by rewording the text to explicitly call out that what is being advertised is the maximum payload length, not maximum block size. This becomes more important when we add 64-bit extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS` to have both an effect length (how much of the image does the client want status on) and a payload length (filtering the response to a subset of negotiated metadata contexts). --- doc/proto.md | 70 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 40 insertions(+), 30 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index c3c7cd9..e332e21 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -769,8 +769,8 @@ learn the server's constraints without committing to them. If block size constraints have not been advertised or agreed on externally, then a server SHOULD support a default minimum block size -of 1, a preferred block size of 2^12 (4,096), and a maximum block size -that is effectively unlimited (0xffffffff, or the export size if that +of 1, a preferred block size of 2^12 (4,096), and a maximum block payload +size that is at least 2^25 (33,554,432) (or the export size if that is smaller), while a client desiring maximum interoperability SHOULD constrain its requests to a minimum block size of 2^9 (512), and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of @@ -815,8 +815,8 @@ the preferred block size for that export. The server MAY advertise an export size that is not an integer multiple of the preferred block size. -The maximum block size represents the maximum length that the server -is willing to handle in one request. If advertised, it MAY be +The maximum block size represents the maximum payload length that the +server is willing to handle in one request. If advertised, it MAY be something other than a power of 2, but MUST be either an integer multiple of the minimum block size or the value 0xffffffff for no inherent limit, MUST be at least as large as the smaller of the @@ -825,7 +825,20 @@ preferred block size or export size, and SHOULD be at least 2^20 MAY advertise a maximum block size that is larger than the export size, although in that case, the client MUST treat the export size as the effective maximum block size (as further constrained by a nonzero -offset). +offset). For commands that require a payload in either direction and +where the client controls the payload length (`NBD_CMD_WRITE` or +`NBD_CMD_READ`), the client MUST NOT use a length larger than the +maximum block size. For replies where the payload length is controlled +by the server (such as `NBD_CMD_BLOCK_STATUS` without the flag +`NBD_CMD_FLAG_REQ_ONE`), the server MUST NOT send a reply larger than +the maximum block size. For commands that do not require a payload +(such as `NBD_CMD_TRIM`), the client MAY request a length larger than +the maximum; the server SHOULD NOT disconnect, but MAY reply with an +`NBD_EOVERFLOW` or `NBD_EINVAL` error if the oversize request would +require more server resources than the same command operating on only +the maximum block size (such as some implementations of +`NBD_CMD_WRITE_ZEROES` without the flag `NBD_CMD_FLAG_FAST_ZERO`, or +`NBD_CMD_CACHE`). Where a transmission request can have a nonzero *offset* and/or *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`), @@ -833,24 +846,17 @@ the client MUST ensure that *offset* and *length* are integer multiples of any advertised minimum block size, and SHOULD use integer multiples of any advertised preferred block size where possible. For those requests, the client MUST NOT use a *length* which, when added to -*offset*, would exceed the export size. Also for NBD_CMD_READ, -NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for -when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length* -larger than any advertised maximum block size. -The server SHOULD report an `NBD_EINVAL` error if -the client's request is not aligned to advertised minimum block size -boundaries, or is larger than the advertised maximum block size. +*offset*, would exceed the export size. The server SHOULD report an +`NBD_EINVAL` error if the client's request is not aligned to advertised +minimum block size boundaries or would exceed the export size. + Notwithstanding any maximum block size advertised, either the server -or the client MAY initiate a hard disconnect if the payload of an -`NBD_CMD_WRITE` request or `NBD_CMD_READ` reply would be large enough -to be deemed a denial of service attack; however, for maximum -portability, any *length* less than 2^25 (33,554,432) bytes SHOULD NOT -be considered a denial of service attack (even if the advertised -maximum block size is smaller). For all other commands, where the -*length* is not reflected in the payload (such as `NBD_CMD_TRIM` or -`NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with -an `NBD_EINVAL` error for a client that exceeds the maximum block size, -rather than initiating a hard disconnect. +or the client MAY initiate a hard disconnect if a payload length of +either a request or a reply would be large enough to be deemed a +denial of service attack; however, for maximum portability, any +*length* less than 2^25 (33,554,432) bytes SHOULD NOT be considered a +denial of service attack (even if the advertised maximum block size is +smaller). ## Metadata querying @@ -1592,7 +1598,7 @@ during option haggling in the fixed newstyle negotiation. - 16 bits, `NBD_INFO_BLOCK_SIZE` - 32 bits, minimum block size - 32 bits, preferred block size - - 32 bits, maximum block size + - 32 bits, maximum block payload size * `NBD_REP_META_CONTEXT` (4) @@ -1740,7 +1746,8 @@ Some chunk types can additionally be categorized by role, such as *error chunks* or *content chunks*. Each type determines how to interpret the "length" bytes of payload. If the client receives an unknown or unexpected type, other than an *error chunk*, it -MUST initiate a hard disconnect. +MUST initiate a hard disconnect. A server MUST NOT send a chunk +larger than any advertised maximum block payload size. * `NBD_REPLY_TYPE_NONE` (0) @@ -2111,11 +2118,12 @@ The following request types exist: If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail with `NBD_ENOTSUP`, even if the operation is no faster than a - corresponding `NBD_CMD_WRITE`. Conversely, if - `NBD_CMD_FLAG_FAST_ZERO` is set, the server MUST fail quickly with - `NBD_ENOTSUP` unless the request can be serviced in less time than - a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents - of the export when returning this failure. The server's + corresponding `NBD_CMD_WRITE`. Conversely, if `NBD_CMD_FLAG_FAST_ZERO` + is set, the server SHOULD NOT fail with `NBD_EOVERFLOW` regardless of + the client length, MUST fail quickly with `NBD_ENOTSUP` unless the + request can be serviced in less time than a corresponding + `NBD_CMD_WRITE`, and SHOULD NOT alter the contents of the export when + returning an `NBD_ENOTSUP` failure. The server's determination on whether to fail a fast request MAY depend on a number of factors, such as whether the request was suitably aligned, on whether the `NBD_CMD_FLAG_NO_HOLE` flag was present, @@ -2266,7 +2274,9 @@ SHOULD return `NBD_EPERM` if it receives a write or trim request on a read-only export. The server SHOULD NOT return `NBD_EOVERFLOW` except as documented in -response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported. +response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported, or when +a command without payload requests a length larger than an advertised +maximum block payload length. The server SHOULD NOT return `NBD_ENOTSUP` except as documented in response to `NBD_CMD_WRITE_ZEROES` when `NBD_CMD_FLAG_FAST_ZERO` is -- 2.35.1