Laszlo Ersek
2023-Jun-01 09:04 UTC
[Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies
On 5/25/23 15:00, Eric Blake wrote:> Support receiving headers for 64-bit replies if extended headers were > negotiated. We already insist that the server not send us too much > payload in one reply, so we can exploit that and merge the 64-bit > length back into a normalized 32-bit field for the rest of the payload > length calculations. The NBD protocol specifically documents that > extended mode takes precedence over structured replies, and that there > are no simple replies in extended mode. We can also take advantage > that the handle field is in the same offset in all the various reply > types.(1) We might want to replace "handle" with "cookie" at this point.> > Note that if we negotiate extended headers, but a non-compliant server > replies with a non-extended header, this patch will stall waiting for > the server to send more bytesAh, yes. If we negotiate extended headers, we set "rlen" to "sizeof extended" in REPLY.START, which is larger than either of "sizeof simple" and "sizeof structured". Therefore we'll get stuck in REPLY.RECV_REPLY. Correct?> rather than noticing that the magic > number is wrong (for aio operations, you'll get a magic number > mismatch once you send a second command that elicits a reply; but for > blocking operations, we basically deadlock). The easy alternative > would be to read just the first 4 bytes of magic, then determine how > many more bytes to expect; but that would require more states and > syscalls, and not worth it since the typical server will be compliant.Not liking this "not worth it" argument. But...> The other alternative is what the next patch implements: teaching > REPLY.RECV_REPLY to handle short reads that were at least long enough > to transmit magic to specifically look for magic number mismatch.... that sounds OK!> > At this point, h->extended_headers is permanently false (we can't > enable it until all other aspects of the protocol have likewise been > converted). > > Signed-off-by: Eric Blake <eblake at redhat.com> > --- > lib/internal.h | 1 + > generator/states-reply-structured.c | 52 +++++++++++++++++++---------- > generator/states-reply.c | 34 ++++++++++++------- > 3 files changed, 58 insertions(+), 29 deletions(-)Probably best to reorder the files in this patch for review:> > diff --git a/lib/internal.h b/lib/internal.h > index 8a5f93d4..e4e21a4d 100644 > --- a/lib/internal.h > +++ b/lib/internal.h > @@ -243,6 +243,7 @@ struct nbd_handle { > union { > struct nbd_simple_reply simple; > struct nbd_structured_reply structured; > + struct nbd_extended_reply extended; > } hdr; > union { > struct nbd_structured_reply_offset_data offset_data;Seems OK.> diff --git a/generator/states-reply.c b/generator/states-reply.c > index 31e4bd2f..4e9f2dde 100644 > --- a/generator/states-reply.c > +++ b/generator/states-reply.c > @@ -62,15 +62,19 @@ REPLY.START: > */ > ssize_t r; > > - /* We read all replies initially as if they are simple replies, but > - * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below. > - * This works because the structured_reply header is larger. > + /* With extended headers, there is only one size to read, so we can do it > + * all in one syscall. But for older structured replies, we don't know if > + * we have a simple or structured reply until we read the magic number, > + * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below. > */ > assert (h->reply_cmd == NULL); > assert (h->rlen == 0); > > h->rbuf = &h->sbuf.reply.hdr; > - h->rlen = sizeof h->sbuf.reply.hdr.simple; > + if (h->extended_headers) > + h->rlen = sizeof h->sbuf.reply.hdr.extended; > + else > + h->rlen = sizeof h->sbuf.reply.hdr.simple; > > r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen); > if (r == -1) {The comment "This works because the structured_reply header is larger" is being removed, which makes the non-ext branch even less comprehensible than before. (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is smaller than "sizeof structured"?> @@ -116,16 +120,22 @@ REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY: > uint64_t cookie; > > magic = be32toh (h->sbuf.reply.hdr.simple.magic); > - if (magic == NBD_SIMPLE_REPLY_MAGIC) { > + switch (magic) { > + case NBD_SIMPLE_REPLY_MAGIC: > + if (h->extended_headers) > + goto invalid; > SET_NEXT_STATE (%SIMPLE_REPLY.START); > - } > - else if (magic == NBD_STRUCTURED_REPLY_MAGIC) { > + break; > + case NBD_STRUCTURED_REPLY_MAGIC: > + case NBD_EXTENDED_REPLY_MAGIC: > + if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)Heh, I love this ((a == b) == c) "equivalence" form! :)> + goto invalid; > SET_NEXT_STATE (%STRUCTURED_REPLY.START); > - } > - else { > - /* We've probably lost synchronization. */ > - SET_NEXT_STATE (%.DEAD); > - set_error (0, "invalid reply magic 0x%" PRIx32, magic); > + break; > + default: > + invalid: > + SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */ > + set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic); > #if 0 /* uncomment to see desynchronized data */ > nbd_internal_hexdump (&h->sbuf.reply.hdr.simple, > sizeof (h->sbuf.reply.hdr.simple),My apologies, but I find *this* placement of "invalid" terrible. I thought the "goto invalid" statements were jumping to the end of the function. Now I see they jump effectively to another case label. (3) How about this (on top of your patch, to be squashed): /--------------------- | diff --git a/generator/states-reply.c b/generator/states-reply.c | index 4e9f2ddeb567..ea7bd4e7db91 100644 | --- a/generator/states-reply.c | +++ b/generator/states-reply.c | @@ -133,15 +133,8 @@ REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY: | SET_NEXT_STATE (%STRUCTURED_REPLY.START); | break; | default: | - invalid: | - SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */ | - set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic); | -#if 0 /* uncomment to see desynchronized data */ | - nbd_internal_hexdump (&h->sbuf.reply.hdr.simple, | - sizeof (h->sbuf.reply.hdr.simple), | - stderr); | -#endif | - return 0; | + /* We've probably lost synchronization. */ | + goto invalid: | } | | /* NB: This works for both simple and structured replies because the | @@ -159,6 +152,16 @@ REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY: | h->reply_cmd = cmd; | return 0; | | +invalid: | + SET_NEXT_STATE (%.DEAD); | + set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic); | +#if 0 /* uncomment to see desynchronized data */ | + nbd_internal_hexdump (&h->sbuf.reply.hdr.simple, | + sizeof (h->sbuf.reply.hdr.simple), | + stderr); | +#endif | + return 0; | + | REPLY.FINISH_COMMAND: | struct command *prev_cmd, *cmd; | uint64_t cookie; \--------------------- ... At the same time, even with this "cleanup" for the labels, I find it relatively confusing (albeit correct, as far as I can tell!) that in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we first check the magic number(s) and *then* whether we negotiated extended headers, whereas in all the other state handlers, the order (= condition nesting) is the opposite: we first check extended headers, and deal with any other objects second. This makes the assert()s in REPLY.STRUCTURED_REPLY.START especially tricky to demonstrate -- I think I've managed to do it, it looks correct, it's just hard. So that's a half- (or quarter-)hearted proposal to investigate how REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY looked if there too, "extended_headers" were the outer check. Anyway, I don't feel strongly about this. :) (4) Still in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we have a comment saying /* NB: This works for both simple and structured replies because the * handle (our cookie) is stored at the same offset. */ The code under it applies to extended headers too, so the comment should be updated. (There's a similar comment in REPLY.FINISH_COMMAND, too; I'm not sure if that state is relevant for extended headers.) Then:> diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c > index df509216..c53aed7b 100644 > --- a/generator/states-reply-structured.c > +++ b/generator/states-reply-structured.c > @@ -45,14 +45,20 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length, > > STATE_MACHINE { > REPLY.STRUCTURED_REPLY.START: > - /* We've only read the simple_reply. The structured_reply is longer, > - * so read the remaining part. > + /* If we have extended headers, we've already read the entire header. > + * Otherwise, we've only read enough for a simple_reply; since structured > + * replies are longer, read the remaining part. > */(5) Ah, the word "simple_reply", which this change preserves, is a leftover. It should be updated in patch#2, "internal: Refactor layout of replies in sbuf", where the "simple_reply" field is replaced with "reply.hdr.simple" in "sbuf". I guess I didn't notice this omission in patch#2 because the context there only shows the "so read the remaining part" line of the comment, and "simple_reply" is on the preceding line. And then this patch too should refer to "reply.hdr.simple", in the new comment. Alternatively, perhaps use the type name (struct tag, actually): "nbd_simple_reply".> - h->rbuf = &h->sbuf; > - h->rbuf = (char *)h->rbuf + sizeof h->sbuf.reply.hdr.simple; > - h->rlen = sizeof h->sbuf.reply.hdr.structured; > - h->rlen -= sizeof h->sbuf.reply.hdr.simple; > - SET_NEXT_STATE (%RECV_REMAINING); > + if (h->extended_headers) { > + assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + (char*)&h->sbuf); > + SET_NEXT_STATE (%CHECK); > + } > + else { > + assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf); > + h->rlen = sizeof h->sbuf.reply.hdr.structured - > + sizeof h->sbuf.reply.hdr.simple; > + SET_NEXT_STATE (%RECV_REMAINING); > + } > return 0; > > REPLY.STRUCTURED_REPLY.RECV_REMAINING:This looks OK, but can we make it less verbose? How about: /---------------------- | diff --git a/lib/internal.h b/lib/internal.h | index e4e21a4d1ffa..a630b2511ff7 100644 | --- a/lib/internal.h | +++ b/lib/internal.h | @@ -240,7 +240,7 @@ struct nbd_handle { | } or; | struct nbd_export_name_option_reply export_name_reply; | struct { | - union { | + union reply_header { | struct nbd_simple_reply simple; | struct nbd_structured_reply structured; | struct nbd_extended_reply extended; | diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c | index c53aed7bb859..852cb5b6053c 100644 | --- a/generator/states-reply-structured.c | +++ b/generator/states-reply-structured.c | @@ -49,14 +49,14 @@ REPLY.STRUCTURED_REPLY.START: | * Otherwise, we've only read enough for a simple_reply; since structured | * replies are longer, read the remaining part. | */ | + union reply_header *hdr = &h->sbuf.reply.hdr; | if (h->extended_headers) { | - assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + (char*)&h->sbuf); | + assert (h->rbuf == sizeof hdr->extended + (char*)&h->sbuf); | SET_NEXT_STATE (%CHECK); | } | else { | - assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf); | - h->rlen = sizeof h->sbuf.reply.hdr.structured - | - sizeof h->sbuf.reply.hdr.simple; | + assert (h->rbuf == sizeof hdr->simple + (char*)&h->sbuf); | + h->rlen = sizeof hdr->structured - sizeof hdr->simple; | SET_NEXT_STATE (%RECV_REMAINING); | } | return 0; \---------------------- Anyway, feel free to ignore this -- I can see two counter-arguments myself: - the rest of the code remains just as verbose, - one could argue that the addition sizeof hdr->simple + (char*)&h->sbuf is *more* confusing than sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf Then:> @@ -69,11 +75,18 @@ REPLY.STRUCTURED_REPLY.RECV_REMAINING: > REPLY.STRUCTURED_REPLY.CHECK: > struct command *cmd = h->reply_cmd; > uint16_t flags, type; > - uint32_t length; > + uint64_t length; > + uint64_t offset = -1;(6) I disagree with initializing the local variable "offset" here. Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read "offset" back if "extended_headers" is set. But if "extended_headers" is set, we also store a value to "offset", before the read. Initializing "offset" to -1 suggests that the code might otherwise read an indeterminate value from "offset" -- but that's not the case.> > + assert (cmd);(7) What guarantees this? Is it the lookup code at the end of REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY? But that code can set "reply_cmd" to NULL, in case the cookie is not found; and the cookie lookup there defers to FINISH for diagnosing the unassociated reply.> flags = be16toh (h->sbuf.reply.hdr.structured.flags); > type = be16toh (h->sbuf.reply.hdr.structured.type); > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + if (h->extended_headers) { > + length = be64toh (h->sbuf.reply.hdr.extended.length); > + offset = be64toh (h->sbuf.reply.hdr.extended.offset); > + } > + else > + length = be32toh (h->sbuf.reply.hdr.structured.length); > > /* Reject a server that replies with too much information, but don't > * reject a single structured reply to NBD_CMD_READ on the largest > @@ -83,13 +96,18 @@ REPLY.STRUCTURED_REPLY.CHECK: > * not worth keeping the connection alive. > */ > if (length > MAX_REQUEST_SIZE + sizeof h->sbuf.reply.payload.offset_data) { > - set_error (0, "invalid server reply length %" PRIu32, length); > + set_error (0, "invalid server reply length %" PRIu64, length); > SET_NEXT_STATE (%.DEAD); > return 0; > } > + /* For convenience, we now normalize extended replies into compact, > + * doable since we validated that payload length fits in 32 bits. > + */ > + h->sbuf.reply.hdr.structured.length = length;(8) I'm very confused by this. For an extended reply, this will overwrite the "offset" field. I understand we have stolen that field already, above; but that's little solace, especially without specific comments.> > /* Skip an unexpected structured reply, including to an unknown cookie. */ > - if (cmd == NULL || !h->structured_replies) > + if (cmd == NULL || !h->structured_replies || > + (h->extended_headers && offset != cmd->offset)) > goto resync; > > switch (type) {(9) Preserving the (cmd == NULL) sub-condition, plus the reference to "unkown cookie" in the comment, looks like a logic bug to me: it can never evaluate to "true", given the assert() I'm questioning above at (7). Alternatively, the assert() is wrong. (10) The comment only talks about "unexpected structured reply", but the condition now deals with extended headers too, and I don't know how those relate to each other. What I'm trying to express is that (a) checking "structured_replies" *here*, but (b) checking "extended_headers" against magic numbers in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, at the same time, seems inconsistent. If we get a structured reply after *not* having negotiated them, we should be able to catch that in the exact same location -- i.e., where we check magic numbers -- where we can also realize that we're getting an extended reply in spite of *not* having negotiated *those*. In other words, the treatment differs, and I don't know why: if we get an unexpected structured reply, we skip it and "resync", whereas if we get an unexpected extended reply, we move to the DEAD state. I think it's fine to skip replies that refer to unknown cookies, but *reply format* violations should be fatal. Then, this is not a request for an update, just a comment: I don't understand why the spec introduced the "offset" field at all. It does not seem to carry additional information beyond the cookie. So we can certainly check that the response's offset matches the command's offset (the code looks OK), but the larger purpose is unclear. (And this is probably also the reason why you get away with corrupting "nbd_extended_reply.offset" at (8) -- the field is never again needed, beyond the sanity check performed here.)> @@ -168,7 +186,7 @@ REPLY.STRUCTURED_REPLY.RECV_ERROR: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > assert (length >= sizeof h->sbuf.reply.payload.error.error.error); > assert (cmd); > > @@ -207,7 +225,7 @@ REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > msglen = be16toh (h->sbuf.reply.payload.error.error.len); > type = be16toh (h->sbuf.reply.hdr.structured.type); > > @@ -307,7 +325,7 @@ REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > offset = be64toh (h->sbuf.reply.payload.offset_data.offset); > > assert (cmd); /* guaranteed by CHECK */ > @@ -346,7 +364,7 @@ REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > offset = be64toh (h->sbuf.reply.payload.offset_data.offset); > > assert (cmd); /* guaranteed by CHECK */ > @@ -426,7 +444,7 @@ REPLY.STRUCTURED_REPLY.RECV_BS_HEADER: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > assert (cmd); /* guaranteed by CHECK */ > assert (cmd->type == NBD_CMD_BLOCK_STATUS); > @@ -460,7 +478,7 @@ REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES: > SET_NEXT_STATE (%.READY); > return 0; > case 0: > - length = be32toh (h->sbuf.reply.hdr.structured.length); > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > assert (cmd); /* guaranteed by CHECK */ > assert (cmd->type == NBD_CMD_BLOCK_STATUS);(11) This is all fallout from (8). The commit message does document it (in the first paragraph), but I don't understand where we *benefit* from stuffing "nbd_extended_reply.length" into "nbd_structured_reply.length". Regarding downsides thereof, I can see two: - as I mentioned, "nbd_extended_reply.offset" becomes unusable, - "nbd_structured_reply.length" will no longer be big-endian but host-endian, which arguably does not match the other fields' endianness in the scratch space (= sbuf). I feel this back-stuffing brings the stashed header into an inconsistent state that is specific to a subset of the automaton's states. I'd rather introduce an entirely new field to "sbuf.reply" -- between "sbuf.reply.hdr" and "sbuf.reply.payload" --, if we really need to cache a host-endian length value, regardless of which kind of reply header we got. Thanks! Laszlo
Eric Blake
2023-Jun-01 13:00 UTC
[Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies
On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:> On 5/25/23 15:00, Eric Blake wrote: > > Support receiving headers for 64-bit replies if extended headers were > > negotiated. We already insist that the server not send us too much > > payload in one reply, so we can exploit that and merge the 64-bit > > length back into a normalized 32-bit field for the rest of the payload > > length calculations. The NBD protocol specifically documents that > > extended mode takes precedence over structured replies, and that there > > are no simple replies in extended mode. We can also take advantage > > that the handle field is in the same offset in all the various reply > > types. > > (1) We might want to replace "handle" with "cookie" at this point.Yep, I'm already rebasing the series to do that, now that c1df4df9 landed.> > > > > Note that if we negotiate extended headers, but a non-compliant server > > replies with a non-extended header, this patch will stall waiting for > > the server to send more bytes > > Ah, yes. If we negotiate extended headers, we set "rlen" to "sizeof > extended" in REPLY.START, which is larger than either of "sizeof simple" > and "sizeof structured". Therefore we'll get stuck in REPLY.RECV_REPLY. > Correct?Yes.> > > rather than noticing that the magic > > number is wrong (for aio operations, you'll get a magic number > > mismatch once you send a second command that elicits a reply; but for > > blocking operations, we basically deadlock). The easy alternative > > would be to read just the first 4 bytes of magic, then determine how > > many more bytes to expect; but that would require more states and > > syscalls, and not worth it since the typical server will be compliant. > > Not liking this "not worth it" argument. But... > > > The other alternative is what the next patch implements: teaching > > REPLY.RECV_REPLY to handle short reads that were at least long enough > > to transmit magic to specifically look for magic number mismatch. > > ... that sounds OK!If you haven't guessed already, I actually did hit this bug (my initial qemu patches sent the wrong reply type, and it took me several minutes to figure out why libnbd was hung), so I was also pleasantly surprised at how easy the next patch turned out to be, after first trying (and failing) to go down the rabbit hole mentioned here of adding yet more states. But, as you mentioned earlier in the series, it MAY be worth rearranging states anyways, to specifically route extended header parsing differently from structured header parsing. So I'm still playing with that, and seeing what may happen to this patch as fallout.> > > > > At this point, h->extended_headers is permanently false (we can't > > enable it until all other aspects of the protocol have likewise been > > converted). > > > > Signed-off-by: Eric Blake <eblake at redhat.com> > > --- > > lib/internal.h | 1 + > > generator/states-reply-structured.c | 52 +++++++++++++++++++---------- > > generator/states-reply.c | 34 ++++++++++++------- > > 3 files changed, 58 insertions(+), 29 deletions(-) > > Probably best to reorder the files in this patch for review:I see what you mean: because of the state hierarchy, it is probably worth tweaking the git orderfile to favor files nearer the top of the state tree first, rather than strict alphabetical ordering of *.c. I may just push a patch for that without needing review, as it doesn't affect library semantics at all. ...> > +++ b/generator/states-reply.c > > @@ -62,15 +62,19 @@ REPLY.START: > > */ > > ssize_t r; > > > > - /* We read all replies initially as if they are simple replies, but > > - * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below. > > - * This works because the structured_reply header is larger. > > + /* With extended headers, there is only one size to read, so we can do it > > + * all in one syscall. But for older structured replies, we don't know if > > + * we have a simple or structured reply until we read the magic number, > > + * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below. > > */ > > assert (h->reply_cmd == NULL); > > assert (h->rlen == 0); > > > > h->rbuf = &h->sbuf.reply.hdr; > > - h->rlen = sizeof h->sbuf.reply.hdr.simple; > > + if (h->extended_headers) > > + h->rlen = sizeof h->sbuf.reply.hdr.extended; > > + else > > + h->rlen = sizeof h->sbuf.reply.hdr.simple; > > > > r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen); > > if (r == -1) { > > The comment "This works because the structured_reply header is larger" > is being removed, which makes the non-ext branch even less > comprehensible than before. > > (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is > smaller than "sizeof structured"?Yep, I'm already rebasing onto some additional asserts along those lines, based on your reviews earlier in the series.> > > @@ -116,16 +120,22 @@ REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY: > > uint64_t cookie; > > > > magic = be32toh (h->sbuf.reply.hdr.simple.magic); > > - if (magic == NBD_SIMPLE_REPLY_MAGIC) { > > + switch (magic) { > > + case NBD_SIMPLE_REPLY_MAGIC: > > + if (h->extended_headers) > > + goto invalid; > > SET_NEXT_STATE (%SIMPLE_REPLY.START); > > - } > > - else if (magic == NBD_STRUCTURED_REPLY_MAGIC) { > > + break; > > + case NBD_STRUCTURED_REPLY_MAGIC: > > + case NBD_EXTENDED_REPLY_MAGIC: > > + if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers) > > Heh, I love this ((a == b) == c) "equivalence" form! :)I still do a double-take every time I see 'a < b < c' in languages (like python) that support it as shorthand for 'a < b && b < c'. Even weirder is when you see 'a < b > c'. But yes, C's non-transitive =can be a surprise for the uninitiated.> > > + goto invalid; > > SET_NEXT_STATE (%STRUCTURED_REPLY.START); > > - } > > - else { > > - /* We've probably lost synchronization. */ > > - SET_NEXT_STATE (%.DEAD); > > - set_error (0, "invalid reply magic 0x%" PRIx32, magic); > > + break; > > + default: > > + invalid: > > + SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */ > > + set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic); > > #if 0 /* uncomment to see desynchronized data */ > > nbd_internal_hexdump (&h->sbuf.reply.hdr.simple, > > sizeof (h->sbuf.reply.hdr.simple), > > My apologies, but I find *this* placement of "invalid" terrible. I > thought the "goto invalid" statements were jumping to the end of the > function. Now I see they jump effectively to another case label. > > (3) How about this (on top of your patch, to be squashed):That part of the patch pre-dates other reviews I've seen from you on the same topic (this patch series has been percolating on my local builds for a long time), so I'm not surprised by your request, and I'm happy to squash in your improvement. I may have copied from other similar code in the generator/states-*.c files, if so, I'll probably do a separate cleanup patch first.> > ... At the same time, even with this "cleanup" for the labels, I find it > relatively confusing (albeit correct, as far as I can tell!) that in > REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we first check the magic > number(s) and *then* whether we negotiated extended headers, whereas in > all the other state handlers, the order (= condition nesting) is the > opposite: we first check extended headers, and deal with any other > objects second. This makes the assert()s in REPLY.STRUCTURED_REPLY.START > especially tricky to demonstrate -- I think I've managed to do it, it > looks correct, it's just hard. So that's a half- (or quarter-)hearted > proposal to investigate how REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY > looked if there too, "extended_headers" were the outer check. Anyway, I > don't feel strongly about this. :) > > (4) Still in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we have a comment > saying > > /* NB: This works for both simple and structured replies because the > * handle (our cookie) is stored at the same offset. > */ > > The code under it applies to extended headers too, so the comment should > be updated. > > (There's a similar comment in REPLY.FINISH_COMMAND, too; I'm not sure if > that state is relevant for extended headers.)Indeed, when rebasing, I need to remember to grep (to cover comments) rather than merely relying on the compiler (which only covers code) for renames.> > Then: > > > diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c > > index df509216..c53aed7b 100644 > > --- a/generator/states-reply-structured.c > > +++ b/generator/states-reply-structured.c > > @@ -45,14 +45,20 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length, > > > > STATE_MACHINE { > > REPLY.STRUCTURED_REPLY.START: > > - /* We've only read the simple_reply. The structured_reply is longer, > > - * so read the remaining part. > > + /* If we have extended headers, we've already read the entire header. > > + * Otherwise, we've only read enough for a simple_reply; since structured > > + * replies are longer, read the remaining part. > > */ > > (5) Ah, the word "simple_reply", which this change preserves, is a > leftover. It should be updated in patch#2, "internal: Refactor layout of > replies in sbuf", where the "simple_reply" field is replaced with > "reply.hdr.simple" in "sbuf". > > I guess I didn't notice this omission in patch#2 because the context > there only shows the "so read the remaining part" line of the comment, > and "simple_reply" is on the preceding line. > > And then this patch too should refer to "reply.hdr.simple", in the new > comment. > > Alternatively, perhaps use the type name (struct tag, actually): > "nbd_simple_reply".Will fix in the appropriate place(s).> > > - h->rbuf = &h->sbuf; > > - h->rbuf = (char *)h->rbuf + sizeof h->sbuf.reply.hdr.simple; > > - h->rlen = sizeof h->sbuf.reply.hdr.structured; > > - h->rlen -= sizeof h->sbuf.reply.hdr.simple; > > - SET_NEXT_STATE (%RECV_REMAINING); > > + if (h->extended_headers) { > > + assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + (char*)&h->sbuf); > > + SET_NEXT_STATE (%CHECK); > > + } > > + else { > > + assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf); > > + h->rlen = sizeof h->sbuf.reply.hdr.structured - > > + sizeof h->sbuf.reply.hdr.simple; > > + SET_NEXT_STATE (%RECV_REMAINING); > > + } > > return 0; > > > > REPLY.STRUCTURED_REPLY.RECV_REMAINING: > > This looks OK, but can we make it less verbose? How about: > > /---------------------- > | diff --git a/lib/internal.h b/lib/internal.h > | index e4e21a4d1ffa..a630b2511ff7 100644 > | --- a/lib/internal.h > | +++ b/lib/internal.h > | @@ -240,7 +240,7 @@ struct nbd_handle { > | } or; > | struct nbd_export_name_option_reply export_name_reply; > | struct { > | - union { > | + union reply_header {Oh cool. I keep forgetting that you can declare a type name usable at the top level even while declaring a nested struct. Yeah, doing that probably has some nice line length improvements.> | struct nbd_simple_reply simple; > | struct nbd_structured_reply structured; > | struct nbd_extended_reply extended; > | diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c > | index c53aed7bb859..852cb5b6053c 100644 > | --- a/generator/states-reply-structured.c > | +++ b/generator/states-reply-structured.c > | @@ -49,14 +49,14 @@ REPLY.STRUCTURED_REPLY.START: > | * Otherwise, we've only read enough for a simple_reply; since structured > | * replies are longer, read the remaining part. > | */ > | + union reply_header *hdr = &h->sbuf.reply.hdr; > | if (h->extended_headers) { > | - assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + (char*)&h->sbuf); > | + assert (h->rbuf == sizeof hdr->extended + (char*)&h->sbuf); > | SET_NEXT_STATE (%CHECK); > | } > | else { > | - assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf); > | - h->rlen = sizeof h->sbuf.reply.hdr.structured - > | - sizeof h->sbuf.reply.hdr.simple; > | + assert (h->rbuf == sizeof hdr->simple + (char*)&h->sbuf); > | + h->rlen = sizeof hdr->structured - sizeof hdr->simple; > | SET_NEXT_STATE (%RECV_REMAINING); > | } > | return 0; > \---------------------- > > Anyway, feel free to ignore this -- I can see two counter-arguments > myself: > > - the rest of the code remains just as verbose, >Not necessarily - once I add in more STATIC_ASSERTs, being able to refer to individual nested types without having to go all the way through struct nbd_handle or union sbuf will be shorter.> - one could argue that the addition > > sizeof hdr->simple + (char*)&h->sbuf > > is *more* confusing than > > sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbufThat argument still remains - it's easier to see that we expect a particular offset from sbuf when the use of sbuf in the offset calculation is not hidden several lines away in the intermediate initialization. I'll think about it for this line.> > Then: > > > @@ -69,11 +75,18 @@ REPLY.STRUCTURED_REPLY.RECV_REMAINING: > > REPLY.STRUCTURED_REPLY.CHECK: > > struct command *cmd = h->reply_cmd; > > uint16_t flags, type; > > - uint32_t length; > > + uint64_t length; > > + uint64_t offset = -1; > > (6) I disagree with initializing the local variable "offset" here. > > Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read > "offset" back if "extended_headers" is set. But if "extended_headers" is > set, we also store a value to "offset", before the read. > > Initializing "offset" to -1 suggests that the code might otherwise read > an indeterminate value from "offset" -- but that's not the case.I'll have to double-check; I thought that offset was read even for structured replies (where there really isn't an offset read from the wire), but my rebasing efforts may have changed that.> > > > > + assert (cmd); > > (7) What guarantees this? > > Is it the lookup code at the end of > REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY? > > But that code can set "reply_cmd" to NULL, in case the cookie is not > found; and the cookie lookup there defers to FINISH for diagnosing the > unassociated reply.Hmmm. Reading ahead, I see what you wrote in (9). This may have been something I copied from other states, but I'll have to double check whether it still makes sense.> > > flags = be16toh (h->sbuf.reply.hdr.structured.flags); > > type = be16toh (h->sbuf.reply.hdr.structured.type); > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + if (h->extended_headers) { > > + length = be64toh (h->sbuf.reply.hdr.extended.length); > > + offset = be64toh (h->sbuf.reply.hdr.extended.offset); > > + } > > + else > > + length = be32toh (h->sbuf.reply.hdr.structured.length); > > > > /* Reject a server that replies with too much information, but don't > > * reject a single structured reply to NBD_CMD_READ on the largest > > @@ -83,13 +96,18 @@ REPLY.STRUCTURED_REPLY.CHECK: > > * not worth keeping the connection alive. > > */ > > if (length > MAX_REQUEST_SIZE + sizeof h->sbuf.reply.payload.offset_data) { > > - set_error (0, "invalid server reply length %" PRIu32, length); > > + set_error (0, "invalid server reply length %" PRIu64, length); > > SET_NEXT_STATE (%.DEAD); > > return 0; > > } > > + /* For convenience, we now normalize extended replies into compact, > > + * doable since we validated that payload length fits in 32 bits. > > + */ > > + h->sbuf.reply.hdr.structured.length = length; > > (8) I'm very confused by this. For an extended reply, this will > overwrite the "offset" field.At one point, I considered normalizing in the opposite direction (take structured replies and widen them into the extended header form); it required touching more lines of code, so I didn't pursue it at the time. But I can see how compressing things down (intentionally throwing away information we will no longer reference) can be confusing without good comments, so at a minimum, I will be adding comments, if not outright going with the widening rather than narrowing approach.> > I understand we have stolen that field already, above; but that's little > solace, especially without specific comments. > > > > > /* Skip an unexpected structured reply, including to an unknown cookie. */ > > - if (cmd == NULL || !h->structured_replies) > > + if (cmd == NULL || !h->structured_replies || > > + (h->extended_headers && offset != cmd->offset)) > > goto resync; > > > > switch (type) { > > (9) Preserving the (cmd == NULL) sub-condition, plus the reference to > "unkown cookie" in the comment, looks like a logic bug to me: it can > never evaluate to "true", given the assert() I'm questioning above at > (7). > > Alternatively, the assert() is wrong.Off-hand, I think the assert is correct and I do have a dead condition here; but I'll have to convince myself of that. Since the condition is pre-existing, it may be worth a separate patch adding just the assertion and removing the 'cmd == NULL' check here, if it is correct.> > (10) The comment only talks about "unexpected structured reply", but the > condition now deals with extended headers too, and I don't know how > those relate to each other. > > What I'm trying to express is that > > (a) checking "structured_replies" *here*, but > > (b) checking "extended_headers" against magic numbers in > REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, > > at the same time, seems inconsistent. If we get a structured reply after > *not* having negotiated them, we should be able to catch that in the > exact same location -- i.e., where we check magic numbers -- where we > can also realize that we're getting an extended reply in spite of *not* > having negotiated *those*. > > In other words, the treatment differs, and I don't know why: if we get > an unexpected structured reply, we skip it and "resync", whereas if we > get an unexpected extended reply, we move to the DEAD state. > > I think it's fine to skip replies that refer to unknown cookies, but > *reply format* violations should be fatal.I think this all stems from me trying to be as generous as possible with bad servers: if we get a wrong reply type per the protocol but it is a reply type we know how to handle, AND we haven't consumed too many bytes for the size of that reply, then we can tolerate the server's bug. If it is the wrong type, and we already read too many bytes for what the right type would be, we can't (easily) undo the fact that we read too much. If we did not negotiate extended headers, we start by reading the length of a simple header; if the magic says it was instead an extended header, we can tolerate the server's mistake by reading the rest of that packet. But I definitely see your point about consolidating the decisions to be local to one state, rather than split across two files, and aiming for consistent handling. I'm liking your idea earlier in the series about reworking the state engine to FULLY parse the header (whether simple, structured, or extended) in states-reply.c, and only then move to states-reply-structured.c if the header was structured or extended, rather than splitting the header parse across two files.> > > Then, this is not a request for an update, just a comment: I don't > understand why the spec introduced the "offset" field at all. It does > not seem to carry additional information beyond the cookie. So we can > certainly check that the response's offset matches the command's offset > (the code looks OK), but the larger purpose is unclear. (And this is > probably also the reason why you get away with corrupting > "nbd_extended_reply.offset" at (8) -- the field is never again needed, > beyond the sanity check performed here.)The evolution of that field: in my first draft, I wanted power-of-2 sizing to the reply header (both request and reply having a header of exactly 32 bytes); this left dead space in the reply packet, which I originally tried to mandate to be zero-filled. But when looking at the types again, I noticed that if the 'offset' field occupies the same relative place in the request and reply struct, a server can rewrite the reply in the same memory as it had saved for the request (I don't know that I actually implemented it that way in qemu as server, but it is possible for some other server). There's also NBD_REPLY_TYPE_OFFSET_DATA and NBD_REPLY_TYPE_OFFSET_HOLE (used in response to NBD_CMD_READ), which each return an offset field in order to handle the case where a single client read request was split across multiple server replies. When we first specified structured replies, the spec was ambiguous: if I issue NBD_CMD_READ(length=1k, offset=0) but the server replies with the pair OFFSET_DATA(length=512, offset=0) and OFFSET_HOLE(length=512, offset=512), it's obvious how to reconstruct the original buffer (read into the first 512 bytes, memset the second 512 bytes). But if I issue NBD_CMD_READ(length=1k, offset=512), should the server reply OFFSET_DATA(length=512, offset=512) and OFFSET_HOLE(length=512, offset=1k) (that is, the reply offsets are absolute to the overall image), or with OFFSET_DATA(length=512, offset=0) and OFFSET_HOLE(length=512, offset=512) (that is, the reply offsets are relative to the start of the 1k buffer of the read request)? The initial implementations of qemu and nbdkit disagreed, and the spec ended up settling in favor of the former (see nbd.git commit 587ba722). As a result, when receiving an absolute offset in a data reply, the client has to place the data into buffer + (reply_offset - request_offset) (that is, convert the server's absolute offset back into a buffer-relative offset), which requires knowing what the reqeust offset was; having the server return this offset in extended replies may make it easier to implement a client that doesn't have to store its request offset separately. Since libnbd also handles non-extended headers (and has to hang on to the request offset for those), you are right that we end up merely validating that the server's reply offset was expected, and then throwing it away as it didn't add anything further than a validation that we haven't lost sync with the server.> > > @@ -168,7 +186,7 @@ REPLY.STRUCTURED_REPLY.RECV_ERROR: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > assert (length >= sizeof h->sbuf.reply.payload.error.error.error); > > assert (cmd); > > > > @@ -207,7 +225,7 @@ REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > msglen = be16toh (h->sbuf.reply.payload.error.error.len); > > type = be16toh (h->sbuf.reply.hdr.structured.type); > > > > @@ -307,7 +325,7 @@ REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > offset = be64toh (h->sbuf.reply.payload.offset_data.offset); > > > > assert (cmd); /* guaranteed by CHECK */ > > @@ -346,7 +364,7 @@ REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > offset = be64toh (h->sbuf.reply.payload.offset_data.offset); > > > > assert (cmd); /* guaranteed by CHECK */ > > @@ -426,7 +444,7 @@ REPLY.STRUCTURED_REPLY.RECV_BS_HEADER: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > > > assert (cmd); /* guaranteed by CHECK */ > > assert (cmd->type == NBD_CMD_BLOCK_STATUS); > > @@ -460,7 +478,7 @@ REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES: > > SET_NEXT_STATE (%.READY); > > return 0; > > case 0: > > - length = be32toh (h->sbuf.reply.hdr.structured.length); > > + length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */ > > > > assert (cmd); /* guaranteed by CHECK */ > > assert (cmd->type == NBD_CMD_BLOCK_STATUS); > > (11) This is all fallout from (8). The commit message does document it > (in the first paragraph), but I don't understand where we *benefit* from > stuffing "nbd_extended_reply.length" into "nbd_structured_reply.length".The benefit is that all later code only has to deal with nbd_structured_reply (rather than adding yet more 'if (h->extended_reply)' in all subsequent states). Conversely, if I had widened instead of narrowed, then all later code would only need to doea with nbd_extended_reply.> > Regarding downsides thereof, I can see two: > > - as I mentioned, "nbd_extended_reply.offset" becomes unusable, > > - "nbd_structured_reply.length" will no longer be big-endian but > host-endian, which arguably does not match the other fields' > endianness in the scratch space (= sbuf). > > I feel this back-stuffing brings the stashed header into an inconsistent > state that is specific to a subset of the automaton's states. > > I'd rather introduce an entirely new field to "sbuf.reply" -- between > "sbuf.reply.hdr" and "sbuf.reply.payload" --, if we really need to cache > a host-endian length value, regardless of which kind of reply header we > got.You may be on to something. Leaving the wire reply ALWAYS in network endian order, and adding a few bytes to nbd_reply to stash a normalized host-endian decoded value, would certainly be less confusing than having to figure out "which parts of my wire reply are still network-endian vs. natively converted". It may make the overall structure a few bytes larger, but that's probably in the noise (since our nbd_handle already accomodatese maximum-length NBD strings of 4k, adding up to another 32 bytes for fully-normalized host-endian form isn't bad). I'll keep that in mind for v4 (this is not the only place that needs normalization; handling 32- vs. 64-bit block status replies also needs normalization; the tradeoff between minimal storage by normalizing in place vs. maintainability by normalizing into a copy has interesting implications). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2023-Jun-07 14:55 UTC
[Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies
On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:> On 5/25/23 15:00, Eric Blake wrote: > > @@ -69,11 +75,18 @@ REPLY.STRUCTURED_REPLY.RECV_REMAINING: > > REPLY.STRUCTURED_REPLY.CHECK: > > struct command *cmd = h->reply_cmd; > > uint16_t flags, type; > > - uint32_t length; > > + uint64_t length; > > + uint64_t offset = -1; > > (6) I disagree with initializing the local variable "offset" here. > > Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read > "offset" back if "extended_headers" is set. But if "extended_headers" is > set, we also store a value to "offset", before the read. > > Initializing "offset" to -1 suggests that the code might otherwise read > an indeterminate value from "offset" -- but that's not the case.You may find that the compiler will give a warning. It's usually not good about dealing with the case where a variable being initialized + used depends on another variable being true. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top