Eric Blake
2019-Apr-25 18:01 UTC
[Libguestfs] [nbdkit PATCH v2 0/5] structured replies/.extents for nbd plugin
Updated based on other changes that have happened in the meantime: - rely more on cleanup.h (throughout) - split structured read for easier review (patch 2 and 3 were combined in v1) - rely on nbdkit not leaking a server's partial answer (patch 3) - add tests (patch 5) - other bug fixes I found while testing it - drop EOVERFLOW patch for now; it will be separate once upstream NBD protocol specification is clarified Eric Blake (5): nbd: Implement NBD_OPT_GO client request nbd: Refactor receive loop to prepare for structured replies nbd: Implement structured reads from server nbd: Implement .extents nbd: Test .extents plugins/nbd/nbd.c | 534 +++++++++++++++++++++++++++++++++++--- tests/Makefile.am | 3 + tests/test-nbd-extents.sh | 113 ++++++++ 3 files changed, 608 insertions(+), 42 deletions(-) create mode 100755 tests/test-nbd-extents.sh -- 2.20.1
Eric Blake
2019-Apr-25 18:01 UTC
[Libguestfs] [nbdkit PATCH v2 1/5] nbd: Implement NBD_OPT_GO client request
The NBD spec was recently patched (nbd.git commit 7827f3ae and friends) to recommend NBD_OPT_GO for baseline interoperability, with the aim of fewer servers and clients falling back to NBD_OPT_EXPORT_NAME. And since nbdkit as server recently started supporting NBD_OPT_GO (commit f7dd9799), our nbd client as plugin should take advantage of it, which involves learning how to do option haggling. The use of option haggling is also a prerequisite to teaching the nbd plugin about structured replies, which in turn will allow an implementation of .extents. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 200 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 182 insertions(+), 18 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 71c1376..7f40035 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -434,6 +434,157 @@ nbd_reply (struct handle *h, int fd) return err ? -1 : 0; } +/* Receive response to @option into @reply, and consume any + payload. If @payload is non-NULL, caller must free *payload. Return + 0 on success, or -1 if communication to server is no longer + possible. */ +static int +nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + void **payload) +{ + CLEANUP_FREE char *buffer = NULL; + + if (payload) + *payload = NULL; + if (read_full (h->fd, reply, sizeof *reply)) { + nbdkit_error ("unable to read option reply: %m"); + return -1; + } + reply->magic = be64toh (reply->magic); + reply->option = be32toh (reply->option); + reply->reply = be32toh (reply->reply); + reply->replylen = be32toh (reply->replylen); + if (reply->magic != NBD_REP_MAGIC || reply->option != option) { + nbdkit_error ("unexpected option reply"); + return -1; + } + if (reply->replylen) { + if (reply->reply == NBD_REP_ACK) { + nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, + reply->replylen); + return -1; + } + if (reply->replylen > 16 * 1024 * 1024) { + nbdkit_error ("option reply length is suspiciously large: %" PRId32, + reply->replylen); + return -1; + } + /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator + makes that string easier to use, without hurting other reply + types where buffer is not a string */ + buffer = malloc (reply->replylen + 1); + if (!buffer) { + nbdkit_error ("malloc: %m"); + return -1; + } + if (read_full (h->fd, buffer, reply->replylen)) { + nbdkit_error ("unable to read option reply payload: %m"); + return -1; + } + buffer[reply->replylen] = '\0'; + if (!payload) + nbdkit_debug ("ignoring option reply payload"); + else { + *payload = buffer; + buffer = NULL; + } + } + return 0; +} + +/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. + Return 1 if haggling completed, 0 if haggling failed but + NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ +static int +nbd_newstyle_haggle (struct handle *h) +{ + struct new_option opt; + uint32_t exportnamelen = htobe32 (strlen (export)); + /* For now, we make no NBD_INFO_* requests, relying on the server to + send its defaults. TODO: nbdkit should let plugins report block + sizes, at which point we should request NBD_INFO_BLOCK_SIZE and + obey any sizes set by server. */ + uint16_t nrinfos = htobe16 (0); + struct fixed_new_option_reply reply; + + /* TODO: structured reads, block status */ + + /* Try NBD_OPT_GO */ + nbdkit_debug ("trying NBD_OPT_GO"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_GO); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrinfos); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrinfos, sizeof nrinfos)) { + nbdkit_error ("unable to request NBD_OPT_GO: %m"); + return -1; + } + while (1) { + CLEANUP_FREE void *buffer; + struct fixed_new_option_reply_info_export *reply_export; + uint16_t info; + + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + return -1; + switch (reply.reply) { + case NBD_REP_INFO: + /* Parse payload, but ignore all except NBD_INFO_EXPORT */ + if (reply.replylen < 2) { + nbdkit_error ("NBD_REP_INFO reply too short"); + return -1; + } + memcpy (&info, buffer, sizeof info); + info = be16toh (info); + switch (info) { + case NBD_INFO_EXPORT: + if (reply.replylen != sizeof *reply_export) { + nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); + return -1; + } + reply_export = buffer; + h->size = be64toh (reply_export->exportsize); + h->flags = be16toh (reply_export->eflags); + break; + default: + nbdkit_debug ("ignoring server info %d", info); + } + break; + case NBD_REP_ACK: + /* End of replies, valid if server already sent NBD_INFO_EXPORT, + observable since h->flags must contain NBD_FLAG_HAS_FLAGS */ + assert (!buffer); + if (!h->flags) { + nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); + return -1; + } + nbdkit_debug ("NBD_OPT_GO complete"); + return 1; + case NBD_REP_ERR_UNSUP: + /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("server lacks NBD_OPT_GO support"); + return 0; + default: + /* Unexpected. Either the server sent a legitimate error or an + unexpected reply, but either way, we can't connect. */ + if (NBD_REP_IS_ERR (reply.reply)) + if (reply.replylen) + nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", + name_of_nbd_rep (reply.reply), (char *) buffer); + else + nbdkit_error ("server rejected NBD_OPT_GO with %s", + name_of_nbd_rep (reply.reply)); + else + nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", + name_of_nbd_rep (reply.reply)); + return -1; + } + } +} + /* Create the per-connection handle. */ static void * nbd_open (int readonly) @@ -473,6 +624,7 @@ nbd_open (int readonly) } version = be64toh (old.version); if (version == OLD_VERSION) { + nbdkit_debug ("trying oldstyle connection"); if (read_full (h->fd, (char *) &old + offsetof (struct old_handshake, exportsize), sizeof old - offsetof (struct old_handshake, exportsize))) { @@ -489,6 +641,7 @@ nbd_open (int readonly) struct new_handshake_finish finish; size_t expect; + nbdkit_debug ("trying newstyle connection"); if (read_full (h->fd, &gflags, sizeof gflags)) { nbdkit_error ("unable to read global flags: %m"); goto err; @@ -500,25 +653,36 @@ nbd_open (int readonly) goto err; } - /* For now, we don't do any option haggling, but go straight into - transmission phase */ - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_EXPORT_NAME); - opt.optlen = htobe32 (strlen (export)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, export, strlen (export))) { - nbdkit_error ("unable to request export '%s': %m", export); - goto err; + /* Prefer NBD_OPT_GO if possible */ + if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { + int rc = nbd_newstyle_haggle (h); + if (rc < 0) + goto err; + if (!rc) + goto export_name; } - expect = sizeof finish; - if (gflags & NBD_FLAG_NO_ZEROES) - expect -= sizeof finish.zeroes; - if (read_full (h->fd, &finish, expect)) { - nbdkit_error ("unable to read new handshake: %m"); - goto err; + else { + export_name: + /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_EXPORT_NAME); + opt.optlen = htobe32 (strlen (export)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, export, strlen (export))) { + nbdkit_error ("unable to request export '%s': %m", export); + goto err; + } + expect = sizeof finish; + if (gflags & NBD_FLAG_NO_ZEROES) + expect -= sizeof finish.zeroes; + if (read_full (h->fd, &finish, expect)) { + nbdkit_error ("unable to read new handshake: %m"); + goto err; + } + h->size = be64toh (finish.exportsize); + h->flags = be16toh (finish.eflags); } - h->size = be64toh (finish.exportsize); - h->flags = be16toh (finish.eflags); } else { nbdkit_error ("unexpected version %#" PRIx64, version); -- 2.20.1
Eric Blake
2019-Apr-25 18:01 UTC
[Libguestfs] [nbdkit PATCH v2 2/5] nbd: Refactor receive loop to prepare for structured replies
Lay the groundwork for handling structured replies, by refactoring the logic of handling a server reply to support different magic numbers, and to support deferring an error from an early chunk until the final structured chunk is received. Also, structured reads report absolute offsets, be we want to read with a relative offset into the buffer, so we have to track the original client offset of each transaction. However, this patch intentionally does not change existing semantic behavior (other than adding a sane error message when the server replies with unexpected magic) because it does not actually negotiate structured replies yet. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 88 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 68 insertions(+), 20 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 7f40035..1d3b2a3 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -128,7 +128,9 @@ struct transaction { int fds[2]; } u; void *buf; + uint64_t offset; uint32_t count; + uint32_t err; struct transaction *next; }; @@ -138,6 +140,7 @@ struct handle { int fd; int flags; int64_t size; + bool structured; pthread_t reader; /* Prevents concurrent threads from interleaving writes to server */ @@ -216,9 +219,10 @@ nbd_mark_dead (struct handle *h) return -1; } -/* Find and remove the transaction corresponding to cookie from the list. */ +/* Find and possibly remove the transaction corresponding to cookie + from the list. */ static struct transaction * -find_trans_by_cookie (struct handle *h, uint64_t cookie) +find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) { struct transaction **ptr; struct transaction *trans; @@ -230,7 +234,7 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie) break; ptr = &trans->next; } - if (trans) + if (trans && remove) *ptr = trans->next; return trans; } @@ -287,6 +291,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, } trans->buf = rep_buf; trans->count = rep_buf ? count : 0; + trans->offset = offset; { ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); if (h->dead) @@ -298,7 +303,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, } if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) return fd; - trans = find_trans_by_cookie (h, cookie); + trans = find_trans_by_cookie (h, cookie, true); err: err = errno; @@ -323,35 +328,77 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, /* Read a reply, and look up the fd corresponding to the transaction. Return the server's non-negative answer (converted to local errno - value) on success, or -1 on read failure. */ + value) on success, or -1 on read failure. If structured replies + were negotiated, fd is set to -1 if there are still more replies + expected. */ static int nbd_reply_raw (struct handle *h, int *fd) { - struct simple_reply rep; + union { + struct simple_reply simple; + struct structured_reply structured; + } rep; struct transaction *trans; - void *buf; + void *buf = NULL; uint32_t count; + int error = NBD_SUCCESS; + bool more = false; *fd = -1; - if (read_full (h->fd, &rep, sizeof rep) < 0) + /* magic and handle overlap between simple and structured replies */ + if (read_full (h->fd, &rep, sizeof rep.simple)) return nbd_mark_dead (h); - if (be32toh (rep.magic) != NBD_SIMPLE_REPLY_MAGIC) + rep.simple.magic = be32toh (rep.simple.magic); + switch (rep.simple.magic) { + case NBD_SIMPLE_REPLY_MAGIC: + nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", + rep.simple.handle, + name_of_nbd_error(be32toh (rep.simple.error))); + error = be32toh (rep.simple.error); + break; + case NBD_STRUCTURED_REPLY_MAGIC: + if (!h->structured) { + nbdkit_error ("structured response without negotiation"); + return nbd_mark_dead (h); + } + /* TODO - set 'more' based on NBD_REPLY_FLAG_DONE, parse the + various reply types, etc. */ + abort (); + + default: + nbdkit_error ("received unexpected magic in reply: %#" PRIx32, + rep.simple.magic); return nbd_mark_dead (h); - nbdkit_debug ("received reply for cookie %#" PRIx64 ", status %s", - rep.handle, name_of_nbd_error(be32toh (rep.error))); - trans = find_trans_by_cookie (h, rep.handle); + } + + trans = find_trans_by_cookie (h, rep.simple.handle, !more); if (!trans) { - nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.handle); + nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); return nbd_mark_dead (h); } - *fd = trans->u.fds[1]; buf = trans->buf; count = trans->count; - free (trans); - switch (be32toh (rep.error)) { + if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { + nbdkit_error ("simple read reply when structured was expected"); + return nbd_mark_dead (h); + } + + /* Thanks to structured replies, we must preserve an error in any + earlier chunk for replay during the final chunk. */ + if (!more) { + *fd = trans->u.fds[1]; + if (!error) + error = trans->err; + free (trans); + } + else if (error && !trans->err) + trans->err = error; + + /* Convert from wire value to local errno, and perform any final read */ + switch (error) { case NBD_SUCCESS: - if (buf && read_full (h->fd, buf, count) < 0) + if (buf && read_full (h->fd, buf, count)) return nbd_mark_dead (h); return 0; case NBD_EPERM: @@ -361,8 +408,7 @@ nbd_reply_raw (struct handle *h, int *fd) case NBD_ENOMEM: return ENOMEM; default: - nbdkit_debug ("unexpected error %d, squashing to EINVAL", - be32toh (rep.error)); + nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); /* fallthrough */ case NBD_EINVAL: return EINVAL; @@ -386,7 +432,9 @@ nbd_reader (void *handle) r = nbd_reply_raw (h, &fd); if (r >= 0) { - if (write (fd, &r, sizeof r) != sizeof r) { + if (fd < 0) + nbdkit_debug ("partial reply handled, waiting for final reply"); + else if (write (fd, &r, sizeof r) != sizeof r) { nbdkit_error ("failed to write pipe: %m"); abort (); } -- 2.20.1
Eric Blake
2019-Apr-25 18:01 UTC
[Libguestfs] [nbdkit PATCH v2 3/5] nbd: Implement structured reads from server
Time to enhance the nbd plugin to request structured replies from the server. For now, deal only with structured reads. The server can now return sparse reads, even though we need to design nbdkit version 3 before we can in turn return sparse reads back to the client. In general, we have to assume the server is malicious, and so we must sanity check that it sends replies we expect, and kill the connection if we detect protocol non-compliance that may affect our ability to further communicate with the server. However, we have a choice between either implementing bookkeeping to ensure that the server sends exactly as many non-overlapping chunks as needed to cover the entire read request, or else assuming that the server was compliant. Since nbdkit was just recently patched to ensure that a plugin that does not make a full reply will merely be exposing data used in a previous read or write command, we don't bother to pre-wipe the buffer or add any reconstruction bookkeeping (if our assumption about a compliant server is wrong, the client may notice confusing results because of the stale data being replayed at server-omitted offsets, but the data corruption fault then lies in the non-compliant server). Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 140 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 136 insertions(+), 4 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 1d3b2a3..b2b3075 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -340,9 +340,14 @@ nbd_reply_raw (struct handle *h, int *fd) } rep; struct transaction *trans; void *buf = NULL; + CLEANUP_FREE char *payload = NULL; uint32_t count; int error = NBD_SUCCESS; bool more = false; + uint32_t len = 0; /* 0 except for structured reads */ + uint64_t offset = 0; /* if len, absolute offset of structured read chunk */ + bool zero = false; /* if len, whether to read or memset */ + uint16_t errlen; *fd = -1; /* magic and handle overlap between simple and structured replies */ @@ -361,9 +366,100 @@ nbd_reply_raw (struct handle *h, int *fd) nbdkit_error ("structured response without negotiation"); return nbd_mark_dead (h); } - /* TODO - set 'more' based on NBD_REPLY_FLAG_DONE, parse the - various reply types, etc. */ - abort (); + if (read_full (h->fd, sizeof rep.simple + (char *) &rep, + sizeof rep - sizeof rep.simple)) + return nbd_mark_dead (h); + rep.structured.flags = be16toh (rep.structured.flags); + rep.structured.type = be16toh (rep.structured.type); + rep.structured.length = be32toh (rep.structured.length); + nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 + ", payload length %" PRId32, + name_of_nbd_reply_type(rep.structured.type), + rep.structured.handle, rep.structured.length); + if (rep.structured.length > 64 * 1024 * 1024) { + nbdkit_error ("structured reply length is suspiciously large: %" PRId32, + rep.structured.length); + return nbd_mark_dead (h); + } + if (rep.structured.length) { + /* Special case for OFFSET_DATA in order to read tail of chunk + directly into final buffer later on */ + len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && + rep.structured.length > sizeof offset) ? sizeof offset : + rep.structured.length; + payload = malloc (len); + if (!payload) { + nbdkit_error ("reading structured reply payload: %m"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, payload, len)) + return nbd_mark_dead (h); + len = 0; + } + more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); + switch (rep.structured.type) { + case NBD_REPLY_TYPE_NONE: + if (rep.structured.length) { + nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); + return nbd_mark_dead (h); + } + if (more) { + nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); + return nbd_mark_dead (h); + } + break; + case NBD_REPLY_TYPE_OFFSET_DATA: + if (rep.structured.length <= sizeof offset) { + nbdkit_error ("structured reply OFFSET_DATA too small"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + len = rep.structured.length - sizeof offset; + break; + case NBD_REPLY_TYPE_OFFSET_HOLE: + if (rep.structured.length != sizeof offset + sizeof len) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + memcpy (&len, payload, sizeof len); + len = be32toh (len); + if (!len) { + nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); + return nbd_mark_dead (h); + } + zero = true; + break; + default: + if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { + nbdkit_error ("received unexpected structured reply %s", + name_of_nbd_reply_type(rep.structured.type)); + return nbd_mark_dead (h); + } + + if (rep.structured.length < sizeof error + sizeof errlen) { + nbdkit_error ("structured reply error size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&errlen, payload + sizeof error, sizeof errlen); + errlen = be16toh (errlen); + if (errlen > rep.structured.length - sizeof error - sizeof errlen) { + nbdkit_error ("structured reply error message size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&error, payload, sizeof error); + error = be32toh (error); + if (errlen) + nbdkit_debug ("received structured error %s with message: %.*s", + name_of_nbd_error (error), (int) errlen, + payload + sizeof error + sizeof errlen); + else + nbdkit_debug ("received structured error %s without message", + name_of_nbd_error (error)); + } + break; default: nbdkit_error ("received unexpected magic in reply: %#" PRIx32, @@ -383,6 +479,24 @@ nbd_reply_raw (struct handle *h, int *fd) nbdkit_error ("simple read reply when structured was expected"); return nbd_mark_dead (h); } + if (len) { + if (!buf) { + nbdkit_error ("structured read response to a non-read command"); + return nbd_mark_dead (h); + } + if (offset < trans->offset || offset > INT64_MAX || + offset + len > trans->offset + count) { + nbdkit_error ("structured read reply with unexpected offset/length"); + return nbd_mark_dead (h); + } + buf = (char *) buf + offset - trans->offset; + if (zero) { + memset (buf, 0, len); + buf = NULL; + } + else + count = len; + } /* Thanks to structured replies, we must preserve an error in any earlier chunk for replay during the final chunk. */ @@ -556,7 +670,25 @@ nbd_newstyle_haggle (struct handle *h) uint16_t nrinfos = htobe16 (0); struct fixed_new_option_reply reply; - /* TODO: structured reads, block status */ + nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); + opt.optlen = htobe32 (0); + if (write_full (h->fd, &opt, sizeof opt)) { + nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_ACK) { + nbdkit_debug ("structured replies enabled"); + h->structured = true; + /* TODO: block status */ + } + else { + nbdkit_debug ("structured replies disabled"); + } /* Try NBD_OPT_GO */ nbdkit_debug ("trying NBD_OPT_GO"); -- 2.20.1
Eric Blake
2019-Apr-25 18:01 UTC
[Libguestfs] [nbdkit PATCH v2 4/5] nbd: Implement .extents
Add support for calling NBD_CMD_BLOCK_STATUS on the remote server when it is supported. I could have parsed that the server's id response to NBD_OPT_SET_META_CONTEXT is the same as what later appears in NBD_REPLY_TYPE_BLOCK_STATUS, but didn't think it worth the effort as long as we expect exactly one meta context. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 118 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 112 insertions(+), 6 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index b2b3075..fd1855e 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -131,6 +131,7 @@ struct transaction { uint64_t offset; uint32_t count; uint32_t err; + struct nbdkit_extents *extents; struct transaction *next; }; @@ -141,6 +142,7 @@ struct handle { int flags; int64_t size; bool structured; + bool extents; pthread_t reader; /* Prevents concurrent threads from interleaving writes to server */ @@ -270,7 +272,7 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, static int nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf) + void *rep_buf, struct nbdkit_extents *extents) { int err; struct transaction *trans; @@ -292,6 +294,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, trans->buf = rep_buf; trans->count = rep_buf ? count : 0; trans->offset = offset; + trans->extents = extents; { ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); if (h->dead) @@ -323,7 +326,7 @@ static int nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, uint32_t count) { - return nbd_request_full (h, flags, type, offset, count, NULL, NULL); + return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); } /* Read a reply, and look up the fd corresponding to the transaction. @@ -342,6 +345,9 @@ nbd_reply_raw (struct handle *h, int *fd) void *buf = NULL; CLEANUP_FREE char *payload = NULL; uint32_t count; + uint32_t id; + struct block_descriptor *extents = NULL; + size_t nextents = 0; int error = NBD_SUCCESS; bool more = false; uint32_t len = 0; /* 0 except for structured reads */ @@ -432,6 +438,23 @@ nbd_reply_raw (struct handle *h, int *fd) } zero = true; break; + case NBD_REPLY_TYPE_BLOCK_STATUS: + if (!h->extents) { + nbdkit_error ("block status response without negotiation"); + return nbd_mark_dead (h); + } + if (rep.structured.length < sizeof *extents || + rep.structured.length % sizeof *extents != sizeof id) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + nextents = rep.structured.length / sizeof *extents; + extents = (struct block_descriptor *) &payload[sizeof id]; + memcpy (&id, payload, sizeof id); + id = be32toh (id); + nbdkit_debug ("parsing %zu extents for context id %" PRId32, + nextents, id); + break; default: if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { nbdkit_error ("received unexpected structured reply %s", @@ -475,6 +498,23 @@ nbd_reply_raw (struct handle *h, int *fd) buf = trans->buf; count = trans->count; + if (nextents) { + if (!trans->extents) { + nbdkit_error ("block status response to a non-status command"); + return nbd_mark_dead (h); + } + offset = trans->offset; + for (size_t i = 0; i < nextents; i++) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (trans->extents, offset, + be32toh (extents[i].length), + be32toh (extents[i].status_flags)) == -1) { + error = errno; + break; + } + offset += be32toh (extents[i].length); + } + } if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { nbdkit_error ("simple read reply when structured was expected"); return nbd_mark_dead (h); @@ -661,8 +701,11 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, static int nbd_newstyle_haggle (struct handle *h) { + const char *const query = "base:allocation"; struct new_option opt; uint32_t exportnamelen = htobe32 (strlen (export)); + uint32_t nrqueries = htobe32 (1); + uint32_t querylen = htobe32 (strlen (query)); /* For now, we make no NBD_INFO_* requests, relying on the server to send its defaults. TODO: nbdkit should let plugins report block sizes, at which point we should request NBD_INFO_BLOCK_SIZE and @@ -682,9 +725,47 @@ nbd_newstyle_haggle (struct handle *h) NULL) < 0) return -1; if (reply.reply == NBD_REP_ACK) { - nbdkit_debug ("structured replies enabled"); + nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); h->structured = true; - /* TODO: block status */ + + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrqueries + sizeof querylen + strlen (query)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrqueries, sizeof nrqueries) || + write_full (h->fd, &querylen, sizeof querylen) || + write_full (h->fd, query, strlen (query))) { + nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_META_CONTEXT) { + /* Cheat: we asked for exactly one context. We could double + check that the server is replying with exactly the + "base:allocation" context, and then remember the id it tells + us to later confirm that responses to NBD_CMD_BLOCK_STATUS + match up; but in the absence of multiple contexts, it's + easier to just assume the server is compliant, and will reuse + the same id, without bothering to check further. */ + nbdkit_debug ("extents enabled"); + h->extents = true; + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + } + if (reply.reply != NBD_REP_ACK) { + if (h->extents) { + nbdkit_error ("unexpected response to set meta context"); + return -1; + } + nbdkit_debug ("ignoring meta context response %s", + name_of_nbd_rep (reply.reply)); + } } else { nbdkit_debug ("structured replies disabled"); @@ -979,6 +1060,14 @@ nbd_can_multi_conn (void *handle) return h->flags & NBD_FLAG_CAN_MULTI_CONN; } +static int +nbd_can_extents (void *handle) +{ + struct handle *h = handle; + + return h->extents; +} + /* Read data from the file. */ static int nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -988,7 +1077,7 @@ nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, int c; assert (!flags); - c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf); + c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); return c < 0 ? c : nbd_reply (h, c); } @@ -1002,7 +1091,7 @@ nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, assert (!(flags & ~NBDKIT_FLAG_FUA)); c = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL); + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); return c < 0 ? c : nbd_reply (h, c); } @@ -1050,6 +1139,21 @@ nbd_flush (void *handle, uint32_t flags) return c < 0 ? c : nbd_reply (h, c); } +/* Read extents of the file. */ +static int +nbd_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + int c; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); + c = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, + NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, + extents); + return c < 0 ? c : nbd_reply (h, c); +} + static struct nbdkit_plugin plugin = { .name = "nbd", .longname = "nbdkit nbd plugin", @@ -1068,11 +1172,13 @@ static struct nbdkit_plugin plugin = { .can_zero = nbd_can_zero, .can_fua = nbd_can_fua, .can_multi_conn = nbd_can_multi_conn, + .can_extents = nbd_can_extents, .pread = nbd_pread, .pwrite = nbd_pwrite, .zero = nbd_zero, .flush = nbd_flush, .trim = nbd_trim, + .extents = nbd_extents, .errno_is_preserved = 1, }; -- 2.20.1
Borrows heavily from the existing test-truncate-extents.sh (basically, adding another layer of nbdkit to prove that the nbd plugin doesn't change anything from what the truncate filter is already tested to do). Signed-off-by: Eric Blake <eblake@redhat.com> --- tests/Makefile.am | 3 + tests/test-nbd-extents.sh | 113 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100755 tests/test-nbd-extents.sh diff --git a/tests/Makefile.am b/tests/Makefile.am index 08bdf85..4b7aa22 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -79,6 +79,7 @@ EXTRA_DIST = \ test.lua \ test-memory-largest.sh \ test-memory-largest-for-qemu.sh \ + test-nbd-extents.sh \ test-nozero.sh \ test-null-extents.sh \ test_ocaml_plugin.ml \ @@ -530,6 +531,8 @@ TESTS += \ # nbd plugin test. LIBGUESTFS_TESTS += test-nbd +TESTS += \ + test-nbd-extents.sh test_nbd_SOURCES = test-nbd.c test.h test_nbd_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) diff --git a/tests/test-nbd-extents.sh b/tests/test-nbd-extents.sh new file mode 100755 index 0000000..76f4f47 --- /dev/null +++ b/tests/test-nbd-extents.sh @@ -0,0 +1,113 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Unfortunately the output of this test depends on the PAGE_SIZE +# defined in common/sparse/sparse.c and would change (breaking the +# test) if we ever changed that definition. + +source ./functions.sh +set -e +set -x + +requires jq --version +requires qemu-img --version +requires qemu-img map --help + +out="test-nbd-extents.out" +expected="test-nbd-extents.expected" +socket="test-nbd-extents.sock" +pid1="test-nbd-extents.pid1" +pid2="test-nbd-extents.pid2" +pid3="test-nbd-extents.pid3" +pid4="test-nbd-extents.pid4" +pid5="test-nbd-extents.pid5" +files="$out $expected $socket $pid1 $pid2 $pid3 $pid4 $pid5" +rm -f $files +cleanup_fn rm -f $files + +do_test () +{ + start_nbdkit -P "$4" -U "$socket" \ + --filter=truncate \ + data data="$1" size="$2" \ + truncate="$3" + # We use jq to normalize the output and convert it to plain text. + nbdkit -U - nbd socket="$socket" \ + --run 'qemu-img map -f raw --output=json $nbd' | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > $out + rm -f "$socket" + if ! cmp $out $expected; then + echo "$0: output did not match expected data" + echo "expected:" + cat $expected + echo "output:" + cat $out + exit 1 + fi +} + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":65536,"data":false,"zero":true} +EOF +do_test "" 1M 65536 "$pid1" + +# Completely allocated disk. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":false} +EOF +do_test "1 @32768 1 @65536 1 @98304 1" 128K 32768 "$pid2" + +#---------------------------------------------------------------------- +# The above are the easy cases. Now let's truncate to a larger +# size which should create a hole at the end. + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":1048576,"data":false,"zero":true} +EOF +do_test "" 65536 1M "$pid3" + +# Completely allocated disk. +cat > $expected <<'EOF' +{"start":0,"length":512,"data":true,"zero":false} +{"start":512,"length":1048064,"data":false,"zero":true} +EOF +do_test "1" 512 1M "$pid4" + +# Zero-length plugin. Unlike nbdkit-zero-plugin, the data plugin +# advertises extents and so will behave differently. +cat > $expected <<'EOF' +{"start":0,"length":0,"data":false,"zero":false} +EOF +do_test "" 0 0 "$pid5" -- 2.20.1
Richard W.M. Jones
2019-Apr-26 16:12 UTC
Re: [Libguestfs] [nbdkit PATCH v2 5/5] nbd: Test .extents
ACK series. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Possibly Parallel Threads
- [PATCH nbdkit] tests: Cancel trap in cleanup function to avoid recursive traps.
- [PATCH v2 nbdkit 5/5] tests: Add a helper function which waits for nbdkit to start up.
- [PATCH v2 nbdkit 4/5] tests: Use a generic cleanup mechanism instead of explicit trap.
- [nbdkit PATCH] plugins: Add .can_zero callback
- [nbdkit PATCH v2 0/5] structured replies/.extents for nbd plugin