Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 0/7] Implement structured replies in nbd plugin
I'm hoping to implement .extents for the nbd plugin; this is a prerequisite. I'm not sure about patch 3 - if we like it, I'll squash it to 2, if we don't, I think we are okay just dropping it. I'm also wondering if we have to worry about malicious plugins that don't populate the entire .pread buffer in an effort to get nbdkit to expose portions of the heap; my patch 7 loses some time on extra memset()s to avoid that, although it's probably in the noise compared to the time spent talking over sockets to the remote server. Eric Blake (7): protocol: Send correct structured reply error for non-Linux protocol: Support EOVERFLOW RFC: protocol: Only send EOVERFLOW when valid nbd: Honor server global flags on little-endian protocol: Add helpers for error response handling nbd: Implement NBD_OPT_GO client request nbd: Implement structured replies common/protocol/protocol.h | 21 +- plugins/nbd/nbd.c | 434 +++++++++++++++++++++++++++++++++---- server/protocol.c | 16 +- 3 files changed, 422 insertions(+), 49 deletions(-) -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 1/7] protocol: Send correct structured reply error for non-Linux
Commit f141228d introduced a mapping from errno values to NBD protocol error numbers, to match the fact that non-Linux systems do not necessarily share the same errno values. But we failed to honor that mapping when we added structured replies. Fixes: eaa4c6e9 Signed-off-by: Eric Blake <eblake@redhat.com> --- server/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/protocol.c b/server/protocol.c index 3d5668f..212bed0 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -591,7 +591,7 @@ send_structured_reply_error (struct connection *conn, } /* Send the error. */ - error_data.error = htobe32 (error); + error_data.error = htobe32 (nbd_errno (error)); error_data.len = htobe16 (0); r = conn->send (conn, &error_data, sizeof error_data); if (r == -1) { -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 2/7] protocol: Support EOVERFLOW
When the NBD spec added structured replies, it also added the EOVERFLOW error as valid for an NBD_CMD_FLAG_DF request larger than 64k if the server can't support that without fragmentation. Once the nbd plugin supports structured replies, we will need to properly pass that error through to the end client (even though our current .pread interface implies we are unlikely to ever generate the error locally). The protocol says that EOVERFLOW is only valid if the client actually sent NBD_CMD_FLAG_DF (which in turn requires that the client negotiated NBD_OPT_STRUCTURED_REPLY), so if we wanted, we could add some logic to continue to squash all other cases of EOVERFLOW to EINVAL to remain protcol-compliant; but I figured it was not worth the complications of ensuring that plugins are not setting that particular error when it is not appropriate. Signed-off-by: Eric Blake <eblake@redhat.com> --- common/protocol/protocol.h | 1 + plugins/nbd/nbd.c | 2 ++ server/protocol.c | 2 ++ 3 files changed, 5 insertions(+) diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h index 57ce3d8..94dd6c5 100644 --- a/common/protocol/protocol.h +++ b/common/protocol/protocol.h @@ -230,6 +230,7 @@ extern const char *name_of_nbd_error (int); #define NBD_ENOMEM 12 #define NBD_EINVAL 22 #define NBD_ENOSPC 28 +#define NBD_EOVERFLOW 75 #define NBD_ESHUTDOWN 108 #endif /* NBDKIT_PROTOCOL_H */ diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index ae15eb7..72e833c 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -385,6 +385,8 @@ nbd_reply_raw (struct handle *h, int *fd) return EINVAL; case NBD_ENOSPC: return ENOSPC; + case NBD_EOVERFLOW: + return EOVERFLOW; case NBD_ESHUTDOWN: return ESHUTDOWN; } diff --git a/server/protocol.c b/server/protocol.c index 212bed0..a52bb56 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -348,6 +348,8 @@ nbd_errno (int error) case ESHUTDOWN: return NBD_ESHUTDOWN; #endif + case EOVERFLOW: + return NBD_EOVERFLOW; case EINVAL: default: return NBD_EINVAL; -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid
Previously, we were squashing EOVERFLOW into EINVAL; continue to do so at points in the protocol where the client may not be expecting EOVERFLOW. Signed-off-by: Eric Blake <eblake@redhat.com> --- server/protocol.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/server/protocol.c b/server/protocol.c index a52bb56..0a9f73c 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -326,7 +326,7 @@ skip_over_write_buffer (int sock, size_t count) /* Convert a system errno to an NBD_E* error code. */ static int -nbd_errno (int error) +nbd_errno (int error, bool flag_df) { switch (error) { case 0: @@ -349,7 +349,9 @@ nbd_errno (int error) return NBD_ESHUTDOWN; #endif case EOVERFLOW: - return NBD_EOVERFLOW; + if (flag_df) + return NBD_EOVERFLOW; + /* fallthrough */ case EINVAL: default: return NBD_EINVAL; @@ -368,7 +370,7 @@ send_simple_reply (struct connection *conn, reply.magic = htobe32 (NBD_SIMPLE_REPLY_MAGIC); reply.handle = handle; - reply.error = htobe32 (nbd_errno (error)); + reply.error = htobe32 (nbd_errno (error, false)); r = conn->send (conn, &reply, sizeof reply); if (r == -1) { @@ -573,7 +575,8 @@ send_structured_reply_block_status (struct connection *conn, static int send_structured_reply_error (struct connection *conn, - uint64_t handle, uint16_t cmd, uint32_t error) + uint64_t handle, uint16_t cmd, uint16_t flags, + uint32_t error) { ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock); struct structured_reply reply; @@ -593,7 +596,7 @@ send_structured_reply_error (struct connection *conn, } /* Send the error. */ - error_data.error = htobe32 (nbd_errno (error)); + error_data.error = htobe32 (nbd_errno (error, flags & NBD_CMD_FLAG_DF)); error_data.len = htobe16 (0); r = conn->send (conn, &error_data, sizeof error_data); if (r == -1) { @@ -737,7 +740,8 @@ protocol_recv_request_send_reply (struct connection *conn) extents); } else - return send_structured_reply_error (conn, request.handle, cmd, error); + return send_structured_reply_error (conn, request.handle, cmd, flags, + error); } else return send_simple_reply (conn, request.handle, cmd, buf, count, error); -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 4/7] nbd: Honor server global flags on little-endian
In the nbd plugin, I forgot to swap the server's global flags into host endian order before checking which flags the server had set. As a result, on little-endian machines, we were always replying with cflags of 0 instead of the intended mirroring of the supported server flags, although fortunately it made no real difference up to now (we were never sending any option other than NBD_OPT_EXPORT_NAME, so not mirroring NBD_FLAG_FIXED_NEWSTYLE had no effect; not mirroring NBD_FLAG_NO_ZEROES merely means a longer handshake; and the NBD protocol does not yet define flag 0x100 or 0x200 to confuse us). However, once we add code for NBD_OPT_GO, it becomes essential to get it right. Fixes: 0e8e8eb1 Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 72e833c..14e6806 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -512,6 +512,7 @@ nbd_open (int readonly) nbdkit_error ("unable to read global flags: %m"); goto err; } + gflags = be16toh (gflags); cflags = htobe32(gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); if (write_full (h->fd, &cflags, sizeof cflags)) { nbdkit_error ("unable to return global flags: %m"); -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 5/7] protocol: Add helpers for error response handling
The NBD protocol specifically sets a witness bit in NBD_REP_ERR_* and NBD_REPLY_TYPE_ERROR_* to make it possible to recognize even unknown error messages at least to the point of being able to parse out the server's optional message as a debugging aid. Make it easier to declare new errors, as well as adding *_IS_ERR macros that the nbd plugin will use in later patches for handling arbitrary server errors. Signed-off-by: Eric Blake <eblake@redhat.com> --- common/protocol/protocol.h | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h index 94dd6c5..c27104c 100644 --- a/common/protocol/protocol.h +++ b/common/protocol/protocol.h @@ -108,16 +108,19 @@ extern const char *name_of_nbd_opt (int); #define NBD_OPT_LIST_META_CONTEXT 9 #define NBD_OPT_SET_META_CONTEXT 10 +#define NBD_REP_ERR(val) (0x80000000 | (val)) +#define NBD_REP_IS_ERR(val) (!!((val) & 0x80000000)) + extern const char *name_of_nbd_rep (int); #define NBD_REP_ACK 1 #define NBD_REP_SERVER 2 #define NBD_REP_INFO 3 #define NBD_REP_META_CONTEXT 4 -#define NBD_REP_ERR_UNSUP 0x80000001 -#define NBD_REP_ERR_POLICY 0x80000002 -#define NBD_REP_ERR_INVALID 0x80000003 -#define NBD_REP_ERR_PLATFORM 0x80000004 -#define NBD_REP_ERR_TLS_REQD 0x80000005 +#define NBD_REP_ERR_UNSUP NBD_REP_ERR (1) +#define NBD_REP_ERR_POLICY NBD_REP_ERR (2) +#define NBD_REP_ERR_INVALID NBD_REP_ERR (3) +#define NBD_REP_ERR_PLATFORM NBD_REP_ERR (4) +#define NBD_REP_ERR_TLS_REQD NBD_REP_ERR (5) extern const char *name_of_nbd_info (int); #define NBD_INFO_EXPORT 0 @@ -195,14 +198,17 @@ struct structured_reply_error { extern const char *name_of_nbd_reply_flag (int); #define NBD_REPLY_FLAG_DONE (1<<0) +#define NBD_REPLY_TYPE_ERR(val) ((1<<15) | (val)) +#define NBD_REPLY_TYPE_IS_ERR(val) (!!((val) & (1<<15))) + /* Structured reply types. */ extern const char *name_of_nbd_reply_type (int); #define NBD_REPLY_TYPE_NONE 0 #define NBD_REPLY_TYPE_OFFSET_DATA 1 #define NBD_REPLY_TYPE_OFFSET_HOLE 2 #define NBD_REPLY_TYPE_BLOCK_STATUS 5 -#define NBD_REPLY_TYPE_ERROR ((1<<15) + 1) -#define NBD_REPLY_TYPE_ERROR_OFFSET ((1<<15) + 2) +#define NBD_REPLY_TYPE_ERROR NBD_REPLY_TYPE_ERR (1) +#define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_TYPE_ERR (2) /* NBD commands. */ extern const char *name_of_nbd_cmd (int); -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request
The NBD spec was recently patched (nbd.git commit 7827f3ae and friends) to require NBD_OPT_GO for baseline interoperability, with the aim of fewer servers and clients falling back to NBD_OPT_EXPORT_NAME. And since nbdkit as server recently started supporting NBD_OPT_GO (commit f7dd9799), our nbd client as plugin should take advantage of it. This patch is a prerequisite to teaching the nbd plugin about structured replies, which in turn will allow an implementation of .extents. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 201 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 183 insertions(+), 18 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 14e6806..313bf59 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -453,6 +453,158 @@ nbd_reply (struct handle *h, int fd) return err ? -1 : 0; } +static int +nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + char **payload) +{ + char *buffer; + + if (payload) + *payload = NULL; + if (read_full (h->fd, reply, sizeof *reply)) { + nbdkit_error ("unable to read option reply: %m"); + return -1; + } + reply->magic = be64toh (reply->magic); + reply->option = be32toh (reply->option); + reply->reply = be32toh (reply->reply); + reply->replylen = be32toh (reply->replylen); + if (reply->magic != NBD_REP_MAGIC || reply->option != option) { + nbdkit_error ("unexpected option reply"); + return -1; + } + if (reply->replylen) { + if (reply->reply == NBD_REP_ACK) { + nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, + reply->replylen); + return -1; + } + if (reply->replylen > 16 * 1024 * 1024) { + nbdkit_error ("option reply length is suspiciously large: %" PRId32, + reply->replylen); + return -1; + } + /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator + makes that string easier to use, without hurting other reply + types where buffer is not a string */ + buffer = malloc (reply->replylen + 1); + if (!buffer) { + nbdkit_error ("malloc: %m"); + return -1; + } + if (read_full (h->fd, buffer, reply->replylen)) { + nbdkit_error ("unable to read option reply payload: %m"); + free (buffer); + return -1; + } + buffer[reply->replylen] = '\0'; + if (!payload) { + nbdkit_debug ("ignoring option reply payload"); + free (buffer); + } + else + *payload = buffer; + } + return 0; +} + +/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. + Return 1 if haggling completed, 0 if haggling failed but + NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ +static int +nbd_newstyle_haggle (struct handle *h) +{ + struct new_option opt; + uint32_t exportnamelen = htobe32 (strlen (export)); + /* For now, we make no NBD_INFO_* requests, relying on the server to + send its defaults. TODO: nbdkit should let plugins report block + sizes, at which point we should request NBD_INFO_BLOCK_SIZE and + obey any sizes set by server. */ + uint16_t nrinfos = htobe16 (0); + struct fixed_new_option_reply reply; + struct fixed_new_option_reply_info_export reply_export; + char *buffer; + uint16_t info; + + /* TODO: structured reads, block status */ + + /* Try NBD_OPT_GO */ + nbdkit_debug ("trying NBD_OPT_GO"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_GO); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrinfos); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrinfos, sizeof nrinfos)) { + nbdkit_error ("unable to request NBD_OPT_GO: %m"); + return -1; + } + while (1) { + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + return -1; + switch (reply.reply) { + case NBD_REP_INFO: + /* Parse payload, but ignore all except NBD_INFO_EXPORT */ + if (reply.replylen < 2) { + nbdkit_error ("NBD_REP_INFO reply too short"); + free (buffer); + return -1; + } + memcpy (&info, buffer, sizeof info); + info = be16toh (info); + switch (info) { + case NBD_INFO_EXPORT: + if (reply.replylen != sizeof reply_export) { + nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); + free (buffer); + return -1; + } + memcpy (&reply_export, buffer, sizeof reply_export); + h->size = be64toh (reply_export.exportsize); + h->flags = be16toh (reply_export.eflags); + break; + default: + nbdkit_debug ("ignoring server info %d", info); + } + free (buffer); + break; + case NBD_REP_ACK: + /* End of replies, valid if server already sent NBD_INFO_EXPORT, + which is observable since h->flags will be set */ + assert (!buffer); + if (!h->flags) { + nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); + return -1; + } + nbdkit_debug ("NBD_OPT_GO complete"); + return 1; + case NBD_REP_ERR_UNSUP: + /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("server lacks NBD_OPT_GO support"); + free (buffer); + return 0; + default: + /* Unexpected. Either the server sent a legitimate error or an + unexpected reply, but either way, we can't connect. */ + if (NBD_REP_IS_ERR (reply.reply)) + if (reply.replylen) + nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", + name_of_nbd_rep (reply.reply), buffer); + else + nbdkit_error ("server rejected NBD_OPT_GO with %s", + name_of_nbd_rep (reply.reply)); + else + nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", + name_of_nbd_rep (reply.reply)); + free (buffer); + return -1; + } + } +} + /* Create the per-connection handle. */ static void * nbd_open (int readonly) @@ -492,6 +644,7 @@ nbd_open (int readonly) } version = be64toh (old.version); if (version == OLD_VERSION) { + nbdkit_debug ("trying oldstyle connection"); if (read_full (h->fd, (char *) &old + offsetof (struct old_handshake, exportsize), sizeof old - offsetof (struct old_handshake, exportsize))) { @@ -508,6 +661,7 @@ nbd_open (int readonly) struct new_handshake_finish finish; size_t expect; + nbdkit_debug ("trying newstyle connection"); if (read_full (h->fd, &gflags, sizeof gflags)) { nbdkit_error ("unable to read global flags: %m"); goto err; @@ -519,25 +673,36 @@ nbd_open (int readonly) goto err; } - /* For now, we don't do any option haggling, but go straight into - transmission phase */ - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_EXPORT_NAME); - opt.optlen = htobe32 (strlen (export)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, export, strlen (export))) { - nbdkit_error ("unable to request export '%s': %m", export); - goto err; + /* Prefer NBD_OPT_GO if possible */ + if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { + int rc = nbd_newstyle_haggle (h); + if (rc < 0) + goto err; + if (!rc) + goto export_name; } - expect = sizeof finish; - if (gflags & NBD_FLAG_NO_ZEROES) - expect -= sizeof finish.zeroes; - if (read_full (h->fd, &finish, expect)) { - nbdkit_error ("unable to read new handshake: %m"); - goto err; + else { + export_name: + /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_EXPORT_NAME); + opt.optlen = htobe32 (strlen (export)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, export, strlen (export))) { + nbdkit_error ("unable to request export '%s': %m", export); + goto err; + } + expect = sizeof finish; + if (gflags & NBD_FLAG_NO_ZEROES) + expect -= sizeof finish.zeroes; + if (read_full (h->fd, &finish, expect)) { + nbdkit_error ("unable to read new handshake: %m"); + goto err; + } + h->size = be64toh (finish.exportsize); + h->flags = be16toh (finish.eflags); } - h->size = be64toh (finish.exportsize); - h->flags = be16toh (finish.eflags); } else { nbdkit_error ("unexpected version %#" PRIx64, version); -- 2.20.1
Eric Blake
2019-Apr-23 00:50 UTC
[Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies
Time to enhance the nbd plugin to request structured replies from the server. For now, deal only with structured reads. The server can now return sparse reads, even though we need nbdkit version 3 before we can in turn return sparse reads back to the client. In general, we have to assume the server is malicious, and so we must sanity check that it sends replies we expect. Thus, we have a choice between either implementing bookkeeping to ensure that the server sends exactly as many non-overlapping chunks as needed to cover the entire read request, or else ensuring that even when the server cheats, we do not leak uninitialized memory back to our client. I chose for simplicity, with the result that I end up calling memset() more frequently than necessary for a compliant server, rather than worrying about bookkeeping to detect a non-compliant server that is attempting to cause an information leak. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 232 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 212 insertions(+), 20 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 313bf59..f729051 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -127,7 +127,9 @@ struct transaction { int fds[2]; } u; void *buf; + uint64_t offset; uint32_t count; + uint32_t err; struct transaction *next; }; @@ -137,6 +139,7 @@ struct handle { int fd; int flags; int64_t size; + bool structured; pthread_t reader; /* Prevents concurrent threads from interleaving writes to server */ @@ -230,9 +233,10 @@ nbd_mark_dead (struct handle *h) return -1; } -/* Find and remove the transaction corresponding to cookie from the list. */ +/* Find and possibly remove the transaction corresponding to cookie + from the list. */ static struct transaction * -find_trans_by_cookie (struct handle *h, uint64_t cookie) +find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) { struct transaction **ptr; struct transaction *trans; @@ -244,7 +248,7 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie) break; ptr = &trans->next; } - if (trans) + if (trans && remove) *ptr = trans->next; nbd_unlock (h); return trans; @@ -303,6 +307,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, } trans->buf = rep_buf; trans->count = rep_buf ? count : 0; + trans->offset = offset; nbd_lock (h); if (h->dead) { nbd_unlock (h); @@ -315,7 +320,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, nbd_unlock (h); if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) return fd; - trans = find_trans_by_cookie (h, cookie); + trans = find_trans_by_cookie (h, cookie, true); err: err = errno; @@ -340,35 +345,196 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, /* Read a reply, and look up the fd corresponding to the transaction. Return the server's non-negative answer (converted to local errno - value) on success, or -1 on read failure. */ + value) on success, or -1 on read failure. If structured replies + were negotiated, fd is set to -1 if there are still more replies + expected. */ static int nbd_reply_raw (struct handle *h, int *fd) { - struct simple_reply rep; + union { + struct simple_reply simple; + struct structured_reply structured; + } rep; struct transaction *trans; - void *buf; + void *buf = NULL; uint32_t count; + int error = NBD_SUCCESS; + bool more = false; + uint64_t offset = 0; /* absolute offset of structured read chunk from buf */ + uint32_t len = 0; /* 0 except for structured reads */ + bool zero = false; /* Whether to read or memset on structured read */ *fd = -1; - if (read_full (h->fd, &rep, sizeof rep) < 0) + /* magic and handle overlap between simple and structured replies */ + if (read_full (h->fd, &rep, sizeof rep.simple)) return nbd_mark_dead (h); - if (be32toh (rep.magic) != NBD_SIMPLE_REPLY_MAGIC) + switch (be32toh (rep.simple.magic)) { + case NBD_SIMPLE_REPLY_MAGIC: + nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", + rep.simple.handle, + name_of_nbd_error(be32toh (rep.simple.error))); + error = be32toh (rep.simple.error); + break; + case NBD_STRUCTURED_REPLY_MAGIC: + if (!h->structured) { + nbdkit_error ("structured response without negotiation"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, sizeof rep.simple + (char *) &rep, + sizeof rep - sizeof rep.simple)) + return nbd_mark_dead (h); + rep.structured.flags = be16toh (rep.structured.flags); + rep.structured.type = be16toh (rep.structured.type); + rep.structured.length = be32toh (rep.structured.length); + nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 + ", payload length %" PRId32, + name_of_nbd_reply_type(rep.structured.type), + rep.structured.handle, rep.structured.length); + if (rep.structured.length > 64 * 1024 * 1024) { + nbdkit_error ("structured reply length is suspiciously large: %" PRId32, + rep.structured.length); + return nbd_mark_dead (h); + } + if (rep.structured.length) { + /* Special case for OFFSET_DATA in order to read tail of chunk + directly into final buffer later on */ + len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && + rep.structured.length > sizeof offset) ? sizeof offset : + rep.structured.length; + buf = malloc (len); + if (!buf) { + nbdkit_error ("reading structured reply payload: %m"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, buf, len)) { + free (buf); + return nbd_mark_dead (h); + } + len = 0; + } + more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); + switch (rep.structured.type) { + case NBD_REPLY_TYPE_NONE: + if (rep.structured.length) { + nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); + free (buf); + return nbd_mark_dead (h); + } + if (more) { + nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); + return nbd_mark_dead (h); + } + break; + case NBD_REPLY_TYPE_OFFSET_DATA: + if (rep.structured.length <= sizeof offset) { + nbdkit_error ("structured reply OFFSET_DATA too small"); + free (buf); + return nbd_mark_dead (h); + } + memcpy (&offset, buf, sizeof offset); + offset = be64toh (offset); + len = rep.structured.length - sizeof offset; + break; + case NBD_REPLY_TYPE_OFFSET_HOLE: + if (rep.structured.length != sizeof offset + sizeof len) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + free (buf); + return nbd_mark_dead (h); + } + memcpy (&offset, buf, sizeof offset); + offset = be64toh (offset); + memcpy (&len, buf, sizeof len); + len = be32toh (len); + if (!len) { + nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); + free (buf); + return nbd_mark_dead (h); + } + zero = true; + break; + default: + if (NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { + uint16_t errlen; + + if (rep.structured.length < sizeof error + sizeof errlen) { + nbdkit_error ("structured reply error size incorrect"); + free (buf); + return nbd_mark_dead (h); + } + memcpy (&errlen, (char *) buf + sizeof error, sizeof errlen); + errlen = be16toh (errlen); + if (errlen > rep.structured.length - sizeof error - sizeof errlen) { + nbdkit_error ("structured reply error message size incorrect"); + free (buf); + return nbd_mark_dead (h); + } + memcpy (&error, buf, sizeof error); + error = be32toh (error); + if (errlen) + nbdkit_debug ("received structured error %s with message: %.*s", + name_of_nbd_error (error), errlen, + (char *) buf + sizeof error + sizeof errlen); + else + nbdkit_debug ("received structured error %s without message", + name_of_nbd_error (error)); + free (buf); + } + else { + nbdkit_error ("received unexpected structured reply %s", + name_of_nbd_reply_type(rep.structured.type)); + free (buf); + return nbd_mark_dead (h); + } + } + break; + default: + nbdkit_error ("received unexpected magic in reply: %#" PRIx32, + rep.simple.magic); return nbd_mark_dead (h); - nbdkit_debug ("received reply for cookie %#" PRIx64 ", status %s", - rep.handle, name_of_nbd_error(be32toh (rep.error))); - trans = find_trans_by_cookie (h, rep.handle); + } + + trans = find_trans_by_cookie (h, rep.simple.handle, !more); if (!trans) { - nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.handle); + nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); return nbd_mark_dead (h); } - *fd = trans->u.fds[1]; + if (!more) + *fd = trans->u.fds[1]; + else if (error && !trans->err) + trans->err = error; buf = trans->buf; count = trans->count; + if (buf && h->structured && + be32toh (rep.simple.magic) == NBD_SIMPLE_REPLY_MAGIC) { + nbdkit_error ("simple read reply when structured was expected"); + return nbd_mark_dead (h); + } + if (len) { + if (!buf) { + nbdkit_error ("structured read response to a non-read command"); + return nbd_mark_dead (h); + } + if (offset < trans->offset || offset > INT64_MAX || + offset + len > trans->offset + count) { + nbdkit_error ("structured read reply with unexpected offset/length"); + return nbd_mark_dead (h); + } + buf = (char *) buf + offset - trans->offset; + if (zero) { + /* We do not check whether the server mistakenly sends + non-compliant overlapping chunks, thus we must re-zero holes + rather than relying on our pre-initialization to zero */ + memset (buf, 0, len); + buf = NULL; + } + else + count = len; + } free (trans); - switch (be32toh (rep.error)) { + switch (error) { case NBD_SUCCESS: - if (buf && read_full (h->fd, buf, count) < 0) + if (buf && read_full (h->fd, buf, count)) return nbd_mark_dead (h); return 0; case NBD_EPERM: @@ -378,8 +544,7 @@ nbd_reply_raw (struct handle *h, int *fd) case NBD_ENOMEM: return ENOMEM; default: - nbdkit_debug ("unexpected error %d, squashing to EINVAL", - be32toh (rep.error)); + nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); /* fallthrough */ case NBD_EINVAL: return EINVAL; @@ -405,7 +570,9 @@ nbd_reader (void *handle) r = nbd_reply_raw (h, &fd); if (r >= 0) { - if (write (fd, &r, sizeof r) != sizeof r) { + if (fd < 0) + nbdkit_debug ("partial reply handled, waiting for final reply"); + else if (write (fd, &r, sizeof r) != sizeof r) { nbdkit_error ("failed to write pipe: %m"); abort (); } @@ -527,7 +694,25 @@ nbd_newstyle_haggle (struct handle *h) char *buffer; uint16_t info; - /* TODO: structured reads, block status */ + nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); + opt.optlen = htobe32 (0); + if (write_full (h->fd, &opt, sizeof opt)) { + nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_ACK) { + nbdkit_debug ("structured replies enabled"); + h->structured = true; + /* TODO: block status */ + } + else { + nbdkit_debug ("structured replies disabled"); + } /* Try NBD_OPT_GO */ nbdkit_debug ("trying NBD_OPT_GO"); @@ -828,6 +1013,13 @@ nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, int c; assert (!flags); + /* We do not check whether structured read chunks cover the entire + buffer; as such, to avoid leaking uninit memory back to the + client, we must pre-zero the buffer. TODO: should nbdkit + guarantee a clean buffer to .pread, perhaps based on a flag set + when we register our plugin? */ + if (h->structured) + memset (buf, 0, count); c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf); return c < 0 ? c : nbd_reply (h, c); } -- 2.20.1
Eric Blake
2019-Apr-23 02:17 UTC
Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies
On 4/22/19 7:50 PM, Eric Blake wrote:> Time to enhance the nbd plugin to request structured replies from the > server. For now, deal only with structured reads. The server can now > return sparse reads, even though we need nbdkit version 3 before we > can in turn return sparse reads back to the client. > > In general, we have to assume the server is malicious, and so we must > sanity check that it sends replies we expect. Thus, we have a choice > between either implementing bookkeeping to ensure that the server > sends exactly as many non-overlapping chunks as needed to cover the > entire read request, or else ensuring that even when the server > cheats, we do not leak uninitialized memory back to our client. I > chose for simplicity, with the result that I end up calling memset() > more frequently than necessary for a compliant server, rather than > worrying about bookkeeping to detect a non-compliant server that is > attempting to cause an information leak. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > plugins/nbd/nbd.c | 232 ++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 212 insertions(+), 20 deletions(-) >> + case NBD_STRUCTURED_REPLY_MAGIC: > + if (!h->structured) { > + nbdkit_error ("structured response without negotiation"); > + return nbd_mark_dead (h); > + } > + if (read_full (h->fd, sizeof rep.simple + (char *) &rep, > + sizeof rep - sizeof rep.simple)) > + return nbd_mark_dead (h); > + rep.structured.flags = be16toh (rep.structured.flags); > + rep.structured.type = be16toh (rep.structured.type); > + rep.structured.length = be32toh (rep.structured.length); > + nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 > + ", payload length %" PRId32, > + name_of_nbd_reply_type(rep.structured.type), > + rep.structured.handle, rep.structured.length); > + if (rep.structured.length > 64 * 1024 * 1024) { > + nbdkit_error ("structured reply length is suspiciously large: %" PRId32, > + rep.structured.length); > + return nbd_mark_dead (h); > + } > + if (rep.structured.length) { > + /* Special case for OFFSET_DATA in order to read tail of chunk > + directly into final buffer later on */ > + len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && > + rep.structured.length > sizeof offset) ? sizeof offset : > + rep.structured.length; > + buf = malloc (len);...> + case NBD_REPLY_TYPE_OFFSET_DATA: > + if (rep.structured.length <= sizeof offset) { > + nbdkit_error ("structured reply OFFSET_DATA too small"); > + free (buf); > + return nbd_mark_dead (h); > + } > + memcpy (&offset, buf, sizeof offset); > + offset = be64toh (offset); > + len = rep.structured.length - sizeof offset; > + break;leaks buf> + case NBD_REPLY_TYPE_OFFSET_HOLE: > + if (rep.structured.length != sizeof offset + sizeof len) { > + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); > + free (buf); > + return nbd_mark_dead (h); > + } > + memcpy (&offset, buf, sizeof offset); > + offset = be64toh (offset); > + memcpy (&len, buf, sizeof len); > + len = be32toh (len); > + if (!len) { > + nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); > + free (buf); > + return nbd_mark_dead (h); > + } > + zero = true; > + break;likewise -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2019-Apr-23 07:36 UTC
Re: [Libguestfs] [nbdkit PATCH 3/7] RFC: protocol: Only send EOVERFLOW when valid
On Mon, Apr 22, 2019 at 07:50:22PM -0500, Eric Blake wrote:> Previously, we were squashing EOVERFLOW into EINVAL; continue to do so > at points in the protocol where the client may not be expecting > EOVERFLOW. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > server/protocol.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/server/protocol.c b/server/protocol.c > index a52bb56..0a9f73c 100644 > --- a/server/protocol.c > +++ b/server/protocol.c > @@ -326,7 +326,7 @@ skip_over_write_buffer (int sock, size_t count) > > /* Convert a system errno to an NBD_E* error code. */ > static int > -nbd_errno (int error) > +nbd_errno (int error, bool flag_df) > { > switch (error) { > case 0: > @@ -349,7 +349,9 @@ nbd_errno (int error) > return NBD_ESHUTDOWN; > #endif > case EOVERFLOW: > - return NBD_EOVERFLOW; > + if (flag_df) > + return NBD_EOVERFLOW; > + /* fallthrough */ > case EINVAL: > default: > return NBD_EINVAL; > @@ -368,7 +370,7 @@ send_simple_reply (struct connection *conn, > > reply.magic = htobe32 (NBD_SIMPLE_REPLY_MAGIC); > reply.handle = handle; > - reply.error = htobe32 (nbd_errno (error)); > + reply.error = htobe32 (nbd_errno (error, false)); > > r = conn->send (conn, &reply, sizeof reply); > if (r == -1) { > @@ -573,7 +575,8 @@ send_structured_reply_block_status (struct connection *conn, > > static int > send_structured_reply_error (struct connection *conn, > - uint64_t handle, uint16_t cmd, uint32_t error) > + uint64_t handle, uint16_t cmd, uint16_t flags, > + uint32_t error) > { > ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock); > struct structured_reply reply; > @@ -593,7 +596,7 @@ send_structured_reply_error (struct connection *conn, > } > > /* Send the error. */ > - error_data.error = htobe32 (nbd_errno (error)); > + error_data.error = htobe32 (nbd_errno (error, flags & NBD_CMD_FLAG_DF)); > error_data.len = htobe16 (0); > r = conn->send (conn, &error_data, sizeof error_data); > if (r == -1) { > @@ -737,7 +740,8 @@ protocol_recv_request_send_reply (struct connection *conn) > extents); > } > else > - return send_structured_reply_error (conn, request.handle, cmd, error); > + return send_structured_reply_error (conn, request.handle, cmd, flags, > + error); > } > else > return send_simple_reply (conn, request.handle, cmd, buf, count, error); > -- > 2.20.1The protocol spec is unclear on whether EOVERFLOW can be returned in cases other than the DF flag being set. Whether we include this patch or not seems to hinge on the interpretation of the protocol spec which I'm not really in a position to make. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Richard W.M. Jones
2019-Apr-23 07:42 UTC
Re: [Libguestfs] [nbdkit PATCH 6/7] nbd: Implement NBD_OPT_GO client request
So patches 1-6 are fine. I'm not in a position to say whether patch 3 is required or not, but I have no objection to it either way. Patch 7 seems to have a leak, but apart from that is also fine. About memset-ing the buffer before passing it to plugin .pread methods, I'll send you a patch for that in a minute. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Add support for calling NBD_CMD_BLOCK_STATUS on the remote server when it is supported. I could have parsed that the server's id response to NBD_OPT_SET_META_CONTEXT is the same as what later appears in NBD_REPLY_TYPE_BLOCK_STATUS, but didn't think it worth the effort as long as we expect exactly one meta context. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 135 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 129 insertions(+), 6 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 8eb8a31..9986d6c 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -130,6 +130,7 @@ struct transaction { uint64_t offset; uint32_t count; uint32_t err; + struct nbdkit_extents *extents; struct transaction *next; }; @@ -140,6 +141,7 @@ struct handle { int flags; int64_t size; bool structured; + bool extents; pthread_t reader; /* Prevents concurrent threads from interleaving writes to server */ @@ -286,7 +288,7 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, static int nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf) + void *rep_buf, struct nbdkit_extents *extents) { int err; struct transaction *trans; @@ -308,6 +310,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, trans->buf = rep_buf; trans->count = rep_buf ? count : 0; trans->offset = offset; + trans->extents = extents; nbd_lock (h); if (h->dead) { nbd_unlock (h); @@ -340,7 +343,7 @@ static int nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, uint32_t count) { - return nbd_request_full (h, flags, type, offset, count, NULL, NULL); + return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); } /* Read a reply, and look up the fd corresponding to the transaction. @@ -358,6 +361,12 @@ nbd_reply_raw (struct handle *h, int *fd) struct transaction *trans; void *buf = NULL; uint32_t count; + uint32_t id; + struct { + uint32_t length; + uint32_t flags; + } *extents = NULL; + size_t nextents = 0; int error = NBD_SUCCESS; bool more = false; uint64_t offset = 0; /* absolute offset of structured read chunk from buf */ @@ -452,6 +461,34 @@ nbd_reply_raw (struct handle *h, int *fd) } zero = true; break; + case NBD_REPLY_TYPE_BLOCK_STATUS: + if (!h->extents) { + nbdkit_error ("block status response without negotiation"); + free (buf); + return nbd_mark_dead (h); + } + if (rep.structured.length < sizeof *extents || + rep.structured.length % sizeof *extents != sizeof id) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + free (buf); + return nbd_mark_dead (h); + } + nextents = rep.structured.length / sizeof *extents; + extents = malloc (rep.structured.length - sizeof id); + if (!extents) { + nbdkit_error ("malloc: %m"); + return nbd_mark_dead (h); + } + memcpy (&id, buf, sizeof id); + id = be32toh (id); + nbdkit_debug ("parsing %zu extents for context id %" PRId32, + nextents, id); + memcpy (extents, (char *) buf + sizeof id, sizeof *extents * nextents); + for (size_t i = 0; i < nextents; i++) { + extents[i].length = be32toh (extents[i].length); + extents[i].flags = be32toh (extents[i].flags); + } + break; default: if (NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { uint16_t errlen; @@ -497,9 +534,29 @@ nbd_reply_raw (struct handle *h, int *fd) trans = find_trans_by_cookie (h, rep.simple.handle, !more); if (!trans) { nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); + free (extents); return nbd_mark_dead (h); } + if (nextents) { + if (!trans->extents) { + nbdkit_error ("block status response to a non-status command"); + free (extents); + return nbd_mark_dead (h); + } + offset = trans->offset; + for (size_t i = 0; i < nextents; i++) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (trans->extents, offset, extents[i].length, + extents[i].flags) == -1) { + error = EINVAL; + break; + } + offset += extents[i].length; + } + free (extents); + } + if (!more) { *fd = trans->u.fds[1]; if (!error) @@ -686,8 +743,11 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, static int nbd_newstyle_haggle (struct handle *h) { + const char *const query = "base:allocation"; struct new_option opt; uint32_t exportnamelen = htobe32 (strlen (export)); + uint32_t nrqueries = htobe32 (1); + uint32_t querylen = htobe32 (strlen (query)); /* For now, we make no NBD_INFO_* requests, relying on the server to send its defaults. TODO: nbdkit should let plugins report block sizes, at which point we should request NBD_INFO_BLOCK_SIZE and @@ -710,9 +770,47 @@ nbd_newstyle_haggle (struct handle *h) NULL) < 0) return -1; if (reply.reply == NBD_REP_ACK) { - nbdkit_debug ("structured replies enabled"); + nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); h->structured = true; - /* TODO: block status */ + + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrqueries + sizeof querylen + strlen (query)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrqueries, sizeof nrqueries) || + write_full (h->fd, &querylen, sizeof querylen) || + write_full (h->fd, query, strlen (query))) { + nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_META_CONTEXT) { + /* Cheat: we asked for exactly one context. We could double + check that the server is replying with exactly the + "base:allocation" context, and then remember the id it tells + us to later confirm that responses to NBD_CMD_BLOCK_STATUS + match up; but in the absence of multiple contexts, it's + easier to just assume the server is compliant, and will reuse + the same id, without bothering to check further. */ + nbdkit_debug ("extents enabled"); + h->extents = true; + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + } + if (reply.reply != NBD_REP_ACK) { + if (h->extents) { + nbdkit_error ("unexpected response to set meta context"); + return -1; + } + nbdkit_debug ("ignoring meta context response %s", + name_of_nbd_rep (reply.reply)); + } } else { nbdkit_debug ("structured replies disabled"); @@ -1008,6 +1106,14 @@ nbd_can_multi_conn (void *handle) return h->flags & NBD_FLAG_CAN_MULTI_CONN; } +static int +nbd_can_extents (void *handle) +{ + struct handle *h = handle; + + return h->extents; +} + /* Read data from the file. */ static int nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -1024,7 +1130,7 @@ nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, when we register our plugin? */ if (h->structured) memset (buf, 0, count); - c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf); + c = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); return c < 0 ? c : nbd_reply (h, c); } @@ -1038,7 +1144,7 @@ nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, assert (!(flags & ~NBDKIT_FLAG_FUA)); c = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL); + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); return c < 0 ? c : nbd_reply (h, c); } @@ -1086,6 +1192,21 @@ nbd_flush (void *handle, uint32_t flags) return c < 0 ? c : nbd_reply (h, c); } +/* Read extents of the file. */ +static int +nbd_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + int c; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); + c = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, + NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, + extents); + return c < 0 ? c : nbd_reply (h, c); +} + static struct nbdkit_plugin plugin = { .name = "nbd", .longname = "nbdkit nbd plugin", @@ -1104,11 +1225,13 @@ static struct nbdkit_plugin plugin = { .can_zero = nbd_can_zero, .can_fua = nbd_can_fua, .can_multi_conn = nbd_can_multi_conn, + .can_extents = nbd_can_extents, .pread = nbd_pread, .pwrite = nbd_pwrite, .zero = nbd_zero, .flush = nbd_flush, .trim = nbd_trim, + .extents = nbd_extents, .errno_is_preserved = 1, }; -- 2.20.1
Eric Blake
2019-Apr-23 20:25 UTC
Re: [Libguestfs] [nbdkit PATCH 7/7] nbd: Implement structured replies
On 4/22/19 7:50 PM, Eric Blake wrote:> Time to enhance the nbd plugin to request structured replies from the > server. For now, deal only with structured reads. The server can now > return sparse reads, even though we need nbdkit version 3 before we > can in turn return sparse reads back to the client. >> + case NBD_STRUCTURED_REPLY_MAGIC:> + more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE);> + trans = find_trans_by_cookie (h, rep.simple.handle, !more); > if (!trans) { > - nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.handle); > + nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); > return nbd_mark_dead (h); > } > > - *fd = trans->u.fds[1]; > + if (!more) > + *fd = trans->u.fds[1]; > + else if (error && !trans->err) > + trans->err = error; > buf = trans->buf; > count = trans->count; > + if (buf && h->structured &&> + } > free (trans);This sets up a use-after-free if the server replies with more than one chunk. The free(trans) call must happen only if !more. Looks like I'll be sending a v2 of the tail of this series on top of my work to utilize cleanup.h (I've applied the obvious bug fixes in 1, 4, and 5, and am waiting for the NBD list to respond to my question about a possible protocol spec change before deciding to push 2 alone or squashed with 3). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org